Dataset statistics
Number of variables | 7 |
---|---|
Number of observations | 10000 |
Missing cells | 2986 |
Missing cells (%) | 4.3% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 664.1 KiB |
Average record size in memory | 68.0 B |
Variable types
Text | 2 |
---|---|
Numeric | 4 |
Categorical | 1 |
Dataset
Description | 한국세라믹기술원 세라믹소재정보은행의 화학식 정보입니다. |
---|---|
Author | 한국세라믹기술원 |
URL | https://www.data.go.kr/data/15072098/fileData.do |
원소순번 is highly overall correlated with 몰수 | High correlation |
몰수 is highly overall correlated with 원소순번 | High correlation |
상의몰계수 has 746 (7.5%) missing values | Missing |
구성원소 has 1086 (10.9%) missing values | Missing |
몰수 has 1154 (11.5%) missing values | Missing |
몰수 is highly skewed (γ1 = 87.66106099) | Skewed |
상의몰계수 has 130 (1.3%) zeros | Zeros |
원소순번 has 429 (4.3%) zeros | Zeros |
Reproduction
Analysis started | 2023-12-12 22:17:46.093644 |
---|---|
Analysis finished | 2023-12-12 22:17:48.814448 |
Duration | 2.72 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
소재시퀀스
Text
Distinct | 7705 |
---|---|
Distinct (%) | 77.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
Value | Count | Frequency (%) |
m112311 | 6 | 0.1% |
m102986 | 5 | < 0.1% |
m119477 | 5 | < 0.1% |
m111819 | 5 | < 0.1% |
m116905 | 5 | < 0.1% |
m116008 | 4 | < 0.1% |
m111989 | 4 | < 0.1% |
m107664 | 4 | < 0.1% |
m108171 | 4 | < 0.1% |
m119182 | 4 | < 0.1% |
Other values (7695) | 9954 |
Most occurring characters
Value | Count | Frequency (%) |
1 | 19107 | |
M | 10000 | |
2 | 7134 | 10.2% |
0 | 5886 | 8.4% |
3 | 4328 | 6.2% |
4 | 4190 | 6.0% |
7 | 4099 | 5.9% |
6 | 4089 | 5.8% |
8 | 3948 | 5.6% |
5 | 3618 | 5.2% |
Other values (3) | 3603 | 5.1% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 60000 | |
Uppercase Letter | 10001 | 14.3% |
Lowercase Letter | 1 | < 0.1% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 19107 | |
2 | 7134 | 11.9% |
0 | 5886 | 9.8% |
3 | 4328 | 7.2% |
4 | 4190 | 7.0% |
7 | 4099 | 6.8% |
6 | 4089 | 6.8% |
8 | 3948 | 6.6% |
5 | 3618 | 6.0% |
9 | 3601 | 6.0% |
Uppercase Letter
Value | Count | Frequency (%) |
M | 10000 | |
R | 1 | < 0.1% |
Lowercase Letter
Value | Count | Frequency (%) |
a | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 60000 | |
Latin | 10002 | 14.3% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
1 | 19107 | |
2 | 7134 | 11.9% |
0 | 5886 | 9.8% |
3 | 4328 | 7.2% |
4 | 4190 | 7.0% |
7 | 4099 | 6.8% |
6 | 4089 | 6.8% |
8 | 3948 | 6.6% |
5 | 3618 | 6.0% |
9 | 3601 | 6.0% |
Latin
Value | Count | Frequency (%) |
M | 10000 | |
R | 1 | < 0.1% |
a | 1 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 70002 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1 | 19107 | |
M | 10000 | |
2 | 7134 | 10.2% |
0 | 5886 | 8.4% |
3 | 4328 | 6.2% |
4 | 4190 | 6.0% |
7 | 4099 | 5.9% |
6 | 4089 | 5.8% |
8 | 3948 | 5.6% |
5 | 3618 | 5.2% |
Other values (3) | 3603 | 5.1% |
화학식번호
Real number (ℝ)
Distinct | 9 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1.2626 |
Minimum | 0 |
---|---|
Maximum | 8 |
Zeros | 1 |
Zeros (%) | < 0.1% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 1 |
Q1 | 1 |
median | 1 |
Q3 | 1 |
95-th percentile | 3 |
Maximum | 8 |
Range | 8 |
Interquartile range (IQR) | 0 |
Descriptive statistics
Standard deviation | 0.65535043 |
---|---|
Coefficient of variation (CV) | 0.51904834 |
Kurtosis | 13.928001 |
Mean | 1.2626 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 3.3086343 |
Sum | 12626 |
Variance | 0.42948419 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1 | 8170 | |
2 | 1295 | 13.0% |
3 | 359 | 3.6% |
4 | 107 | 1.1% |
5 | 52 | 0.5% |
6 | 12 | 0.1% |
7 | 3 | < 0.1% |
0 | 1 | < 0.1% |
8 | 1 | < 0.1% |
Value | Count | Frequency (%) |
0 | 1 | < 0.1% |
1 | 8170 | |
2 | 1295 | 13.0% |
3 | 359 | 3.6% |
4 | 107 | 1.1% |
5 | 52 | 0.5% |
6 | 12 | 0.1% |
7 | 3 | < 0.1% |
8 | 1 | < 0.1% |
Value | Count | Frequency (%) |
8 | 1 | < 0.1% |
7 | 3 | < 0.1% |
6 | 12 | 0.1% |
5 | 52 | 0.5% |
4 | 107 | 1.1% |
3 | 359 | 3.6% |
2 | 1295 | 13.0% |
1 | 8170 | |
0 | 1 | < 0.1% |
상의몰계수
Real number (ℝ)
MISSING
  ZEROS
 
Distinct | 216 |
---|---|
Distinct (%) | 2.3% |
Missing | 746 |
Missing (%) | 7.5% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1.6894431 |
Minimum | 0 |
---|---|
Maximum | 100 |
Zeros | 130 |
Zeros (%) | 1.3% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0.08 |
Q1 | 1 |
median | 1 |
Q3 | 1 |
95-th percentile | 1 |
Maximum | 100 |
Range | 100 |
Interquartile range (IQR) | 0 |
Descriptive statistics
Standard deviation | 6.1676447 |
---|---|
Coefficient of variation (CV) | 3.650697 |
Kurtosis | 113.34029 |
Mean | 1.6894431 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 9.9416658 |
Sum | 15634.106 |
Variance | 38.039841 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1.0 | 7143 | |
0.0 | 130 | 1.3% |
0.5 | 108 | 1.1% |
0.16 | 106 | 1.1% |
0.84 | 103 | 1.0% |
0.33 | 89 | 0.9% |
0.97 | 76 | 0.8% |
0.25 | 50 | 0.5% |
0.05 | 48 | 0.5% |
0.96 | 47 | 0.5% |
Other values (206) | 1354 | 13.5% |
(Missing) | 746 | 7.5% |
Value | Count | Frequency (%) |
0.0 | 130 | |
0.001 | 1 | < 0.1% |
0.002 | 7 | 0.1% |
0.003 | 2 | < 0.1% |
0.004 | 10 | 0.1% |
0.005 | 1 | < 0.1% |
0.008 | 4 | < 0.1% |
0.01 | 35 | 0.4% |
0.011 | 1 | < 0.1% |
0.013 | 1 | < 0.1% |
Value | Count | Frequency (%) |
100.0 | 5 | |
96.0 | 1 | < 0.1% |
93.0 | 1 | < 0.1% |
88.2 | 2 | < 0.1% |
87.0 | 1 | < 0.1% |
85.7 | 1 | < 0.1% |
85.0 | 1 | < 0.1% |
84.4 | 1 | < 0.1% |
81.48 | 1 | < 0.1% |
78.26 | 2 | < 0.1% |
에이비엑스코드
Categorical
Distinct | 3 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
A | |
---|---|
B | |
C |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | C |
---|---|
2nd row | B |
3rd row | B |
4th row | B |
5th row | C |
Common Values
Value | Count | Frequency (%) |
A | 4165 | |
B | 3510 | |
C | 2325 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
a | 4165 | |
b | 3510 | |
c | 2325 |
원소순번
Real number (ℝ)
HIGH CORRELATION
  ZEROS
 
Distinct | 16 |
---|---|
Distinct (%) | 0.2% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1.5191 |
Minimum | 0 |
---|---|
Maximum | 15 |
Zeros | 429 |
Zeros (%) | 4.3% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 1 |
Q1 | 1 |
median | 1 |
Q3 | 2 |
95-th percentile | 4 |
Maximum | 15 |
Range | 15 |
Interquartile range (IQR) | 1 |
Descriptive statistics
Standard deviation | 1.2192554 |
---|---|
Coefficient of variation (CV) | 0.80261698 |
Kurtosis | 21.535803 |
Mean | 1.5191 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 3.6288419 |
Sum | 15191 |
Variance | 1.4865838 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1 | 6512 | |
2 | 1739 | 17.4% |
3 | 762 | 7.6% |
0 | 429 | 4.3% |
4 | 322 | 3.2% |
5 | 91 | 0.9% |
6 | 49 | 0.5% |
7 | 26 | 0.3% |
8 | 18 | 0.2% |
10 | 16 | 0.2% |
Other values (6) | 36 | 0.4% |
Value | Count | Frequency (%) |
0 | 429 | 4.3% |
1 | 6512 | |
2 | 1739 | 17.4% |
3 | 762 | 7.6% |
4 | 322 | 3.2% |
5 | 91 | 0.9% |
6 | 49 | 0.5% |
7 | 26 | 0.3% |
8 | 18 | 0.2% |
9 | 15 | 0.1% |
Value | Count | Frequency (%) |
15 | 2 | < 0.1% |
14 | 2 | < 0.1% |
13 | 4 | < 0.1% |
12 | 4 | < 0.1% |
11 | 9 | 0.1% |
10 | 16 | 0.2% |
9 | 15 | 0.1% |
8 | 18 | 0.2% |
7 | 26 | |
6 | 49 |
구성원소
Text
MISSING
 
Distinct | 115 |
---|---|
Distinct (%) | 1.3% |
Missing | 1086 |
Missing (%) | 10.9% |
Memory size | 156.2 KiB |
Length
Max length | 11 |
---|---|
Median length | 10 |
Mean length | 10.001907 |
Min length | 2 |
Characters and Unicode
Total characters | 89157 |
---|---|
Distinct characters | 19 |
Distinct categories | 5 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 21 ? |
---|---|
Unique (%) | 0.2% |
Sample
1st row | C010200003 |
---|---|
2nd row | C000000022 |
3rd row | C000000022 |
4th row | C000000026 |
5th row | C000000022 |
Value | Count | Frequency (%) |
c000000008 | 1843 | |
c000000022 | 736 | 8.3% |
c000000041 | 469 | 5.3% |
c000000040 | 360 | 4.0% |
c000000011 | 323 | 3.6% |
c000000030 | 321 | 3.6% |
c000000026 | 309 | 3.5% |
c000000056 | 305 | 3.4% |
c000000082 | 302 | 3.4% |
c000000019 | 266 | 3.0% |
Other values (105) | 3680 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 64653 | |
C | 8888 | 10.0% |
2 | 3395 | 3.8% |
1 | 3133 | 3.5% |
8 | 3011 | 3.4% |
3 | 1528 | 1.7% |
4 | 1392 | 1.6% |
6 | 949 | 1.1% |
5 | 875 | 1.0% |
9 | 604 | 0.7% |
Other values (9) | 729 | 0.8% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 79994 | |
Uppercase Letter | 8913 | 10.0% |
Lowercase Letter | 200 | 0.2% |
Open Punctuation | 25 | < 0.1% |
Close Punctuation | 25 | < 0.1% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 64653 | |
2 | 3395 | 4.2% |
1 | 3133 | 3.9% |
8 | 3011 | 3.8% |
3 | 1528 | 1.9% |
4 | 1392 | 1.7% |
6 | 949 | 1.2% |
5 | 875 | 1.1% |
9 | 604 | 0.8% |
7 | 454 | 0.6% |
Lowercase Letter
Value | Count | Frequency (%) |
n | 50 | |
d | 50 | |
e | 50 | |
f | 25 | |
i | 25 |
Uppercase Letter
Value | Count | Frequency (%) |
C | 8888 | |
U | 25 | 0.3% |
Open Punctuation
Value | Count | Frequency (%) |
[ | 25 |
Close Punctuation
Value | Count | Frequency (%) |
] | 25 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 80044 | |
Latin | 9113 | 10.2% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 64653 | |
2 | 3395 | 4.2% |
1 | 3133 | 3.9% |
8 | 3011 | 3.8% |
3 | 1528 | 1.9% |
4 | 1392 | 1.7% |
6 | 949 | 1.2% |
5 | 875 | 1.1% |
9 | 604 | 0.8% |
7 | 454 | 0.6% |
Other values (2) | 50 | 0.1% |
Latin
Value | Count | Frequency (%) |
C | 8888 | |
n | 50 | 0.5% |
d | 50 | 0.5% |
e | 50 | 0.5% |
U | 25 | 0.3% |
f | 25 | 0.3% |
i | 25 | 0.3% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 89157 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 64653 | |
C | 8888 | 10.0% |
2 | 3395 | 3.8% |
1 | 3133 | 3.5% |
8 | 3011 | 3.4% |
3 | 1528 | 1.7% |
4 | 1392 | 1.6% |
6 | 949 | 1.1% |
5 | 875 | 1.0% |
9 | 604 | 0.7% |
Other values (9) | 729 | 0.8% |
몰수
Real number (ℝ)
HIGH CORRELATION
  MISSING
  SKEWED
 
Distinct | 906 |
---|---|
Distinct (%) | 10.2% |
Missing | 1154 |
Missing (%) | 11.5% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 3.7400003 |
Minimum | 0 |
---|---|
Maximum | 4725 |
Zeros | 69 |
Zeros (%) | 0.7% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0.02 |
Q1 | 0.4 |
median | 1 |
Q3 | 2 |
95-th percentile | 8 |
Maximum | 4725 |
Range | 4725 |
Interquartile range (IQR) | 1.6 |
Descriptive statistics
Standard deviation | 51.407837 |
---|---|
Coefficient of variation (CV) | 13.74541 |
Kurtosis | 8045.5325 |
Mean | 3.7400003 |
Median Absolute Deviation (MAD) | 0.8 |
Skewness | 87.661061 |
Sum | 33084.042 |
Variance | 2642.7657 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1.0 | 1705 | |
3.0 | 1196 | 12.0% |
2.0 | 545 | 5.5% |
0.5 | 370 | 3.7% |
4.0 | 183 | 1.8% |
0.05 | 163 | 1.6% |
0.2 | 148 | 1.5% |
0.02 | 138 | 1.4% |
0.1 | 119 | 1.2% |
0.48 | 112 | 1.1% |
Other values (896) | 4167 | |
(Missing) | 1154 | 11.5% |
Value | Count | Frequency (%) |
0.0 | 69 | |
0.0005 | 1 | < 0.1% |
0.001 | 6 | 0.1% |
0.00135 | 1 | < 0.1% |
0.0015 | 1 | < 0.1% |
0.002 | 9 | 0.1% |
0.0025 | 2 | < 0.1% |
0.003 | 5 | 0.1% |
0.004 | 8 | 0.1% |
0.005 | 27 | 0.3% |
Value | Count | Frequency (%) |
4725.0 | 1 | |
123.0 | 1 | |
101.5 | 2 | |
101.2 | 1 | |
100.0 | 2 | |
99.84 | 1 | |
99.06 | 2 | |
99.0 | 2 | |
98.3 | 1 | |
98.25 | 1 |
화학식번호 | 상의몰계수 | 에이비엑스코드 | 원소순번 | 몰수 | |
---|---|---|---|---|---|
화학식번호 | 1.000 | 0.261 | 0.107 | 0.136 | 0.000 |
상의몰계수 | 0.261 | 1.000 | 0.054 | 0.000 | 0.000 |
에이비엑스코드 | 0.107 | 0.054 | 1.000 | 0.379 | 0.000 |
원소순번 | 0.136 | 0.000 | 0.379 | 1.000 | 0.000 |
몰수 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 |
화학식번호 | 상의몰계수 | 원소순번 | 몰수 | 에이비엑스코드 | |
---|---|---|---|---|---|
화학식번호 | 1.000 | -0.396 | -0.290 | 0.114 | 0.047 |
상의몰계수 | -0.396 | 1.000 | 0.187 | -0.122 | 0.030 |
원소순번 | -0.290 | 0.187 | 1.000 | -0.513 | 0.246 |
몰수 | 0.114 | -0.122 | -0.513 | 1.000 | 0.000 |
에이비엑스코드 | 0.047 | 0.030 | 0.246 | 0.000 | 1.000 |
소재시퀀스 | 화학식번호 | 상의몰계수 | 에이비엑스코드 | 원소순번 | 구성원소 | 몰수 | |
---|---|---|---|---|---|---|---|
63712 | M114217 | 1 | 1.0 | C | 1 | C010200003 | 3.0 |
49367 | M117011 | 1 | 1.0 | B | 1 | C000000022 | 1.0 |
21661 | M102247 | 1 | 1.0 | B | 1 | C000000022 | 0.89 |
69199 | M116090 | 1 | 1.0 | B | 1 | C000000026 | 0.485 |
82138 | M124181 | 1 | 0.5 | C | 1 | <NA> | <NA> |
775 | M101742 | 1 | 1.0 | B | 1 | C000000022 | 0.88 |
49307 | M117057 | 1 | 1.0 | B | 2 | C000000073 | 0.0 |
3112 | M104031 | 1 | 1.0 | B | 2 | C000000041 | 2.0 |
91511 | M122869 | 1 | 1.0 | B | 2 | C010200006 | 0.1 |
56888 | M116210 | 1 | 1.0 | B | 3 | C000000022 | 0.3475 |
소재시퀀스 | 화학식번호 | 상의몰계수 | 에이비엑스코드 | 원소순번 | 구성원소 | 몰수 | |
---|---|---|---|---|---|---|---|
84646 | M122063 | 1 | 1.0 | B | 2 | C000000013 | 0.5 |
24867 | M107886 | 1 | 0.05 | A | 1 | C000000082 | 1.0 |
50798 | M117099 | 1 | 1.0 | C | 1 | C000000008 | 2.07 |
92370 | M122251 | 1 | <NA> | A | 1 | C000000025 | 0.352 |
69765 | M118828 | 1 | 1.0 | B | 1 | C000000013 | 0.5 |
34433 | M113736 | 1 | 1.0 | B | 5 | C000000040 | 0.525 |
2659 | M103932 | 1 | 1.0 | B | 2 | C000000051 | 0.04 |
42184 | M114435 | 1 | 1.0 | B | 3 | C000000022 | 0.43 |
91961 | M121439 | 1 | 1.0 | A | 4 | C000000013 | 0.02 |
88693 | M120597 | 2 | 1.0 | B | 2 | C000000008 | 1.0 |