Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells2986
Missing cells (%)4.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory664.1 KiB
Average record size in memory68.0 B

Variable types

Text2
Numeric4
Categorical1

Dataset

Description한국세라믹기술원 세라믹소재정보은행의 화학식 정보입니다.
Author한국세라믹기술원
URLhttps://www.data.go.kr/data/15072098/fileData.do

Alerts

원소순번 is highly overall correlated with 몰수High correlation
몰수 is highly overall correlated with 원소순번High correlation
상의몰계수 has 746 (7.5%) missing valuesMissing
구성원소 has 1086 (10.9%) missing valuesMissing
몰수 has 1154 (11.5%) missing valuesMissing
몰수 is highly skewed (γ1 = 87.66106099)Skewed
상의몰계수 has 130 (1.3%) zerosZeros
원소순번 has 429 (4.3%) zerosZeros

Reproduction

Analysis started2023-12-12 22:17:46.093644
Analysis finished2023-12-12 22:17:48.814448
Duration2.72 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct7705
Distinct (%)77.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T07:17:49.096628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length7
Mean length7.0002
Min length7

Characters and Unicode

Total characters70002
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5809 ?
Unique (%)58.1%

Sample

1st rowM114217
2nd rowM117011
3rd rowM102247
4th rowM116090
5th rowM124181
ValueCountFrequency (%)
m112311 6
 
0.1%
m102986 5
 
< 0.1%
m119477 5
 
< 0.1%
m111819 5
 
< 0.1%
m116905 5
 
< 0.1%
m116008 4
 
< 0.1%
m111989 4
 
< 0.1%
m107664 4
 
< 0.1%
m108171 4
 
< 0.1%
m119182 4
 
< 0.1%
Other values (7695) 9954
99.5%
2023-12-13T07:17:49.537731image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 19107
27.3%
M 10000
14.3%
2 7134
 
10.2%
0 5886
 
8.4%
3 4328
 
6.2%
4 4190
 
6.0%
7 4099
 
5.9%
6 4089
 
5.8%
8 3948
 
5.6%
5 3618
 
5.2%
Other values (3) 3603
 
5.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 60000
85.7%
Uppercase Letter 10001
 
14.3%
Lowercase Letter 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 19107
31.8%
2 7134
 
11.9%
0 5886
 
9.8%
3 4328
 
7.2%
4 4190
 
7.0%
7 4099
 
6.8%
6 4089
 
6.8%
8 3948
 
6.6%
5 3618
 
6.0%
9 3601
 
6.0%
Uppercase Letter
ValueCountFrequency (%)
M 10000
> 99.9%
R 1
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
a 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 60000
85.7%
Latin 10002
 
14.3%

Most frequent character per script

Common
ValueCountFrequency (%)
1 19107
31.8%
2 7134
 
11.9%
0 5886
 
9.8%
3 4328
 
7.2%
4 4190
 
7.0%
7 4099
 
6.8%
6 4089
 
6.8%
8 3948
 
6.6%
5 3618
 
6.0%
9 3601
 
6.0%
Latin
ValueCountFrequency (%)
M 10000
> 99.9%
R 1
 
< 0.1%
a 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 70002
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 19107
27.3%
M 10000
14.3%
2 7134
 
10.2%
0 5886
 
8.4%
3 4328
 
6.2%
4 4190
 
6.0%
7 4099
 
5.9%
6 4089
 
5.8%
8 3948
 
5.6%
5 3618
 
5.2%
Other values (3) 3603
 
5.1%

화학식번호
Real number (ℝ)

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.2626
Minimum0
Maximum8
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T07:17:49.636519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.65535043
Coefficient of variation (CV)0.51904834
Kurtosis13.928001
Mean1.2626
Median Absolute Deviation (MAD)0
Skewness3.3086343
Sum12626
Variance0.42948419
MonotonicityNot monotonic
2023-12-13T07:17:49.722086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
1 8170
81.7%
2 1295
 
13.0%
3 359
 
3.6%
4 107
 
1.1%
5 52
 
0.5%
6 12
 
0.1%
7 3
 
< 0.1%
0 1
 
< 0.1%
8 1
 
< 0.1%
ValueCountFrequency (%)
0 1
 
< 0.1%
1 8170
81.7%
2 1295
 
13.0%
3 359
 
3.6%
4 107
 
1.1%
5 52
 
0.5%
6 12
 
0.1%
7 3
 
< 0.1%
8 1
 
< 0.1%
ValueCountFrequency (%)
8 1
 
< 0.1%
7 3
 
< 0.1%
6 12
 
0.1%
5 52
 
0.5%
4 107
 
1.1%
3 359
 
3.6%
2 1295
 
13.0%
1 8170
81.7%
0 1
 
< 0.1%

상의몰계수
Real number (ℝ)

MISSING  ZEROS 

Distinct216
Distinct (%)2.3%
Missing746
Missing (%)7.5%
Infinite0
Infinite (%)0.0%
Mean1.6894431
Minimum0
Maximum100
Zeros130
Zeros (%)1.3%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T07:17:49.824996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.08
Q11
median1
Q31
95-th percentile1
Maximum100
Range100
Interquartile range (IQR)0

Descriptive statistics

Standard deviation6.1676447
Coefficient of variation (CV)3.650697
Kurtosis113.34029
Mean1.6894431
Median Absolute Deviation (MAD)0
Skewness9.9416658
Sum15634.106
Variance38.039841
MonotonicityNot monotonic
2023-12-13T07:17:49.941315image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.0 7143
71.4%
0.0 130
 
1.3%
0.5 108
 
1.1%
0.16 106
 
1.1%
0.84 103
 
1.0%
0.33 89
 
0.9%
0.97 76
 
0.8%
0.25 50
 
0.5%
0.05 48
 
0.5%
0.96 47
 
0.5%
Other values (206) 1354
 
13.5%
(Missing) 746
 
7.5%
ValueCountFrequency (%)
0.0 130
1.3%
0.001 1
 
< 0.1%
0.002 7
 
0.1%
0.003 2
 
< 0.1%
0.004 10
 
0.1%
0.005 1
 
< 0.1%
0.008 4
 
< 0.1%
0.01 35
 
0.4%
0.011 1
 
< 0.1%
0.013 1
 
< 0.1%
ValueCountFrequency (%)
100.0 5
0.1%
96.0 1
 
< 0.1%
93.0 1
 
< 0.1%
88.2 2
 
< 0.1%
87.0 1
 
< 0.1%
85.7 1
 
< 0.1%
85.0 1
 
< 0.1%
84.4 1
 
< 0.1%
81.48 1
 
< 0.1%
78.26 2
 
< 0.1%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
A
4165 
B
3510 
C
2325 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowC
2nd rowB
3rd rowB
4th rowB
5th rowC

Common Values

ValueCountFrequency (%)
A 4165
41.6%
B 3510
35.1%
C 2325
23.2%

Length

2023-12-13T07:17:50.041865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T07:17:50.116260image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
a 4165
41.6%
b 3510
35.1%
c 2325
23.2%

원소순번
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct16
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.5191
Minimum0
Maximum15
Zeros429
Zeros (%)4.3%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T07:17:50.192607image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q32
95-th percentile4
Maximum15
Range15
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.2192554
Coefficient of variation (CV)0.80261698
Kurtosis21.535803
Mean1.5191
Median Absolute Deviation (MAD)0
Skewness3.6288419
Sum15191
Variance1.4865838
MonotonicityNot monotonic
2023-12-13T07:17:50.303013image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
1 6512
65.1%
2 1739
 
17.4%
3 762
 
7.6%
0 429
 
4.3%
4 322
 
3.2%
5 91
 
0.9%
6 49
 
0.5%
7 26
 
0.3%
8 18
 
0.2%
10 16
 
0.2%
Other values (6) 36
 
0.4%
ValueCountFrequency (%)
0 429
 
4.3%
1 6512
65.1%
2 1739
 
17.4%
3 762
 
7.6%
4 322
 
3.2%
5 91
 
0.9%
6 49
 
0.5%
7 26
 
0.3%
8 18
 
0.2%
9 15
 
0.1%
ValueCountFrequency (%)
15 2
 
< 0.1%
14 2
 
< 0.1%
13 4
 
< 0.1%
12 4
 
< 0.1%
11 9
 
0.1%
10 16
 
0.2%
9 15
 
0.1%
8 18
 
0.2%
7 26
0.3%
6 49
0.5%

구성원소
Text

MISSING 

Distinct115
Distinct (%)1.3%
Missing1086
Missing (%)10.9%
Memory size156.2 KiB
2023-12-13T07:17:50.533754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length10
Mean length10.001907
Min length2

Characters and Unicode

Total characters89157
Distinct characters19
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21 ?
Unique (%)0.2%

Sample

1st rowC010200003
2nd rowC000000022
3rd rowC000000022
4th rowC000000026
5th rowC000000022
ValueCountFrequency (%)
c000000008 1843
20.7%
c000000022 736
 
8.3%
c000000041 469
 
5.3%
c000000040 360
 
4.0%
c000000011 323
 
3.6%
c000000030 321
 
3.6%
c000000026 309
 
3.5%
c000000056 305
 
3.4%
c000000082 302
 
3.4%
c000000019 266
 
3.0%
Other values (105) 3680
41.3%
2023-12-13T07:17:50.843934image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 64653
72.5%
C 8888
 
10.0%
2 3395
 
3.8%
1 3133
 
3.5%
8 3011
 
3.4%
3 1528
 
1.7%
4 1392
 
1.6%
6 949
 
1.1%
5 875
 
1.0%
9 604
 
0.7%
Other values (9) 729
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 79994
89.7%
Uppercase Letter 8913
 
10.0%
Lowercase Letter 200
 
0.2%
Open Punctuation 25
 
< 0.1%
Close Punctuation 25
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 64653
80.8%
2 3395
 
4.2%
1 3133
 
3.9%
8 3011
 
3.8%
3 1528
 
1.9%
4 1392
 
1.7%
6 949
 
1.2%
5 875
 
1.1%
9 604
 
0.8%
7 454
 
0.6%
Lowercase Letter
ValueCountFrequency (%)
n 50
25.0%
d 50
25.0%
e 50
25.0%
f 25
12.5%
i 25
12.5%
Uppercase Letter
ValueCountFrequency (%)
C 8888
99.7%
U 25
 
0.3%
Open Punctuation
ValueCountFrequency (%)
[ 25
100.0%
Close Punctuation
ValueCountFrequency (%)
] 25
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80044
89.8%
Latin 9113
 
10.2%

Most frequent character per script

Common
ValueCountFrequency (%)
0 64653
80.8%
2 3395
 
4.2%
1 3133
 
3.9%
8 3011
 
3.8%
3 1528
 
1.9%
4 1392
 
1.7%
6 949
 
1.2%
5 875
 
1.1%
9 604
 
0.8%
7 454
 
0.6%
Other values (2) 50
 
0.1%
Latin
ValueCountFrequency (%)
C 8888
97.5%
n 50
 
0.5%
d 50
 
0.5%
e 50
 
0.5%
U 25
 
0.3%
f 25
 
0.3%
i 25
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 89157
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 64653
72.5%
C 8888
 
10.0%
2 3395
 
3.8%
1 3133
 
3.5%
8 3011
 
3.4%
3 1528
 
1.7%
4 1392
 
1.6%
6 949
 
1.1%
5 875
 
1.0%
9 604
 
0.7%
Other values (9) 729
 
0.8%

몰수
Real number (ℝ)

HIGH CORRELATION  MISSING  SKEWED 

Distinct906
Distinct (%)10.2%
Missing1154
Missing (%)11.5%
Infinite0
Infinite (%)0.0%
Mean3.7400003
Minimum0
Maximum4725
Zeros69
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T07:17:50.978517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.02
Q10.4
median1
Q32
95-th percentile8
Maximum4725
Range4725
Interquartile range (IQR)1.6

Descriptive statistics

Standard deviation51.407837
Coefficient of variation (CV)13.74541
Kurtosis8045.5325
Mean3.7400003
Median Absolute Deviation (MAD)0.8
Skewness87.661061
Sum33084.042
Variance2642.7657
MonotonicityNot monotonic
2023-12-13T07:17:51.091066image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.0 1705
17.1%
3.0 1196
 
12.0%
2.0 545
 
5.5%
0.5 370
 
3.7%
4.0 183
 
1.8%
0.05 163
 
1.6%
0.2 148
 
1.5%
0.02 138
 
1.4%
0.1 119
 
1.2%
0.48 112
 
1.1%
Other values (896) 4167
41.7%
(Missing) 1154
 
11.5%
ValueCountFrequency (%)
0.0 69
0.7%
0.0005 1
 
< 0.1%
0.001 6
 
0.1%
0.00135 1
 
< 0.1%
0.0015 1
 
< 0.1%
0.002 9
 
0.1%
0.0025 2
 
< 0.1%
0.003 5
 
0.1%
0.004 8
 
0.1%
0.005 27
 
0.3%
ValueCountFrequency (%)
4725.0 1
< 0.1%
123.0 1
< 0.1%
101.5 2
< 0.1%
101.2 1
< 0.1%
100.0 2
< 0.1%
99.84 1
< 0.1%
99.06 2
< 0.1%
99.0 2
< 0.1%
98.3 1
< 0.1%
98.25 1
< 0.1%

Interactions

2023-12-13T07:17:48.132568image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:17:46.884031image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:17:47.350002image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:17:47.738858image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:17:48.224883image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:17:47.001109image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:17:47.449817image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:17:47.844740image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:17:48.310175image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:17:47.122516image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:17:47.536308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:17:47.939450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:17:48.406476image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:17:47.247199image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:17:47.646243image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:17:48.038261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T07:17:51.189520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
화학식번호상의몰계수에이비엑스코드원소순번몰수
화학식번호1.0000.2610.1070.1360.000
상의몰계수0.2611.0000.0540.0000.000
에이비엑스코드0.1070.0541.0000.3790.000
원소순번0.1360.0000.3791.0000.000
몰수0.0000.0000.0000.0001.000
2023-12-13T07:17:51.280256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
화학식번호상의몰계수원소순번몰수에이비엑스코드
화학식번호1.000-0.396-0.2900.1140.047
상의몰계수-0.3961.0000.187-0.1220.030
원소순번-0.2900.1871.000-0.5130.246
몰수0.114-0.122-0.5131.0000.000
에이비엑스코드0.0470.0300.2460.0001.000

Missing values

2023-12-13T07:17:48.524388image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T07:17:48.657349image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T07:17:48.757884image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

소재시퀀스화학식번호상의몰계수에이비엑스코드원소순번구성원소몰수
63712M11421711.0C1C0102000033.0
49367M11701111.0B1C0000000221.0
21661M10224711.0B1C0000000220.89
69199M11609011.0B1C0000000260.485
82138M12418110.5C1<NA><NA>
775M10174211.0B1C0000000220.88
49307M11705711.0B2C0000000730.0
3112M10403111.0B2C0000000412.0
91511M12286911.0B2C0102000060.1
56888M11621011.0B3C0000000220.3475
소재시퀀스화학식번호상의몰계수에이비엑스코드원소순번구성원소몰수
84646M12206311.0B2C0000000130.5
24867M10788610.05A1C0000000821.0
50798M11709911.0C1C0000000082.07
92370M1222511<NA>A1C0000000250.352
69765M11882811.0B1C0000000130.5
34433M11373611.0B5C0000000400.525
2659M10393211.0B2C0000000510.04
42184M11443511.0B3C0000000220.43
91961M12143911.0A4C0000000130.02
88693M12059721.0B2C0000000081.0