Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells3416
Missing cells (%)5.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory576.2 KiB
Average record size in memory59.0 B

Variable types

Text2
Numeric2
Categorical2

Dataset

Description한국세라믹기술원 세라믹소재정보은행의 소재 조성 정보입니다.
Author한국세라믹기술원
URLhttps://www.data.go.kr/data/15072095/fileData.do

Alerts

원소 has 603 (6.0%) missing valuesMissing
데이터 has 2813 (28.1%) missing valuesMissing
데이터 is highly skewed (γ1 = 84.7761759)Skewed
데이터 has 624 (6.2%) zerosZeros

Reproduction

Analysis started2023-12-12 23:33:10.403011
Analysis finished2023-12-12 23:33:11.731526
Duration1.33 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct4639
Distinct (%)46.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T08:33:11.901077image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length14
Mean length14
Min length14

Characters and Unicode

Total characters140000
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1922 ?
Unique (%)19.2%

Sample

1st rowMAT-1000008848
2nd rowMAT-1000008583
3rd rowMAT-1000006056
4th rowMAT-1000002115
5th rowMAT-1000002186
ValueCountFrequency (%)
mat-1000002834 12
 
0.1%
mat-1000002846 11
 
0.1%
mat-1000006912 10
 
0.1%
mat-1000007806 10
 
0.1%
mat-1000002880 10
 
0.1%
mat-1000007813 10
 
0.1%
mat-1000002863 9
 
0.1%
mat-1000003471 9
 
0.1%
mat-1000002874 9
 
0.1%
mat-1000002843 9
 
0.1%
Other values (4629) 9901
99.0%
2023-12-13T08:33:12.294169image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 53298
38.1%
1 12918
 
9.2%
M 10000
 
7.1%
A 10000
 
7.1%
T 10000
 
7.1%
- 10000
 
7.1%
8 4724
 
3.4%
2 4669
 
3.3%
7 4634
 
3.3%
6 4192
 
3.0%
Other values (4) 15565
 
11.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100000
71.4%
Uppercase Letter 30000
 
21.4%
Dash Punctuation 10000
 
7.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 53298
53.3%
1 12918
 
12.9%
8 4724
 
4.7%
2 4669
 
4.7%
7 4634
 
4.6%
6 4192
 
4.2%
3 4058
 
4.1%
4 3984
 
4.0%
9 3975
 
4.0%
5 3548
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
M 10000
33.3%
A 10000
33.3%
T 10000
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 110000
78.6%
Latin 30000
 
21.4%

Most frequent character per script

Common
ValueCountFrequency (%)
0 53298
48.5%
1 12918
 
11.7%
- 10000
 
9.1%
8 4724
 
4.3%
2 4669
 
4.2%
7 4634
 
4.2%
6 4192
 
3.8%
3 4058
 
3.7%
4 3984
 
3.6%
9 3975
 
3.6%
Latin
ValueCountFrequency (%)
M 10000
33.3%
A 10000
33.3%
T 10000
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 140000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 53298
38.1%
1 12918
 
9.2%
M 10000
 
7.1%
A 10000
 
7.1%
T 10000
 
7.1%
- 10000
 
7.1%
8 4724
 
3.4%
2 4669
 
3.3%
7 4634
 
3.3%
6 4192
 
3.0%
Other values (4) 15565
 
11.1%

순번
Real number (ℝ)

Distinct50
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.4351
Minimum1
Maximum51
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T08:33:12.570546image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q37
95-th percentile13
Maximum51
Range50
Interquartile range (IQR)5

Descriptive statistics

Standard deviation5.1692998
Coefficient of variation (CV)0.95109561
Kurtosis21.77003
Mean5.4351
Median Absolute Deviation (MAD)3
Skewness3.7215188
Sum54351
Variance26.72166
MonotonicityNot monotonic
2023-12-13T08:33:12.743968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 2316
23.2%
4 1838
18.4%
5 1233
12.3%
7 788
 
7.9%
3 782
 
7.8%
6 737
 
7.4%
9 474
 
4.7%
8 464
 
4.6%
10 245
 
2.5%
2 212
 
2.1%
Other values (40) 911
 
9.1%
ValueCountFrequency (%)
1 2316
23.2%
2 212
 
2.1%
3 782
 
7.8%
4 1838
18.4%
5 1233
12.3%
6 737
 
7.4%
7 788
 
7.9%
8 464
 
4.6%
9 474
 
4.7%
10 245
 
2.5%
ValueCountFrequency (%)
51 1
 
< 0.1%
50 2
 
< 0.1%
49 3
< 0.1%
48 4
< 0.1%
47 6
0.1%
46 6
0.1%
45 5
0.1%
44 4
< 0.1%
43 5
0.1%
42 4
< 0.1%

원소
Text

MISSING 

Distinct564
Distinct (%)6.0%
Missing603
Missing (%)6.0%
Memory size156.2 KiB
2023-12-13T08:33:13.041708image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length30
Median length28
Mean length6.7052251
Min length1

Characters and Unicode

Total characters63009
Distinct characters80
Distinct categories12 ?
Distinct scripts4 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique192 ?
Unique (%)2.0%

Sample

1st rowK2CO3
2nd rowYAG
3rd rowCdO
4th rowYO3/2
5th rowCuO
ValueCountFrequency (%)
tio2 928
 
8.6%
pbo 499
 
4.6%
nb2o5 490
 
4.5%
zro2 382
 
3.5%
baco3 320
 
3.0%
bi2o3 320
 
3.0%
cuo 290
 
2.7%
zno 264
 
2.5%
na2co3 259
 
2.4%
mgo 225
 
2.1%
Other values (561) 6793
63.1%
2023-12-13T08:33:13.484520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 8475
 
13.5%
O 8360
 
13.3%
2 5767
 
9.2%
3 4332
 
6.9%
C 4070
 
6.5%
i 2838
 
4.5%
1 2416
 
3.8%
a 1893
 
3.0%
N 1456
 
2.3%
T 1431
 
2.3%
Other values (70) 21971
34.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 23794
37.8%
Uppercase Letter 22340
35.5%
Lowercase Letter 12521
19.9%
Other Punctuation 1877
 
3.0%
Space Separator 1427
 
2.3%
Close Punctuation 301
 
0.5%
Open Punctuation 301
 
0.5%
Dash Punctuation 243
 
0.4%
Math Symbol 195
 
0.3%
Other Letter 8
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 2838
22.7%
a 1893
15.1%
b 1403
11.2%
n 1046
 
8.4%
r 1023
 
8.2%
l 676
 
5.4%
e 658
 
5.3%
u 555
 
4.4%
g 474
 
3.8%
o 366
 
2.9%
Other values (17) 1589
12.7%
Uppercase Letter
ValueCountFrequency (%)
O 8360
37.4%
C 4070
18.2%
N 1456
 
6.5%
T 1431
 
6.4%
B 1139
 
5.1%
Z 874
 
3.9%
S 846
 
3.8%
P 735
 
3.3%
M 724
 
3.2%
L 609
 
2.7%
Other values (14) 2096
 
9.4%
Decimal Number
ValueCountFrequency (%)
0 8475
35.6%
2 5767
24.2%
3 4332
18.2%
1 2416
 
10.2%
5 1406
 
5.9%
4 750
 
3.2%
9 209
 
0.9%
6 168
 
0.7%
8 143
 
0.6%
7 128
 
0.5%
Other Punctuation
ValueCountFrequency (%)
, 1331
70.9%
. 344
 
18.3%
/ 146
 
7.8%
% 20
 
1.1%
· 19
 
1.0%
: 10
 
0.5%
" 4
 
0.2%
* 3
 
0.2%
Close Punctuation
ValueCountFrequency (%)
) 292
97.0%
] 9
 
3.0%
Open Punctuation
ValueCountFrequency (%)
( 292
97.0%
[ 9
 
3.0%
Math Symbol
ValueCountFrequency (%)
+ 152
77.9%
~ 43
 
22.1%
Space Separator
ValueCountFrequency (%)
1427
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 243
100.0%
Other Letter
ValueCountFrequency (%)
8
100.0%
Other Number
ValueCountFrequency (%)
1
100.0%
Control
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 34836
55.3%
Common 28140
44.7%
Greek 25
 
< 0.1%
Hangul 8
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
O 8360
24.0%
C 4070
 
11.7%
i 2838
 
8.1%
a 1893
 
5.4%
N 1456
 
4.2%
T 1431
 
4.1%
b 1403
 
4.0%
B 1139
 
3.3%
n 1046
 
3.0%
r 1023
 
2.9%
Other values (39) 10177
29.2%
Common
ValueCountFrequency (%)
0 8475
30.1%
2 5767
20.5%
3 4332
15.4%
1 2416
 
8.6%
1427
 
5.1%
5 1406
 
5.0%
, 1331
 
4.7%
4 750
 
2.7%
. 344
 
1.2%
) 292
 
1.0%
Other values (18) 1600
 
5.7%
Greek
ValueCountFrequency (%)
α 22
88.0%
δ 3
 
12.0%
Hangul
ValueCountFrequency (%)
8
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 62956
99.9%
None 45
 
0.1%
Compat Jamo 8
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 8475
 
13.5%
O 8360
 
13.3%
2 5767
 
9.2%
3 4332
 
6.9%
C 4070
 
6.5%
i 2838
 
4.5%
1 2416
 
3.8%
a 1893
 
3.0%
N 1456
 
2.3%
T 1431
 
2.3%
Other values (65) 21918
34.8%
None
ValueCountFrequency (%)
α 22
48.9%
· 19
42.2%
δ 3
 
6.7%
1
 
2.2%
Compat Jamo
ValueCountFrequency (%)
8
100.0%

데이터
Real number (ℝ)

MISSING  SKEWED  ZEROS 

Distinct771
Distinct (%)10.7%
Missing2813
Missing (%)28.1%
Infinite0
Infinite (%)0.0%
Mean589167.15
Minimum0
Maximum4.2342342 × 109
Zeros624
Zeros (%)6.2%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T08:33:13.670900image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.3
median1
Q310
95-th percentile96
Maximum4.2342342 × 109
Range4.2342342 × 109
Interquartile range (IQR)9.7

Descriptive statistics

Standard deviation49946039
Coefficient of variation (CV)84.773971
Kurtosis7187
Mean589167.15
Median Absolute Deviation (MAD)1
Skewness84.776176
Sum4.2343443 × 109
Variance2.4946068 × 1015
MonotonicityNot monotonic
2023-12-13T08:33:13.826400image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.0 1070
 
10.7%
0.0 624
 
6.2%
2.0 293
 
2.9%
50.0 287
 
2.9%
100.0 262
 
2.6%
0.5 243
 
2.4%
5.0 213
 
2.1%
10.0 210
 
2.1%
0.2 202
 
2.0%
0.1 158
 
1.6%
Other values (761) 3625
36.2%
(Missing) 2813
28.1%
ValueCountFrequency (%)
0.0 624
6.2%
3.5e-06 1
 
< 0.1%
0.0005 1
 
< 0.1%
0.001 3
 
< 0.1%
0.0018 1
 
< 0.1%
0.002 3
 
< 0.1%
0.0025 6
 
0.1%
0.004 5
 
0.1%
0.005 29
 
0.3%
0.0055 1
 
< 0.1%
ValueCountFrequency (%)
4234234234.0 1
 
< 0.1%
500.0 2
 
< 0.1%
400.0 1
 
< 0.1%
200.0 6
 
0.1%
150.0 1
 
< 0.1%
100.0 262
2.6%
99.99 2
 
< 0.1%
99.95 2
 
< 0.1%
99.91 3
 
< 0.1%
99.9 6
 
0.1%

단위
Categorical

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
F01002
3113 
F01001
2605 
F01003
2297 
<NA>
1839 
F01004
 
74

Length

Max length6
Median length6
Mean length5.6322
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd rowF01002
4th rowF01001
5th rowF01001

Common Values

ValueCountFrequency (%)
F01002 3113
31.1%
F01001 2605
26.1%
F01003 2297
23.0%
<NA> 1839
18.4%
F01004 74
 
0.7%
F01006 72
 
0.7%

Length

2023-12-13T08:33:13.962664image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T08:33:14.083187image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
f01002 3113
31.1%
f01001 2605
26.1%
f01003 2297
23.0%
na 1839
18.4%
f01004 74
 
0.7%
f01006 72
 
0.7%

조성구분
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1000
6822 
3000
1682 
7000
1496 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1000
2nd row1000
3rd row3000
4th row3000
5th row3000

Common Values

ValueCountFrequency (%)
1000 6822
68.2%
3000 1682
 
16.8%
7000 1496
 
15.0%

Length

2023-12-13T08:33:14.211852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T08:33:14.321613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1000 6822
68.2%
3000 1682
 
16.8%
7000 1496
 
15.0%

Interactions

2023-12-13T08:33:10.994305image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:33:10.779490image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:33:11.073881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:33:10.909516image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T08:33:14.401026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번데이터단위조성구분
순번1.0000.0000.3040.389
데이터0.0001.0000.0000.000
단위0.3040.0001.0000.457
조성구분0.3890.0000.4571.000
2023-12-13T08:33:14.495692image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
조성구분단위
조성구분1.0000.389
단위0.3891.000
2023-12-13T08:33:14.593138image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번데이터단위조성구분
순번1.000-0.2110.1330.264
데이터-0.2111.0000.0000.000
단위0.1330.0001.0000.389
조성구분0.2640.0000.3891.000

Missing values

2023-12-13T08:33:11.181266image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T08:33:11.300885image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T08:33:11.677343image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

소재시퀀스순번원소데이터단위조성구분
10759MAT-10000088486K2CO3<NA><NA>1000
15003MAT-10000085835YAG10.0<NA>1000
12865MAT-10000060564CdO1.2F010023000
1146MAT-10000021153YO3/22.0F010013000
1097MAT-10000021869CuO0.1F010013000
15940MAT-10000060009NiO, WO3, ZrO2, TiO250.0F010011000
6565MAT-10000031061B4C98.0F010021000
13177MAT-10000081737Ta2O5<NA><NA>1000
2756MAT-10000024414Nb2O51.0F010031000
15797MAT-100000692013PbO1.0F010023000
소재시퀀스순번원소데이터단위조성구분
3059MAT-10000026264ZrO20.0F010011000
16895MAT-10000084647SnCl4.5H2O0.005F010011000
7777MAT-100000383610F127 ((EO)106(PO)70(EO)106)0.02<NA>1000
15912MAT-10000059367NiO, WO3, ZrO2, TiO250.0F010011000
6772MAT-10000031893Eu2+2.0F010013000
16354MAT-10000073274ZrO2<NA>F010031000
10222MAT-10000045084Nb2O51.0F010031000
1702MAT-10000022739CuO0.1F010013000
4000MAT-10000027688C0101000083.77F010027000
5538MAT-10000030164Y2O35.0F010021000