Overview

Dataset statistics

Number of variables7
Number of observations9978
Missing cells410
Missing cells (%)0.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory545.8 KiB
Average record size in memory56.0 B

Variable types

Text3
Categorical3
Boolean1

Dataset

Description국립중앙과학관 홈페이지에 있는 과학학습콘텐츠의 전시보유품 상세항목입니다.
Author과학기술정보통신부 국립중앙과학관
URLhttps://www.data.go.kr/data/15067823/fileData.do

Alerts

공개여부 has constant value ""Constant
이름 is highly overall correlated with 전시타입High correlation
전시타입 is highly overall correlated with 이름 and 1 other fieldsHigh correlation
등록자 아이디 is highly overall correlated with 전시타입High correlation
전시타입 is highly imbalanced (99.7%)Imbalance
등록자 아이디 is highly imbalanced (67.5%)Imbalance
공개여부 has 410 (4.1%) missing valuesMissing
상세 아이디 has unique valuesUnique

Reproduction

Analysis started2023-12-12 00:54:54.303321
Analysis finished2023-12-12 00:54:55.297800
Duration0.99 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

상세 아이디
Text

UNIQUE 

Distinct9978
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size78.1 KiB
2023-12-12T09:54:55.761359image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length5
Mean length5.1871116
Min length1

Characters and Unicode

Total characters51757
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9978 ?
Unique (%)100.0%

Sample

1st row14
2nd row15
3rd row24
4th row25
5th row28
ValueCountFrequency (%)
14 1
 
< 0.1%
7,041 1
 
< 0.1%
7,029 1
 
< 0.1%
7,030 1
 
< 0.1%
7,080 1
 
< 0.1%
7,037 1
 
< 0.1%
7,038 1
 
< 0.1%
7,039 1
 
< 0.1%
7,040 1
 
< 0.1%
7,042 1
 
< 0.1%
Other values (9968) 9968
99.9%
2023-12-12T09:54:56.419858image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
, 9150
17.7%
1 7518
14.5%
4 4971
9.6%
5 4839
9.3%
3 4588
8.9%
6 3989
7.7%
2 3948
7.6%
8 3336
 
6.4%
7 3270
 
6.3%
9 3104
 
6.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 42607
82.3%
Other Punctuation 9150
 
17.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 7518
17.6%
4 4971
11.7%
5 4839
11.4%
3 4588
10.8%
6 3989
9.4%
2 3948
9.3%
8 3336
7.8%
7 3270
7.7%
9 3104
7.3%
0 3044
7.1%
Other Punctuation
ValueCountFrequency (%)
, 9150
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 51757
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
, 9150
17.7%
1 7518
14.5%
4 4971
9.6%
5 4839
9.3%
3 4588
8.9%
6 3989
7.7%
2 3948
7.6%
8 3336
 
6.4%
7 3270
 
6.3%
9 3104
 
6.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 51757
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
, 9150
17.7%
1 7518
14.5%
4 4971
9.6%
5 4839
9.3%
3 4588
8.9%
6 3989
7.7%
2 3948
7.6%
8 3336
 
6.4%
7 3270
 
6.3%
9 3104
 
6.0%
Distinct2564
Distinct (%)25.7%
Missing0
Missing (%)0.0%
Memory size78.1 KiB
2023-12-12T09:54:56.931430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length5
Mean length4.19092
Min length1

Characters and Unicode

Total characters41817
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique302 ?
Unique (%)3.0%

Sample

1st row5
2nd row5
3rd row8
4th row8
5th row10
ValueCountFrequency (%)
1,583 10
 
0.1%
1,582 10
 
0.1%
1,588 9
 
0.1%
1,587 9
 
0.1%
428 8
 
0.1%
429 8
 
0.1%
1,497 7
 
0.1%
588 7
 
0.1%
1,507 7
 
0.1%
1,493 7
 
0.1%
Other values (2554) 9896
99.2%
2023-12-12T09:54:57.669084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 6848
16.4%
, 6059
14.5%
2 4101
9.8%
4 3736
8.9%
3 3549
8.5%
5 3435
8.2%
8 3046
7.3%
9 2841
6.8%
7 2815
6.7%
0 2744
6.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 35758
85.5%
Other Punctuation 6059
 
14.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 6848
19.2%
2 4101
11.5%
4 3736
10.4%
3 3549
9.9%
5 3435
9.6%
8 3046
8.5%
9 2841
7.9%
7 2815
7.9%
0 2744
7.7%
6 2643
 
7.4%
Other Punctuation
ValueCountFrequency (%)
, 6059
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 41817
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 6848
16.4%
, 6059
14.5%
2 4101
9.8%
4 3736
8.9%
3 3549
8.5%
5 3435
8.2%
8 3046
7.3%
9 2841
6.8%
7 2815
6.7%
0 2744
6.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 41817
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 6848
16.4%
, 6059
14.5%
2 4101
9.8%
4 3736
8.9%
3 3549
8.5%
5 3435
8.2%
8 3046
7.3%
9 2841
6.8%
7 2815
6.7%
0 2744
6.6%

전시타입
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.1 KiB
E
9976 
L
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowE
2nd rowE
3rd rowE
4th rowE
5th rowE

Common Values

ValueCountFrequency (%)
E 9976
> 99.9%
L 2
 
< 0.1%

Length

2023-12-12T09:54:57.862926image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T09:54:57.962431image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
e 9976
> 99.9%
l 2
 
< 0.1%

이름
Categorical

HIGH CORRELATION 

Distinct42
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size78.1 KiB
크기
1338 
국적
1177 
제조연대
1127 
학명
932 
제조사
927 
Other values (37)
4477 

Length

Max length7
Median length2
Mean length2.6309882
Min length1

Unique

Unique20 ?
Unique (%)0.2%

Sample

1st row국적
2nd row재질
3rd row국적
4th row재질
5th row국적

Common Values

ValueCountFrequency (%)
크기 1338
13.4%
국적 1177
11.8%
제조연대 1127
11.3%
학명 932
9.3%
제조사 927
9.3%
재질 855
8.6%
용도/기능 733
7.3%
과명 681
6.8%
목명 653
6.5%
영명 579
5.8%
Other values (32) 976
9.8%

Length

2023-12-12T09:54:58.067931image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
크기 1338
13.4%
국적 1177
11.8%
제조연대 1127
11.3%
학명 932
9.3%
제조사 927
9.3%
재질 855
8.6%
용도/기능 733
7.3%
과명 681
6.8%
목명 653
6.5%
영명 579
5.8%
Other values (31) 976
9.8%
Distinct3799
Distinct (%)38.1%
Missing0
Missing (%)0.0%
Memory size78.1 KiB
2023-12-12T09:54:58.391994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length111
Median length100
Mean length8.4368611
Min length1

Characters and Unicode

Total characters84183
Distinct characters782
Distinct categories12 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3193 ?
Unique (%)32.0%

Sample

1st row한국
2nd row기타
3rd row한국
4th row기타
5th row이집트
ValueCountFrequency (%)
한국 976
 
6.2%
2011년 206
 
1.3%
141230 203
 
1.3%
아미랜드 171
 
1.1%
㈜전시과학 148
 
0.9%
146
 
0.9%
콩과 141
 
0.9%
콩목 141
 
0.9%
2009년 138
 
0.9%
0.2kg 121
 
0.8%
Other values (6097) 13467
84.9%
2023-12-12T09:54:59.011411image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6226
 
7.4%
0 4099
 
4.9%
1 2758
 
3.3%
2 2737
 
3.3%
a 2587
 
3.1%
e 2127
 
2.5%
i 1918
 
2.3%
s 1747
 
2.1%
n 1530
 
1.8%
o 1510
 
1.8%
Other values (772) 56944
67.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 31548
37.5%
Lowercase Letter 23023
27.3%
Decimal Number 14785
17.6%
Space Separator 6226
 
7.4%
Uppercase Letter 2313
 
2.7%
Other Symbol 2105
 
2.5%
Other Punctuation 2049
 
2.4%
Math Symbol 1457
 
1.7%
Close Punctuation 299
 
0.4%
Open Punctuation 298
 
0.4%
Other values (2) 80
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1299
 
4.1%
1093
 
3.5%
1072
 
3.4%
884
 
2.8%
676
 
2.1%
625
 
2.0%
606
 
1.9%
519
 
1.6%
513
 
1.6%
488
 
1.5%
Other values (686) 23773
75.4%
Uppercase Letter
ValueCountFrequency (%)
M 201
 
8.7%
C 199
 
8.6%
G 197
 
8.5%
T 185
 
8.0%
S 181
 
7.8%
A 178
 
7.7%
P 166
 
7.2%
D 130
 
5.6%
B 114
 
4.9%
L 102
 
4.4%
Other values (17) 660
28.5%
Lowercase Letter
ValueCountFrequency (%)
a 2587
 
11.2%
e 2127
 
9.2%
i 1918
 
8.3%
s 1747
 
7.6%
n 1530
 
6.6%
o 1510
 
6.6%
r 1466
 
6.4%
l 1220
 
5.3%
m 1188
 
5.2%
t 1164
 
5.1%
Other values (16) 6566
28.5%
Decimal Number
ValueCountFrequency (%)
0 4099
27.7%
1 2758
18.7%
2 2737
18.5%
3 1073
 
7.3%
4 983
 
6.6%
5 873
 
5.9%
8 650
 
4.4%
9 634
 
4.3%
6 495
 
3.3%
7 483
 
3.3%
Other Punctuation
ValueCountFrequency (%)
, 1055
51.5%
. 659
32.2%
/ 160
 
7.8%
* 127
 
6.2%
: 21
 
1.0%
· 14
 
0.7%
& 7
 
0.3%
' 6
 
0.3%
Math Symbol
ValueCountFrequency (%)
× 1072
73.6%
~ 327
 
22.4%
+ 28
 
1.9%
> 22
 
1.5%
= 5
 
0.3%
< 3
 
0.2%
Other Symbol
ValueCountFrequency (%)
1068
50.7%
761
36.2%
275
 
13.1%
1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
6226
100.0%
Close Punctuation
ValueCountFrequency (%)
) 299
100.0%
Open Punctuation
ValueCountFrequency (%)
( 298
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 77
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 31823
37.8%
Common 27024
32.1%
Latin 25334
30.1%
Greek 2
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1299
 
4.1%
1093
 
3.4%
1072
 
3.4%
884
 
2.8%
676
 
2.1%
625
 
2.0%
606
 
1.9%
519
 
1.6%
513
 
1.6%
488
 
1.5%
Other values (687) 24048
75.6%
Latin
ValueCountFrequency (%)
a 2587
 
10.2%
e 2127
 
8.4%
i 1918
 
7.6%
s 1747
 
6.9%
n 1530
 
6.0%
o 1510
 
6.0%
r 1466
 
5.8%
l 1220
 
4.8%
m 1188
 
4.7%
t 1164
 
4.6%
Other values (42) 8877
35.0%
Common
ValueCountFrequency (%)
6226
23.0%
0 4099
15.2%
1 2758
10.2%
2 2737
10.1%
3 1073
 
4.0%
× 1072
 
4.0%
1068
 
4.0%
, 1055
 
3.9%
4 983
 
3.6%
5 873
 
3.2%
Other values (22) 5080
18.8%
Greek
ValueCountFrequency (%)
Φ 2
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 49442
58.7%
Hangul 31542
37.5%
CJK Compat 1830
 
2.2%
None 1363
 
1.6%
Compat Jamo 6
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6226
 
12.6%
0 4099
 
8.3%
1 2758
 
5.6%
2 2737
 
5.5%
a 2587
 
5.2%
e 2127
 
4.3%
i 1918
 
3.9%
s 1747
 
3.5%
n 1530
 
3.1%
o 1510
 
3.1%
Other values (69) 22203
44.9%
Hangul
ValueCountFrequency (%)
1299
 
4.1%
1093
 
3.5%
1072
 
3.4%
884
 
2.8%
676
 
2.1%
625
 
2.0%
606
 
1.9%
519
 
1.6%
513
 
1.6%
488
 
1.5%
Other values (682) 23767
75.4%
None
ValueCountFrequency (%)
× 1072
78.7%
275
 
20.2%
· 14
 
1.0%
Φ 2
 
0.1%
CJK Compat
ValueCountFrequency (%)
1068
58.4%
761
41.6%
1
 
0.1%
Compat Jamo
ValueCountFrequency (%)
3
50.0%
1
 
16.7%
1
 
16.7%
1
 
16.7%

공개여부
Boolean

CONSTANT  MISSING 

Distinct1
Distinct (%)< 0.1%
Missing410
Missing (%)4.1%
Memory size19.6 KiB
True
9568 
(Missing)
 
410
ValueCountFrequency (%)
True 9568
95.9%
(Missing) 410
 
4.1%
2023-12-12T09:54:59.155923image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

등록자 아이디
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct13
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size78.1 KiB
superadmin
6261 
scicenter
3366 
jnse
 
156
child
 
75
gisegen
 
49
Other values (8)
 
71

Length

Max length11
Median length10
Mean length9.5026057
Min length4

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowsuperadmin
2nd rowsuperadmin
3rd rowsuperadmin
4th rowsuperadmin
5th rowsuperadmin

Common Values

ValueCountFrequency (%)
superadmin 6261
62.7%
scicenter 3366
33.7%
jnse 156
 
1.6%
child 75
 
0.8%
gisegen 49
 
0.5%
cheongyang 27
 
0.3%
gogostar 18
 
0.2%
jejusi 8
 
0.1%
biocp 7
 
0.1%
cheonan 5
 
0.1%
Other values (3) 6
 
0.1%

Length

2023-12-12T09:54:59.295973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
superadmin 6261
62.7%
scicenter 3366
33.7%
jnse 156
 
1.6%
child 75
 
0.8%
gisegen 49
 
0.5%
cheongyang 27
 
0.3%
gogostar 18
 
0.2%
jejusi 8
 
0.1%
biocp 7
 
0.1%
cheonan 5
 
0.1%
Other values (3) 6
 
0.1%

Correlations

2023-12-12T09:54:59.417642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
전시타입이름등록자 아이디
전시타입1.0001.0000.740
이름1.0001.0000.869
등록자 아이디0.7400.8691.000
2023-12-12T09:54:59.538231image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록자 아이디이름전시타입
등록자 아이디1.0000.4810.706
이름0.4811.0000.998
전시타입0.7060.9981.000
2023-12-12T09:54:59.679900image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
전시타입이름등록자 아이디
전시타입1.0000.9980.706
이름0.9981.0000.481
등록자 아이디0.7060.4811.000

Missing values

2023-12-12T09:54:55.068727image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T09:54:55.206313image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

상세 아이디전시품 아이디전시타입이름항목값공개여부등록자 아이디
0145E국적한국Ysuperadmin
1155E재질기타Ysuperadmin
2248E국적한국Ysuperadmin
3258E재질기타Ysuperadmin
42810E국적이집트Ysuperadmin
52910E재질목제Ysuperadmin
63413E재질목제Ysuperadmin
73615E재질목제Ysuperadmin
83918E재질목제Ysuperadmin
94420E국적스웨덴Ysuperadmin
상세 아이디전시품 아이디전시타입이름항목값공개여부등록자 아이디
996817,5032,551E방언감생이Yscicenter
996917,5122,549E분포우리 나라 전 연안, 일본, 대만, 동중국해Yscicenter
997017,5252,546E학명Sebastes schlegeli HilgendorfYscicenter
997117,5262,546E영명Black rockfishYscicenter
997217,5272,546E방언우럭, 감펭이Yscicenter
997317,5282,546E분포국내 전 연해, 동중국해, 일본Yscicenter
997417,2822,512E국적한국Yscicenter
997517,2832,512E크기1100*4100*310Yscicenter
997617,2842,512E용도/기능어린이체험Yscicenter
997717,2852,512E제조연대2011Yscicenter