Overview

Dataset statistics

Number of variables3
Number of observations1608
Missing cells388
Missing cells (%)8.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory37.8 KiB
Average record size in memory24.1 B

Variable types

Text3

Dataset

Description국립중앙과학관 홈페이지 과학학습콘텐츠에서 제공하는 관련 사이트 목록입니다.
Author과학기술정보통신부 국립중앙과학관
URLhttps://www.data.go.kr/data/15067815/fileData.do

Alerts

사이트명 has 388 (24.1%) missing valuesMissing
고유 아이디 has unique valuesUnique

Reproduction

Analysis started2023-12-12 05:44:48.339787
Analysis finished2023-12-12 05:44:48.803881
Duration0.46 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

고유 아이디
Text

UNIQUE 

Distinct1608
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size12.7 KiB
2023-12-12T14:44:49.207426image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length3
Mean length3.8152985
Min length2

Characters and Unicode

Total characters6135
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1608 ?
Unique (%)100.0%

Sample

1st row435
2nd row442
3rd row447
4th row449
5th row459
ValueCountFrequency (%)
435 1
 
0.1%
5,028 1
 
0.1%
5,000 1
 
0.1%
2,880 1
 
0.1%
2,879 1
 
0.1%
2,878 1
 
0.1%
2,877 1
 
0.1%
2,876 1
 
0.1%
2,873 1
 
0.1%
1,575 1
 
0.1%
Other values (1598) 1598
99.4%
2023-12-12T14:44:49.798318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 994
16.2%
, 692
11.3%
2 646
10.5%
5 596
9.7%
3 525
8.6%
4 521
8.5%
9 466
7.6%
8 448
7.3%
7 426
6.9%
6 418
6.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 5443
88.7%
Other Punctuation 692
 
11.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 994
18.3%
2 646
11.9%
5 596
10.9%
3 525
9.6%
4 521
9.6%
9 466
8.6%
8 448
8.2%
7 426
7.8%
6 418
7.7%
0 403
7.4%
Other Punctuation
ValueCountFrequency (%)
, 692
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 6135
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 994
16.2%
, 692
11.3%
2 646
10.5%
5 596
9.7%
3 525
8.6%
4 521
8.5%
9 466
7.6%
8 448
7.3%
7 426
6.9%
6 418
6.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6135
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 994
16.2%
, 692
11.3%
2 646
10.5%
5 596
9.7%
3 525
8.6%
4 521
8.5%
9 466
7.6%
8 448
7.3%
7 426
6.9%
6 418
6.8%
Distinct723
Distinct (%)45.0%
Missing0
Missing (%)0.0%
Memory size12.7 KiB
2023-12-12T14:44:50.323915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length3
Mean length3.7002488
Min length1

Characters and Unicode

Total characters5950
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique420 ?
Unique (%)26.1%

Sample

1st row189
2nd row188
3rd row186
4th row185
5th row183
ValueCountFrequency (%)
181 15
 
0.9%
1,217 12
 
0.7%
388 11
 
0.7%
314 10
 
0.6%
188 10
 
0.6%
313 10
 
0.6%
378 9
 
0.6%
309 9
 
0.6%
387 8
 
0.5%
1,078 8
 
0.5%
Other values (713) 1506
93.7%
2023-12-12T14:44:51.008576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 1249
21.0%
3 760
12.8%
2 711
11.9%
0 665
11.2%
, 617
10.4%
8 362
 
6.1%
4 354
 
5.9%
9 339
 
5.7%
7 315
 
5.3%
5 298
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 5333
89.6%
Other Punctuation 617
 
10.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 1249
23.4%
3 760
14.3%
2 711
13.3%
0 665
12.5%
8 362
 
6.8%
4 354
 
6.6%
9 339
 
6.4%
7 315
 
5.9%
5 298
 
5.6%
6 280
 
5.3%
Other Punctuation
ValueCountFrequency (%)
, 617
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5950
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 1249
21.0%
3 760
12.8%
2 711
11.9%
0 665
11.2%
, 617
10.4%
8 362
 
6.1%
4 354
 
5.9%
9 339
 
5.7%
7 315
 
5.3%
5 298
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5950
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 1249
21.0%
3 760
12.8%
2 711
11.9%
0 665
11.2%
, 617
10.4%
8 362
 
6.1%
4 354
 
5.9%
9 339
 
5.7%
7 315
 
5.3%
5 298
 
5.0%

사이트명
Text

MISSING 

Distinct809
Distinct (%)66.3%
Missing388
Missing (%)24.1%
Memory size12.7 KiB
2023-12-12T14:44:51.375900image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length124
Median length57
Mean length14.279508
Min length4

Characters and Unicode

Total characters17421
Distinct characters504
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique734 ?
Unique (%)60.2%

Sample

1st row위키피디아 - RCA Records
2nd row위키피디아 - Extended play
3rd row위키피디아 - Edison Records
4th row위키피디아 - Edison Records
5th row위키피디아 - Theremin
ValueCountFrequency (%)
411
 
13.3%
위키피디아 234
 
7.5%
두산백과 138
 
4.5%
한국위키피디아 80
 
2.6%
문화재청 42
 
1.4%
네이버지식백과 41
 
1.3%
문화콘텐츠닷컴 31
 
1.0%
한국민족문화대백과 30
 
1.0%
향토문화대전 28
 
0.9%
youtube 28
 
0.9%
Other values (1221) 2038
65.7%
2023-12-12T14:44:51.928945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1882
 
10.8%
- 874
 
5.0%
446
 
2.6%
425
 
2.4%
419
 
2.4%
390
 
2.2%
375
 
2.2%
375
 
2.2%
342
 
2.0%
e 340
 
2.0%
Other values (494) 11553
66.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 10137
58.2%
Lowercase Letter 3105
 
17.8%
Space Separator 1882
 
10.8%
Uppercase Letter 1208
 
6.9%
Dash Punctuation 874
 
5.0%
Decimal Number 122
 
0.7%
Open Punctuation 35
 
0.2%
Close Punctuation 35
 
0.2%
Other Punctuation 23
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
446
 
4.4%
425
 
4.2%
419
 
4.1%
390
 
3.8%
375
 
3.7%
375
 
3.7%
342
 
3.4%
274
 
2.7%
268
 
2.6%
235
 
2.3%
Other values (421) 6588
65.0%
Uppercase Letter
ValueCountFrequency (%)
D 149
12.3%
S 139
11.5%
L 132
10.9%
N 115
 
9.5%
M 86
 
7.1%
P 68
 
5.6%
C 65
 
5.4%
I 60
 
5.0%
R 53
 
4.4%
B 48
 
4.0%
Other values (16) 293
24.3%
Lowercase Letter
ValueCountFrequency (%)
e 340
11.0%
o 300
 
9.7%
a 272
 
8.8%
n 250
 
8.1%
r 249
 
8.0%
i 213
 
6.9%
t 186
 
6.0%
c 163
 
5.2%
u 160
 
5.2%
l 138
 
4.4%
Other values (14) 834
26.9%
Decimal Number
ValueCountFrequency (%)
0 41
33.6%
1 28
23.0%
3 11
 
9.0%
2 11
 
9.0%
8 9
 
7.4%
5 8
 
6.6%
6 7
 
5.7%
7 3
 
2.5%
4 3
 
2.5%
9 1
 
0.8%
Other Punctuation
ValueCountFrequency (%)
: 7
30.4%
' 5
21.7%
, 3
13.0%
/ 3
13.0%
& 1
 
4.3%
· 1
 
4.3%
1
 
4.3%
? 1
 
4.3%
. 1
 
4.3%
Space Separator
ValueCountFrequency (%)
1882
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 874
100.0%
Open Punctuation
ValueCountFrequency (%)
( 35
100.0%
Close Punctuation
ValueCountFrequency (%)
) 35
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 10137
58.2%
Latin 4313
24.8%
Common 2971
 
17.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
446
 
4.4%
425
 
4.2%
419
 
4.1%
390
 
3.8%
375
 
3.7%
375
 
3.7%
342
 
3.4%
274
 
2.7%
268
 
2.6%
235
 
2.3%
Other values (421) 6588
65.0%
Latin
ValueCountFrequency (%)
e 340
 
7.9%
o 300
 
7.0%
a 272
 
6.3%
n 250
 
5.8%
r 249
 
5.8%
i 213
 
4.9%
t 186
 
4.3%
c 163
 
3.8%
u 160
 
3.7%
D 149
 
3.5%
Other values (40) 2031
47.1%
Common
ValueCountFrequency (%)
1882
63.3%
- 874
29.4%
0 41
 
1.4%
( 35
 
1.2%
) 35
 
1.2%
1 28
 
0.9%
3 11
 
0.4%
2 11
 
0.4%
8 9
 
0.3%
5 8
 
0.3%
Other values (13) 37
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 10137
58.2%
ASCII 7282
41.8%
None 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1882
25.8%
- 874
 
12.0%
e 340
 
4.7%
o 300
 
4.1%
a 272
 
3.7%
n 250
 
3.4%
r 249
 
3.4%
i 213
 
2.9%
t 186
 
2.6%
c 163
 
2.2%
Other values (61) 2553
35.1%
Hangul
ValueCountFrequency (%)
446
 
4.4%
425
 
4.2%
419
 
4.1%
390
 
3.8%
375
 
3.7%
375
 
3.7%
342
 
3.4%
274
 
2.7%
268
 
2.6%
235
 
2.3%
Other values (421) 6588
65.0%
None
ValueCountFrequency (%)
· 1
50.0%
1
50.0%

Missing values

2023-12-12T14:44:48.667964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:44:48.764722image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

고유 아이디고유 아이디 2사이트명
0435189위키피디아 - RCA Records
1442188위키피디아 - Extended play
2447186위키피디아 - Edison Records
3449185위키피디아 - Edison Records
4459183위키피디아 - Theremin
5463181위키피디아 - RCA Records
6471181위키피디아 - Extended play
7473181한국위키피디아 - 자기 테이프
8480177위키피디아 - Gramophone Company
9501256두산백과 - 컴퓨터
고유 아이디고유 아이디 2사이트명
15982,9341,375<NA>
15992,9581,399<NA>
16002,9651,405<NA>
16012,9681,408<NA>
16022,9721,413<NA>
16032,9731,414<NA>
16042,9741,415<NA>
16052,9761,417<NA>
16062,9891,430<NA>
16072,9941,435<NA>