Overview

Dataset statistics

Number of variables3
Number of observations1000
Missing cells1001
Missing cells (%)33.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory25.5 KiB
Average record size in memory26.1 B

Variable types

Numeric1
Text1
Unsupported1

Dataset

Description현대한국구술자료관 구술자료와 관련된 연혁 정보
Author한국학중앙연구원
URLhttps://www.data.go.kr/data/15049074/fileData.do

Alerts

Unnamed: 2 has 1000 (100.0%) missing valuesMissing
번호 has unique valuesUnique
Unnamed: 2 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-12 15:33:17.551482
Analysis finished2023-12-12 15:33:17.935238
Duration0.38 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

번호
Real number (ℝ)

UNIQUE 

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1103.335
Minimum277
Maximum1760
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T00:33:18.035091image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum277
5-th percentile593.85
Q1799.75
median1141.5
Q31423.25
95-th percentile1623.05
Maximum1760
Range1483
Interquartile range (IQR)623.5

Descriptive statistics

Standard deviation365.84425
Coefficient of variation (CV)0.33158039
Kurtosis-0.96915083
Mean1103.335
Median Absolute Deviation (MAD)312
Skewness-0.15897445
Sum1103335
Variance133842.01
MonotonicityNot monotonic
2023-12-13T00:33:18.236146image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1639 1
 
0.1%
904 1
 
0.1%
891 1
 
0.1%
892 1
 
0.1%
893 1
 
0.1%
894 1
 
0.1%
895 1
 
0.1%
896 1
 
0.1%
897 1
 
0.1%
898 1
 
0.1%
Other values (990) 990
99.0%
ValueCountFrequency (%)
277 1
0.1%
278 1
0.1%
279 1
0.1%
280 1
0.1%
281 1
0.1%
282 1
0.1%
283 1
0.1%
284 1
0.1%
285 1
0.1%
286 1
0.1%
ValueCountFrequency (%)
1760 1
0.1%
1759 1
0.1%
1758 1
0.1%
1757 1
0.1%
1756 1
0.1%
1755 1
0.1%
1754 1
0.1%
1753 1
0.1%
1752 1
0.1%
1751 1
0.1%

연도
Text

Distinct591
Distinct (%)59.2%
Missing1
Missing (%)0.1%
Memory size7.9 KiB
2023-12-13T00:33:18.514252image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length32
Median length22
Mean length7.4574575
Min length1

Characters and Unicode

Total characters7450
Distinct characters40
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique446 ?
Unique (%)44.6%

Sample

1st row1972~1979
2nd row1983. 10~1989. 7
3rd row1989.10~
4th row1997~
5th row1997~
ValueCountFrequency (%)
368
 
23.8%
1973 23
 
1.5%
1988 22
 
1.4%
2008 22
 
1.4%
1991 20
 
1.3%
1997 20
 
1.3%
1990 19
 
1.2%
1974 19
 
1.2%
1956 18
 
1.2%
1979 17
 
1.1%
Other values (413) 1000
64.6%
2023-12-13T00:33:18.919086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9 1249
16.8%
1 1114
15.0%
0 991
13.3%
597
8.0%
2 570
7.7%
- 472
 
6.3%
. 449
 
6.0%
8 360
 
4.8%
7 337
 
4.5%
6 298
 
4.0%
Other values (30) 1013
13.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 5520
74.1%
Space Separator 671
 
9.0%
Other Punctuation 523
 
7.0%
Dash Punctuation 472
 
6.3%
Other Letter 148
 
2.0%
Math Symbol 89
 
1.2%
Lowercase Letter 20
 
0.3%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%
Uppercase Letter 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
9 1249
22.6%
1 1114
20.2%
0 991
18.0%
2 570
10.3%
8 360
 
6.5%
7 337
 
6.1%
6 298
 
5.4%
5 227
 
4.1%
3 220
 
4.0%
4 154
 
2.8%
Other Letter
ValueCountFrequency (%)
66
44.6%
39
26.4%
37
25.0%
2
 
1.4%
1
 
0.7%
1
 
0.7%
1
 
0.7%
1
 
0.7%
Lowercase Letter
ValueCountFrequency (%)
n 5
25.0%
p 4
20.0%
s 4
20.0%
b 4
20.0%
a 2
 
10.0%
y 1
 
5.0%
Other Punctuation
ValueCountFrequency (%)
. 449
85.9%
/ 65
 
12.4%
; 4
 
0.8%
& 4
 
0.8%
, 1
 
0.2%
Math Symbol
ValueCountFrequency (%)
~ 75
84.3%
10
 
11.2%
4
 
4.5%
Space Separator
ValueCountFrequency (%)
597
89.0%
  74
 
11.0%
Uppercase Letter
ValueCountFrequency (%)
J 1
50.0%
M 1
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 472
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Control
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 7280
97.7%
Hangul 148
 
2.0%
Latin 22
 
0.3%

Most frequent character per script

Common
ValueCountFrequency (%)
9 1249
17.2%
1 1114
15.3%
0 991
13.6%
597
8.2%
2 570
7.8%
- 472
 
6.5%
. 449
 
6.2%
8 360
 
4.9%
7 337
 
4.6%
6 298
 
4.1%
Other values (14) 843
11.6%
Hangul
ValueCountFrequency (%)
66
44.6%
39
26.4%
37
25.0%
2
 
1.4%
1
 
0.7%
1
 
0.7%
1
 
0.7%
1
 
0.7%
Latin
ValueCountFrequency (%)
n 5
22.7%
p 4
18.2%
s 4
18.2%
b 4
18.2%
a 2
 
9.1%
J 1
 
4.5%
M 1
 
4.5%
y 1
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7214
96.8%
Hangul 148
 
2.0%
None 78
 
1.0%
Math Operators 10
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9 1249
17.3%
1 1114
15.4%
0 991
13.7%
597
8.3%
2 570
7.9%
- 472
 
6.5%
. 449
 
6.2%
8 360
 
5.0%
7 337
 
4.7%
6 298
 
4.1%
Other values (19) 777
10.8%
None
ValueCountFrequency (%)
  74
94.9%
4
 
5.1%
Hangul
ValueCountFrequency (%)
66
44.6%
39
26.4%
37
25.0%
2
 
1.4%
1
 
0.7%
1
 
0.7%
1
 
0.7%
1
 
0.7%
Math Operators
ValueCountFrequency (%)
10
100.0%

Unnamed: 2
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing1000
Missing (%)100.0%
Memory size8.9 KiB

Interactions

2023-12-13T00:33:17.647812image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Missing values

2023-12-13T00:33:17.797810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T00:33:17.895569image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

번호연도Unnamed: 2
016391972~1979<NA>
116401983. 10~1989. 7<NA>
216411989.10~<NA>
316421997~<NA>
416431997~<NA>
517501992-1995<NA>
617511996-1997<NA>
717521998-2000<NA>
817531999-2003<NA>
917542002<NA>
번호연도Unnamed: 2
9905991959-1964<NA>
9916001962-1968<NA>
9926011962-1966<NA>
9936021965-1990<NA>
9946031976-1979<NA>
9956041982-현재<NA>
9965881988<NA>
9975891991<NA>
9985901995-현재<NA>
9995912008-현재<NA>