Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells2986
Missing cells (%)6.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory478.5 KiB
Average record size in memory49.0 B

Variable types

Text1
Categorical3
Numeric1

Dataset

Description6세이상 교육정도별 인구(초등학교, 중학교, 고등학교, 대학교(2,3년제), 대학교(4년제 이상), 대학원(석박사 과정), 받지 않았음(미취학 포함))에 대한 정보입니다. * 인구주택 총조사 자료(5년주기 생성)
Author인천광역시
URLhttps://data.incheon.go.kr/findData/publicDataDetail?dataId=15055008&srcSe=7661IVAWM27C61E190

Alerts

2020 년 has 2986 (29.9%) missing valuesMissing

Reproduction

Analysis started2024-04-20 15:46:51.156456
Analysis finished2024-04-20 15:46:52.057849
Duration0.9 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct169
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-21T00:46:53.153986image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length4
Mean length3.7727
Min length2

Characters and Unicode

Total characters37727
Distinct characters117
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row가정1동
2nd row계산4동
3rd row옥련1동
4th row논현1동
5th row부평구
ValueCountFrequency (%)
삼산2동 70
 
0.7%
청라3동 70
 
0.7%
삼산면 70
 
0.7%
영흥면 69
 
0.7%
만석동 69
 
0.7%
강화군 68
 
0.7%
만수5동 68
 
0.7%
연안동 68
 
0.7%
원당동 67
 
0.7%
계산1동 67
 
0.7%
Other values (159) 9314
93.1%
2024-04-21T00:46:54.594429image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8463
22.4%
2 1908
 
5.1%
1 1887
 
5.0%
3 1261
 
3.3%
1208
 
3.2%
798
 
2.1%
788
 
2.1%
783
 
2.1%
764
 
2.0%
757
 
2.0%
Other values (107) 19110
50.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 30931
82.0%
Decimal Number 6437
 
17.1%
Other Punctuation 359
 
1.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8463
27.4%
1208
 
3.9%
798
 
2.6%
788
 
2.5%
783
 
2.5%
764
 
2.5%
757
 
2.4%
687
 
2.2%
580
 
1.9%
558
 
1.8%
Other values (98) 15545
50.3%
Decimal Number
ValueCountFrequency (%)
2 1908
29.6%
1 1887
29.3%
3 1261
19.6%
4 683
 
10.6%
5 346
 
5.4%
6 242
 
3.8%
8 57
 
0.9%
7 53
 
0.8%
Other Punctuation
ValueCountFrequency (%)
· 359
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 30931
82.0%
Common 6796
 
18.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8463
27.4%
1208
 
3.9%
798
 
2.6%
788
 
2.5%
783
 
2.5%
764
 
2.5%
757
 
2.4%
687
 
2.2%
580
 
1.9%
558
 
1.8%
Other values (98) 15545
50.3%
Common
ValueCountFrequency (%)
2 1908
28.1%
1 1887
27.8%
3 1261
18.6%
4 683
 
10.1%
· 359
 
5.3%
5 346
 
5.1%
6 242
 
3.6%
8 57
 
0.8%
7 53
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 30931
82.0%
ASCII 6437
 
17.1%
None 359
 
1.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
8463
27.4%
1208
 
3.9%
798
 
2.6%
788
 
2.5%
783
 
2.5%
764
 
2.5%
757
 
2.4%
687
 
2.2%
580
 
1.9%
558
 
1.8%
Other values (98) 15545
50.3%
ASCII
ValueCountFrequency (%)
2 1908
29.6%
1 1887
29.3%
3 1261
19.6%
4 683
 
10.6%
5 346
 
5.4%
6 242
 
3.8%
8 57
 
0.9%
7 53
 
0.8%
None
ValueCountFrequency (%)
· 359
100.0%

성별
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
남자
5003 
여자
4997 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row남자
2nd row여자
3rd row여자
4th row남자
5th row남자

Common Values

ValueCountFrequency (%)
남자 5003
50.0%
여자 4997
50.0%

Length

2024-04-21T00:46:54.820121image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T00:46:54.978644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
남자 5003
50.0%
여자 4997
50.0%

연령별
Categorical

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
70세 이상
1284 
10-19세
1256 
40-49세
1249 
50-59세
1246 
30-39세
1245 
Other values (3)
3720 

Length

Max length6
Median length6
Mean length5.7522
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row30-39세
2nd row6-9세
3rd row30-39세
4th row60-69세
5th row6-9세

Common Values

ValueCountFrequency (%)
70세 이상 1284
12.8%
10-19세 1256
12.6%
40-49세 1249
12.5%
50-59세 1246
12.5%
30-39세 1245
12.4%
20-29세 1242
12.4%
6-9세 1239
12.4%
60-69세 1239
12.4%

Length

2024-04-21T00:46:55.243487image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T00:46:55.604690image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
70세 1284
11.4%
이상 1284
11.4%
10-19세 1256
11.1%
40-49세 1249
11.1%
50-59세 1246
11.0%
30-39세 1245
11.0%
20-29세 1242
11.0%
6-9세 1239
11.0%
60-69세 1239
11.0%

교육정도별
Categorical

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
대학교(2,3년제)
1460 
대학교(4년제 이상)
1458 
대학원(석박사 과정)
1438 
고등학교
1428 
초등학교
1420 
Other values (2)
2796 

Length

Max length14
Median length11
Mean length8.1515
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row대학원(석박사 과정)
2nd row초등학교
3rd row받지 않았음(미취학 포함)
4th row대학교(4년제 이상)
5th row초등학교

Common Values

ValueCountFrequency (%)
대학교(2,3년제) 1460
14.6%
대학교(4년제 이상) 1458
14.6%
대학원(석박사 과정) 1438
14.4%
고등학교 1428
14.3%
초등학교 1420
14.2%
중학교 1407
14.1%
받지 않았음(미취학 포함) 1389
13.9%

Length

2024-04-21T00:46:56.029676image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T00:46:56.364194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
대학교(2,3년제 1460
9.3%
대학교(4년제 1458
9.3%
이상 1458
9.3%
대학원(석박사 1438
9.2%
과정 1438
9.2%
고등학교 1428
9.1%
초등학교 1420
9.1%
중학교 1407
9.0%
받지 1389
8.9%
않았음(미취학 1389
8.9%

2020 년
Real number (ℝ)

MISSING 

Distinct1340
Distinct (%)19.1%
Missing2986
Missing (%)29.9%
Infinite0
Infinite (%)0.0%
Mean770.96393
Minimum1
Maximum154893
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-21T00:46:56.766897image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6
Q129
median117
Q3359
95-th percentile1373.1
Maximum154893
Range154892
Interquartile range (IQR)330

Descriptive statistics

Standard deviation5040.2266
Coefficient of variation (CV)6.5375647
Kurtosis327.715
Mean770.96393
Median Absolute Deviation (MAD)104
Skewness16.162914
Sum5407541
Variance25403884
MonotonicityNot monotonic
2024-04-21T00:46:57.212439image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7 111
 
1.1%
8 88
 
0.9%
6 82
 
0.8%
12 82
 
0.8%
9 80
 
0.8%
4 78
 
0.8%
5 76
 
0.8%
2 76
 
0.8%
14 69
 
0.7%
11 63
 
0.6%
Other values (1330) 6209
62.1%
(Missing) 2986
29.9%
ValueCountFrequency (%)
1 56
0.6%
2 76
0.8%
3 63
0.6%
4 78
0.8%
5 76
0.8%
6 82
0.8%
7 111
1.1%
8 88
0.9%
9 80
0.8%
10 52
0.5%
ValueCountFrequency (%)
154893 1
< 0.1%
128364 1
< 0.1%
97154 1
< 0.1%
94379 1
< 0.1%
94088 1
< 0.1%
90342 1
< 0.1%
86988 1
< 0.1%
86622 1
< 0.1%
85060 1
< 0.1%
73890 1
< 0.1%

Interactions

2024-04-21T00:46:51.590103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-21T00:46:57.490879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성별연령별교육정도별2020 년
성별1.0000.0000.0000.000
연령별0.0001.0000.0000.067
교육정도별0.0000.0001.0000.060
2020 년0.0000.0670.0601.000
2024-04-21T00:46:57.737978image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연령별성별교육정도별
연령별1.0000.0000.000
성별0.0001.0000.000
교육정도별0.0000.0001.000
2024-04-21T00:46:57.983835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2020 년성별연령별교육정도별
2020 년1.0000.0000.0330.031
성별0.0001.0000.0000.000
연령별0.0330.0001.0000.000
교육정도별0.0310.0000.0001.000

Missing values

2024-04-21T00:46:51.815696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-21T00:46:51.984134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

행정구역별(동읍면)성별연령별교육정도별2020 년
11786가정1동남자30-39세대학원(석박사 과정)136
11144계산4동여자6-9세초등학교348
3443옥련1동여자30-39세받지 않았음(미취학 포함)<NA>
6878논현1동남자60-69세대학교(4년제 이상)256
7392부평구남자6-9세초등학교5942
1611운서동남자60-69세중학교102
17476양사면남자6-9세대학교(4년제 이상)<NA>
10709작전1동여자10-19세받지 않았음(미취학 포함)<NA>
10537계산3동남자10-19세고등학교363
4438옥련2동여자20-29세초등학교<NA>
행정구역별(동읍면)성별연령별교육정도별2020 년
5855간석3동남자40-49세대학교(2,3년제)372
8848갈산1동남자6-9세초등학교157
16496강화군남자40-49세대학교(4년제 이상)866
12861가좌4동여자50-59세고등학교609
14148숭의2동남자50-59세중학교150
17736교동면남자50-59세대학원(석박사 과정)11
1088율목동여자30-39세대학교(2,3년제)42
8023부평5동여자20-29세중학교22
13505불로대곡동여자10-19세고등학교456
1567용유동여자70세 이상받지 않았음(미취학 포함)60