Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells8354
Missing cells (%)11.9%
Duplicate rows173
Duplicate rows (%)1.7%
Total size in memory654.3 KiB
Average record size in memory67.0 B

Variable types

Text3
Numeric3
DateTime1

Dataset

Description국토지리정보원의 수치지도(수치지형도) 관련 메타데이터 중 지도 정보입니다. (도엽번호, 도엽명, 조사연도, 제작연도, 고시번호 등)
Author국토교통부 국토지리정보원
URLhttps://www.data.go.kr/data/15067685/fileData.do

Alerts

Dataset has 173 (1.7%) duplicate rowsDuplicates
사업_번호 is highly overall correlated with 조사_연도 and 1 other fieldsHigh correlation
조사_연도 is highly overall correlated with 사업_번호 and 1 other fieldsHigh correlation
제작_연도 is highly overall correlated with 사업_번호 and 1 other fieldsHigh correlation
도엽_명 has 115 (1.1%) missing valuesMissing
사업_번호 has 150 (1.5%) missing valuesMissing
조사_연도 has 2685 (26.9%) missing valuesMissing
고시_번호 has 2694 (26.9%) missing valuesMissing
고시_일자 has 2694 (26.9%) missing valuesMissing

Reproduction

Analysis started2023-12-12 10:06:46.278411
Analysis finished2023-12-12 10:06:48.849697
Duration2.57 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct9007
Distinct (%)90.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T19:06:49.193425image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length8.3038
Min length2

Characters and Unicode

Total characters83038
Distinct characters24
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8065 ?
Unique (%)80.7%

Sample

1st row37801032
2nd row356161299
3rd row37711064
4th row368051490
5th row36712075
ValueCountFrequency (%)
35616067 4
 
< 0.1%
37705038 3
 
< 0.1%
37913033 3
 
< 0.1%
36701083 3
 
< 0.1%
34703035 3
 
< 0.1%
35611041 3
 
< 0.1%
37701070 3
 
< 0.1%
37712046 3
 
< 0.1%
37610047 3
 
< 0.1%
37603088 3
 
< 0.1%
Other values (8997) 9969
99.7%
2023-12-12T19:06:49.809721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 15932
19.2%
3 14266
17.2%
1 9698
11.7%
7 8932
10.8%
6 8421
10.1%
5 6803
8.2%
8 5848
 
7.0%
4 4553
 
5.5%
2 4483
 
5.4%
9 4088
 
4.9%
Other values (14) 14
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 83024
> 99.9%
Other Letter 10
 
< 0.1%
Uppercase Letter 4
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 15932
19.2%
3 14266
17.2%
1 9698
11.7%
7 8932
10.8%
6 8421
10.1%
5 6803
8.2%
8 5848
 
7.0%
4 4553
 
5.5%
2 4483
 
5.4%
9 4088
 
4.9%
Other Letter
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Uppercase Letter
ValueCountFrequency (%)
N 1
25.0%
E 1
25.0%
J 1
25.0%
I 1
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common 83024
> 99.9%
Hangul 10
 
< 0.1%
Latin 4
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 15932
19.2%
3 14266
17.2%
1 9698
11.7%
7 8932
10.8%
6 8421
10.1%
5 6803
8.2%
8 5848
 
7.0%
4 4553
 
5.5%
2 4483
 
5.4%
9 4088
 
4.9%
Hangul
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Latin
ValueCountFrequency (%)
N 1
25.0%
E 1
25.0%
J 1
25.0%
I 1
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 83028
> 99.9%
Hangul 10
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 15932
19.2%
3 14266
17.2%
1 9698
11.7%
7 8932
10.8%
6 8421
10.1%
5 6803
8.2%
8 5848
 
7.0%
4 4553
 
5.5%
2 4483
 
5.4%
9 4088
 
4.9%
Other values (4) 4
 
< 0.1%
Hangul
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%

도엽_명
Text

MISSING 

Distinct8288
Distinct (%)83.8%
Missing115
Missing (%)1.1%
Memory size156.2 KiB
2023-12-12T19:06:50.385446image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length5
Mean length5.0311583
Min length2

Characters and Unicode

Total characters49733
Distinct characters200
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7425 ?
Unique (%)75.1%

Sample

1st row어론
2nd row광주1299
3rd row여주064
4th row문경1490
5th row관기075
ValueCountFrequency (%)
창원 26
 
0.3%
부산 20
 
0.2%
구정 17
 
0.2%
운봉 16
 
0.2%
마산 15
 
0.2%
정읍 14
 
0.1%
평창 14
 
0.1%
왜관 14
 
0.1%
진주 14
 
0.1%
대구 13
 
0.1%
Other values (8278) 9723
98.4%
2023-12-12T19:06:50.984164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 8355
16.8%
1 3664
 
7.4%
2 2999
 
6.0%
3 2352
 
4.7%
4 2215
 
4.5%
5 2063
 
4.1%
8 2047
 
4.1%
6 2003
 
4.0%
7 1986
 
4.0%
9 1964
 
3.9%
Other values (190) 20085
40.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 29648
59.6%
Other Letter 20083
40.4%
Space Separator 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1185
 
5.9%
975
 
4.9%
958
 
4.8%
898
 
4.5%
628
 
3.1%
560
 
2.8%
506
 
2.5%
484
 
2.4%
454
 
2.3%
424
 
2.1%
Other values (179) 13011
64.8%
Decimal Number
ValueCountFrequency (%)
0 8355
28.2%
1 3664
12.4%
2 2999
 
10.1%
3 2352
 
7.9%
4 2215
 
7.5%
5 2063
 
7.0%
8 2047
 
6.9%
6 2003
 
6.8%
7 1986
 
6.7%
9 1964
 
6.6%
Space Separator
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 29650
59.6%
Hangul 20083
40.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1185
 
5.9%
975
 
4.9%
958
 
4.8%
898
 
4.5%
628
 
3.1%
560
 
2.8%
506
 
2.5%
484
 
2.4%
454
 
2.3%
424
 
2.1%
Other values (179) 13011
64.8%
Common
ValueCountFrequency (%)
0 8355
28.2%
1 3664
12.4%
2 2999
 
10.1%
3 2352
 
7.9%
4 2215
 
7.5%
5 2063
 
7.0%
8 2047
 
6.9%
6 2003
 
6.8%
7 1986
 
6.7%
9 1964
 
6.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 29650
59.6%
Hangul 20083
40.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 8355
28.2%
1 3664
12.4%
2 2999
 
10.1%
3 2352
 
7.9%
4 2215
 
7.5%
5 2063
 
7.0%
8 2047
 
6.9%
6 2003
 
6.8%
7 1986
 
6.7%
9 1964
 
6.6%
Hangul
ValueCountFrequency (%)
1185
 
5.9%
975
 
4.9%
958
 
4.8%
898
 
4.5%
628
 
3.1%
560
 
2.8%
506
 
2.5%
484
 
2.4%
454
 
2.3%
424
 
2.1%
Other values (179) 13011
64.8%

사업_번호
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct248
Distinct (%)2.5%
Missing150
Missing (%)1.5%
Infinite0
Infinite (%)0.0%
Mean3.2923163 × 1012
Minimum1.99806 × 109
Maximum2.02206 × 1013
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T19:06:51.188421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1.99806 × 109
5-th percentile2.00012 × 109
Q12.00406 × 109
median2.00803 × 109
Q32.01603 × 109
95-th percentile2.02205 × 1013
Maximum2.02206 × 1013
Range2.0218602 × 1013
Interquartile range (IQR)11970002

Descriptive statistics

Standard deviation7.4634549 × 1012
Coefficient of variation (CV)2.2669313
Kurtosis1.340392
Mean3.2923163 × 1012
Median Absolute Deviation (MAD)4910004
Skewness1.8275995
Sum3.2429316 × 1016
Variance5.5703158 × 1025
MonotonicityNot monotonic
2023-12-12T19:06:51.383568image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20220500000000 1007
 
10.1%
2013020001 598
 
6.0%
2003120001 596
 
6.0%
2000120001 530
 
5.3%
20220600000000 480
 
4.8%
2008050001 389
 
3.9%
2012020001 294
 
2.9%
2004110005 287
 
2.9%
2001010004 213
 
2.1%
2004120007 185
 
1.8%
Other values (238) 5271
52.7%
ValueCountFrequency (%)
1998060002 1
 
< 0.1%
2000120001 530
5.3%
2000120003 83
 
0.8%
2001010003 7
 
0.1%
2001010004 213
2.1%
2001090002 3
 
< 0.1%
2002040004 102
 
1.0%
2002080001 152
 
1.5%
2003030001 6
 
0.1%
2003030002 30
 
0.3%
ValueCountFrequency (%)
20220600000000 480
4.8%
20220500000000 1007
10.1%
20220200000000 45
 
0.4%
20210100000000 71
 
0.7%
2022050001 23
 
0.2%
2022040001 21
 
0.2%
2022030001 38
 
0.4%
2022020001 9
 
0.1%
2022010002 26
 
0.3%
2022010001 1
 
< 0.1%

조사_연도
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct22
Distinct (%)0.3%
Missing2685
Missing (%)26.9%
Infinite0
Infinite (%)0.0%
Mean2012.4427
Minimum1998
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T19:06:51.537795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1998
5-th percentile2004
Q12006
median2011
Q32020
95-th percentile2022
Maximum2022
Range24
Interquartile range (IQR)14

Descriptive statistics

Standard deviation6.6999309
Coefficient of variation (CV)0.0033292531
Kurtosis-1.4331048
Mean2012.4427
Median Absolute Deviation (MAD)5
Skewness0.29056498
Sum14721018
Variance44.889075
MonotonicityNot monotonic
2023-12-12T19:06:51.683097image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
2022 1532
15.3%
2008 908
 
9.1%
2006 734
 
7.3%
2012 714
 
7.1%
2005 555
 
5.5%
2007 433
 
4.3%
2004 398
 
4.0%
2011 334
 
3.3%
2015 273
 
2.7%
2021 255
 
2.5%
Other values (12) 1179
11.8%
(Missing) 2685
26.9%
ValueCountFrequency (%)
1998 3
 
< 0.1%
2002 102
 
1.0%
2003 97
 
1.0%
2004 398
4.0%
2005 555
5.5%
2006 734
7.3%
2007 433
4.3%
2008 908
9.1%
2009 66
 
0.7%
2010 39
 
0.4%
ValueCountFrequency (%)
2022 1532
15.3%
2021 255
 
2.5%
2020 180
 
1.8%
2019 159
 
1.6%
2018 126
 
1.3%
2017 156
 
1.6%
2016 72
 
0.7%
2015 273
 
2.7%
2014 63
 
0.6%
2013 116
 
1.2%

제작_연도
Real number (ℝ)

HIGH CORRELATION 

Distinct39
Distinct (%)0.4%
Missing16
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean2009.8254
Minimum1984
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T19:06:52.149553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1984
5-th percentile2000
Q12004
median2008
Q32015
95-th percentile2022
Maximum2022
Range38
Interquartile range (IQR)11

Descriptive statistics

Standard deviation7.2634545
Coefficient of variation (CV)0.0036139729
Kurtosis-1.012571
Mean2009.8254
Median Absolute Deviation (MAD)4
Skewness0.54058204
Sum20066097
Variance52.757771
MonotonicityNot monotonic
2023-12-12T19:06:52.335469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%)
2022.0 1532
15.3%
2004.0 1176
11.8%
2008.0 946
9.5%
2000.0 826
8.3%
2006.0 793
7.9%
2003.0 693
 
6.9%
2012.0 665
 
6.7%
2007.0 635
 
6.3%
2005.0 568
 
5.7%
2011.0 313
 
3.1%
Other values (29) 1837
18.4%
ValueCountFrequency (%)
1984.0 2
 
< 0.1%
1991.0 2
 
< 0.1%
1996.0 1
 
< 0.1%
1999.0 4
 
< 0.1%
2000.0 826
8.3%
2001.0 12
 
0.1%
2002.0 254
 
2.5%
2003.0 693
6.9%
2004.0 1176
11.8%
2005.0 568
5.7%
ValueCountFrequency (%)
2022.0 1532
15.3%
2021.0 246
 
2.5%
2020.0 180
 
1.8%
2019.0 159
 
1.6%
2018.0 126
 
1.3%
2017.0 156
 
1.6%
2016.0 72
 
0.7%
2015.0 273
 
2.7%
2014.0 63
 
0.6%
2013.0 116
 
1.2%

고시_번호
Text

MISSING 

Distinct138
Distinct (%)1.9%
Missing2694
Missing (%)26.9%
Memory size156.2 KiB
2023-12-12T19:06:52.655898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length8.3794142
Min length6

Characters and Unicode

Total characters61220
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)0.1%

Sample

1st row2012-1633
2nd row2008-111
3rd row2005-653
4th row2008-875
5th row2023-691
ValueCountFrequency (%)
2012-1633 598
 
8.2%
2009-115 389
 
5.3%
2007-37 386
 
5.3%
2011-1080 294
 
4.0%
2008-875 276
 
3.8%
2006-755 263
 
3.6%
2023-0902 245
 
3.4%
2008-111 244
 
3.3%
2004-740 228
 
3.1%
2005-840 225
 
3.1%
Other values (128) 4158
56.9%
2023-12-12T19:06:53.116660image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 14038
22.9%
2 12490
20.4%
- 7306
11.9%
1 6933
11.3%
3 4828
 
7.9%
5 3238
 
5.3%
7 3150
 
5.1%
8 2562
 
4.2%
9 2239
 
3.7%
4 2218
 
3.6%
Other values (4) 2218
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 53902
88.0%
Dash Punctuation 7306
 
11.9%
Lowercase Letter 8
 
< 0.1%
Uppercase Letter 4
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 14038
26.0%
2 12490
23.2%
1 6933
12.9%
3 4828
 
9.0%
5 3238
 
6.0%
7 3150
 
5.8%
8 2562
 
4.8%
9 2239
 
4.2%
4 2218
 
4.1%
6 2206
 
4.1%
Lowercase Letter
ValueCountFrequency (%)
u 4
50.0%
n 4
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 7306
100.0%
Uppercase Letter
ValueCountFrequency (%)
J 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 61208
> 99.9%
Latin 12
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 14038
22.9%
2 12490
20.4%
- 7306
11.9%
1 6933
11.3%
3 4828
 
7.9%
5 3238
 
5.3%
7 3150
 
5.1%
8 2562
 
4.2%
9 2239
 
3.7%
4 2218
 
3.6%
Latin
ValueCountFrequency (%)
J 4
33.3%
u 4
33.3%
n 4
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 61220
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 14038
22.9%
2 12490
20.4%
- 7306
11.9%
1 6933
11.3%
3 4828
 
7.9%
5 3238
 
5.3%
7 3150
 
5.1%
8 2562
 
4.2%
9 2239
 
3.7%
4 2218
 
3.6%
Other values (4) 2218
 
3.6%

고시_일자
Date

MISSING 

Distinct133
Distinct (%)1.8%
Missing2694
Missing (%)26.9%
Memory size156.2 KiB
Minimum2003-03-13 00:00:00
Maximum2023-03-27 00:00:00
2023-12-12T19:06:53.248391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:06:53.372842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2023-12-12T19:06:47.908056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:06:47.005422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:06:47.425730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:06:48.036577image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:06:47.149614image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:06:47.616587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:06:48.166001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:06:47.296050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:06:47.746004image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T19:06:53.454224image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업_번호조사_연도제작_연도
사업_번호1.0000.8490.971
조사_연도0.8491.0000.913
제작_연도0.9710.9131.000
2023-12-12T19:06:53.538678image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업_번호조사_연도제작_연도
사업_번호1.0000.9950.996
조사_연도0.9951.0001.000
제작_연도0.9961.0001.000

Missing values

2023-12-12T19:06:48.348205image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T19:06:48.524428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T19:06:48.727175image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

도엽_번호도엽_명사업_번호조사_연도제작_연도고시_번호고시_일자
331337801032어론201302000120122012.02012-16332012-12-27
89812356161299광주12992001010004<NA>2000.0<NA><NA>
5309437711064여주064200703001120072007.02008-1112008-01-31
57639368051490문경1490200503001120052005.02005-6532005-10-20
1735536712075관기0752003120001<NA>2003.0<NA><NA>
77817359130984부산0984200803000620082008.02008-8752008-12-30
4542837913073장성0732022050000000020222022.02023-6912023-02-07
634634601030자은201302000120122012.02012-16332012-12-27
6631437802035현리201302000120122012.02012-16332012-12-27
60063358021560왜관1560201712000620172017.02017-40022017-12-21
도엽_번호도엽_명사업_번호조사_연도제작_연도고시_번호고시_일자
1800336612079청양0792003120001<NA>2003.0<NA><NA>
59101359091388양산1388200510000120062006.02006-3532006-06-07
6904436803015영주015200603000820062006.02007-372007-01-16
2959535806059창녕0592022050000000020222022.02023-07192023-02-08
707735905072언양201302000120122012.02012-16332012-12-27
869636702007진천0072004040021<NA>2004.0<NA><NA>
1842135809016삼가0162003120001<NA>2003.0<NA><NA>
81652356040199익산0199200604001120062006.02006-7552006-12-29
42200347030772광양0772200604000820062006.02006-7552006-12-29
951735707009함양201202000120112011.02011-10802011-12-26

Duplicate rows

Most frequently occurring

도엽_번호도엽_명사업_번호조사_연도제작_연도고시_번호고시_일자# duplicates
0336062073한림2073201603000320152015.02015-28442015-12-212
1336071238제주1238201603000320152015.02015-28442015-12-212
2336071240제주1240201603000320152015.02015-28442015-12-212
3336071264제주1264201603000320152015.02015-28442015-12-212
4336071627제주1627201312000720132013.02013-22332013-12-312
5336080696성산0696201703000120162016.02017-7432017-02-282
6336111171서귀1171201312000720132013.02013-22332013-12-312
7336111385서귀1385201712000620172017.02017-40022017-12-212
8336111561서귀1561201712000620172017.02017-40022017-12-212
9336111602서귀1602201312000720132013.02013-22332013-12-312