Overview

Dataset statistics

Number of variables9
Number of observations10000
Missing cells3019
Missing cells (%)3.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory820.3 KiB
Average record size in memory84.0 B

Variable types

Text3
Categorical3
DateTime1
Numeric2

Dataset

Description국토지리정보원의 수치지도(수치지형도) 관련 메타데이터 중 고시정보 입니다. (도엽번호, 도엽명, 축척, 고시일자, 지도종류 등 포함)
Author국토교통부 국토지리정보원
URLhttps://www.data.go.kr/data/15067684/fileData.do

Alerts

조사연도 is highly overall correlated with 제작연도 and 1 other fieldsHigh correlation
제작연도 is highly overall correlated with 조사연도 and 1 other fieldsHigh correlation
촬영연도 is highly overall correlated with 조사연도 and 1 other fieldsHigh correlation
축척 is highly imbalanced (54.4%)Imbalance
도엽명 has 876 (8.8%) missing valuesMissing
고시일자 has 517 (5.2%) missing valuesMissing
조사연도 has 521 (5.2%) missing valuesMissing
제작연도 has 584 (5.8%) missing valuesMissing
고시번호 has 521 (5.2%) missing valuesMissing

Reproduction

Analysis started2023-12-12 19:08:37.195661
Analysis finished2023-12-12 19:08:39.458066
Duration2.26 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct9296
Distinct (%)93.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T04:08:39.777276image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length8.5931
Min length6

Characters and Unicode

Total characters85931
Distinct characters22
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8592 ?
Unique (%)85.9%

Sample

1st row34610049
2nd row37703072
3rd row359100699
4th row367101886
5th row34611079
ValueCountFrequency (%)
377121765 2
 
< 0.1%
379140789 2
 
< 0.1%
37715043 2
 
< 0.1%
359100348 2
 
< 0.1%
368140110 2
 
< 0.1%
347030800 2
 
< 0.1%
377102575 2
 
< 0.1%
368151170 2
 
< 0.1%
367101968 2
 
< 0.1%
34711024 2
 
< 0.1%
Other values (9286) 9980
99.8%
2023-12-13T04:08:40.406579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 15073
17.5%
0 14360
16.7%
1 11559
13.5%
6 8963
10.4%
7 8468
9.9%
5 6866
8.0%
8 5724
 
6.7%
2 5382
 
6.3%
9 4753
 
5.5%
4 4701
 
5.5%
Other values (12) 82
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 85849
99.9%
Uppercase Letter 82
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I 13
15.9%
H 13
15.9%
G 11
13.4%
E 8
9.8%
J 7
8.5%
N 6
7.3%
D 5
 
6.1%
C 5
 
6.1%
A 5
 
6.1%
B 4
 
4.9%
Other values (2) 5
 
6.1%
Decimal Number
ValueCountFrequency (%)
3 15073
17.6%
0 14360
16.7%
1 11559
13.5%
6 8963
10.4%
7 8468
9.9%
5 6866
8.0%
8 5724
 
6.7%
2 5382
 
6.3%
9 4753
 
5.5%
4 4701
 
5.5%

Most occurring scripts

ValueCountFrequency (%)
Common 85849
99.9%
Latin 82
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
I 13
15.9%
H 13
15.9%
G 11
13.4%
E 8
9.8%
J 7
8.5%
N 6
7.3%
D 5
 
6.1%
C 5
 
6.1%
A 5
 
6.1%
B 4
 
4.9%
Other values (2) 5
 
6.1%
Common
ValueCountFrequency (%)
3 15073
17.6%
0 14360
16.7%
1 11559
13.5%
6 8963
10.4%
7 8468
9.9%
5 6866
8.0%
8 5724
 
6.7%
2 5382
 
6.3%
9 4753
 
5.5%
4 4701
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 85931
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 15073
17.5%
0 14360
16.7%
1 11559
13.5%
6 8963
10.4%
7 8468
9.9%
5 6866
8.0%
8 5724
 
6.7%
2 5382
 
6.3%
9 4753
 
5.5%
4 4701
 
5.5%
Other values (12) 82
 
0.1%

도엽명
Text

MISSING 

Distinct7287
Distinct (%)79.9%
Missing876
Missing (%)8.8%
Memory size156.2 KiB
2023-12-13T04:08:40.805452image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length6
Mean length5.1794169
Min length2

Characters and Unicode

Total characters47257
Distinct characters191
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6673 ?
Unique (%)73.1%

Sample

1st row진도049
2nd row춘천
3rd row방어진0699
4th row대전1886
5th row봉평077
ValueCountFrequency (%)
마산 53
 
0.6%
남원 42
 
0.5%
담양 38
 
0.4%
장호원 38
 
0.4%
금산 38
 
0.4%
고창 37
 
0.4%
익산 36
 
0.4%
거창 36
 
0.4%
논산 35
 
0.4%
무풍 35
 
0.4%
Other values (7280) 8807
95.8%
2023-12-13T04:08:41.393482image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 6146
 
13.0%
1 4593
 
9.7%
2 3363
 
7.1%
3 2324
 
4.9%
7 2129
 
4.5%
4 2125
 
4.5%
5 1979
 
4.2%
8 1974
 
4.2%
9 1972
 
4.2%
6 1819
 
3.8%
Other values (181) 18833
39.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 28424
60.1%
Other Letter 18661
39.5%
Space Separator 172
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1471
 
7.9%
1139
 
6.1%
935
 
5.0%
751
 
4.0%
585
 
3.1%
525
 
2.8%
497
 
2.7%
485
 
2.6%
468
 
2.5%
455
 
2.4%
Other values (170) 11350
60.8%
Decimal Number
ValueCountFrequency (%)
0 6146
21.6%
1 4593
16.2%
2 3363
11.8%
3 2324
 
8.2%
7 2129
 
7.5%
4 2125
 
7.5%
5 1979
 
7.0%
8 1974
 
6.9%
9 1972
 
6.9%
6 1819
 
6.4%
Space Separator
ValueCountFrequency (%)
172
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 28596
60.5%
Hangul 18661
39.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1471
 
7.9%
1139
 
6.1%
935
 
5.0%
751
 
4.0%
585
 
3.1%
525
 
2.8%
497
 
2.7%
485
 
2.6%
468
 
2.5%
455
 
2.4%
Other values (170) 11350
60.8%
Common
ValueCountFrequency (%)
0 6146
21.5%
1 4593
16.1%
2 3363
11.8%
3 2324
 
8.1%
7 2129
 
7.4%
4 2125
 
7.4%
5 1979
 
6.9%
8 1974
 
6.9%
9 1972
 
6.9%
6 1819
 
6.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 28596
60.5%
Hangul 18661
39.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 6146
21.5%
1 4593
16.1%
2 3363
11.8%
3 2324
 
8.1%
7 2129
 
7.4%
4 2125
 
7.4%
5 1979
 
6.9%
8 1974
 
6.9%
9 1972
 
6.9%
6 1819
 
6.4%
Hangul
ValueCountFrequency (%)
1471
 
7.9%
1139
 
6.1%
935
 
5.0%
751
 
4.0%
585
 
3.1%
525
 
2.8%
497
 
2.7%
485
 
2.6%
468
 
2.5%
455
 
2.4%
Other values (170) 11350
60.8%

축척
Categorical

IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1000
6025 
5000
3858 
2500
 
70
25000
 
41
250000
 
6

Length

Max length6
Median length4
Mean length4.0053
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5000
2nd row5000
3rd row1000
4th row1000
5th row5000

Common Values

ValueCountFrequency (%)
1000 6025
60.2%
5000 3858
38.6%
2500 70
 
0.7%
25000 41
 
0.4%
250000 6
 
0.1%

Length

2023-12-13T04:08:41.596602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:08:41.760277image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1000 6025
60.2%
5000 3858
38.6%
2500 70
 
0.7%
25000 41
 
0.4%
250000 6
 
0.1%

고시일자
Date

MISSING 

Distinct58
Distinct (%)0.6%
Missing517
Missing (%)5.2%
Memory size156.2 KiB
Minimum1899-12-30 00:00:00
Maximum2012-02-27 00:00:00
2023-12-13T04:08:41.895370image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:08:42.049490image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

지도종류
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
3
5178 
0
4822 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
3 5178
51.8%
0 4822
48.2%

Length

2023-12-13T04:08:42.213150image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:08:42.334195image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
3 5178
51.8%
0 4822
48.2%

촬영연도
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2010
2137 
2008
1926 
2009
1518 
2005
1000 
2006
871 
Other values (7)
2548 

Length

Max length4
Median length4
Mean length3.9994
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2008
2nd row2010
3rd row2003
4th row2011
5th row2008

Common Values

ValueCountFrequency (%)
2010 2137
21.4%
2008 1926
19.3%
2009 1518
15.2%
2005 1000
10.0%
2006 871
8.7%
2007 858
8.6%
<NA> 522
 
5.2%
2003 342
 
3.4%
2011 336
 
3.4%
2004 279
 
2.8%
Other values (2) 211
 
2.1%

Length

2023-12-13T04:08:42.515754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2010 2137
21.4%
2008 1926
19.3%
2009 1518
15.2%
2005 1000
10.0%
2006 871
8.7%
2007 858
8.6%
na 522
 
5.2%
2003 342
 
3.4%
2011 336
 
3.4%
2004 279
 
2.8%
Other values (2) 211
 
2.1%

조사연도
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct10
Distinct (%)0.1%
Missing521
Missing (%)5.2%
Infinite0
Infinite (%)0.0%
Mean2008.2298
Minimum2002
Maximum2011
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T04:08:42.687276image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2002
5-th percentile2004
Q12006
median2009
Q32010
95-th percentile2011
Maximum2011
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.4329272
Coefficient of variation (CV)0.0012114785
Kurtosis-0.76722312
Mean2008.2298
Median Absolute Deviation (MAD)2
Skewness-0.60014076
Sum19036010
Variance5.9191347
MonotonicityNot monotonic
2023-12-13T04:08:42.799804image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
2010 2066
20.7%
2011 1970
19.7%
2009 1120
11.2%
2006 943
9.4%
2007 914
9.1%
2005 834
8.3%
2008 832
8.3%
2004 397
 
4.0%
2003 324
 
3.2%
2002 79
 
0.8%
(Missing) 521
 
5.2%
ValueCountFrequency (%)
2002 79
 
0.8%
2003 324
 
3.2%
2004 397
 
4.0%
2005 834
8.3%
2006 943
9.4%
2007 914
9.1%
2008 832
8.3%
2009 1120
11.2%
2010 2066
20.7%
2011 1970
19.7%
ValueCountFrequency (%)
2011 1970
19.7%
2010 2066
20.7%
2009 1120
11.2%
2008 832
8.3%
2007 914
9.1%
2006 943
9.4%
2005 834
8.3%
2004 397
 
4.0%
2003 324
 
3.2%
2002 79
 
0.8%

제작연도
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct10
Distinct (%)0.1%
Missing584
Missing (%)5.8%
Infinite0
Infinite (%)0.0%
Mean2008.2004
Minimum2002
Maximum2011
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T04:08:42.946221image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2002
5-th percentile2004
Q12006
median2009
Q32010
95-th percentile2011
Maximum2011
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.4267678
Coefficient of variation (CV)0.0012084291
Kurtosis-0.77343126
Mean2008.2004
Median Absolute Deviation (MAD)2
Skewness-0.58622743
Sum18909215
Variance5.8892022
MonotonicityNot monotonic
2023-12-13T04:08:43.101898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
2010 2036
20.4%
2011 1892
18.9%
2009 1144
11.4%
2006 943
9.4%
2007 935
9.3%
2005 834
8.3%
2008 832
8.3%
2004 397
 
4.0%
2003 324
 
3.2%
2002 79
 
0.8%
(Missing) 584
 
5.8%
ValueCountFrequency (%)
2002 79
 
0.8%
2003 324
 
3.2%
2004 397
 
4.0%
2005 834
8.3%
2006 943
9.4%
2007 935
9.3%
2008 832
8.3%
2009 1144
11.4%
2010 2036
20.4%
2011 1892
18.9%
ValueCountFrequency (%)
2011 1892
18.9%
2010 2036
20.4%
2009 1144
11.4%
2008 832
8.3%
2007 935
9.3%
2006 943
9.4%
2005 834
8.3%
2004 397
 
4.0%
2003 324
 
3.2%
2002 79
 
0.8%

고시번호
Text

MISSING 

Distinct58
Distinct (%)0.6%
Missing521
Missing (%)5.2%
Memory size156.2 KiB
2023-12-13T04:08:43.341138image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length8.0488448
Min length7

Characters and Unicode

Total characters76295
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row2010-52
2nd row2011-1080
3rd row2005-124
4th row2012-260
5th row2010-52
ValueCountFrequency (%)
2011-1080 1470
15.5%
2010-52 790
 
8.3%
2006-755 716
 
7.6%
2010-777 704
 
7.4%
2008-875 655
 
6.9%
2010-953 592
 
6.2%
2012-260 468
 
4.9%
2007-675 397
 
4.2%
2010-907 345
 
3.6%
2005-124 292
 
3.1%
Other values (48) 3050
32.2%
2023-12-13T04:08:43.757124image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 21111
27.7%
2 12210
16.0%
- 9479
12.4%
1 8810
11.5%
7 6283
 
8.2%
5 6064
 
7.9%
8 3941
 
5.2%
6 3150
 
4.1%
3 2616
 
3.4%
9 1488
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 66816
87.6%
Dash Punctuation 9479
 
12.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 21111
31.6%
2 12210
18.3%
1 8810
13.2%
7 6283
 
9.4%
5 6064
 
9.1%
8 3941
 
5.9%
6 3150
 
4.7%
3 2616
 
3.9%
9 1488
 
2.2%
4 1143
 
1.7%
Dash Punctuation
ValueCountFrequency (%)
- 9479
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 76295
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 21111
27.7%
2 12210
16.0%
- 9479
12.4%
1 8810
11.5%
7 6283
 
8.2%
5 6064
 
7.9%
8 3941
 
5.2%
6 3150
 
4.1%
3 2616
 
3.4%
9 1488
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 76295
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 21111
27.7%
2 12210
16.0%
- 9479
12.4%
1 8810
11.5%
7 6283
 
8.2%
5 6064
 
7.9%
8 3941
 
5.2%
6 3150
 
4.1%
3 2616
 
3.4%
9 1488
 
2.0%

Interactions

2023-12-13T04:08:38.665330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:08:38.388228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:08:38.772205image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:08:38.528965image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T04:08:43.871467image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
축척고시일자지도종류촬영연도조사연도제작연도고시번호
축척1.0000.8900.0640.5430.5400.5380.937
고시일자0.8901.0000.2890.9870.9990.9991.000
지도종류0.0640.2891.0000.1730.0590.0590.288
촬영연도0.5430.9870.1731.0000.9430.9430.987
조사연도0.5400.9990.0590.9431.0001.0001.000
제작연도0.5380.9990.0590.9431.0001.0001.000
고시번호0.9371.0000.2880.9871.0001.0001.000
2023-12-13T04:08:43.997864image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
축척촬영연도지도종류
축척1.0000.3370.078
촬영연도0.3371.0000.166
지도종류0.0780.1661.000
2023-12-13T04:08:44.126512image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
조사연도제작연도축척지도종류촬영연도
조사연도1.0000.9990.3500.1050.787
제작연도0.9991.0000.3480.1060.787
축척0.3500.3481.0000.0780.337
지도종류0.1050.1060.0781.0000.166
촬영연도0.7870.7870.3370.1661.000

Missing values

2023-12-13T04:08:38.957602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T04:08:39.164392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T04:08:39.333525image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

도엽번호도엽명축척고시일자지도종류촬영연도조사연도제작연도고시번호
649934610049진도04950002010-01-1932008200920092010-52
4768037703072춘천50002011-12-2602010201120112011-1080
10690359100699방어진069910002005-02-0402003200420042005-124
8024367101886대전188610002012-02-2702011201120112012-260
96534611079<NA>50002010-01-1902008200920092010-52
1460437806077봉평07750002010-12-2432009201020102010-953
12834376071978김포197810002004-01-0502002200320032004-001
37127356041886익산188610002005-10-1732005200520052005-643
11603359130382부산038210002006-12-2902005200620062006-755
4802535705073갈담50002011-12-2632010201120112011-1080
도엽번호도엽명축척고시일자지도종류촬영연도조사연도제작연도고시번호
23222336061968한림196810002012-02-2702011201120112012-260
1262534612074<NA>50002010-01-1902008200920092010-52
29179336102017모슬포201710002006-12-2902006200620062006-755
25439347030966광양096610002010-11-0902009201020102010-777
1956536702071진천50002011-12-2602010201120112011-1080
38200368141835구미183510002008-12-3032008200820082008-875
30839359130176부산017610002006-06-0732005200620062006-353
5135735704037무풍50002011-12-2632010201120112011-1080
11101367101310대전131010002005-12-1402005200520052005-498
4946536713090논산50002011-12-2632010201120112011-1080