Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows285
Duplicate rows (%)2.9%
Total size in memory566.4 KiB
Average record size in memory58.0 B

Variable types

Categorical2
Text2
Numeric2

Dataset

Description당진시 공간정보활용시스템에서 관리하는 건물 정보에 대한 데이터로 대분류, 중분류, 소분류, 구지번, 위도, 경도 등의 항목을 제공합니다.
Author충청남도 당진시
URLhttps://www.data.go.kr/data/15091587/fileData.do

Alerts

Dataset has 285 (2.9%) duplicate rowsDuplicates
중분류명 is highly overall correlated with 대분류명High correlation
대분류명 is highly overall correlated with 중분류명High correlation

Reproduction

Analysis started2023-12-12 06:18:48.194149
Analysis finished2023-12-12 06:18:49.801089
Duration1.61 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

대분류명
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
시설물
4138 
산업
2582 
숙박및음식
2016 
레저및관광및예술
593 
교육및보건
437 

Length

Max length8
Median length5
Mean length3.5757
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row산업
2nd row산업
3rd row시설물
4th row시설물
5th row산업

Common Values

ValueCountFrequency (%)
시설물 4138
41.4%
산업 2582
25.8%
숙박및음식 2016
20.2%
레저및관광및예술 593
 
5.9%
교육및보건 437
 
4.4%
공공및환경 234
 
2.3%

Length

2023-12-12T15:18:49.895662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:18:50.013286image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
시설물 4138
41.4%
산업 2582
25.8%
숙박및음식 2016
20.2%
레저및관광및예술 593
 
5.9%
교육및보건 437
 
4.4%
공공및환경 234
 
2.3%

중분류명
Categorical

HIGH CORRELATION 

Distinct21
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
도로시설
2672 
음식점
1948 
서비스산업
1570 
안전시설
951 
원시산업
538 
Other values (16)
2321 

Length

Max length8
Median length4
Mean length3.9619
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row제조산업
2nd row서비스산업
3rd row안전시설
4th row도로시설
5th row서비스산업

Common Values

ValueCountFrequency (%)
도로시설 2672
26.7%
음식점 1948
19.5%
서비스산업 1570
15.7%
안전시설 951
 
9.5%
원시산업 538
 
5.4%
관광지 506
 
5.1%
제조산업 464
 
4.6%
편의시설 281
 
2.8%
교육시설 235
 
2.4%
보건시설 202
 
2.0%
Other values (11) 633
 
6.3%

Length

2023-12-12T15:18:50.158949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
도로시설 2672
26.7%
음식점 1948
19.5%
서비스산업 1570
15.7%
안전시설 951
 
9.5%
원시산업 538
 
5.4%
관광지 506
 
5.1%
제조산업 464
 
4.6%
편의시설 281
 
2.8%
교육시설 235
 
2.4%
보건시설 202
 
2.0%
Other values (11) 633
 
6.3%
Distinct94
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T15:18:50.384020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length5
Mean length4.9045
Min length1

Characters and Unicode

Total characters49045
Distinct characters156
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)0.1%

Sample

1st row건설업
2nd row전문도매업
3rd row기타안전시설
4th row진출입시설
5th row자동차산업
ValueCountFrequency (%)
진출입시설 2147
21.5%
일반음식점 1300
 
13.0%
가로등 575
 
5.8%
농업및축업 529
 
5.3%
전문소매업 454
 
4.5%
기타도로시설물 392
 
3.9%
기타안전시설 330
 
3.3%
제조업 312
 
3.1%
골짜기및고개 301
 
3.0%
기타서비스산업 288
 
2.9%
Other values (84) 3372
33.7%
2023-12-12T15:18:50.820960image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3839
 
7.8%
3719
 
7.6%
2908
 
5.9%
2147
 
4.4%
2147
 
4.4%
2147
 
4.4%
1758
 
3.6%
1671
 
3.4%
1431
 
2.9%
1384
 
2.8%
Other values (146) 25894
52.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 49045
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3839
 
7.8%
3719
 
7.6%
2908
 
5.9%
2147
 
4.4%
2147
 
4.4%
2147
 
4.4%
1758
 
3.6%
1671
 
3.4%
1431
 
2.9%
1384
 
2.8%
Other values (146) 25894
52.8%

Most occurring scripts

ValueCountFrequency (%)
Hangul 49045
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3839
 
7.8%
3719
 
7.6%
2908
 
5.9%
2147
 
4.4%
2147
 
4.4%
2147
 
4.4%
1758
 
3.6%
1671
 
3.4%
1431
 
2.9%
1384
 
2.8%
Other values (146) 25894
52.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 49045
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3839
 
7.8%
3719
 
7.6%
2908
 
5.9%
2147
 
4.4%
2147
 
4.4%
2147
 
4.4%
1758
 
3.6%
1671
 
3.4%
1431
 
2.9%
1384
 
2.8%
Other values (146) 25894
52.8%
Distinct5827
Distinct (%)58.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T15:18:51.222164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length35
Median length29
Mean length18.1357
Min length1

Characters and Unicode

Total characters181357
Distinct characters152
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4405 ?
Unique (%)44.0%

Sample

1st row충청남도 당진시 대덕동 1689번지
2nd row충청남도 당진시 고대면 항곡리 321-14
3rd row
4th row충청남도 당진시 읍내동 641-7
5th row충청남도 당진시 원당동 461-2
ValueCountFrequency (%)
충청남도 9094
22.1%
당진시 9093
22.1%
읍내동 1656
 
4.0%
송악읍 1417
 
3.4%
신평면 910
 
2.2%
합덕읍 888
 
2.2%
석문면 604
 
1.5%
운산리 592
 
1.4%
송산면 472
 
1.1%
복운리 408
 
1.0%
Other values (5075) 16073
39.0%
2023-12-12T15:18:51.728091image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
33017
18.2%
9523
 
5.3%
9441
 
5.2%
9385
 
5.2%
9316
 
5.1%
9285
 
5.1%
9126
 
5.0%
9096
 
5.0%
1 6333
 
3.5%
- 5942
 
3.3%
Other values (142) 70893
39.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 110286
60.8%
Space Separator 33017
 
18.2%
Decimal Number 32107
 
17.7%
Dash Punctuation 5942
 
3.3%
Other Punctuation 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9523
 
8.6%
9441
 
8.6%
9385
 
8.5%
9316
 
8.4%
9285
 
8.4%
9126
 
8.3%
9096
 
8.2%
5869
 
5.3%
3962
 
3.6%
3856
 
3.5%
Other values (129) 31427
28.5%
Decimal Number
ValueCountFrequency (%)
1 6333
19.7%
2 4522
14.1%
3 3537
11.0%
5 3181
9.9%
6 3113
9.7%
4 3024
9.4%
9 2363
 
7.4%
8 2187
 
6.8%
0 1987
 
6.2%
7 1860
 
5.8%
Space Separator
ValueCountFrequency (%)
33017
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5942
100.0%
Other Punctuation
ValueCountFrequency (%)
. 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 110286
60.8%
Common 71071
39.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9523
 
8.6%
9441
 
8.6%
9385
 
8.5%
9316
 
8.4%
9285
 
8.4%
9126
 
8.3%
9096
 
8.2%
5869
 
5.3%
3962
 
3.6%
3856
 
3.5%
Other values (129) 31427
28.5%
Common
ValueCountFrequency (%)
33017
46.5%
1 6333
 
8.9%
- 5942
 
8.4%
2 4522
 
6.4%
3 3537
 
5.0%
5 3181
 
4.5%
6 3113
 
4.4%
4 3024
 
4.3%
9 2363
 
3.3%
8 2187
 
3.1%
Other values (3) 3852
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 110286
60.8%
ASCII 71071
39.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
33017
46.5%
1 6333
 
8.9%
- 5942
 
8.4%
2 4522
 
6.4%
3 3537
 
5.0%
5 3181
 
4.5%
6 3113
 
4.4%
4 3024
 
4.3%
9 2363
 
3.3%
8 2187
 
3.1%
Other values (3) 3852
 
5.4%
Hangul
ValueCountFrequency (%)
9523
 
8.6%
9441
 
8.6%
9385
 
8.5%
9316
 
8.4%
9285
 
8.4%
9126
 
8.3%
9096
 
8.2%
5869
 
5.3%
3962
 
3.6%
3856
 
3.5%
Other values (129) 31427
28.5%

위도
Real number (ℝ)

Distinct9173
Distinct (%)91.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean126.67567
Minimum126.40408
Maximum126.85808
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T15:18:51.869243image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum126.40408
5-th percentile126.54035
Q1126.62891
median126.64677
Q3126.75323
95-th percentile126.78734
Maximum126.85808
Range0.4540014
Interquartile range (IQR)0.1243166

Descriptive statistics

Standard deviation0.077777784
Coefficient of variation (CV)0.0006139915
Kurtosis-0.5916222
Mean126.67567
Median Absolute Deviation (MAD)0.0470785
Skewness-0.01608475
Sum1266756.7
Variance0.0060493837
MonotonicityNot monotonic
2023-12-12T15:18:52.005414image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
126.646029 20
 
0.2%
126.5922214 11
 
0.1%
126.6841659 11
 
0.1%
126.647021 8
 
0.1%
126.62962 8
 
0.1%
126.660444 8
 
0.1%
126.6331 7
 
0.1%
126.6462379 7
 
0.1%
126.645219 6
 
0.1%
126.643611 6
 
0.1%
Other values (9163) 9908
99.1%
ValueCountFrequency (%)
126.4040833 1
< 0.1%
126.4132222 1
< 0.1%
126.4134167 1
< 0.1%
126.4233799 1
< 0.1%
126.4236389 1
< 0.1%
126.4241466 1
< 0.1%
126.4242798 2
< 0.1%
126.4247552 1
< 0.1%
126.4249957 1
< 0.1%
126.4250746 1
< 0.1%
ValueCountFrequency (%)
126.8580847 1
< 0.1%
126.8579152 1
< 0.1%
126.8577788 1
< 0.1%
126.8574749 1
< 0.1%
126.8568655 1
< 0.1%
126.8562563 1
< 0.1%
126.8559511 1
< 0.1%
126.8501443 1
< 0.1%
126.8429798 1
< 0.1%
126.8417293 1
< 0.1%

경도
Real number (ℝ)

Distinct9194
Distinct (%)91.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.894926
Minimum36.760354
Maximum37.056929
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T15:18:52.136929image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum36.760354
5-th percentile36.806876
Q136.879476
median36.893683
Q336.913078
95-th percentile36.99051
Maximum37.056929
Range0.29657505
Interquartile range (IQR)0.033602293

Descriptive statistics

Standard deviation0.053201326
Coefficient of variation (CV)0.0014419687
Kurtosis0.54583341
Mean36.894926
Median Absolute Deviation (MAD)0.0153216
Skewness0.27744149
Sum368949.26
Variance0.0028303811
MonotonicityNot monotonic
2023-12-12T15:18:52.311032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
36.889997 20
 
0.2%
36.90429815 11
 
0.1%
36.89046 11
 
0.1%
36.921711 8
 
0.1%
36.900725 8
 
0.1%
36.888617 7
 
0.1%
36.9010192 7
 
0.1%
36.90574 7
 
0.1%
36.8523536 6
 
0.1%
36.8907122 6
 
0.1%
Other values (9184) 9909
99.1%
ValueCountFrequency (%)
36.76035395 1
< 0.1%
36.76038884 1
< 0.1%
36.76162659 1
< 0.1%
36.762662 1
< 0.1%
36.76277 1
< 0.1%
36.76364774 1
< 0.1%
36.76367662 1
< 0.1%
36.76419051 1
< 0.1%
36.76442 1
< 0.1%
36.7645284 1
< 0.1%
ValueCountFrequency (%)
37.056929 1
< 0.1%
37.05376973 1
< 0.1%
37.05375279 1
< 0.1%
37.05361866 1
< 0.1%
37.05359755 1
< 0.1%
37.05268611 1
< 0.1%
37.05267821 1
< 0.1%
37.052248 1
< 0.1%
37.05222655 1
< 0.1%
37.05161412 1
< 0.1%

Interactions

2023-12-12T15:18:49.258094image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:18:48.964838image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:18:49.382176image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:18:49.097828image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T15:18:52.413166image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대분류명중분류명소분류명위도경도
대분류명1.0001.0001.0000.1910.233
중분류명1.0001.0001.0000.3600.412
소분류명1.0001.0001.0000.5720.535
위도0.1910.3600.5721.0000.770
경도0.2330.4120.5350.7701.000
2023-12-12T15:18:52.518389image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
중분류명대분류명
중분류명1.0000.999
대분류명0.9991.000
2023-12-12T15:18:52.602644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위도경도대분류명중분류명
위도1.000-0.2610.1010.141
경도-0.2611.0000.1240.165
대분류명0.1010.1241.0000.999
중분류명0.1410.1650.9991.000

Missing values

2023-12-12T15:18:49.571148image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:18:49.720867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

대분류명중분류명소분류명구지번위도경도
20373산업제조산업건설업충청남도 당진시 대덕동 1689번지126.6380436.885889
26302산업서비스산업전문도매업충청남도 당진시 고대면 항곡리 321-14126.6046636.889541
5864시설물안전시설기타안전시설126.60982536.832508
10579시설물도로시설진출입시설충청남도 당진시 읍내동 641-7126.62957536.891334
25194산업서비스산업자동차산업충청남도 당진시 원당동 461-2126.64975536.911512
23595산업서비스산업임대업충청남도 당진시 송산면 가곡리 386-25126.67510336.971526
21851산업원시산업농업및축업충청남도 당진시 신평면 초대리 772126.73604736.896784
25248산업서비스산업종합상품판매업충청남도 당진시 채운동 1105126.63157336.886172
798레저및관광및예술관광지골짜기및고개충청남도 당진시 대호지면 두산리126.54646636.891889
2227교육및보건교육시설중등교육기관충청남도 당진시 원당동 1224126.6409236.900059
대분류명중분류명소분류명구지번위도경도
3092공공및환경정치및사회및외교사회복지시설충청남도 당진시 석문면 삼봉리 868-1126.53452637.013943
16101숙박및음식음식점기타음식점충청남도 당진시 고대면 용두리 647-6126.60224736.925604
21977산업원시산업농업및축업충청남도 당진시 고대면 항곡리 80126.59760636.907961
17513숙박및음식음식점주점충청남도 당진시 우강면 송산리 404-43번지126.77541536.810653
12382시설물도로시설진출입시설충청남도 당진시 송악읍 복운리 1642-8126.78368236.94127
25649산업서비스산업뷰티서비스충청남도 당진시 석문면 교로리 906-15126.51237537.048985
15370숙박및음식음식점일반음식점충청남도 당진시 송악읍 기지시리 324-2126.6911836.903084
17660숙박및음식음식점디저트충청남도 당진시 합덕읍 운산리 300-50126.7706336.80991
3198공공및환경정치및사회및외교사회복지시설충청남도 당진시 채운동 310-3126.62402536.893563
293레저및관광및예술관광지골짜기및고개충청남도 당진시 정미면 봉성리126.54158336.842684

Duplicate rows

Most frequently occurring

대분류명중분류명소분류명구지번위도경도# duplicates
242시설물기반시설유통및공급시설충청남도 당진시 시곡동 57-1126.68416636.90429811
41산업서비스산업기타서비스산업충청남도 당진시 읍내동 153-69126.633136.8886177
10교육및보건보건시설기타보건시설충청남도 당진시 수청동 1002126.64602936.8899976
91산업서비스산업전문소매업충청남도 당진시 수청동 1002126.64602936.8899976
99산업서비스산업전문소매업충청남도 당진시 읍내동 145-12126.6296236.890466
196숙박및음식음식점일반음식점충청남도 당진시 신평면 운정리 960-4126.8225636.8907126
240시설물기반시설유통및공급시설충청남도 당진시 수청동 1005126.64361136.8916676
248시설물기반시설유통및공급시설충청남도 당진시 정미면 덕마리 230-2임126.59222136.8534656
264시설물편의시설보행시설충청남도 당진시 송악읍 한진리 318-4126.76445336.9688646
7교육및보건교육시설학원충청남도 당진시 수청동 980126.64521336.9014795