Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows296
Duplicate rows (%)3.0%
Total size in memory566.4 KiB
Average record size in memory58.0 B

Variable types

Categorical2
Text2
Numeric2

Dataset

Description당진시 공간정보활용시스템에서 관리하는 건물 정보에 대한 데이터로 대분류, 중분류, 소분류, 구지번, 위도, 경도 등의 항목을 제공합니다.
Author충청남도
URLhttps://alldam.chungnam.go.kr/index.chungnam?menuCd=DOM_000000201001001001&st=&cds=&orgCd=&apiType=&isOpen=Y&pageIndex=323&beforeMenuCd=DOM_000000201001001000&publicdatapk=15091587

Alerts

Dataset has 296 (3.0%) duplicate rowsDuplicates
대분류명 is highly overall correlated with 중분류명High correlation
중분류명 is highly overall correlated with 대분류명High correlation

Reproduction

Analysis started2024-01-09 22:25:06.178729
Analysis finished2024-01-09 22:25:07.075999
Duration0.9 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

대분류명
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
시설물
4170 
산업
2596 
숙박및음식
1978 
레저및관광및예술
580 
교육및보건
460 

Length

Max length8
Median length5
Mean length3.5612
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row산업
2nd row시설물
3rd row산업
4th row시설물
5th row시설물

Common Values

ValueCountFrequency (%)
시설물 4170
41.7%
산업 2596
26.0%
숙박및음식 1978
19.8%
레저및관광및예술 580
 
5.8%
교육및보건 460
 
4.6%
공공및환경 216
 
2.2%

Length

2024-01-10T07:25:07.130289image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T07:25:07.219675image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
시설물 4170
41.7%
산업 2596
26.0%
숙박및음식 1978
19.8%
레저및관광및예술 580
 
5.8%
교육및보건 460
 
4.6%
공공및환경 216
 
2.2%

중분류명
Categorical

HIGH CORRELATION 

Distinct21
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
도로시설
2694 
음식점
1909 
서비스산업
1549 
안전시설
958 
원시산업
554 
Other values (16)
2336 

Length

Max length8
Median length4
Mean length3.9621
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서비스산업
2nd row도로시설
3rd row서비스산업
4th row도로시설
5th row도로시설

Common Values

ValueCountFrequency (%)
도로시설 2694
26.9%
음식점 1909
19.1%
서비스산업 1549
15.5%
안전시설 958
 
9.6%
원시산업 554
 
5.5%
관광지 491
 
4.9%
제조산업 482
 
4.8%
편의시설 283
 
2.8%
교육시설 258
 
2.6%
보건시설 202
 
2.0%
Other values (11) 620
 
6.2%

Length

2024-01-10T07:25:07.325765image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
도로시설 2694
26.9%
음식점 1909
19.1%
서비스산업 1549
15.5%
안전시설 958
 
9.6%
원시산업 554
 
5.5%
관광지 491
 
4.9%
제조산업 482
 
4.8%
편의시설 283
 
2.8%
교육시설 258
 
2.6%
보건시설 202
 
2.0%
Other values (11) 620
 
6.2%
Distinct91
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-01-10T07:25:07.504817image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length5
Mean length4.9147
Min length1

Characters and Unicode

Total characters49147
Distinct characters154
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15 ?
Unique (%)0.1%

Sample

1st row임대업
2nd row기타도로시설물
3rd row전문소매업
4th row진출입시설
5th row진출입시설
ValueCountFrequency (%)
진출입시설 2173
21.7%
일반음식점 1261
 
12.6%
가로등 566
 
5.7%
농업및축업 548
 
5.5%
전문소매업 449
 
4.5%
기타도로시설물 386
 
3.9%
기타안전시설 349
 
3.5%
제조업 315
 
3.1%
골짜기및고개 306
 
3.1%
주점 302
 
3.0%
Other values (81) 3345
33.5%
2024-01-10T07:25:07.994109image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3900
 
7.9%
3772
 
7.7%
2963
 
6.0%
2173
 
4.4%
2173
 
4.4%
2173
 
4.4%
1788
 
3.6%
1649
 
3.4%
1478
 
3.0%
1347
 
2.7%
Other values (144) 25731
52.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 49147
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3900
 
7.9%
3772
 
7.7%
2963
 
6.0%
2173
 
4.4%
2173
 
4.4%
2173
 
4.4%
1788
 
3.6%
1649
 
3.4%
1478
 
3.0%
1347
 
2.7%
Other values (144) 25731
52.4%

Most occurring scripts

ValueCountFrequency (%)
Hangul 49147
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3900
 
7.9%
3772
 
7.7%
2963
 
6.0%
2173
 
4.4%
2173
 
4.4%
2173
 
4.4%
1788
 
3.6%
1649
 
3.4%
1478
 
3.0%
1347
 
2.7%
Other values (144) 25731
52.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 49147
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3900
 
7.9%
3772
 
7.7%
2963
 
6.0%
2173
 
4.4%
2173
 
4.4%
2173
 
4.4%
1788
 
3.6%
1649
 
3.4%
1478
 
3.0%
1347
 
2.7%
Other values (144) 25731
52.4%
Distinct5790
Distinct (%)57.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-01-10T07:25:08.296481image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length30
Median length27
Mean length18.1126
Min length1

Characters and Unicode

Total characters181126
Distinct characters141
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4333 ?
Unique (%)43.3%

Sample

1st row충청남도 당진시 읍내동 25-23
2nd row충청남도 당진시 읍내동
3rd row충청남도 당진시 읍내동 531-12
4th row충청남도 당진시 합덕읍 운산리 288-3
5th row충청남도 당진시 송산면 상거리 326-6
ValueCountFrequency (%)
충청남도 9083
22.1%
당진시 9082
22.1%
읍내동 1660
 
4.0%
송악읍 1417
 
3.4%
신평면 914
 
2.2%
합덕읍 886
 
2.2%
운산리 590
 
1.4%
석문면 588
 
1.4%
송산면 460
 
1.1%
복운리 415
 
1.0%
Other values (5075) 16079
39.1%
2024-01-10T07:25:08.712687image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
33005
18.2%
9533
 
5.3%
9416
 
5.2%
9365
 
5.2%
9342
 
5.2%
9268
 
5.1%
9113
 
5.0%
9086
 
5.0%
1 6370
 
3.5%
- 5909
 
3.3%
Other values (131) 70719
39.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 110075
60.8%
Space Separator 33005
 
18.2%
Decimal Number 32136
 
17.7%
Dash Punctuation 5909
 
3.3%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9533
 
8.7%
9416
 
8.6%
9365
 
8.5%
9342
 
8.5%
9268
 
8.4%
9113
 
8.3%
9086
 
8.3%
5854
 
5.3%
3964
 
3.6%
3820
 
3.5%
Other values (118) 31314
28.4%
Decimal Number
ValueCountFrequency (%)
1 6370
19.8%
2 4492
14.0%
3 3578
11.1%
5 3121
9.7%
6 3121
9.7%
4 3023
9.4%
9 2410
 
7.5%
8 2138
 
6.7%
7 1954
 
6.1%
0 1929
 
6.0%
Space Separator
ValueCountFrequency (%)
33005
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5909
100.0%
Other Punctuation
ValueCountFrequency (%)
. 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 110075
60.8%
Common 71051
39.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9533
 
8.7%
9416
 
8.6%
9365
 
8.5%
9342
 
8.5%
9268
 
8.4%
9113
 
8.3%
9086
 
8.3%
5854
 
5.3%
3964
 
3.6%
3820
 
3.5%
Other values (118) 31314
28.4%
Common
ValueCountFrequency (%)
33005
46.5%
1 6370
 
9.0%
- 5909
 
8.3%
2 4492
 
6.3%
3 3578
 
5.0%
5 3121
 
4.4%
6 3121
 
4.4%
4 3023
 
4.3%
9 2410
 
3.4%
8 2138
 
3.0%
Other values (3) 3884
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 110075
60.8%
ASCII 71051
39.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
33005
46.5%
1 6370
 
9.0%
- 5909
 
8.3%
2 4492
 
6.3%
3 3578
 
5.0%
5 3121
 
4.4%
6 3121
 
4.4%
4 3023
 
4.3%
9 2410
 
3.4%
8 2138
 
3.0%
Other values (3) 3884
 
5.5%
Hangul
ValueCountFrequency (%)
9533
 
8.7%
9416
 
8.6%
9365
 
8.5%
9342
 
8.5%
9268
 
8.4%
9113
 
8.3%
9086
 
8.3%
5854
 
5.3%
3964
 
3.6%
3820
 
3.5%
Other values (118) 31314
28.4%

위도
Real number (ℝ)

Distinct9115
Distinct (%)91.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean126.67585
Minimum126.41322
Maximum126.85792
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-01-10T07:25:08.831827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum126.41322
5-th percentile126.54033
Q1126.6288
median126.64626
Q3126.75357
95-th percentile126.7881
Maximum126.85792
Range0.444693
Interquartile range (IQR)0.12476925

Descriptive statistics

Standard deviation0.077876645
Coefficient of variation (CV)0.00061477104
Kurtosis-0.65319383
Mean126.67585
Median Absolute Deviation (MAD)0.0471726
Skewness0.0047947023
Sum1266758.5
Variance0.0060647718
MonotonicityNot monotonic
2024-01-10T07:25:08.938493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
126.646029 19
 
0.2%
126.6841659 12
 
0.1%
126.646985 11
 
0.1%
126.6462379 9
 
0.1%
126.660444 9
 
0.1%
126.8229491 8
 
0.1%
126.647021 8
 
0.1%
126.6296073 8
 
0.1%
126.5919436 7
 
0.1%
126.645751 7
 
0.1%
Other values (9105) 9902
99.0%
ValueCountFrequency (%)
126.4132222 1
< 0.1%
126.4134167 1
< 0.1%
126.4236389 1
< 0.1%
126.4241466 1
< 0.1%
126.4242798 1
< 0.1%
126.4243321 1
< 0.1%
126.4302899 1
< 0.1%
126.4333733 1
< 0.1%
126.4342164 1
< 0.1%
126.4348346 1
< 0.1%
ValueCountFrequency (%)
126.8579152 1
< 0.1%
126.8574749 1
< 0.1%
126.8568655 1
< 0.1%
126.8562563 1
< 0.1%
126.8554162 1
< 0.1%
126.8535416 1
< 0.1%
126.8508507 1
< 0.1%
126.8507084 1
< 0.1%
126.8501413 1
< 0.1%
126.8470143 1
< 0.1%

경도
Real number (ℝ)

Distinct9141
Distinct (%)91.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.895045
Minimum36.753511
Maximum37.056929
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-01-10T07:25:09.048939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum36.753511
5-th percentile36.806882
Q136.879818
median36.893754
Q336.913415
95-th percentile36.991902
Maximum37.056929
Range0.30341753
Interquartile range (IQR)0.033597275

Descriptive statistics

Standard deviation0.052984424
Coefficient of variation (CV)0.0014360851
Kurtosis0.57771248
Mean36.895045
Median Absolute Deviation (MAD)0.0154227
Skewness0.27289554
Sum368950.45
Variance0.0028073491
MonotonicityNot monotonic
2024-01-10T07:25:09.178548image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
36.889997 19
 
0.2%
36.90429815 12
 
0.1%
36.90149 11
 
0.1%
36.9010192 9
 
0.1%
36.921711 9
 
0.1%
36.89046 8
 
0.1%
36.8911063 8
 
0.1%
36.900725 8
 
0.1%
36.8899311 8
 
0.1%
36.901485 7
 
0.1%
Other values (9131) 9901
99.0%
ValueCountFrequency (%)
36.75351147 1
< 0.1%
36.7587866 1
< 0.1%
36.76162659 1
< 0.1%
36.762662 1
< 0.1%
36.76364705 1
< 0.1%
36.76364774 2
< 0.1%
36.76367662 1
< 0.1%
36.76400831 1
< 0.1%
36.76419051 1
< 0.1%
36.7645284 1
< 0.1%
ValueCountFrequency (%)
37.056929 1
< 0.1%
37.05387532 1
< 0.1%
37.05376973 1
< 0.1%
37.05361866 1
< 0.1%
37.05267821 1
< 0.1%
37.0526229 1
< 0.1%
37.0522783 1
< 0.1%
37.05225202 1
< 0.1%
37.0516845 1
< 0.1%
37.05162077 1
< 0.1%

Interactions

2024-01-10T07:25:06.779717image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T07:25:06.641930image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T07:25:06.851345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T07:25:06.708308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-01-10T07:25:09.259954image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대분류명중분류명소분류명위도경도
대분류명1.0001.0001.0000.1940.231
중분류명1.0001.0001.0000.3530.400
소분류명1.0001.0001.0000.5920.515
위도0.1940.3530.5921.0000.768
경도0.2310.4000.5150.7681.000
2024-01-10T07:25:09.335701image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대분류명중분류명
대분류명1.0000.999
중분류명0.9991.000
2024-01-10T07:25:09.406228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위도경도대분류명중분류명
위도1.000-0.2630.1030.138
경도-0.2631.0000.1230.159
대분류명0.1030.1231.0000.999
중분류명0.1380.1590.9991.000

Missing values

2024-01-10T07:25:06.945261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-01-10T07:25:07.032776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

대분류명중분류명소분류명구지번위도경도
23645산업서비스산업임대업충청남도 당진시 읍내동 25-23126.6348736.896324
12978시설물도로시설기타도로시설물충청남도 당진시 읍내동126.63859836.890789
22807산업서비스산업전문소매업충청남도 당진시 읍내동 531-12126.630136.89332
10953시설물도로시설진출입시설충청남도 당진시 합덕읍 운산리 288-3126.77349936.808851
10106시설물도로시설진출입시설충청남도 당진시 송산면 상거리 326-6126.67979336.933128
3624시설물편의시설보행시설충청남도 당진시 석문면 교로리 944-21126.50495437.049732
17953숙박및음식음식점디저트충청남도 당진시 읍내동 528-19126.6303636.893604
2100교육및보건교육시설유아교육기관충청남도 당진시 대덕동 947-2126.63733136.87612
12241시설물도로시설진출입시설충청남도 당진시 신평면 거산리 443126.74182536.886264
26694레저및관광및예술스포츠시설생활스포츠시설충청남도 당진시 합덕읍 석우리 1147126.74681536.80956
대분류명중분류명소분류명구지번위도경도
14201시설물교통시설버스터미널및정류장충청남도 당진시 석문면 교로리126.52604937.019033
354레저및관광및예술관광지골짜기및고개충청남도 당진시 면천면 성상리126.67830236.81534
1179레저및관광및예술관광지들및평야충청남도 당진시 합덕읍 신흥리126.85085136.798516
16968숙박및음식음식점일반음식점충청남도 당진시 면천면 문봉리 186-34번지126.71195736.806874
6122시설물안전시설기타안전시설126.77798736.809305
7935시설물도로시설진출입시설충청남도 당진시 합덕읍 대전리 134-44126.76177136.765458
13583시설물도로시설기타도로시설물충청남도 당진시 합덕읍 운산리126.78000736.801958
13975시설물기반시설유통및공급시설충청남도 당진시 채운동 238-21126.62682236.890484
25529산업서비스산업언론및미디어충청남도 당진시 읍내동 160-1126.63202936.890936
893레저및관광및예술관광지들및평야충청남도 당진시 면천면 송학리126.64472336.82822

Duplicate rows

Most frequently occurring

대분류명중분류명소분류명구지번위도경도# duplicates
255시설물기반시설유통및공급시설충청남도 당진시 시곡동 57-1126.68416636.90429812
211숙박및음식음식점일반음식점충청남도 당진시 신평면 운정리 961-3126.82294936.8911068
259시설물기반시설유통및공급시설충청남도 당진시 읍내동 232-85126.63166636.8937437
268시설물기반시설유통및공급시설충청남도 당진시 정미면 덕마리 379-3 임126.59194436.8498547
15교육및보건보건시설기타보건시설충청남도 당진시 수청동 1002126.64602936.8899976
188숙박및음식음식점일반음식점충청남도 당진시 수청동 988126.64623836.9010196
62산업서비스산업임대업충청남도 당진시 대덕동 1643126.63686336.8870745
89산업서비스산업전문도매업충청남도 당진시 읍내동 145-13126.62960736.8899315
190숙박및음식음식점일반음식점충청남도 당진시 수청동 988126.64624936.9010575
244숙박및음식음식점주점충청남도 당진시 수청동 997126.64698536.901495