Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells2959
Missing cells (%)4.2%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory625.0 KiB
Average record size in memory64.0 B

Variable types

Categorical5
Text2

Dataset

Description충청남도 보령시 도로명주소(건물번호)에 관련된 자료로 시군구명, 읍면동명, 도로명주소, 건축물용도, 형태, 건물표지판의 용도로 구성되어있습니다.
Author충청남도
URLhttps://alldam.chungnam.go.kr/index.chungnam?menuCd=DOM_000000201001001001&st=&cds=&orgCd=&apiType=&isOpen=Y&pageIndex=400&beforeMenuCd=DOM_000000201001001000&publicdatapk=15041819

Alerts

시군구명 has constant value ""Constant
Dataset has 1 (< 0.1%) duplicate rowsDuplicates
형태 is highly overall correlated with 용도High correlation
용도 is highly overall correlated with 형태High correlation
건축물용도 is highly imbalanced (55.3%)Imbalance
형태 is highly imbalanced (97.5%)Imbalance
용도 is highly imbalanced (96.3%)Imbalance
리명 has 2959 (29.6%) missing valuesMissing

Reproduction

Analysis started2024-01-14 06:39:45.054164
Analysis finished2024-01-14 06:39:46.334828
Duration1.28 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시군구명
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
보령시
10000 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row보령시
2nd row보령시
3rd row보령시
4th row보령시
5th row보령시

Common Values

ValueCountFrequency (%)
보령시 10000
100.0%

Length

2024-01-14T15:39:46.400964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-14T15:39:46.484807image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
보령시 10000
100.0%

읍면동명
Categorical

Distinct21
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
웅천읍
1106 
대천동
1012 
남포면
929 
오천면
811 
주교면
724 
Other values (16)
5418 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row동대동
2nd row청라면
3rd row성주면
4th row청라면
5th row주교면

Common Values

ValueCountFrequency (%)
웅천읍 1106
11.1%
대천동 1012
10.1%
남포면 929
 
9.3%
오천면 811
 
8.1%
주교면 724
 
7.2%
청라면 709
 
7.1%
천북면 701
 
7.0%
동대동 534
 
5.3%
신흑동 508
 
5.1%
청소면 506
 
5.1%
Other values (11) 2460
24.6%

Length

2024-01-14T15:39:46.578506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
웅천읍 1106
11.1%
대천동 1012
10.1%
남포면 929
 
9.3%
오천면 811
 
8.1%
주교면 724
 
7.2%
청라면 709
 
7.1%
천북면 701
 
7.0%
동대동 534
 
5.3%
신흑동 508
 
5.1%
청소면 506
 
5.1%
Other values (11) 2460
24.6%

리명
Text

MISSING 

Distinct101
Distinct (%)1.4%
Missing2959
Missing (%)29.6%
Memory size156.2 KiB
2024-01-14T15:39:46.885962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length3
Mean length3.0622071
Min length2

Characters and Unicode

Total characters21561
Distinct characters96
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row나원리
2nd row성주리
3rd row나원리
4th row신대리
5th row삼현리
ValueCountFrequency (%)
대창리 261
 
3.7%
성주리 248
 
3.5%
원산도리 174
 
2.5%
주교리 161
 
2.3%
신대리 157
 
2.2%
관당리 151
 
2.1%
진죽리 147
 
2.1%
삽시도리 139
 
2.0%
하만리 138
 
2.0%
나원리 137
 
1.9%
Other values (91) 5328
75.7%
2024-01-14T15:39:47.444857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7041
32.7%
710
 
3.3%
585
 
2.7%
570
 
2.6%
545
 
2.5%
526
 
2.4%
460
 
2.1%
435
 
2.0%
385
 
1.8%
363
 
1.7%
Other values (86) 9941
46.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 21561
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
7041
32.7%
710
 
3.3%
585
 
2.7%
570
 
2.6%
545
 
2.5%
526
 
2.4%
460
 
2.1%
435
 
2.0%
385
 
1.8%
363
 
1.7%
Other values (86) 9941
46.1%

Most occurring scripts

ValueCountFrequency (%)
Hangul 21561
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
7041
32.7%
710
 
3.3%
585
 
2.7%
570
 
2.6%
545
 
2.5%
526
 
2.4%
460
 
2.1%
435
 
2.0%
385
 
1.8%
363
 
1.7%
Other values (86) 9941
46.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 21561
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
7041
32.7%
710
 
3.3%
585
 
2.7%
570
 
2.6%
545
 
2.5%
526
 
2.4%
460
 
2.1%
435
 
2.0%
385
 
1.8%
363
 
1.7%
Other values (86) 9941
46.1%
Distinct9999
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-01-14T15:39:47.924748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length12
Mean length8.5429
Min length4

Characters and Unicode

Total characters85429
Distinct characters310
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9998 ?
Unique (%)> 99.9%

Sample

1st row평원길 52
2nd row원모루길 194
3rd row심원계곡로 152-2
4th row당안길 3
5th row신대1길 90-3
ValueCountFrequency (%)
충서로 182
 
0.9%
토정로 128
 
0.6%
홍보로 99
 
0.5%
죽성로 94
 
0.5%
만수로 85
 
0.4%
성주산로 84
 
0.4%
중앙로 84
 
0.4%
대해로 79
 
0.4%
보령남로 71
 
0.4%
10 66
 
0.3%
Other values (5062) 19028
95.1%
2024-01-14T15:39:48.381440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10000
 
11.7%
7570
 
8.9%
1 7132
 
8.3%
2 5037
 
5.9%
- 4563
 
5.3%
3 4095
 
4.8%
4 3316
 
3.9%
5 2885
 
3.4%
6 2601
 
3.0%
2475
 
2.9%
Other values (300) 35755
41.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 36921
43.2%
Decimal Number 33945
39.7%
Space Separator 10000
 
11.7%
Dash Punctuation 4563
 
5.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
7570
 
20.5%
2475
 
6.7%
836
 
2.3%
784
 
2.1%
699
 
1.9%
603
 
1.6%
602
 
1.6%
564
 
1.5%
506
 
1.4%
477
 
1.3%
Other values (288) 21805
59.1%
Decimal Number
ValueCountFrequency (%)
1 7132
21.0%
2 5037
14.8%
3 4095
12.1%
4 3316
9.8%
5 2885
8.5%
6 2601
 
7.7%
7 2447
 
7.2%
8 2183
 
6.4%
0 2150
 
6.3%
9 2099
 
6.2%
Space Separator
ValueCountFrequency (%)
10000
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4563
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 48508
56.8%
Hangul 36921
43.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
7570
 
20.5%
2475
 
6.7%
836
 
2.3%
784
 
2.1%
699
 
1.9%
603
 
1.6%
602
 
1.6%
564
 
1.5%
506
 
1.4%
477
 
1.3%
Other values (288) 21805
59.1%
Common
ValueCountFrequency (%)
10000
20.6%
1 7132
14.7%
2 5037
10.4%
- 4563
9.4%
3 4095
8.4%
4 3316
 
6.8%
5 2885
 
5.9%
6 2601
 
5.4%
7 2447
 
5.0%
8 2183
 
4.5%
Other values (2) 4249
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 48508
56.8%
Hangul 36921
43.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10000
20.6%
1 7132
14.7%
2 5037
10.4%
- 4563
9.4%
3 4095
8.4%
4 3316
 
6.8%
5 2885
 
5.9%
6 2601
 
5.4%
7 2447
 
5.0%
8 2183
 
4.5%
Other values (2) 4249
8.8%
Hangul
ValueCountFrequency (%)
7570
 
20.5%
2475
 
6.7%
836
 
2.3%
784
 
2.1%
699
 
1.9%
603
 
1.6%
602
 
1.6%
564
 
1.5%
506
 
1.4%
477
 
1.3%
Other values (288) 21805
59.1%

건축물용도
Categorical

IMBALANCE 

Distinct22
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
단독주택
6106 
창고시설
1969 
제2종근린생활시설
 
479
제1종근린생활시설
 
315
판매 및 영업시설
 
232
Other values (17)
899 

Length

Max length11
Median length4
Mean length4.6671
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row창고시설
2nd row단독주택
3rd row제2종근린생활시설
4th row단독주택
5th row단독주택

Common Values

ValueCountFrequency (%)
단독주택 6106
61.1%
창고시설 1969
 
19.7%
제2종근린생활시설 479
 
4.8%
제1종근린생활시설 315
 
3.1%
판매 및 영업시설 232
 
2.3%
동식물관련시설 203
 
2.0%
숙박시설 166
 
1.7%
업무시설 119
 
1.2%
공장 90
 
0.9%
교육연구 및 복지시설 74
 
0.7%
Other values (12) 247
 
2.5%

Length

2024-01-14T15:39:48.519684image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
단독주택 6106
57.1%
창고시설 1969
 
18.4%
제2종근린생활시설 479
 
4.5%
350
 
3.3%
제1종근린생활시설 315
 
2.9%
판매 232
 
2.2%
영업시설 232
 
2.2%
동식물관련시설 203
 
1.9%
숙박시설 166
 
1.6%
업무시설 119
 
1.1%
Other values (16) 529
 
4.9%

형태
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
표준형
9975 
비표준형
 
25

Length

Max length4
Median length3
Mean length3.0025
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row표준형
2nd row표준형
3rd row표준형
4th row표준형
5th row표준형

Common Values

ValueCountFrequency (%)
표준형 9975
99.8%
비표준형 25
 
0.2%

Length

2024-01-14T15:39:48.637693image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-14T15:39:48.737441image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
표준형 9975
99.8%
비표준형 25
 
0.2%

용도
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
일반용(오각형)
9940 
관공서용
 
35
자율형
 
25

Length

Max length8
Median length8
Mean length7.9735
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row일반용(오각형)
2nd row일반용(오각형)
3rd row일반용(오각형)
4th row일반용(오각형)
5th row일반용(오각형)

Common Values

ValueCountFrequency (%)
일반용(오각형) 9940
99.4%
관공서용 35
 
0.4%
자율형 25
 
0.2%

Length

2024-01-14T15:39:48.856540image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-14T15:39:48.964005image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반용(오각형 9940
99.4%
관공서용 35
 
0.4%
자율형 25
 
0.2%

Correlations

2024-01-14T15:39:49.030841image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
읍면동명건축물용도형태용도
읍면동명1.0000.5290.0300.095
건축물용도0.5291.0000.0000.220
형태0.0300.0001.0001.000
용도0.0950.2201.0001.000
2024-01-14T15:39:49.133047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
건축물용도읍면동명형태용도
건축물용도1.0000.1710.0000.116
읍면동명0.1711.0000.0260.043
형태0.0000.0261.0001.000
용도0.1160.0431.0001.000
2024-01-14T15:39:49.245671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
읍면동명건축물용도형태용도
읍면동명1.0000.1710.0260.043
건축물용도0.1711.0000.0000.116
형태0.0260.0001.0001.000
용도0.0430.1161.0001.000

Missing values

2024-01-14T15:39:46.092601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-01-14T15:39:46.258980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

시군구명읍면동명리명도로명주소건축물용도형태용도
3565보령시동대동<NA>평원길 52창고시설표준형일반용(오각형)
11679보령시청라면나원리원모루길 194단독주택표준형일반용(오각형)
14918보령시성주면성주리심원계곡로 152-2제2종근린생활시설표준형일반용(오각형)
26956보령시청라면나원리당안길 3단독주택표준형일반용(오각형)
15683보령시주교면신대리신대1길 90-3단독주택표준형일반용(오각형)
19007보령시남포면삼현리삼현3길 8-70단독주택표준형일반용(오각형)
17215보령시대천동<NA>소미1길 68단독주택표준형일반용(오각형)
4117보령시주교면은포리토정로 486-26창고시설표준형일반용(오각형)
23184보령시동대동<NA>매방아1길 12-15단독주택표준형일반용(오각형)
27237보령시내항동<NA>녹문3길 39-16단독주택표준형일반용(오각형)
시군구명읍면동명리명도로명주소건축물용도형태용도
12833보령시주산면황율리옹동말길 59단독주택표준형일반용(오각형)
23434보령시웅천읍수부리만수로 582-25단독주택표준형일반용(오각형)
11221보령시오천면원산도리원산도5길 19-7단독주택표준형일반용(오각형)
21459보령시대천동<NA>벼루길 11-12단독주택표준형일반용(오각형)
27649보령시청라면의평리냉풍욕장길 45-6제2종근린생활시설표준형일반용(오각형)
4629보령시동대동<NA>큰오랏3길 30제2종근린생활시설표준형일반용(오각형)
15565보령시주교면신대리신대동길 177단독주택표준형일반용(오각형)
23460보령시웅천읍수부리만수로 454-12창고시설표준형일반용(오각형)
26789보령시미산면늑전리대늑길 33창고시설표준형일반용(오각형)
1608보령시오천면효자도리허육도길 18단독주택표준형일반용(오각형)

Duplicate rows

Most frequently occurring

시군구명읍면동명리명도로명주소건축물용도형태용도# duplicates
0보령시주포면관산리대학길 106교육연구 및 복지시설표준형일반용(오각형)2