Overview

Dataset statistics

Number of variables14
Number of observations10000
Missing cells35498
Missing cells (%)25.4%
Duplicate rows479
Duplicate rows (%)4.8%
Total size in memory1.2 MiB
Average record size in memory124.0 B

Variable types

Text4
Categorical7
Numeric1
Unsupported2

Dataset

Description당진시 금연시설물 정보(금연구역구분, 위반신고전화번호, 금연시설명, 과태료,금연시설물주소, 관리기관명)
Author충청남도
URLhttps://alldam.chungnam.go.kr/index.chungnam?menuCd=DOM_000000201001001001&st=&cds=&orgCd=&apiType=&isOpen=Y&pageIndex=398&beforeMenuCd=DOM_000000201001001000&publicdatapk=15042403

Alerts

Dataset has 479 (4.8%) duplicate rowsDuplicates
위반신고전화번호 is highly overall correlated with 금연구역면적 and 6 other fieldsHigh correlation
금연구역범위상세 is highly overall correlated with 금연구역면적 and 6 other fieldsHigh correlation
시군구명 is highly overall correlated with 금연구역면적 and 5 other fieldsHigh correlation
데이터기준일자 is highly overall correlated with 금연구역면적 and 6 other fieldsHigh correlation
금연구역지정근거명 is highly overall correlated with 금연구역면적 and 6 other fieldsHigh correlation
관리기관명 is highly overall correlated with 금연구역면적 and 6 other fieldsHigh correlation
시도명 is highly overall correlated with 금연구역면적 and 5 other fieldsHigh correlation
금연구역면적 is highly overall correlated with 금연구역범위상세 and 6 other fieldsHigh correlation
금연구역범위상세 is highly imbalanced (99.9%)Imbalance
시도명 is highly imbalanced (99.9%)Imbalance
시군구명 is highly imbalanced (99.9%)Imbalance
금연구역지정근거명 is highly imbalanced (99.9%)Imbalance
위반신고전화번호 is highly imbalanced (99.9%)Imbalance
관리기관명 is highly imbalanced (99.6%)Imbalance
데이터기준일자 is highly imbalanced (99.6%)Imbalance
금연구역면적 has 7501 (75.0%) missing valuesMissing
위반과태료 has 10000 (100.0%) missing valuesMissing
소재지도로명주소 has 851 (8.5%) missing valuesMissing
소재지지번주소 has 7145 (71.5%) missing valuesMissing
Unnamed: 13 has 10000 (100.0%) missing valuesMissing
위반과태료 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 13 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-01-09 21:36:15.782916
Analysis finished2024-01-09 21:36:17.735560
Duration1.95 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct7249
Distinct (%)72.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-01-10T06:36:17.907152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length39
Median length24
Mean length7.4282
Min length1

Characters and Unicode

Total characters74282
Distinct characters911
Distinct categories13 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4502 ?
Unique (%)45.0%

Sample

1st row원당마을1단지(시내방면)
2nd row명산리(외나무다리)
3rd row농민주유소버스정류소
4th row차리인동네
5th row예일어린이집(70소7076)
ValueCountFrequency (%)
놀이터 83
 
0.7%
gs25 80
 
0.7%
씨유 68
 
0.6%
놀이시설 59
 
0.5%
1 45
 
0.4%
담배소매업소 43
 
0.4%
미니스톱 37
 
0.3%
잡화상 35
 
0.3%
어린이집 34
 
0.3%
세븐일레븐 34
 
0.3%
Other values (7377) 11010
95.5%
2024-01-10T06:36:18.253897image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1914
 
2.6%
1772
 
2.4%
( 1620
 
2.2%
) 1619
 
2.2%
1545
 
2.1%
1511
 
2.0%
1206
 
1.6%
1190
 
1.6%
1128
 
1.5%
1016
 
1.4%
Other values (901) 59761
80.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 62963
84.8%
Decimal Number 4252
 
5.7%
Open Punctuation 1620
 
2.2%
Close Punctuation 1619
 
2.2%
Space Separator 1545
 
2.1%
Uppercase Letter 1281
 
1.7%
Lowercase Letter 525
 
0.7%
Connector Punctuation 246
 
0.3%
Other Punctuation 119
 
0.2%
Dash Punctuation 74
 
0.1%
Other values (3) 38
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1914
 
3.0%
1772
 
2.8%
1511
 
2.4%
1206
 
1.9%
1190
 
1.9%
1128
 
1.8%
1016
 
1.6%
988
 
1.6%
949
 
1.5%
929
 
1.5%
Other values (820) 50360
80.0%
Uppercase Letter
ValueCountFrequency (%)
G 214
16.7%
C 213
16.6%
S 204
15.9%
P 112
8.7%
O 53
 
4.1%
B 50
 
3.9%
A 47
 
3.7%
U 42
 
3.3%
I 42
 
3.3%
E 32
 
2.5%
Other values (16) 272
21.2%
Lowercase Letter
ValueCountFrequency (%)
m 218
41.5%
e 51
 
9.7%
o 34
 
6.5%
n 28
 
5.3%
c 28
 
5.3%
a 24
 
4.6%
r 18
 
3.4%
f 14
 
2.7%
i 14
 
2.7%
h 11
 
2.1%
Other values (14) 85
 
16.2%
Other Punctuation
ValueCountFrequency (%)
. 67
56.3%
& 25
 
21.0%
· 7
 
5.9%
? 5
 
4.2%
# 4
 
3.4%
! 4
 
3.4%
% 2
 
1.7%
2
 
1.7%
/ 1
 
0.8%
* 1
 
0.8%
Decimal Number
ValueCountFrequency (%)
1 786
18.5%
2 677
15.9%
7 624
14.7%
0 556
13.1%
5 414
9.7%
3 319
7.5%
4 267
 
6.3%
9 239
 
5.6%
6 197
 
4.6%
8 173
 
4.1%
Other Symbol
ValueCountFrequency (%)
30
96.8%
° 1
 
3.2%
Math Symbol
ValueCountFrequency (%)
+ 4
66.7%
~ 2
33.3%
Open Punctuation
ValueCountFrequency (%)
( 1620
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1619
100.0%
Space Separator
ValueCountFrequency (%)
1545
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 246
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 74
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 62989
84.8%
Common 9483
 
12.8%
Latin 1806
 
2.4%
Han 4
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1914
 
3.0%
1772
 
2.8%
1511
 
2.4%
1206
 
1.9%
1190
 
1.9%
1128
 
1.8%
1016
 
1.6%
988
 
1.6%
949
 
1.5%
929
 
1.5%
Other values (817) 50386
80.0%
Latin
ValueCountFrequency (%)
m 218
 
12.1%
G 214
 
11.8%
C 213
 
11.8%
S 204
 
11.3%
P 112
 
6.2%
O 53
 
2.9%
e 51
 
2.8%
B 50
 
2.8%
A 47
 
2.6%
U 42
 
2.3%
Other values (40) 602
33.3%
Common
ValueCountFrequency (%)
( 1620
17.1%
) 1619
17.1%
1545
16.3%
1 786
8.3%
2 677
7.1%
7 624
 
6.6%
0 556
 
5.9%
5 414
 
4.4%
3 319
 
3.4%
4 267
 
2.8%
Other values (20) 1056
11.1%
Han
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 62959
84.8%
ASCII 11279
 
15.2%
None 40
 
0.1%
CJK 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1914
 
3.0%
1772
 
2.8%
1511
 
2.4%
1206
 
1.9%
1190
 
1.9%
1128
 
1.8%
1016
 
1.6%
988
 
1.6%
949
 
1.5%
929
 
1.5%
Other values (816) 50356
80.0%
ASCII
ValueCountFrequency (%)
( 1620
14.4%
) 1619
14.4%
1545
13.7%
1 786
 
7.0%
2 677
 
6.0%
7 624
 
5.5%
0 556
 
4.9%
5 414
 
3.7%
3 319
 
2.8%
4 267
 
2.4%
Other values (67) 2852
25.3%
None
ValueCountFrequency (%)
30
75.0%
· 7
 
17.5%
2
 
5.0%
° 1
 
2.5%
CJK
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

금연구역범위상세
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
건물 및 영엄장
9999 
<NA>
 
1

Length

Max length8
Median length8
Mean length7.9996
Min length4

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row건물 및 영엄장
2nd row건물 및 영엄장
3rd row건물 및 영엄장
4th row건물 및 영엄장
5th row건물 및 영엄장

Common Values

ValueCountFrequency (%)
건물 및 영엄장 9999
> 99.9%
<NA> 1
 
< 0.1%

Length

2024-01-10T06:36:18.365974image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T06:36:18.444823image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
건물 9999
33.3%
9999
33.3%
영엄장 9999
33.3%
na 1
 
< 0.1%

시도명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
충청남도
9999 
당진시보건소
 
1

Length

Max length6
Median length4
Mean length4.0002
Min length4

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row충청남도
2nd row충청남도
3rd row충청남도
4th row충청남도
5th row충청남도

Common Values

ValueCountFrequency (%)
충청남도 9999
> 99.9%
당진시보건소 1
 
< 0.1%

Length

2024-01-10T06:36:18.525940image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T06:36:18.605025image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
충청남도 9999
> 99.9%
당진시보건소 1
 
< 0.1%

시군구명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
당진시
9999 
43808.0
 
1

Length

Max length7
Median length3
Mean length3.0004
Min length3

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row당진시
2nd row당진시
3rd row당진시
4th row당진시
5th row당진시

Common Values

ValueCountFrequency (%)
당진시 9999
> 99.9%
43808.0 1
 
< 0.1%

Length

2024-01-10T06:36:18.687900image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T06:36:18.780807image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
당진시 9999
> 99.9%
43808.0 1
 
< 0.1%
Distinct55
Distinct (%)0.6%
Missing1
Missing (%)< 0.1%
Memory size156.2 KiB
2024-01-10T06:36:18.914906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length3
Mean length4.3035304
Min length2

Characters and Unicode

Total characters43031
Distinct characters113
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row버스정류장
2nd row버스정류장
3rd row버스정류소
4th row음식점
5th row어린이운송용 승합차
ValueCountFrequency (%)
음식점 5305
51.9%
담배소매업소 1315
 
12.9%
버스정류장 577
 
5.6%
버스정류소 545
 
5.3%
어린이놀이시설 284
 
2.8%
어린이집 274
 
2.7%
어린이운송용 228
 
2.2%
승합차 228
 
2.2%
주유소 198
 
1.9%
실내체육시설 182
 
1.8%
Other values (49) 1094
 
10.7%
2024-01-10T06:36:19.176311image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5305
 
12.3%
5305
 
12.3%
5305
 
12.3%
3540
 
8.2%
1477
 
3.4%
1342
 
3.1%
1327
 
3.1%
1327
 
3.1%
1300
 
3.0%
1223
 
2.8%
Other values (103) 15580
36.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 42203
98.1%
Decimal Number 326
 
0.8%
Space Separator 231
 
0.5%
Lowercase Letter 163
 
0.4%
Uppercase Letter 108
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5305
 
12.6%
5305
 
12.6%
5305
 
12.6%
3540
 
8.4%
1477
 
3.5%
1342
 
3.2%
1327
 
3.1%
1327
 
3.1%
1300
 
3.1%
1223
 
2.9%
Other values (96) 14752
35.0%
Uppercase Letter
ValueCountFrequency (%)
L 36
33.3%
P 36
33.3%
G 36
33.3%
Decimal Number
ValueCountFrequency (%)
0 163
50.0%
1 163
50.0%
Space Separator
ValueCountFrequency (%)
231
100.0%
Lowercase Letter
ValueCountFrequency (%)
m 163
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 42203
98.1%
Common 557
 
1.3%
Latin 271
 
0.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5305
 
12.6%
5305
 
12.6%
5305
 
12.6%
3540
 
8.4%
1477
 
3.5%
1342
 
3.2%
1327
 
3.1%
1327
 
3.1%
1300
 
3.1%
1223
 
2.9%
Other values (96) 14752
35.0%
Latin
ValueCountFrequency (%)
m 163
60.1%
L 36
 
13.3%
P 36
 
13.3%
G 36
 
13.3%
Common
ValueCountFrequency (%)
231
41.5%
0 163
29.3%
1 163
29.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 42203
98.1%
ASCII 828
 
1.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5305
 
12.6%
5305
 
12.6%
5305
 
12.6%
3540
 
8.4%
1477
 
3.5%
1342
 
3.2%
1327
 
3.1%
1327
 
3.1%
1300
 
3.1%
1223
 
2.9%
Other values (96) 14752
35.0%
ASCII
ValueCountFrequency (%)
231
27.9%
0 163
19.7%
m 163
19.7%
1 163
19.7%
L 36
 
4.3%
P 36
 
4.3%
G 36
 
4.3%

금연구역지정근거명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
국민건강증진법 제9조
9999 
<NA>
 
1

Length

Max length11
Median length11
Mean length10.9993
Min length4

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row국민건강증진법 제9조
2nd row국민건강증진법 제9조
3rd row국민건강증진법 제9조
4th row국민건강증진법 제9조
5th row국민건강증진법 제9조

Common Values

ValueCountFrequency (%)
국민건강증진법 제9조 9999
> 99.9%
<NA> 1
 
< 0.1%

Length

2024-01-10T06:36:19.287538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T06:36:19.364717image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
국민건강증진법 9999
50.0%
제9조 9999
50.0%
na 1
 
< 0.1%

금연구역면적
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct1943
Distinct (%)77.8%
Missing7501
Missing (%)75.0%
Infinite0
Infinite (%)0.0%
Mean106.49605
Minimum0
Maximum3950
Zeros99
Zeros (%)1.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-01-10T06:36:19.449666image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile15.626
Q145
median76.6
Q3125.685
95-th percentile238.487
Maximum3950
Range3950
Interquartile range (IQR)80.685

Descriptive statistics

Standard deviation178.42596
Coefficient of variation (CV)1.6754233
Kurtosis215.39093
Mean106.49605
Median Absolute Deviation (MAD)37.35
Skewness12.627398
Sum266133.62
Variance31835.824
MonotonicityNot monotonic
2024-01-10T06:36:19.560139image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 99
 
1.0%
45.0 8
 
0.1%
100.0 8
 
0.1%
33.0 8
 
0.1%
39.6 7
 
0.1%
66.0 7
 
0.1%
50.0 6
 
0.1%
52.0 6
 
0.1%
72.0 6
 
0.1%
34.0 6
 
0.1%
Other values (1933) 2338
 
23.4%
(Missing) 7501
75.0%
ValueCountFrequency (%)
0.0 99
1.0%
6.6 1
 
< 0.1%
7.68 1
 
< 0.1%
7.9 1
 
< 0.1%
8.55 1
 
< 0.1%
8.8 1
 
< 0.1%
10.15 1
 
< 0.1%
10.2 1
 
< 0.1%
11.68 1
 
< 0.1%
11.78 1
 
< 0.1%
ValueCountFrequency (%)
3950.0 1
< 0.1%
3700.0 1
< 0.1%
2946.0 1
< 0.1%
2496.25 1
< 0.1%
2472.21 1
< 0.1%
2061.9 1
< 0.1%
1934.8 1
< 0.1%
1444.65 1
< 0.1%
1275.22 1
< 0.1%
1271.78 1
< 0.1%

위반과태료
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10000
Missing (%)100.0%
Memory size166.0 KiB

위반신고전화번호
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
041-360-6053
9999 
<NA>
 
1

Length

Max length12
Median length12
Mean length11.9992
Min length4

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row041-360-6053
2nd row041-360-6053
3rd row041-360-6053
4th row041-360-6053
5th row041-360-6053

Common Values

ValueCountFrequency (%)
041-360-6053 9999
> 99.9%
<NA> 1
 
< 0.1%

Length

2024-01-10T06:36:19.669763image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T06:36:19.755782image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
041-360-6053 9999
> 99.9%
na 1
 
< 0.1%
Distinct4914
Distinct (%)53.7%
Missing851
Missing (%)8.5%
Memory size156.2 KiB
2024-01-10T06:36:20.022786image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length52
Median length49
Mean length23.468248
Min length9

Characters and Unicode

Total characters214711
Distinct characters422
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2313 ?
Unique (%)25.3%

Sample

1st row충청남도 당진시 송산면 금암리 96-1
2nd row충청남도 당진시 구봉로 101-3 (원당동)
3rd row충청남도 당진시 수청로 300-22
4th row충청남도 당진시 합덕읍 합덕산단4로 12-23
5th row충청남도 당진시 원당로 163 (원당동)
ValueCountFrequency (%)
충청남도 9149
19.6%
당진시 9149
19.6%
송악읍 1602
 
3.4%
읍내동 1402
 
3.0%
신평면 1039
 
2.2%
합덕읍 777
 
1.7%
석문면 740
 
1.6%
송산면 583
 
1.3%
당진중앙2로 355
 
0.8%
채운동 268
 
0.6%
Other values (4156) 21533
46.2%
2024-01-10T06:36:20.429256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
37449
 
17.4%
10390
 
4.8%
10106
 
4.7%
9833
 
4.6%
9557
 
4.5%
9464
 
4.4%
9313
 
4.3%
9152
 
4.3%
1 8220
 
3.8%
5366
 
2.5%
Other values (412) 95861
44.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 131424
61.2%
Space Separator 37449
 
17.4%
Decimal Number 34695
 
16.2%
Dash Punctuation 3626
 
1.7%
Open Punctuation 3315
 
1.5%
Close Punctuation 3311
 
1.5%
Other Punctuation 730
 
0.3%
Uppercase Letter 128
 
0.1%
Lowercase Letter 18
 
< 0.1%
Math Symbol 15
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10390
 
7.9%
10106
 
7.7%
9833
 
7.5%
9557
 
7.3%
9464
 
7.2%
9313
 
7.1%
9152
 
7.0%
5366
 
4.1%
3967
 
3.0%
3868
 
2.9%
Other values (368) 50408
38.4%
Uppercase Letter
ValueCountFrequency (%)
B 32
25.0%
A 28
21.9%
L 12
 
9.4%
C 10
 
7.8%
F 9
 
7.0%
P 8
 
6.2%
D 5
 
3.9%
T 5
 
3.9%
H 4
 
3.1%
E 4
 
3.1%
Other values (6) 11
 
8.6%
Decimal Number
ValueCountFrequency (%)
1 8220
23.7%
2 5123
14.8%
3 4227
12.2%
4 2883
 
8.3%
0 2725
 
7.9%
7 2665
 
7.7%
5 2511
 
7.2%
6 2372
 
6.8%
8 2091
 
6.0%
9 1878
 
5.4%
Other Punctuation
ValueCountFrequency (%)
. 502
68.8%
? 209
28.6%
10
 
1.4%
/ 5
 
0.7%
@ 4
 
0.5%
Lowercase Letter
ValueCountFrequency (%)
a 9
50.0%
e 6
33.3%
c 2
 
11.1%
b 1
 
5.6%
Math Symbol
ValueCountFrequency (%)
~ 11
73.3%
> 2
 
13.3%
< 2
 
13.3%
Open Punctuation
ValueCountFrequency (%)
( 3313
99.9%
[ 2
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 3309
99.9%
] 2
 
0.1%
Space Separator
ValueCountFrequency (%)
37449
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3626
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 131424
61.2%
Common 83141
38.7%
Latin 146
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10390
 
7.9%
10106
 
7.7%
9833
 
7.5%
9557
 
7.3%
9464
 
7.2%
9313
 
7.1%
9152
 
7.0%
5366
 
4.1%
3967
 
3.0%
3868
 
2.9%
Other values (368) 50408
38.4%
Common
ValueCountFrequency (%)
37449
45.0%
1 8220
 
9.9%
2 5123
 
6.2%
3 4227
 
5.1%
- 3626
 
4.4%
( 3313
 
4.0%
) 3309
 
4.0%
4 2883
 
3.5%
0 2725
 
3.3%
7 2665
 
3.2%
Other values (14) 9601
 
11.5%
Latin
ValueCountFrequency (%)
B 32
21.9%
A 28
19.2%
L 12
 
8.2%
C 10
 
6.8%
F 9
 
6.2%
a 9
 
6.2%
P 8
 
5.5%
e 6
 
4.1%
D 5
 
3.4%
T 5
 
3.4%
Other values (10) 22
15.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 131424
61.2%
ASCII 83277
38.8%
None 10
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
37449
45.0%
1 8220
 
9.9%
2 5123
 
6.2%
3 4227
 
5.1%
- 3626
 
4.4%
( 3313
 
4.0%
) 3309
 
4.0%
4 2883
 
3.5%
0 2725
 
3.3%
7 2665
 
3.2%
Other values (33) 9737
 
11.7%
Hangul
ValueCountFrequency (%)
10390
 
7.9%
10106
 
7.7%
9833
 
7.5%
9557
 
7.3%
9464
 
7.2%
9313
 
7.1%
9152
 
7.0%
5366
 
4.1%
3967
 
3.0%
3868
 
2.9%
Other values (368) 50408
38.4%
None
ValueCountFrequency (%)
10
100.0%

소재지지번주소
Text

MISSING 

Distinct2447
Distinct (%)85.7%
Missing7145
Missing (%)71.5%
Memory size156.2 KiB
2024-01-10T06:36:20.704905image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length48
Median length40
Mean length24.64063
Min length13

Characters and Unicode

Total characters70349
Distinct characters280
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2161 ?
Unique (%)75.7%

Sample

1st row충청남도 당진시 읍내동 1254-10도
2nd row충청남도 당진시 송산면 명산리 34-198도
3rd row충청남도 당진시 원당동 508번지 3호
4th row충청남도 당진시 합덕읍 석우리 1150번지
5th row충청남도 당진시 원당동 661번지 1호
ValueCountFrequency (%)
충청남도 2855
18.9%
당진시 2855
18.9%
송악읍 495
 
3.3%
읍내동 470
 
3.1%
신평면 325
 
2.2%
석문면 271
 
1.8%
합덕읍 257
 
1.7%
1호 207
 
1.4%
복운리 187
 
1.2%
운산리 185
 
1.2%
Other values (1831) 6959
46.2%
2024-01-10T06:36:21.103618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
16470
23.4%
3228
 
4.6%
2949
 
4.2%
2948
 
4.2%
2944
 
4.2%
2904
 
4.1%
2863
 
4.1%
2856
 
4.1%
1 2350
 
3.3%
2303
 
3.3%
Other values (270) 28534
40.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 41852
59.5%
Space Separator 16470
 
23.4%
Decimal Number 11525
 
16.4%
Dash Punctuation 438
 
0.6%
Uppercase Letter 21
 
< 0.1%
Open Punctuation 14
 
< 0.1%
Close Punctuation 14
 
< 0.1%
Lowercase Letter 7
 
< 0.1%
Other Punctuation 6
 
< 0.1%
Math Symbol 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3228
 
7.7%
2949
 
7.0%
2948
 
7.0%
2944
 
7.0%
2904
 
6.9%
2863
 
6.8%
2856
 
6.8%
2303
 
5.5%
2168
 
5.2%
1958
 
4.7%
Other values (237) 14731
35.2%
Uppercase Letter
ValueCountFrequency (%)
A 5
23.8%
B 3
14.3%
C 2
 
9.5%
T 2
 
9.5%
D 1
 
4.8%
P 1
 
4.8%
I 1
 
4.8%
F 1
 
4.8%
K 1
 
4.8%
G 1
 
4.8%
Other values (3) 3
14.3%
Decimal Number
ValueCountFrequency (%)
1 2350
20.4%
2 1532
13.3%
3 1275
11.1%
6 1175
10.2%
5 1172
10.2%
4 1088
9.4%
9 843
 
7.3%
8 738
 
6.4%
0 700
 
6.1%
7 652
 
5.7%
Lowercase Letter
ValueCountFrequency (%)
e 5
71.4%
o 1
 
14.3%
n 1
 
14.3%
Other Punctuation
ValueCountFrequency (%)
. 5
83.3%
/ 1
 
16.7%
Space Separator
ValueCountFrequency (%)
16470
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 438
100.0%
Open Punctuation
ValueCountFrequency (%)
( 14
100.0%
Close Punctuation
ValueCountFrequency (%)
) 14
100.0%
Math Symbol
ValueCountFrequency (%)
~ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 41852
59.5%
Common 28469
40.5%
Latin 28
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3228
 
7.7%
2949
 
7.0%
2948
 
7.0%
2944
 
7.0%
2904
 
6.9%
2863
 
6.8%
2856
 
6.8%
2303
 
5.5%
2168
 
5.2%
1958
 
4.7%
Other values (237) 14731
35.2%
Common
ValueCountFrequency (%)
16470
57.9%
1 2350
 
8.3%
2 1532
 
5.4%
3 1275
 
4.5%
6 1175
 
4.1%
5 1172
 
4.1%
4 1088
 
3.8%
9 843
 
3.0%
8 738
 
2.6%
0 700
 
2.5%
Other values (7) 1126
 
4.0%
Latin
ValueCountFrequency (%)
A 5
17.9%
e 5
17.9%
B 3
10.7%
C 2
 
7.1%
T 2
 
7.1%
D 1
 
3.6%
P 1
 
3.6%
I 1
 
3.6%
F 1
 
3.6%
o 1
 
3.6%
Other values (6) 6
21.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 41852
59.5%
ASCII 28497
40.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
16470
57.8%
1 2350
 
8.2%
2 1532
 
5.4%
3 1275
 
4.5%
6 1175
 
4.1%
5 1172
 
4.1%
4 1088
 
3.8%
9 843
 
3.0%
8 738
 
2.6%
0 700
 
2.5%
Other values (23) 1154
 
4.0%
Hangul
ValueCountFrequency (%)
3228
 
7.7%
2949
 
7.0%
2948
 
7.0%
2944
 
7.0%
2904
 
6.9%
2863
 
6.8%
2856
 
6.8%
2303
 
5.5%
2168
 
5.2%
1958
 
4.7%
Other values (237) 14731
35.2%

관리기관명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
당진시보건소
9997 
<NA>
 
3

Length

Max length6
Median length6
Mean length5.9994
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row당진시보건소
2nd row당진시보건소
3rd row당진시보건소
4th row당진시보건소
5th row당진시보건소

Common Values

ValueCountFrequency (%)
당진시보건소 9997
> 99.9%
<NA> 3
 
< 0.1%

Length

2024-01-10T06:36:21.216065image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T06:36:21.295785image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
당진시보건소 9997
> 99.9%
na 3
 
< 0.1%

데이터기준일자
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
43808
9997 
<NA>
 
3

Length

Max length5
Median length5
Mean length4.9997
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row43808
2nd row43808
3rd row43808
4th row43808
5th row43808

Common Values

ValueCountFrequency (%)
43808 9997
> 99.9%
<NA> 3
 
< 0.1%

Length

2024-01-10T06:36:21.369445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T06:36:21.439988image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
43808 9997
> 99.9%
na 3
 
< 0.1%

Unnamed: 13
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10000
Missing (%)100.0%
Memory size166.0 KiB

Interactions

2024-01-10T06:36:17.230999image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-01-10T06:36:21.489870image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시도명시군구명금연구역구분금연구역면적
시도명1.0000.707NaNNaN
시군구명0.7071.000NaNNaN
금연구역구분NaNNaN1.0000.907
금연구역면적NaNNaN0.9071.000
2024-01-10T06:36:21.572000image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위반신고전화번호금연구역범위상세시군구명데이터기준일자금연구역지정근거명관리기관명시도명
위반신고전화번호1.0001.0001.0001.0001.0001.0001.000
금연구역범위상세1.0001.0001.0001.0001.0001.0001.000
시군구명1.0001.0001.0001.0001.0001.0000.500
데이터기준일자1.0001.0001.0001.0001.0001.0001.000
금연구역지정근거명1.0001.0001.0001.0001.0001.0001.000
관리기관명1.0001.0001.0001.0001.0001.0001.000
시도명1.0001.0000.5001.0001.0001.0001.000
2024-01-10T06:36:21.658657image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
금연구역면적금연구역범위상세시도명시군구명금연구역지정근거명위반신고전화번호관리기관명데이터기준일자
금연구역면적1.0001.0001.0001.0001.0001.0001.0001.000
금연구역범위상세1.0001.0001.0001.0001.0001.0001.0001.000
시도명1.0001.0001.0000.5001.0001.0001.0001.000
시군구명1.0001.0000.5001.0001.0001.0001.0001.000
금연구역지정근거명1.0001.0001.0001.0001.0001.0001.0001.000
위반신고전화번호1.0001.0001.0001.0001.0001.0001.0001.000
관리기관명1.0001.0001.0001.0001.0001.0001.0001.000
데이터기준일자1.0001.0001.0001.0001.0001.0001.0001.000

Missing values

2024-01-10T06:36:17.344428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-01-10T06:36:17.494985image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-01-10T06:36:17.639479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

금연구역명금연구역범위상세시도명시군구명금연구역구분금연구역지정근거명금연구역면적위반과태료위반신고전화번호소재지도로명주소소재지지번주소관리기관명데이터기준일자Unnamed: 13
205원당마을1단지(시내방면)건물 및 영엄장충청남도당진시버스정류장국민건강증진법 제9조<NA><NA>041-360-6053<NA>충청남도 당진시 읍내동 1254-10도당진시보건소43808<NA>
750명산리(외나무다리)건물 및 영엄장충청남도당진시버스정류장국민건강증진법 제9조<NA><NA>041-360-6053<NA>충청남도 당진시 송산면 명산리 34-198도당진시보건소43808<NA>
9714농민주유소버스정류소건물 및 영엄장충청남도당진시버스정류소국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 송산면 금암리 96-1<NA>당진시보건소43808<NA>
6581차리인동네건물 및 영엄장충청남도당진시음식점국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 구봉로 101-3 (원당동)충청남도 당진시 원당동 508번지 3호당진시보건소43808<NA>
5715예일어린이집(70소7076)건물 및 영엄장충청남도당진시어린이운송용 승합차국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 수청로 300-22<NA>당진시보건소43808<NA>
8021산단밥집건물 및 영엄장충청남도당진시음식점국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 합덕읍 합덕산단4로 12-23충청남도 당진시 합덕읍 석우리 1150번지당진시보건소43808<NA>
6765해저도시건물 및 영엄장충청남도당진시음식점국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 원당로 163 (원당동)충청남도 당진시 원당동 661번지 1호당진시보건소43808<NA>
6510은지네식당건물 및 영엄장충청남도당진시음식점국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 면천면 면천로 678충청남도 당진시 면천면 성상리 334번지 6호당진시보건소43808<NA>
7783태화루건물 및 영엄장충청남도당진시음식점국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 시청2로 6-5 (수청동)충청남도 당진시 수청동 1021번지당진시보건소43808<NA>
10234씨유 당진송산빌리지점건물 및 영엄장충청남도당진시담배소매업소국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 송산면 송산로 883-45 ((1층101호))<NA>당진시보건소43808<NA>
금연구역명금연구역범위상세시도명시군구명금연구역구분금연구역지정근거명금연구역면적위반과태료위반신고전화번호소재지도로명주소소재지지번주소관리기관명데이터기준일자Unnamed: 13
7725퓨전부페와생찌게방건물 및 영엄장충청남도당진시음식점국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 합덕읍 운산로 115충청남도 당진시 합덕읍 운산리 927번지 8호당진시보건소43808<NA>
5153출입국관리사무소 당진출장소건물 및 영엄장충청남도당진시공공기관국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 송악읍 고대공단2길 79-33 항만지원센터2<NA>당진시보건소43808<NA>
9098석문면사무소건물 및 영엄장충청남도당진시지방자치단체국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 석문면 통정3길 393-2<NA>당진시보건소43808<NA>
4408로또 복권방건물 및 영엄장충청남도당진시담배소매업소국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 송악읍 월곡로 25-5.B동 401호 (대신빌라)<NA>당진시보건소43808<NA>
1343롯데캐슬어린이집건물 및 영엄장충청남도당진시어린이집국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 송악읍 반촌로 108 롯데캐슬아파트 관리동<NA>당진시보건소43808<NA>
5825아이누리어린이집건물 및 영엄장충청남도당진시어린이집국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 원당로 51-10 109동 101호<NA>당진시보건소43808<NA>
6942차칸호프건물 및 영엄장충청남도당진시음식점국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 송악읍 신복운로7길 13충청남도 당진시 송악읍 복운리 1638번지 8호당진시보건소43808<NA>
4407위드미 당진송악점건물 및 영엄장충청남도당진시담배소매업소국민건강증진법 제9조<NA><NA>041-360-6053<NA><NA>당진시보건소43808<NA>
9178시청후문(터미널방면)버스정류소건물 및 영엄장충청남도당진시버스정류소국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 수청동 1162<NA>당진시보건소43808<NA>
8631중앙다방건물 및 영엄장충청남도당진시음식점국민건강증진법 제9조<NA><NA>041-360-6053충청남도 당진시 합덕읍 합우로 161<NA>당진시보건소43808<NA>

Duplicate rows

Most frequently occurring

금연구역명금연구역범위상세시도명시군구명금연구역구분금연구역지정근거명금연구역면적위반신고전화번호소재지도로명주소소재지지번주소관리기관명데이터기준일자# duplicates
0(주)대호휴하우징타운건물 및 영엄장충청남도당진시담배소매업소국민건강증진법 제9조<NA>041-360-6053충청남도 당진시 석문면 대호만로 2358<NA>당진시보건소438082
1(주)서우리테일건물 및 영엄장충청남도당진시담배소매업소국민건강증진법 제9조<NA>041-360-6053충청남도 당진시 계성3길 59 (읍내동)<NA>당진시보건소438082
2(주)아산개발건물 및 영엄장충청남도당진시담배소매업소국민건강증진법 제9조<NA>041-360-6053충청남도 당진시 송악읍 서해로 6622<NA>당진시보건소438082
3(주)용주유통건물 및 영엄장충청남도당진시담배소매업소국민건강증진법 제9조<NA>041-360-6053충청남도 당진시 반촌로 29-8 (시곡동)<NA>당진시보건소438082
4(주)코리아세븐 당진복운점건물 및 영엄장충청남도당진시담배소매업소국민건강증진법 제9조<NA>041-360-6053충청남도 당진시 송악읍 신복운로6길 16-1<NA>당진시보건소438082
5(주)코리아세븐 당진한진점건물 및 영엄장충청남도당진시담배소매업소국민건강증진법 제9조<NA>041-360-6053충청남도 당진시 송악읍 부곡공단로 350-5<NA>당진시보건소438082
6(주)코리아세븐 신성대인성관점건물 및 영엄장충청남도당진시담배소매업소국민건강증진법 제9조<NA>041-360-6053충청남도 당진시 정미면 대학로 1<NA>당진시보건소438082
71.2.3 당구장건물 및 영엄장충청남도당진시실내체육시설국민건강증진법 제9조<NA>041-360-6053충청남도 당진시 송악읍 신복운로6길 16-6<NA>당진시보건소438082
811호 천사어린이공원건물 및 영엄장충청남도당진시어린이놀이시설국민건강증진법 제9조<NA>041-360-6053충청남도 당진시 우강면 송산리<NA>당진시보건소438082
9365 골프존건물 및 영엄장충청남도당진시실내체육시설국민건강증진법 제9조<NA>041-360-6053충청남도 당진시 신평면 신평길 100<NA>당진시보건소438082