Overview

Dataset statistics

Number of variables14
Number of observations10000
Missing cells16736
Missing cells (%)12.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.2 MiB
Average record size in memory126.0 B

Variable types

Numeric5
Categorical3
Text4
Unsupported1
DateTime1

Dataset

Description문화체육관광부에서 조사 및 취합한 전국에 위치한 출판사, 인쇄사 영업,폐업 정보, 지역정보 및 업체 주소 정보를 제공
URLhttps://www.data.go.kr/data/15060743/fileData.do

Alerts

순번 is highly overall correlated with 등록일 and 1 other fieldsHigh correlation
시군구코드 is highly overall correlated with 지역코드2 and 1 other fieldsHigh correlation
지역코드2 is highly overall correlated with 시군구코드 and 1 other fieldsHigh correlation
등록일 is highly overall correlated with 순번 and 1 other fieldsHigh correlation
폐업일 is highly overall correlated with 순번 and 1 other fieldsHigh correlation
시도명 is highly overall correlated with 시군구코드 and 1 other fieldsHigh correlation
영업구분 is highly imbalanced (50.4%)Imbalance
대표자 has 304 (3.0%) missing valuesMissing
개업일 has 10000 (100.0%) missing valuesMissing
폐업일 has 6372 (63.7%) missing valuesMissing
폐업일 is highly skewed (γ1 = 21.21892354)Skewed
순번 has unique valuesUnique
개업일 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-12 04:53:43.680681
Analysis finished2023-12-12 04:53:50.561154
Duration6.88 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

순번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39444.93
Minimum15
Maximum78908
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T13:53:50.644646image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum15
5-th percentile3757.75
Q119476.5
median39596.5
Q358977
95-th percentile75245.7
Maximum78908
Range78893
Interquartile range (IQR)39500.5

Descriptive statistics

Standard deviation22874.967
Coefficient of variation (CV)0.57992161
Kurtosis-1.1954137
Mean39444.93
Median Absolute Deviation (MAD)19690
Skewness-0.0056800273
Sum3.944493 × 108
Variance5.2326413 × 108
MonotonicityNot monotonic
2023-12-12T13:53:50.822751image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
38943 1
 
< 0.1%
37406 1
 
< 0.1%
73146 1
 
< 0.1%
77765 1
 
< 0.1%
48904 1
 
< 0.1%
59192 1
 
< 0.1%
55149 1
 
< 0.1%
76401 1
 
< 0.1%
51245 1
 
< 0.1%
71783 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
15 1
< 0.1%
16 1
< 0.1%
29 1
< 0.1%
34 1
< 0.1%
50 1
< 0.1%
62 1
< 0.1%
63 1
< 0.1%
72 1
< 0.1%
74 1
< 0.1%
79 1
< 0.1%
ValueCountFrequency (%)
78908 1
< 0.1%
78907 1
< 0.1%
78896 1
< 0.1%
78890 1
< 0.1%
78878 1
< 0.1%
78870 1
< 0.1%
78843 1
< 0.1%
78835 1
< 0.1%
78817 1
< 0.1%
78810 1
< 0.1%

시군구코드
Real number (ℝ)

HIGH CORRELATION 

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6192117
Minimum5690000
Maximum6500000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T13:53:50.985078image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum5690000
5-th percentile6110000
Q16110000
median6110000
Q36280000
95-th percentile6460000
Maximum6500000
Range810000
Interquartile range (IQR)170000

Descriptive statistics

Standard deviation142827.62
Coefficient of variation (CV)0.02306604
Kurtosis0.34421488
Mean6192117
Median Absolute Deviation (MAD)0
Skewness0.67454654
Sum6.192117 × 1010
Variance2.0399728 × 1010
MonotonicityNot monotonic
2023-12-12T13:53:51.110036image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
6110000 6870
68.7%
6410000 735
 
7.3%
6280000 349
 
3.5%
6450000 339
 
3.4%
6440000 302
 
3.0%
6300000 191
 
1.9%
6460000 188
 
1.9%
6500000 182
 
1.8%
6260000 131
 
1.3%
6310000 123
 
1.2%
Other values (7) 590
 
5.9%
ValueCountFrequency (%)
5690000 80
 
0.8%
6110000 6870
68.7%
6260000 131
 
1.3%
6270000 103
 
1.0%
6280000 349
 
3.5%
6290000 74
 
0.7%
6300000 191
 
1.9%
6310000 123
 
1.2%
6410000 735
 
7.3%
6420000 62
 
0.6%
ValueCountFrequency (%)
6500000 182
 
1.8%
6480000 68
 
0.7%
6470000 89
 
0.9%
6460000 188
 
1.9%
6450000 339
3.4%
6440000 302
3.0%
6430000 114
 
1.1%
6420000 62
 
0.6%
6410000 735
7.3%
6310000 123
 
1.2%

시도명
Categorical

HIGH CORRELATION 

Distinct18
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
서울특별시
6619 
경기도
934 
인천광역시
 
343
전라북도
 
343
충청남도
 
300
Other values (13)
1461 

Length

Max length7
Median length5
Mean length4.7399
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울특별시
2nd row서울특별시
3rd row서울특별시
4th row서울특별시
5th row서울특별시

Common Values

ValueCountFrequency (%)
서울특별시 6619
66.2%
경기도 934
 
9.3%
인천광역시 343
 
3.4%
전라북도 343
 
3.4%
충청남도 300
 
3.0%
대전광역시 191
 
1.9%
제주특별자치도 189
 
1.9%
전라남도 187
 
1.9%
부산광역시 134
 
1.3%
울산광역시 123
 
1.2%
Other values (8) 637
 
6.4%

Length

2023-12-12T13:53:51.282484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서울특별시 6619
66.2%
경기도 934
 
9.3%
인천광역시 343
 
3.4%
전라북도 343
 
3.4%
충청남도 300
 
3.0%
대전광역시 191
 
1.9%
제주특별자치도 189
 
1.9%
전라남도 187
 
1.9%
부산광역시 134
 
1.3%
울산광역시 123
 
1.2%
Other values (8) 637
 
6.4%

지역코드2
Real number (ℝ)

HIGH CORRELATION 

Distinct206
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3492019.1
Minimum3000000
Maximum6520000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T13:53:51.450570image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3000000
5-th percentile3000000
Q13020000
median3180000
Q33610000
95-th percentile5070000
Maximum6520000
Range3520000
Interquartile range (IQR)590000

Descriptive statistics

Standard deviation746081.83
Coefficient of variation (CV)0.21365342
Kurtosis4.4594207
Mean3492019.1
Median Absolute Deviation (MAD)170000
Skewness2.1595171
Sum3.4920192 × 1010
Variance5.5663809 × 1011
MonotonicityNot monotonic
2023-12-12T13:53:51.643897image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3010000 1240
 
12.4%
3210000 978
 
9.8%
3000000 866
 
8.7%
3180000 647
 
6.5%
3020000 467
 
4.7%
3230000 414
 
4.1%
3120000 333
 
3.3%
3030000 277
 
2.8%
3110000 241
 
2.4%
3070000 227
 
2.3%
Other values (196) 4310
43.1%
ValueCountFrequency (%)
3000000 866
8.7%
3010000 1240
12.4%
3020000 467
 
4.7%
3030000 277
 
2.8%
3040000 45
 
0.4%
3050000 30
 
0.3%
3060000 103
 
1.0%
3070000 227
 
2.3%
3080000 24
 
0.2%
3090000 23
 
0.2%
ValueCountFrequency (%)
6520000 34
 
0.3%
6510000 148
1.5%
5710000 33
 
0.3%
5700000 3
 
< 0.1%
5690000 80
0.8%
5680000 16
 
0.2%
5670000 22
 
0.2%
5600000 1
 
< 0.1%
5590000 7
 
0.1%
5580000 8
 
0.1%
Distinct233
Distinct (%)2.3%
Missing30
Missing (%)0.3%
Memory size156.2 KiB
2023-12-12T13:53:52.203735image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length9
Mean length8.9950853
Min length7

Characters and Unicode

Total characters89681
Distinct characters141
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)0.2%

Sample

1st row서울특별시 중구
2nd row서울특별시 마포구
3rd row서울특별시 서초구
4th row서울특별시 종로구
5th row서울특별시 마포구
ValueCountFrequency (%)
서울특별시 6619
32.0%
중구 1228
 
5.9%
경기도 934
 
4.5%
서초구 871
 
4.2%
종로구 734
 
3.6%
영등포구 571
 
2.8%
용산구 418
 
2.0%
송파구 377
 
1.8%
전라북도 343
 
1.7%
인천광역시 343
 
1.7%
Other values (225) 8230
39.8%
2023-12-12T13:53:52.892520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10698
 
11.9%
9684
 
10.8%
8553
 
9.5%
8114
 
9.0%
6890
 
7.7%
6890
 
7.7%
6756
 
7.5%
2346
 
2.6%
1326
 
1.5%
1244
 
1.4%
Other values (131) 27180
30.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 78983
88.1%
Space Separator 10698
 
11.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9684
 
12.3%
8553
 
10.8%
8114
 
10.3%
6890
 
8.7%
6890
 
8.7%
6756
 
8.6%
2346
 
3.0%
1326
 
1.7%
1244
 
1.6%
1166
 
1.5%
Other values (130) 26014
32.9%
Space Separator
ValueCountFrequency (%)
10698
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 78983
88.1%
Common 10698
 
11.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9684
 
12.3%
8553
 
10.8%
8114
 
10.3%
6890
 
8.7%
6890
 
8.7%
6756
 
8.6%
2346
 
3.0%
1326
 
1.7%
1244
 
1.6%
1166
 
1.5%
Other values (130) 26014
32.9%
Common
ValueCountFrequency (%)
10698
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 78983
88.1%
ASCII 10698
 
11.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10698
100.0%
Hangul
ValueCountFrequency (%)
9684
 
12.3%
8553
 
10.8%
8114
 
10.3%
6890
 
8.7%
6890
 
8.7%
6756
 
8.6%
2346
 
3.0%
1326
 
1.7%
1244
 
1.6%
1166
 
1.5%
Other values (130) 26014
32.9%
Distinct9683
Distinct (%)96.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T13:53:53.365448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length67
Median length39
Mean length7.0873
Min length1

Characters and Unicode

Total characters70873
Distinct characters1121
Distinct categories13 ?
Distinct scripts4 ?
Distinct blocks7 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9385 ?
Unique (%)93.8%

Sample

1st row디와이프린팅
2nd row트래블북스
3rd row파랑돌
4th row책공장더불어
5th row정원출판사
ValueCountFrequency (%)
도서출판 766
 
5.8%
주식회사 623
 
4.7%
출판사 74
 
0.6%
디자인 57
 
0.4%
books 42
 
0.3%
사단법인 39
 
0.3%
33
 
0.3%
미디어 32
 
0.2%
연구소 25
 
0.2%
유한회사 23
 
0.2%
Other values (10383) 11471
87.0%
2023-12-12T13:53:53.967453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3191
 
4.5%
2236
 
3.2%
2223
 
3.1%
) 2018
 
2.8%
( 1996
 
2.8%
1643
 
2.3%
1631
 
2.3%
1474
 
2.1%
1446
 
2.0%
1314
 
1.9%
Other values (1111) 51701
72.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 56877
80.3%
Space Separator 3191
 
4.5%
Lowercase Letter 3153
 
4.4%
Uppercase Letter 3050
 
4.3%
Close Punctuation 2020
 
2.9%
Open Punctuation 1998
 
2.8%
Decimal Number 302
 
0.4%
Other Punctuation 214
 
0.3%
Dash Punctuation 58
 
0.1%
Other Symbol 4
 
< 0.1%
Other values (3) 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2236
 
3.9%
2223
 
3.9%
1643
 
2.9%
1631
 
2.9%
1474
 
2.6%
1446
 
2.5%
1314
 
2.3%
1246
 
2.2%
1186
 
2.1%
977
 
1.7%
Other values (1027) 41501
73.0%
Uppercase Letter
ValueCountFrequency (%)
S 257
 
8.4%
A 227
 
7.4%
O 215
 
7.0%
E 202
 
6.6%
C 191
 
6.3%
I 177
 
5.8%
B 172
 
5.6%
M 164
 
5.4%
T 156
 
5.1%
N 156
 
5.1%
Other values (16) 1133
37.1%
Lowercase Letter
ValueCountFrequency (%)
e 373
11.8%
o 368
11.7%
a 268
 
8.5%
i 268
 
8.5%
n 253
 
8.0%
s 218
 
6.9%
r 204
 
6.5%
t 165
 
5.2%
l 127
 
4.0%
u 106
 
3.4%
Other values (15) 803
25.5%
Other Punctuation
ValueCountFrequency (%)
. 104
48.6%
& 69
32.2%
, 24
 
11.2%
3
 
1.4%
! 3
 
1.4%
· 3
 
1.4%
/ 3
 
1.4%
: 2
 
0.9%
@ 2
 
0.9%
1
 
0.5%
Decimal Number
ValueCountFrequency (%)
1 97
32.1%
2 66
21.9%
0 35
 
11.6%
3 21
 
7.0%
9 20
 
6.6%
5 16
 
5.3%
6 15
 
5.0%
4 14
 
4.6%
8 9
 
3.0%
7 9
 
3.0%
Math Symbol
ValueCountFrequency (%)
> 2
50.0%
< 1
25.0%
+ 1
25.0%
Close Punctuation
ValueCountFrequency (%)
) 2018
99.9%
] 2
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 1996
99.9%
[ 2
 
0.1%
Other Symbol
ValueCountFrequency (%)
3
75.0%
1
 
25.0%
Space Separator
ValueCountFrequency (%)
3191
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 58
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 56756
80.1%
Common 7790
 
11.0%
Latin 6203
 
8.8%
Han 124
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2236
 
3.9%
2223
 
3.9%
1643
 
2.9%
1631
 
2.9%
1474
 
2.6%
1446
 
2.5%
1314
 
2.3%
1246
 
2.2%
1186
 
2.1%
977
 
1.7%
Other values (935) 41380
72.9%
Han
ValueCountFrequency (%)
5
 
4.0%
4
 
3.2%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
2
 
1.6%
2
 
1.6%
2
 
1.6%
Other values (83) 94
75.8%
Latin
ValueCountFrequency (%)
e 373
 
6.0%
o 368
 
5.9%
a 268
 
4.3%
i 268
 
4.3%
S 257
 
4.1%
n 253
 
4.1%
A 227
 
3.7%
s 218
 
3.5%
O 215
 
3.5%
r 204
 
3.3%
Other values (41) 3552
57.3%
Common
ValueCountFrequency (%)
3191
41.0%
) 2018
25.9%
( 1996
25.6%
. 104
 
1.3%
1 97
 
1.2%
& 69
 
0.9%
2 66
 
0.8%
- 58
 
0.7%
0 35
 
0.4%
, 24
 
0.3%
Other values (22) 132
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 56752
80.1%
ASCII 13985
 
19.7%
CJK 120
 
0.2%
None 10
 
< 0.1%
CJK Compat Ideographs 4
 
< 0.1%
Compat Jamo 1
 
< 0.1%
Box Drawing 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3191
22.8%
) 2018
14.4%
( 1996
14.3%
e 373
 
2.7%
o 368
 
2.6%
a 268
 
1.9%
i 268
 
1.9%
S 257
 
1.8%
n 253
 
1.8%
A 227
 
1.6%
Other values (69) 4766
34.1%
Hangul
ValueCountFrequency (%)
2236
 
3.9%
2223
 
3.9%
1643
 
2.9%
1631
 
2.9%
1474
 
2.6%
1446
 
2.5%
1314
 
2.3%
1246
 
2.2%
1186
 
2.1%
977
 
1.7%
Other values (933) 41376
72.9%
CJK
ValueCountFrequency (%)
5
 
4.2%
4
 
3.3%
3
 
2.5%
3
 
2.5%
3
 
2.5%
3
 
2.5%
3
 
2.5%
2
 
1.7%
2
 
1.7%
2
 
1.7%
Other values (80) 90
75.0%
None
ValueCountFrequency (%)
3
30.0%
· 3
30.0%
3
30.0%
1
 
10.0%
CJK Compat Ideographs
ValueCountFrequency (%)
2
50.0%
1
25.0%
1
25.0%
Compat Jamo
ValueCountFrequency (%)
1
100.0%
Box Drawing
ValueCountFrequency (%)
1
100.0%

대표자
Text

MISSING 

Distinct7949
Distinct (%)82.0%
Missing304
Missing (%)3.0%
Memory size156.2 KiB
2023-12-12T13:53:54.392470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length27
Median length3
Mean length3.0422855
Min length2

Characters and Unicode

Total characters29498
Distinct characters413
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6764 ?
Unique (%)69.8%

Sample

1st row양승표
2nd row서병용
3rd row김현진
4th row김보경
5th row김준규
ValueCountFrequency (%)
주식회사 12
 
0.1%
김정수 10
 
0.1%
김정호 9
 
0.1%
이지현 9
 
0.1%
김영숙 8
 
0.1%
김현수 8
 
0.1%
이상훈 8
 
0.1%
김정희 7
 
0.1%
이재환 7
 
0.1%
김성태 7
 
0.1%
Other values (7970) 9656
99.1%
2023-12-12T13:53:54.966373image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2021
 
6.9%
1466
 
5.0%
1023
 
3.5%
905
 
3.1%
771
 
2.6%
599
 
2.0%
571
 
1.9%
566
 
1.9%
546
 
1.9%
518
 
1.8%
Other values (403) 20512
69.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 29131
98.8%
Uppercase Letter 237
 
0.8%
Space Separator 45
 
0.2%
Lowercase Letter 37
 
0.1%
Close Punctuation 17
 
0.1%
Open Punctuation 17
 
0.1%
Decimal Number 7
 
< 0.1%
Other Punctuation 6
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2021
 
6.9%
1466
 
5.0%
1023
 
3.5%
905
 
3.1%
771
 
2.6%
599
 
2.1%
571
 
2.0%
566
 
1.9%
546
 
1.9%
518
 
1.8%
Other values (356) 20145
69.2%
Uppercase Letter
ValueCountFrequency (%)
N 28
 
11.8%
E 23
 
9.7%
A 21
 
8.9%
I 17
 
7.2%
O 15
 
6.3%
S 14
 
5.9%
G 13
 
5.5%
H 12
 
5.1%
U 12
 
5.1%
L 11
 
4.6%
Other values (14) 71
30.0%
Lowercase Letter
ValueCountFrequency (%)
n 5
13.5%
e 5
13.5%
a 5
13.5%
o 3
8.1%
g 3
8.1%
s 3
8.1%
m 2
 
5.4%
y 2
 
5.4%
r 2
 
5.4%
d 1
 
2.7%
Other values (6) 6
16.2%
Other Punctuation
ValueCountFrequency (%)
, 4
66.7%
. 2
33.3%
Space Separator
ValueCountFrequency (%)
45
100.0%
Close Punctuation
ValueCountFrequency (%)
) 17
100.0%
Open Punctuation
ValueCountFrequency (%)
( 17
100.0%
Decimal Number
ValueCountFrequency (%)
1 7
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 29131
98.8%
Latin 274
 
0.9%
Common 93
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2021
 
6.9%
1466
 
5.0%
1023
 
3.5%
905
 
3.1%
771
 
2.6%
599
 
2.1%
571
 
2.0%
566
 
1.9%
546
 
1.9%
518
 
1.8%
Other values (356) 20145
69.2%
Latin
ValueCountFrequency (%)
N 28
 
10.2%
E 23
 
8.4%
A 21
 
7.7%
I 17
 
6.2%
O 15
 
5.5%
S 14
 
5.1%
G 13
 
4.7%
H 12
 
4.4%
U 12
 
4.4%
L 11
 
4.0%
Other values (30) 108
39.4%
Common
ValueCountFrequency (%)
45
48.4%
) 17
 
18.3%
( 17
 
18.3%
1 7
 
7.5%
, 4
 
4.3%
. 2
 
2.2%
- 1
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 29131
98.8%
ASCII 367
 
1.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2021
 
6.9%
1466
 
5.0%
1023
 
3.5%
905
 
3.1%
771
 
2.6%
599
 
2.1%
571
 
2.0%
566
 
1.9%
546
 
1.9%
518
 
1.8%
Other values (356) 20145
69.2%
ASCII
ValueCountFrequency (%)
45
 
12.3%
N 28
 
7.6%
E 23
 
6.3%
A 21
 
5.7%
) 17
 
4.6%
( 17
 
4.6%
I 17
 
4.6%
O 15
 
4.1%
S 14
 
3.8%
G 13
 
3.5%
Other values (37) 157
42.8%

주소
Text

Distinct233
Distinct (%)2.3%
Missing30
Missing (%)0.3%
Memory size156.2 KiB
2023-12-12T13:53:55.407550image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length9
Mean length8.9950853
Min length7

Characters and Unicode

Total characters89681
Distinct characters141
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)0.2%

Sample

1st row서울특별시 중구
2nd row서울특별시 마포구
3rd row서울특별시 서초구
4th row서울특별시 종로구
5th row서울특별시 마포구
ValueCountFrequency (%)
서울특별시 6619
32.0%
중구 1228
 
5.9%
경기도 934
 
4.5%
서초구 871
 
4.2%
종로구 734
 
3.6%
영등포구 571
 
2.8%
용산구 418
 
2.0%
송파구 377
 
1.8%
전라북도 343
 
1.7%
인천광역시 343
 
1.7%
Other values (225) 8230
39.8%
2023-12-12T13:53:56.049790image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10698
 
11.9%
9684
 
10.8%
8553
 
9.5%
8114
 
9.0%
6890
 
7.7%
6890
 
7.7%
6756
 
7.5%
2346
 
2.6%
1326
 
1.5%
1244
 
1.4%
Other values (131) 27180
30.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 78983
88.1%
Space Separator 10698
 
11.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9684
 
12.3%
8553
 
10.8%
8114
 
10.3%
6890
 
8.7%
6890
 
8.7%
6756
 
8.6%
2346
 
3.0%
1326
 
1.7%
1244
 
1.6%
1166
 
1.5%
Other values (130) 26014
32.9%
Space Separator
ValueCountFrequency (%)
10698
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 78983
88.1%
Common 10698
 
11.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9684
 
12.3%
8553
 
10.8%
8114
 
10.3%
6890
 
8.7%
6890
 
8.7%
6756
 
8.6%
2346
 
3.0%
1326
 
1.7%
1244
 
1.6%
1166
 
1.5%
Other values (130) 26014
32.9%
Common
ValueCountFrequency (%)
10698
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 78983
88.1%
ASCII 10698
 
11.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10698
100.0%
Hangul
ValueCountFrequency (%)
9684
 
12.3%
8553
 
10.8%
8114
 
10.3%
6890
 
8.7%
6890
 
8.7%
6756
 
8.6%
2346
 
3.0%
1326
 
1.7%
1244
 
1.6%
1166
 
1.5%
Other values (130) 26014
32.9%

구분
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
출판사
8713 
인쇄사
1287 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row인쇄사
2nd row출판사
3rd row출판사
4th row출판사
5th row출판사

Common Values

ValueCountFrequency (%)
출판사 8713
87.1%
인쇄사 1287
 
12.9%

Length

2023-12-12T13:53:56.197560image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:53:56.304396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
출판사 8713
87.1%
인쇄사 1287
 
12.9%

영업구분
Categorical

IMBALANCE 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
영업
6118 
폐업
2016 
전출
1655 
직권말소
 
81
등록취소
 
70
Other values (3)
 
60

Length

Max length4
Median length2
Mean length2.0406
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row영업
2nd row영업
3rd row영업
4th row영업
5th row영업

Common Values

ValueCountFrequency (%)
영업 6118
61.2%
폐업 2016
 
20.2%
전출 1655
 
16.6%
직권말소 81
 
0.8%
등록취소 70
 
0.7%
신고취소 32
 
0.3%
허가취소 20
 
0.2%
전입 8
 
0.1%

Length

2023-12-12T13:53:56.438390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:53:56.599911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
영업 6118
61.2%
폐업 2016
 
20.2%
전출 1655
 
16.6%
직권말소 81
 
0.8%
등록취소 70
 
0.7%
신고취소 32
 
0.3%
허가취소 20
 
0.2%
전입 8
 
0.1%

등록일
Real number (ℝ)

HIGH CORRELATION 

Distinct5557
Distinct (%)55.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20097047
Minimum19051218
Maximum20230411
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T13:53:56.928004image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum19051218
5-th percentile19910702
Q120030512
median20111108
Q320190202
95-th percentile20220823
Maximum20230411
Range1179193
Interquartile range (IQR)159690.5

Descriptive statistics

Standard deviation105474.79
Coefficient of variation (CV)0.0052482731
Kurtosis2.1884707
Mean20097047
Median Absolute Deviation (MAD)79594.5
Skewness-1.0719829
Sum2.0097047 × 1011
Variance1.1124932 × 1010
MonotonicityNot monotonic
2023-12-12T13:53:57.186499image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20230109 9
 
0.1%
20030422 8
 
0.1%
20211221 8
 
0.1%
20210608 8
 
0.1%
20210105 8
 
0.1%
20220524 8
 
0.1%
20230201 8
 
0.1%
20190104 8
 
0.1%
20220616 8
 
0.1%
20210329 7
 
0.1%
Other values (5547) 9920
99.2%
ValueCountFrequency (%)
19051218 1
< 0.1%
19511127 1
< 0.1%
19520709 1
< 0.1%
19530212 1
< 0.1%
19550514 1
< 0.1%
19550913 1
< 0.1%
19551128 1
< 0.1%
19570111 1
< 0.1%
19570303 1
< 0.1%
19580807 1
< 0.1%
ValueCountFrequency (%)
20230411 3
< 0.1%
20230410 1
 
< 0.1%
20230407 4
< 0.1%
20230406 3
< 0.1%
20230405 5
0.1%
20230404 2
 
< 0.1%
20230403 2
 
< 0.1%
20230331 3
< 0.1%
20230330 4
< 0.1%
20230329 6
0.1%

개업일
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10000
Missing (%)100.0%
Memory size166.0 KiB

폐업일
Real number (ℝ)

HIGH CORRELATION  MISSING  SKEWED 

Distinct2502
Distinct (%)69.0%
Missing6372
Missing (%)63.7%
Infinite0
Infinite (%)0.0%
Mean20301947
Minimum19830905
Maximum99991231
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T13:53:57.708259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum19830905
5-th percentile19990215
Q120060810
median20140412
Q320200703
95-th percentile20221223
Maximum99991231
Range80160326
Interquartile range (IQR)139893.25

Descriptive statistics

Standard deviation3747585.8
Coefficient of variation (CV)0.18459243
Kurtosis448.70183
Mean20301947
Median Absolute Deviation (MAD)69799
Skewness21.218924
Sum7.3655465 × 1010
Variance1.4044399 × 1013
MonotonicityNot monotonic
2023-12-12T13:53:57.936900image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20030207 22
 
0.2%
20030418 22
 
0.2%
20010809 22
 
0.2%
20030506 15
 
0.1%
20000221 14
 
0.1%
20100611 13
 
0.1%
19981104 11
 
0.1%
20020402 8
 
0.1%
20030206 8
 
0.1%
20230130 8
 
0.1%
Other values (2492) 3485
34.8%
(Missing) 6372
63.7%
ValueCountFrequency (%)
19830905 1
< 0.1%
19831020 1
< 0.1%
19840507 1
< 0.1%
19840714 1
< 0.1%
19841012 1
< 0.1%
19850208 1
< 0.1%
19850517 1
< 0.1%
19851003 1
< 0.1%
19851231 1
< 0.1%
19860208 1
< 0.1%
ValueCountFrequency (%)
99991231 8
0.1%
20230410 1
 
< 0.1%
20230407 2
 
< 0.1%
20230406 1
 
< 0.1%
20230405 1
 
< 0.1%
20230404 1
 
< 0.1%
20230403 3
 
< 0.1%
20230331 1
 
< 0.1%
20230330 2
 
< 0.1%
20230329 1
 
< 0.1%
Distinct520
Distinct (%)5.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2021-04-01 00:00:00
Maximum2023-04-11 00:00:00
2023-12-12T13:53:58.154529image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:58.398128image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2023-12-12T13:53:49.102197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:45.998160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:46.691335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:47.331628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:48.325518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:49.266988image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:46.130219image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:46.811026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:47.723590image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:48.488033image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:49.421938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:46.255414image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:46.926602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:47.856360image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:48.655395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:49.619923image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:46.380785image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:47.058906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:48.005881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:48.817149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:49.797330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:46.519448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:47.196361image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:48.144661image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:53:48.949111image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T13:53:58.543572image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번시군구코드시도명지역코드2구분영업구분등록일폐업일
순번1.0000.6500.6720.5670.4550.2910.4120.021
시군구코드0.6501.0000.9700.9570.3730.3040.2700.000
시도명0.6720.9701.0000.9570.3000.2110.2740.000
지역코드20.5670.9570.9571.0000.2380.1820.2430.000
구분0.4550.3730.3000.2381.0000.1640.1140.000
영업구분0.2910.3040.2110.1820.1641.0000.2820.064
등록일0.4120.2700.2740.2430.1140.2821.0000.024
폐업일0.0210.0000.0000.0000.0000.0640.0241.000
2023-12-12T13:53:58.686117image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분시도명영업구분
구분1.0000.2690.123
시도명0.2691.0000.090
영업구분0.1230.0901.000
2023-12-12T13:53:58.788973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번시군구코드지역코드2등록일폐업일시도명구분영업구분
순번1.000-0.489-0.338-0.545-0.5690.3370.3490.143
시군구코드-0.4891.0000.7660.3140.3320.9180.2500.122
지역코드2-0.3380.7661.0000.3720.3300.8060.2480.091
등록일-0.5450.3140.3721.0000.7770.1170.1210.143
폐업일-0.5690.3320.3300.7771.0000.0000.0000.043
시도명0.3370.9180.8060.1170.0001.0000.2690.090
구분0.3490.2500.2480.1210.0000.2691.0000.123
영업구분0.1430.1220.0910.1430.0430.0900.1231.000

Missing values

2023-12-12T13:53:50.013506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T13:53:50.263081image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T13:53:50.473916image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

순번시군구코드시도명지역코드2시군구명회사명대표자주소구분영업구분등록일개업일폐업일최종수정일
38942389436110000서울특별시3010000서울특별시 중구디와이프린팅양승표서울특별시 중구인쇄사영업20040311<NA><NA>2021-04-01
317931806110000서울특별시3130000서울특별시 마포구트래블북스서병용서울특별시 마포구출판사영업20220103<NA><NA>2023-02-06
640764086110000서울특별시3210000서울특별시 서초구파랑돌김현진서울특별시 서초구출판사영업20221116<NA><NA>2022-11-24
49183491846110000서울특별시3000000서울특별시 종로구책공장더불어김보경서울특별시 종로구출판사영업20040826<NA><NA>2021-04-01
56619566206110000서울특별시3180000서울특별시 마포구정원출판사김준규서울특별시 마포구출판사영업20041223<NA><NA>2021-04-01
11674116756410000경기도3940000경기도 고양시 일산동구SOYUZ노국희경기도 고양시 일산동구출판사영업20220623<NA><NA>2022-06-23
54405544066110000서울특별시3180000서울특별시 영등포구허그북박보영서울특별시 영등포구출판사영업20210219<NA><NA>2021-04-01
41883418846110000서울특별시3010000서울특별시 중구정수출판김정수서울특별시 중구출판사폐업20090212<NA>201111212021-04-01
71543715446110000서울특별시3210000서울특별시 서초구사슴저널사전제승서울특별시 서초구출판사폐업19990806<NA>200104192021-04-01
560856096410000경기도5530000경기도 화성시도서출판 문곰신대섭경기도 화성시출판사영업20220823<NA><NA>2022-12-15
순번시군구코드시도명지역코드2시군구명회사명대표자주소구분영업구분등록일개업일폐업일최종수정일
616861696110000서울특별시3000000서울특별시 종로구학선재박수준서울특별시 종로구출판사영업20061129<NA><NA>2022-12-01
33659336606460000전라남도4930000전라남도 해남군해남인쇄문화사천대진전라남도 해남군인쇄사영업20070123<NA><NA>2021-04-01
65676656776110000서울특별시3030000서울특별시 성동구태양강경구서울특별시 성동구출판사영업20081112<NA><NA>2021-04-01
19041190426420000강원도4230000강원도 속초시동우인쇄사한상학강원도 속초시인쇄사폐업19890712<NA>201207302021-12-07
55716557176110000서울특별시3180000서울특별시 영등포구JM애드문종수서울특별시 영등포구출판사영업20110609<NA><NA>2021-04-01
20465204666110000서울특별시3010000서울특별시 중구제이에스컴설세화서울특별시 중구인쇄사영업20211028<NA><NA>2021-10-28
486048616410000경기도3940000경기도 고양시 덕양구가온비즈김준연경기도 고양시 덕양구출판사영업20230103<NA><NA>2023-01-03
47514475156110000서울특별시3000000서울특별시 동대문구레몬컬쳐(Lemon culture)이도은서울특별시 동대문구출판사전출20131226<NA>201501272021-04-01
63105631066110000서울특별시3070000서울특별시 마포구(주)월천상회이한상서울특별시 마포구출판사전출20150406<NA>201706122021-04-01
68563685646110000서울특별시3210000서울특별시 서초구웰뉴스강신구서울특별시 서초구출판사영업20111202<NA><NA>2021-04-01