Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory566.4 KiB
Average record size in memory58.0 B

Variable types

Numeric1
Text2
Categorical3

Dataset

Description경상북도 202,675개의 소상공인 사업체 입지 유형(상가업소 번호, 상호, 시군명, 주소, 입지유형코드, 입지유형 명) 데이터 셋 (CSV 파일)
Author경상북도
URLhttps://www.data.go.kr/data/15096081/fileData.do

Alerts

입지유형 코드 is highly overall correlated with 입지유형 명High correlation
입지유형 명 is highly overall correlated with 입지유형 코드High correlation
입지유형 코드 is highly imbalanced (61.9%)Imbalance
입지유형 명 is highly imbalanced (61.9%)Imbalance
상가업소 번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 13:39:58.288872
Analysis finished2023-12-12 13:39:59.926362
Duration1.64 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

상가업소 번호
Real number (ℝ)

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48324.236
Minimum21
Maximum96101
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T22:40:00.041545image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile4858.45
Q124369.25
median48500.5
Q372499.25
95-th percentile91366.65
Maximum96101
Range96080
Interquartile range (IQR)48130

Descriptive statistics

Standard deviation27794.746
Coefficient of variation (CV)0.57517198
Kurtosis-1.2007869
Mean48324.236
Median Absolute Deviation (MAD)24060
Skewness-0.013651015
Sum4.8324236 × 108
Variance7.7254793 × 108
MonotonicityNot monotonic
2023-12-12T22:40:00.244870image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
92061 1
 
< 0.1%
19223 1
 
< 0.1%
83683 1
 
< 0.1%
8658 1
 
< 0.1%
15227 1
 
< 0.1%
12931 1
 
< 0.1%
90786 1
 
< 0.1%
90707 1
 
< 0.1%
69657 1
 
< 0.1%
76906 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
21 1
< 0.1%
40 1
< 0.1%
49 1
< 0.1%
57 1
< 0.1%
91 1
< 0.1%
108 1
< 0.1%
121 1
< 0.1%
131 1
< 0.1%
146 1
< 0.1%
149 1
< 0.1%
ValueCountFrequency (%)
96101 1
< 0.1%
96094 1
< 0.1%
96091 1
< 0.1%
96089 1
< 0.1%
96074 1
< 0.1%
96061 1
< 0.1%
96055 1
< 0.1%
96053 1
< 0.1%
96052 1
< 0.1%
96050 1
< 0.1%

상호
Text

Distinct7735
Distinct (%)77.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T22:40:00.779752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length40
Median length31
Mean length6.6033
Min length2

Characters and Unicode

Total characters66033
Distinct characters924
Distinct categories13 ?
Distinct scripts4 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6944 ?
Unique (%)69.4%

Sample

1st row합동*
2nd row구룡***************
3rd row신신상회
4th row곱창명가
5th row인터**********
ValueCountFrequency (%)
경북 506
 
4.9%
주식 175
 
1.7%
15 89
 
0.9%
대한 80
 
0.8%
88 59
 
0.6%
86 54
 
0.5%
한국 47
 
0.5%
현대 46
 
0.4%
포항 36
 
0.3%
16 33
 
0.3%
Other values (6224) 9254
89.2%
2023-12-12T22:40:01.449776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 35299
53.5%
771
 
1.2%
691
 
1.0%
569
 
0.9%
548
 
0.8%
530
 
0.8%
471
 
0.7%
437
 
0.7%
427
 
0.6%
379
 
0.6%
Other values (914) 25911
39.2%

Most occurring categories

ValueCountFrequency (%)
Other Punctuation 35344
53.5%
Other Letter 28243
42.8%
Decimal Number 764
 
1.2%
Uppercase Letter 493
 
0.7%
Space Separator 379
 
0.6%
Open Punctuation 324
 
0.5%
Close Punctuation 292
 
0.4%
Lowercase Letter 182
 
0.3%
Dash Punctuation 7
 
< 0.1%
Other Symbol 2
 
< 0.1%
Other values (3) 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
771
 
2.7%
691
 
2.4%
569
 
2.0%
548
 
1.9%
530
 
1.9%
471
 
1.7%
437
 
1.5%
427
 
1.5%
378
 
1.3%
326
 
1.2%
Other values (838) 23095
81.8%
Uppercase Letter
ValueCountFrequency (%)
S 55
 
11.2%
C 43
 
8.7%
G 34
 
6.9%
T 30
 
6.1%
K 27
 
5.5%
D 26
 
5.3%
M 23
 
4.7%
B 22
 
4.5%
A 22
 
4.5%
N 22
 
4.5%
Other values (16) 189
38.3%
Lowercase Letter
ValueCountFrequency (%)
o 23
12.6%
e 17
 
9.3%
a 17
 
9.3%
h 17
 
9.3%
i 14
 
7.7%
n 13
 
7.1%
t 9
 
4.9%
r 8
 
4.4%
l 8
 
4.4%
s 7
 
3.8%
Other values (13) 49
26.9%
Decimal Number
ValueCountFrequency (%)
8 235
30.8%
1 154
20.2%
5 142
18.6%
6 91
 
11.9%
2 46
 
6.0%
9 29
 
3.8%
0 24
 
3.1%
7 17
 
2.2%
3 13
 
1.7%
4 13
 
1.7%
Other Punctuation
ValueCountFrequency (%)
* 35299
99.9%
. 21
 
0.1%
& 17
 
< 0.1%
· 3
 
< 0.1%
, 2
 
< 0.1%
/ 1
 
< 0.1%
! 1
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 309
95.4%
15
 
4.6%
Close Punctuation
ValueCountFrequency (%)
) 291
99.7%
1
 
0.3%
Space Separator
ValueCountFrequency (%)
379
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7
100.0%
Other Symbol
ValueCountFrequency (%)
2
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 1
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 37112
56.2%
Hangul 28240
42.8%
Latin 676
 
1.0%
Han 5
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
771
 
2.7%
691
 
2.4%
569
 
2.0%
548
 
1.9%
530
 
1.9%
471
 
1.7%
437
 
1.5%
427
 
1.5%
378
 
1.3%
326
 
1.2%
Other values (834) 23092
81.8%
Latin
ValueCountFrequency (%)
S 55
 
8.1%
C 43
 
6.4%
G 34
 
5.0%
T 30
 
4.4%
K 27
 
4.0%
D 26
 
3.8%
o 23
 
3.4%
M 23
 
3.4%
B 22
 
3.3%
A 22
 
3.3%
Other values (40) 371
54.9%
Common
ValueCountFrequency (%)
* 35299
95.1%
379
 
1.0%
( 309
 
0.8%
) 291
 
0.8%
8 235
 
0.6%
1 154
 
0.4%
5 142
 
0.4%
6 91
 
0.2%
2 46
 
0.1%
9 29
 
0.1%
Other values (15) 137
 
0.4%
Han
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 37768
57.2%
Hangul 28237
42.8%
None 21
 
< 0.1%
CJK 5
 
< 0.1%
Number Forms 1
 
< 0.1%
Compat Jamo 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 35299
93.5%
379
 
1.0%
( 309
 
0.8%
) 291
 
0.8%
8 235
 
0.6%
1 154
 
0.4%
5 142
 
0.4%
6 91
 
0.2%
S 55
 
0.1%
2 46
 
0.1%
Other values (61) 767
 
2.0%
Hangul
ValueCountFrequency (%)
771
 
2.7%
691
 
2.4%
569
 
2.0%
548
 
1.9%
530
 
1.9%
471
 
1.7%
437
 
1.5%
427
 
1.5%
378
 
1.3%
326
 
1.2%
Other values (832) 23089
81.8%
None
ValueCountFrequency (%)
15
71.4%
· 3
 
14.3%
2
 
9.5%
1
 
4.8%
CJK
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Number Forms
ValueCountFrequency (%)
1
100.0%
Compat Jamo
ValueCountFrequency (%)
1
100.0%

시군명
Categorical

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
구미시
1536 
경주시
1120 
포항시 북구
959 
경산시
841 
포항시 남구
831 
Other values (19)
4713 

Length

Max length6
Median length3
Mean length3.537
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row포항시 북구
2nd row포항시 남구
3rd row청도군
4th row구미시
5th row구미시

Common Values

ValueCountFrequency (%)
구미시 1536
15.4%
경주시 1120
11.2%
포항시 북구 959
9.6%
경산시 841
 
8.4%
포항시 남구 831
 
8.3%
안동시 623
 
6.2%
칠곡군 512
 
5.1%
김천시 459
 
4.6%
영주시 428
 
4.3%
영천시 394
 
3.9%
Other values (14) 2297
23.0%

Length

2023-12-12T22:40:01.651352image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
포항시 1790
15.2%
구미시 1536
13.0%
경주시 1120
9.5%
북구 959
 
8.1%
경산시 841
 
7.1%
남구 831
 
7.0%
안동시 623
 
5.3%
칠곡군 512
 
4.3%
김천시 459
 
3.9%
영주시 428
 
3.6%
Other values (15) 2691
22.8%
Distinct9268
Distinct (%)92.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T22:40:02.127201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length22
Mean length12.8689
Min length9

Characters and Unicode

Total characters128689
Distinct characters392
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8724 ?
Unique (%)87.2%

Sample

1st row포항시 북구 중흥로225번길 7-7
2nd row포항시 남구 호미로 247
3rd row청도군 운문사길 109
4th row구미시 임은길 44
5th row구미시 첨단기업5로 10-171
ValueCountFrequency (%)
포항시 1790
 
5.6%
구미시 1536
 
4.8%
경주시 1120
 
3.5%
북구 959
 
3.0%
경산시 841
 
2.6%
남구 831
 
2.6%
안동시 623
 
2.0%
칠곡군 512
 
1.6%
김천시 459
 
1.4%
영주시 428
 
1.3%
Other values (6384) 22691
71.4%
2023-12-12T22:40:02.853511image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
21790
 
16.9%
8139
 
6.3%
1 7636
 
5.9%
6776
 
5.3%
5335
 
4.1%
2 4872
 
3.8%
3 3799
 
3.0%
3649
 
2.8%
4 3015
 
2.3%
- 2953
 
2.3%
Other values (382) 60725
47.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 70887
55.1%
Decimal Number 33059
25.7%
Space Separator 21790
 
16.9%
Dash Punctuation 2953
 
2.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8139
 
11.5%
6776
 
9.6%
5335
 
7.5%
3649
 
5.1%
2546
 
3.6%
2333
 
3.3%
2308
 
3.3%
1959
 
2.8%
1828
 
2.6%
1699
 
2.4%
Other values (370) 34315
48.4%
Decimal Number
ValueCountFrequency (%)
1 7636
23.1%
2 4872
14.7%
3 3799
11.5%
4 3015
 
9.1%
5 2736
 
8.3%
6 2591
 
7.8%
7 2238
 
6.8%
8 2142
 
6.5%
0 2051
 
6.2%
9 1979
 
6.0%
Space Separator
ValueCountFrequency (%)
21790
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2953
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 70887
55.1%
Common 57802
44.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8139
 
11.5%
6776
 
9.6%
5335
 
7.5%
3649
 
5.1%
2546
 
3.6%
2333
 
3.3%
2308
 
3.3%
1959
 
2.8%
1828
 
2.6%
1699
 
2.4%
Other values (370) 34315
48.4%
Common
ValueCountFrequency (%)
21790
37.7%
1 7636
 
13.2%
2 4872
 
8.4%
3 3799
 
6.6%
4 3015
 
5.2%
- 2953
 
5.1%
5 2736
 
4.7%
6 2591
 
4.5%
7 2238
 
3.9%
8 2142
 
3.7%
Other values (2) 4030
 
7.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 70887
55.1%
ASCII 57802
44.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
21790
37.7%
1 7636
 
13.2%
2 4872
 
8.4%
3 3799
 
6.6%
4 3015
 
5.2%
- 2953
 
5.1%
5 2736
 
4.7%
6 2591
 
4.5%
7 2238
 
3.9%
8 2142
 
3.7%
Other values (2) 4030
 
7.0%
Hangul
ValueCountFrequency (%)
8139
 
11.5%
6776
 
9.6%
5335
 
7.5%
3649
 
5.1%
2546
 
3.6%
2333
 
3.3%
2308
 
3.3%
1959
 
2.8%
1828
 
2.6%
1699
 
2.4%
Other values (370) 34315
48.4%

입지유형 코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
8524 
1
1028 
3
 
380
2
 
68

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 8524
85.2%
1 1028
 
10.3%
3 380
 
3.8%
2 68
 
0.7%

Length

2023-12-12T22:40:03.041531image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:40:03.170125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 8524
85.2%
1 1028
 
10.3%
3 380
 
3.8%
2 68
 
0.7%

입지유형 명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
일반상가
8524 
집합상가
1028 
전통시장
 
380
대규모상가
 
68

Length

Max length5
Median length4
Mean length4.0068
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row일반상가
2nd row집합상가
3rd row일반상가
4th row일반상가
5th row일반상가

Common Values

ValueCountFrequency (%)
일반상가 8524
85.2%
집합상가 1028
 
10.3%
전통시장 380
 
3.8%
대규모상가 68
 
0.7%

Length

2023-12-12T22:40:03.672005image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:40:03.788025image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반상가 8524
85.2%
집합상가 1028
 
10.3%
전통시장 380
 
3.8%
대규모상가 68
 
0.7%

Interactions

2023-12-12T22:39:59.539211image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T22:40:03.878527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상가업소 번호시군명입지유형 코드입지유형 명
상가업소 번호1.0000.0430.0040.004
시군명0.0431.0000.2320.232
입지유형 코드0.0040.2321.0001.000
입지유형 명0.0040.2321.0001.000
2023-12-12T22:40:04.010391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
입지유형 코드입지유형 명시군명
입지유형 코드1.0001.0000.112
입지유형 명1.0001.0000.112
시군명0.1120.1121.000
2023-12-12T22:40:04.139207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상가업소 번호시군명입지유형 코드입지유형 명
상가업소 번호1.0000.0160.0030.003
시군명0.0161.0000.1120.112
입지유형 코드0.0030.1121.0001.000
입지유형 명0.0030.1121.0001.000

Missing values

2023-12-12T22:39:59.708637image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T22:39:59.856600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

상가업소 번호상호시군명도로명 주소입지유형 코드입지유형 명
9128792061합동*포항시 북구포항시 북구 중흥로225번길 7-70일반상가
1572815849구룡***************포항시 남구포항시 남구 호미로 2471집합상가
5336853837신신상회청도군청도군 운문사길 1090일반상가
20852097곱창명가구미시구미시 임은길 440일반상가
8453185234인터**********구미시구미시 첨단기업5로 10-1710일반상가
7724577898진평*****구미시구미시 인동36길 120일반상가
2692427151한국**포항시 북구포항시 북구 장량로32번길 78-10일반상가
7905979730대로**구미시구미시 산호대로 11330일반상가
7897479644어묵천국원평점구미시구미시 구미중앙로 521집합상가
544975496815*****구미시구미시 인동54길 250일반상가
상가업소 번호상호시군명도로명 주소입지유형 코드입지유형 명
3751437836경북*******경주시경주시 원지길 100일반상가
2105821218지에********포항시 남구포항시 남구 문덕로37번길 210일반상가
4181442177칠곡************칠곡군칠곡군 석전로7길 58-10일반상가
8564786365어모반점김천시김천시 아랫장터5길 170일반상가
4483745225경산********경산시경산시 낙산길 510일반상가
3116631429한성***영주시영주시 광복로24번길 30일반상가
2916029410M.********경산시경산시 대학로 2800일반상가
8793288669지에**************경산시경산시 봉회1길 260일반상가
25072521금오*****구미시구미시 구미중앙로13길 41-10일반상가
5372554194아지터경산시경산시 원효로32길 29-10일반상가