Overview

Dataset statistics

Number of variables7
Number of observations199
Missing cells1
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory11.2 KiB
Average record size in memory57.7 B

Variable types

Categorical6
Numeric1

Dataset

DescriptionSample
Author소상공인연합회
URLhttps://www.bigdata-telecom.kr/invoke/SOKBP2603/?goodsCode=KFMECMS009

Alerts

사업자 has constant value ""Constant
년도 has constant value ""Constant
사업자수 is highly overall correlated with 성별High correlation
성별 is highly overall correlated with 사업자수High correlation
성별 is highly imbalanced (78.0%)Imbalance

Reproduction

Analysis started2023-12-10 06:20:15.862175
Analysis finished2023-12-10 06:20:18.996772
Duration3.13 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

사업자
Categorical

CONSTANT 

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
전체사업자
199 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row전체사업자
2nd row전체사업자
3rd row전체사업자
4th row전체사업자
5th row전체사업자

Common Values

ValueCountFrequency (%)
전체사업자 199
100.0%

Length

2023-12-10T15:20:19.132308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:20:19.323732image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
전체사업자 199
100.0%

성별
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
남성
192 
남녀합계
 
7

Length

Max length4
Median length2
Mean length2.0703518
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row남녀합계
2nd row남녀합계
3rd row남녀합계
4th row남녀합계
5th row남녀합계

Common Values

ValueCountFrequency (%)
남성 192
96.5%
남녀합계 7
 
3.5%

Length

2023-12-10T15:20:19.535924image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:20:19.728141image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
남성 192
96.5%
남녀합계 7
 
3.5%

지역
Categorical

Distinct6
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
전국
112 
서울
21 
인천
21 
경기
21 
강원
21 

Length

Max length3
Median length2
Mean length2.1055276
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row전국
2nd row전국
3rd row전국
4th row전국
5th row전국

Common Values

ValueCountFrequency (%)
전국 112
56.3%
서울 21
 
10.6%
인천 21
 
10.6%
경기 21
 
10.6%
강원 21
 
10.6%
대전 3
 
1.5%

Length

2023-12-10T15:20:19.899095image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:20:20.109905image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
전국 112
56.3%
서울 21
 
10.6%
인천 21
 
10.6%
경기 21
 
10.6%
강원 21
 
10.6%
대전 3
 
1.5%

업종
Categorical

Distinct15
Distinct (%)7.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
업종전체
45 
농·임·어업
 
11
광업
 
11
제조업
 
11
전기·가스·수도업
 
11
Other values (10)
110 

Length

Max length10
Median length9
Mean length4.7738693
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row업종전체
2nd row업종전체
3rd row업종전체
4th row업종전체
5th row업종전체

Common Values

ValueCountFrequency (%)
업종전체 45
22.6%
농·임·어업 11
 
5.5%
광업 11
 
5.5%
제조업 11
 
5.5%
전기·가스·수도업 11
 
5.5%
도매업 11
 
5.5%
소매업 11
 
5.5%
부동산매매업 11
 
5.5%
건설업 11
 
5.5%
음식업 11
 
5.5%
Other values (5) 55
27.6%

Length

2023-12-10T15:20:20.336417image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
업종전체 45
22.6%
농·임·어업 11
 
5.5%
광업 11
 
5.5%
제조업 11
 
5.5%
전기·가스·수도업 11
 
5.5%
도매업 11
 
5.5%
소매업 11
 
5.5%
부동산매매업 11
 
5.5%
건설업 11
 
5.5%
음식업 11
 
5.5%
Other values (5) 55
27.6%

연령
Categorical

Distinct7
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
전연령
77 
30세 미만
21 
30세 이상
21 
40세 이상
20 
50세 이상
20 
Other values (2)
40 

Length

Max length6
Median length6
Mean length4.839196
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row전연령
2nd row30세 미만
3rd row30세 이상
4th row40세 이상
5th row50세 이상

Common Values

ValueCountFrequency (%)
전연령 77
38.7%
30세 미만 21
 
10.6%
30세 이상 21
 
10.6%
40세 이상 20
 
10.1%
50세 이상 20
 
10.1%
60세 이상 20
 
10.1%
70세 이상 20
 
10.1%

Length

2023-12-10T15:20:20.561269image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:20:20.752303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
이상 101
31.5%
전연령 77
24.0%
30세 42
13.1%
미만 21
 
6.5%
40세 20
 
6.2%
50세 20
 
6.2%
60세 20
 
6.2%
70세 20
 
6.2%

년도
Categorical

CONSTANT 

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
2019년
199 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2019년
2nd row2019년
3rd row2019년
4th row2019년
5th row2019년

Common Values

ValueCountFrequency (%)
2019년 199
100.0%

Length

2023-12-10T15:20:20.963862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:20:21.126383image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2019년 199
100.0%

사업자수
Real number (ℝ)

HIGH CORRELATION 

Distinct194
Distinct (%)98.0%
Missing1
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean23596.934
Minimum2
Maximum921299
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB
2023-12-10T15:20:21.322907image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile47.65
Q1869.75
median4448
Q313278.75
95-th percentile106207.75
Maximum921299
Range921297
Interquartile range (IQR)12409

Descriptive statistics

Standard deviation80840.226
Coefficient of variation (CV)3.4258783
Kurtosis83.292297
Mean23596.934
Median Absolute Deviation (MAD)4154
Skewness8.3331337
Sum4672193
Variance6.5351422 × 109
MonotonicityNot monotonic
2023-12-10T15:20:21.564734image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
33 2
 
1.0%
9 2
 
1.0%
1484 2
 
1.0%
707 2
 
1.0%
921299 1
 
0.5%
7593 1
 
0.5%
7962 1
 
0.5%
13282 1
 
0.5%
1699 1
 
0.5%
29809 1
 
0.5%
Other values (184) 184
92.5%
ValueCountFrequency (%)
2 1
0.5%
4 1
0.5%
6 1
0.5%
9 2
1.0%
15 1
0.5%
20 1
0.5%
33 2
1.0%
40 1
0.5%
49 1
0.5%
55 1
0.5%
ValueCountFrequency (%)
921299 1
0.5%
494772 1
0.5%
245128 1
0.5%
241317 1
0.5%
176895 1
0.5%
134850 1
0.5%
130232 1
0.5%
128475 1
0.5%
123960 1
0.5%
114355 1
0.5%

Interactions

2023-12-10T15:20:18.182435image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T15:20:21.719047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성별지역업종연령사업자수
성별1.0000.0800.2650.0000.570
지역0.0801.0000.0000.3320.000
업종0.2650.0001.0000.0000.000
연령0.0000.3320.0001.0000.000
사업자수0.5700.0000.0000.0001.000
2023-12-10T15:20:21.878519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연령업종성별지역
연령1.0000.0000.0000.204
업종0.0001.0000.2330.000
성별0.0000.2331.0000.056
지역0.2040.0000.0561.000
2023-12-10T15:20:22.037945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업자수성별지역업종연령
사업자수1.0000.6830.0000.0000.000
성별0.6831.0000.0560.2330.000
지역0.0000.0561.0000.0000.204
업종0.0000.2330.0001.0000.000
연령0.0000.0000.2040.0001.000

Missing values

2023-12-10T15:20:18.621691image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T15:20:18.913497image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

사업자성별지역업종연령년도사업자수
0전체사업자남녀합계전국업종전체전연령2019년921299
1전체사업자남녀합계전국업종전체30세 미만2019년71435
2전체사업자남녀합계전국업종전체30세 이상2019년176895
3전체사업자남녀합계전국업종전체40세 이상2019년241317
4전체사업자남녀합계전국업종전체50세 이상2019년245128
5전체사업자남녀합계전국업종전체60세 이상2019년134850
6전체사업자남녀합계전국업종전체70세 이상2019년51674
7전체사업자남성전국업종전체전연령2019년494772
8전체사업자남성전국업종전체30세 미만2019년37242
9전체사업자남성전국업종전체30세 이상2019년97284
사업자성별지역업종연령년도사업자수
189전체사업자남성강원건설업전연령2019년1231
190전체사업자남성강원음식업전연령2019년2551
191전체사업자남성강원숙박업전연령2019년324
192전체사업자남성강원운수·창고·통신업전연령2019년818
193전체사업자남성강원부동산임대업전연령2019년1207
194전체사업자남성강원대리·중개·도급업전연령2019년149
195전체사업자남성강원서비스업전연령2019년2673
196전체사업자남성대전업종전체전연령2019년13269
197전체사업자남성대전업종전체30세 미만2019년1128
198전체사업자남성대전업종전체30세 이상2019년2755