Overview

Dataset statistics

Number of variables8
Number of observations100
Missing cells3
Missing cells (%)0.4%
Duplicate rows1
Duplicate rows (%)1.0%
Total size in memory6.7 KiB
Average record size in memory68.3 B

Variable types

Numeric3
Categorical5

Dataset

DescriptionSample
Author소상공인연합회
URLhttps://www.bigdata-telecom.kr/invoke/SOKBP2603/?goodsCode=KFMECMS001

Alerts

Dataset has 1 (1.0%) duplicate rowsDuplicates
telno is highly overall correlated with mber_cn_nmHigh correlation
mber_nm is highly overall correlated with mber_cn_nmHigh correlation
mber_cn_nm is highly overall correlated with mber_no and 6 other fieldsHigh correlation
mber_group_nm is highly overall correlated with mber_no and 3 other fieldsHigh correlation
adres is highly overall correlated with mber_no and 2 other fieldsHigh correlation
mber_no is highly overall correlated with rgsde and 3 other fieldsHigh correlation
brthdy is highly overall correlated with mber_cn_nmHigh correlation
rgsde is highly overall correlated with mber_no and 2 other fieldsHigh correlation
mber_cn_nm is highly imbalanced (91.9%)Imbalance

Reproduction

Analysis started2023-12-10 06:33:52.286557
Analysis finished2023-12-10 06:33:56.045249
Duration3.76 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

mber_no
Real number (ℝ)

HIGH CORRELATION 

Distinct98
Distinct (%)99.0%
Missing1
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean2858.697
Minimum240
Maximum12185
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T15:33:56.175390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum240
5-th percentile253.6
Q1275.5
median367
Q35662
95-th percentile9969.4
Maximum12185
Range11945
Interquartile range (IQR)5386.5

Descriptive statistics

Standard deviation3626.9888
Coefficient of variation (CV)1.2687559
Kurtosis-0.050528938
Mean2858.697
Median Absolute Deviation (MAD)110
Skewness1.122078
Sum283011
Variance13155047
MonotonicityNot monotonic
2023-12-10T15:33:56.422170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
258 2
 
2.0%
284 1
 
1.0%
282 1
 
1.0%
281 1
 
1.0%
280 1
 
1.0%
279 1
 
1.0%
277 1
 
1.0%
276 1
 
1.0%
275 1
 
1.0%
274 1
 
1.0%
Other values (88) 88
88.0%
ValueCountFrequency (%)
240 1
1.0%
241 1
1.0%
245 1
1.0%
247 1
1.0%
250 1
1.0%
254 1
1.0%
255 1
1.0%
256 1
1.0%
257 1
1.0%
258 2
2.0%
ValueCountFrequency (%)
12185 1
1.0%
11863 1
1.0%
11732 1
1.0%
11719 1
1.0%
10180 1
1.0%
9946 1
1.0%
9915 1
1.0%
9897 1
1.0%
9621 1
1.0%
9333 1
1.0%

mber_nm
Categorical

HIGH CORRELATION 

Distinct33
Distinct (%)33.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
이**
17 
김**
13 
박**
10 
조**
 
5
강**
 
5
Other values (28)
50 

Length

Max length4
Median length3
Mean length3.01
Min length3

Unique

Unique15 ?
Unique (%)15.0%

Sample

1st row신**
2nd row박**
3rd row윤**
4th row박**
5th row한**

Common Values

ValueCountFrequency (%)
이** 17
17.0%
김** 13
13.0%
박** 10
 
10.0%
조** 5
 
5.0%
강** 5
 
5.0%
신** 4
 
4.0%
정** 4
 
4.0%
최** 4
 
4.0%
윤** 4
 
4.0%
양** 3
 
3.0%
Other values (23) 31
31.0%

Length

2023-12-10T15:33:56.698157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
17
17.0%
13
13.0%
10
 
10.0%
5
 
5.0%
5
 
5.0%
4
 
4.0%
4
 
4.0%
4
 
4.0%
4
 
4.0%
3
 
3.0%
Other values (23) 31
31.0%

telno
Categorical

HIGH CORRELATION 

Distinct23
Distinct (%)23.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
0105********
16 
0103********
13 
0116********
13 
0108********
0104********
Other values (18)
43 

Length

Max length12
Median length12
Mean length11.92
Min length4

Unique

Unique8 ?
Unique (%)8.0%

Sample

1st row0100********
2nd row0115********
3rd row0105********
4th row0115********
5th row0112********

Common Values

ValueCountFrequency (%)
0105******** 16
16.0%
0103******** 13
13.0%
0116******** 13
13.0%
0108******** 8
 
8.0%
0104******** 7
 
7.0%
0109******** 6
 
6.0%
0107******** 5
 
5.0%
0102******** 4
 
4.0%
0106******** 4
 
4.0%
0112******** 3
 
3.0%
Other values (13) 21
21.0%

Length

2023-12-10T15:33:56.921366image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0105 16
16.0%
0103 13
13.0%
0116 13
13.0%
0108 8
 
8.0%
0104 7
 
7.0%
0109 6
 
6.0%
0107 5
 
5.0%
0102 4
 
4.0%
0106 4
 
4.0%
0113 3
 
3.0%
Other values (13) 21
21.0%

brthdy
Real number (ℝ)

HIGH CORRELATION 

Distinct98
Distinct (%)99.0%
Missing1
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean2056072.2
Minimum441125
Maximum8205131
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T15:33:57.183090image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum441125
5-th percentile537434.1
Q1601025
median680109
Q3870854.5
95-th percentile6941319.1
Maximum8205131
Range7764006
Interquartile range (IQR)269829.5

Descriptive statistics

Standard deviation2540197.5
Coefficient of variation (CV)1.2354613
Kurtosis-0.18715062
Mean2056072.2
Median Absolute Deviation (MAD)89580
Skewness1.298957
Sum2.0355114 × 108
Variance6.4526036 × 1012
MonotonicityNot monotonic
2023-12-10T15:33:57.841738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
710414 2
 
2.0%
6511241 1
 
1.0%
640720 1
 
1.0%
651102 1
 
1.0%
670221 1
 
1.0%
650101 1
 
1.0%
600127 1
 
1.0%
670505 1
 
1.0%
6912211 1
 
1.0%
6903051 1
 
1.0%
Other values (88) 88
88.0%
ValueCountFrequency (%)
441125 1
1.0%
470701 1
1.0%
480312 1
1.0%
510310 1
1.0%
510525 1
1.0%
540424 1
1.0%
540718 1
1.0%
551016 1
1.0%
560118 1
1.0%
560506 1
1.0%
ValueCountFrequency (%)
8205131 1
1.0%
7512071 1
1.0%
7506241 1
1.0%
7405061 1
1.0%
7203292 1
1.0%
6912211 1
1.0%
6903051 1
1.0%
6902151 1
1.0%
6811102 1
1.0%
6612301 1
1.0%

rgsde
Real number (ℝ)

HIGH CORRELATION 

Distinct36
Distinct (%)36.4%
Missing1
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean20180890
Minimum20171027
Maximum20200608
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T15:33:58.059427image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20171027
5-th percentile20171027
Q120171027
median20171215
Q320190712
95-th percentile20200316
Maximum20200608
Range29581
Interquartile range (IQR)19684.5

Descriptive statistics

Standard deviation10991.279
Coefficient of variation (CV)0.00054463795
Kurtosis-1.3622087
Mean20180890
Median Absolute Deviation (MAD)188
Skewness0.47833243
Sum1.9979081 × 109
Variance1.2080821 × 108
MonotonicityNot monotonic
2023-12-10T15:33:58.272335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=36)
ValueCountFrequency (%)
20171027 49
49.0%
20180213 5
 
5.0%
20190621 3
 
3.0%
20190715 3
 
3.0%
20180109 3
 
3.0%
20190712 2
 
2.0%
20190521 2
 
2.0%
20190409 2
 
2.0%
20200313 2
 
2.0%
20190830 2
 
2.0%
Other values (26) 26
26.0%
ValueCountFrequency (%)
20171027 49
49.0%
20171215 1
 
1.0%
20180109 3
 
3.0%
20180213 5
 
5.0%
20180503 1
 
1.0%
20190409 2
 
2.0%
20190521 2
 
2.0%
20190524 1
 
1.0%
20190607 1
 
1.0%
20190610 1
 
1.0%
ValueCountFrequency (%)
20200608 1
1.0%
20200526 1
1.0%
20200522 1
1.0%
20200521 1
1.0%
20200325 1
1.0%
20200315 1
1.0%
20200313 2
2.0%
20200302 1
1.0%
20200217 1
1.0%
20200210 1
1.0%

mber_group_nm
Categorical

HIGH CORRELATION 

Distinct47
Distinct (%)47.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
경기도 동두천시
29 
경기도 여주시
13 
경상남도 양산시
충청남도 당진시
 
2
전라북도 전주시
 
2
Other values (42)
49 

Length

Max length12
Median length11
Mean length8.17
Min length2

Unique

Unique35 ?
Unique (%)35.0%

Sample

1st row서울특별시 은평구
2nd row경상남도 양산시
3rd row서울특별시 서초구
4th row경상북도 경주시
5th row경기도 김포시

Common Values

ValueCountFrequency (%)
경기도 동두천시 29
29.0%
경기도 여주시 13
 
13.0%
경상남도 양산시 5
 
5.0%
충청남도 당진시 2
 
2.0%
전라북도 전주시 2
 
2.0%
전라남도 함평군 2
 
2.0%
경기도 성남시 중원구 2
 
2.0%
경상남도 합천군 2
 
2.0%
경기도 안양시 2
 
2.0%
서울특별시 구로구 2
 
2.0%
Other values (37) 39
39.0%

Length

2023-12-10T15:33:58.490690image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기도 54
26.9%
동두천시 29
14.4%
여주시 13
 
6.5%
경상남도 10
 
5.0%
전라남도 10
 
5.0%
서울특별시 6
 
3.0%
양산시 5
 
2.5%
전라북도 5
 
2.5%
안양시 2
 
1.0%
강원도 2
 
1.0%
Other values (53) 65
32.3%

adres
Categorical

HIGH CORRELATION 

Distinct40
Distinct (%)40.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
X
61 
전라남도 해남군 삼삼녀
 
1
전라남도 여수시 국동
 
1
충청북도 영동군
 
1
서울시 구로구 개봉동 132-45
 
1
Other values (35)
35 

Length

Max length37
Median length1
Mean length7.8
Min length1

Unique

Unique39 ?
Unique (%)39.0%

Sample

1st rowX
2nd rowX
3rd rowX
4th rowX
5th rowX

Common Values

ValueCountFrequency (%)
X 61
61.0%
전라남도 해남군 삼삼녀 1
 
1.0%
전라남도 여수시 국동 1
 
1.0%
충청북도 영동군 1
 
1.0%
서울시 구로구 개봉동 132-45 1
 
1.0%
경기 안양시 동안구 귀인로190번길 57 (평촌동) 1
 
1.0%
인천 미추홀구 용현동 565-11 1
 
1.0%
전라북도 군산시 1
 
1.0%
경기도 남양주시 오남읍 양지리 1
 
1.0%
전라남도 함평군 함평읍 중앙길 106 1
 
1.0%
Other values (30) 30
30.0%

Length

2023-12-10T15:33:58.709378image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
x 61
26.0%
경남 5
 
2.1%
전남 5
 
2.1%
전라남도 4
 
1.7%
경기 4
 
1.7%
동안구 2
 
0.9%
강원도 2
 
0.9%
합천군 2
 
0.9%
전북 2
 
0.9%
함평읍 2
 
0.9%
Other values (138) 146
62.1%

mber_cn_nm
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
개인
99 
<NA>
 
1

Length

Max length4
Median length2
Mean length2.02
Min length2

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row개인
2nd row개인
3rd row개인
4th row개인
5th row개인

Common Values

ValueCountFrequency (%)
개인 99
99.0%
<NA> 1
 
1.0%

Length

2023-12-10T15:33:58.920689image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:33:59.116588image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
개인 99
99.0%
na 1
 
1.0%

Interactions

2023-12-10T15:33:54.739038image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:33:53.322711image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:33:54.066474image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:33:54.946395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:33:53.518515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:33:54.303403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:33:55.203989image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:33:53.691416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:33:54.482890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T15:33:59.214947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
mber_nomber_nmtelnobrthdyrgsdember_group_nmadres
mber_no1.0000.5560.7380.1160.8750.9780.988
mber_nm0.5561.0000.6540.0000.8170.0000.000
telno0.7380.6541.0000.0730.2830.9480.907
brthdy0.1160.0000.0731.0000.0000.8950.790
rgsde0.8750.8170.2830.0001.0000.9270.000
mber_group_nm0.9780.0000.9480.8950.9271.0000.989
adres0.9880.0000.9070.7900.0000.9891.000
2023-12-10T15:33:59.413919image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
telnomber_nmmber_cn_nmmber_group_nmadres
telno1.0000.1921.0000.4680.407
mber_nm0.1921.0001.0000.0000.000
mber_cn_nm1.0001.0001.0001.0001.000
mber_group_nm0.4680.0001.0001.0000.676
adres0.4070.0001.0000.6761.000
2023-12-10T15:33:59.660803image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
mber_nobrthdyrgsdember_nmtelnomber_group_nmadresmber_cn_nm
mber_no1.000-0.0250.9370.1920.3550.6360.7361.000
brthdy-0.0251.0000.0620.0000.0090.4890.3881.000
rgsde0.9370.0621.0000.0000.4150.5890.4841.000
mber_nm0.1920.0000.0001.0000.1920.0000.0001.000
telno0.3550.0090.4150.1921.0000.4680.4071.000
mber_group_nm0.6360.4890.5890.0000.4681.0000.6761.000
adres0.7360.3880.4840.0000.4070.6761.0001.000
mber_cn_nm1.0001.0001.0001.0001.0001.0001.0001.000

Missing values

2023-12-10T15:33:55.440326image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T15:33:55.654587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-10T15:33:55.876291image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

mber_nomber_nmtelnobrthdyrgsdember_group_nmadresmber_cn_nm
0258신**0100********71041420171027서울특별시 은평구X개인
1407박**0115********591002220180213경상남도 양산시X개인
2816윤**0105********67101220180503서울특별시 서초구X개인
31407박**0115********740506120190409경상북도 경주시X개인
41536한**0112********61091820190409경기도 김포시X개인
53689한**0116********54042420190607전라남도 해남군전라남도 해남군 삼삼녀개인
63836이**0114********570610120190610충청북도 영동군충청북도 영동군개인
72365조**0105********51052520190521서울특별시 구로구서울시 구로구 개봉동 132-45개인
82476강**0113********650807220190521경기도 안양시경기 안양시 동안구 귀인로190번길 57 (평촌동)개인
94566임**0112********630806220190617인천광역시 미추홀구인천 미추홀구 용현동 565-11개인
mber_nomber_nmtelnobrthdyrgsdember_group_nmadresmber_cn_nm
90304정**0102********70091220171027경기도 여주시X개인
91367송**0107********661230120171215경기도 양주시X개인
92369김**0103********70031620180109경기도 여주시X개인
93370강**0105********89050620180109경기도 여주시X개인
94371석**0107********820513120180109경기도 여주시X개인
95395이**0107********63010820180213경상남도 양산시경남 양산시 물금읍 새실로 11 (양산 대방노블랜드 7차 메가시티)개인
96396이**0104********750624120180213경상남도 양산시X개인
97397김**0103********63070720180213경상남도 양산시경남 양산시 덕계로 35 (덕계동)개인
98398노**0105********720329220180213경상남도 양산시X개인
99<NA><NA><NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

mber_nomber_nmtelnobrthdyrgsdember_group_nmadresmber_cn_nm# duplicates
0258신**0100********71041420171027서울특별시 은평구X개인2