Overview

Dataset statistics

Number of variables14
Number of observations200
Missing cells0
Missing cells (%)0.0%
Duplicate rows15
Duplicate rows (%)7.5%
Total size in memory23.0 KiB
Average record size in memory117.7 B

Variable types

Categorical12
Numeric2

Alerts

YEAR has constant value ""Constant
MONTH has constant value ""Constant
XLAT has constant value ""Constant
YLON has constant value ""Constant
DATE has constant value ""Constant
DAYS has constant value ""Constant
TEL_NO has constant value ""Constant
Dataset has 15 (7.5%) duplicate rowsDuplicates
BUSI_NM is highly overall correlated with TIMES and 5 other fieldsHigh correlation
ADDR2 is highly overall correlated with BUSI_CD and 4 other fieldsHigh correlation
ADDR3 is highly overall correlated with BUSI_CD and 4 other fieldsHigh correlation
ADDR1 is highly overall correlated with ADDR3 and 3 other fieldsHigh correlation
ADDR4 is highly overall correlated with TIMES and 5 other fieldsHigh correlation
TIMES is highly overall correlated with ADDR4 and 1 other fieldsHigh correlation
BUSI_CD is highly overall correlated with ADDR3 and 3 other fieldsHigh correlation
ADDR1 is highly imbalanced (81.2%)Imbalance
ADDR2 is highly imbalanced (83.8%)Imbalance

Reproduction

Analysis started2023-12-10 06:38:56.307585
Analysis finished2023-12-10 06:38:58.710108
Duration2.4 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

YEAR
Categorical

CONSTANT 

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
2020
200 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020
2nd row2020
3rd row2020
4th row2020
5th row2020

Common Values

ValueCountFrequency (%)
2020 200
100.0%

Length

2023-12-10T15:38:58.830454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:38:58.995403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020 200
100.0%

MONTH
Categorical

CONSTANT 

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
2
200 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2 200
100.0%

Length

2023-12-10T15:38:59.143752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:38:59.288771image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 200
100.0%

ADDR3
Categorical

HIGH CORRELATION 

Distinct19
Distinct (%)9.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
광진구
44 
마포구
43 
도봉구
27 
성동구
17 
송파구
13 
Other values (14)
56 

Length

Max length4
Median length3
Mean length2.975
Min length2

Unique

Unique1 ?
Unique (%)0.5%

Sample

1st row영통구
2nd row영통구
3rd row백곡면
4th row의창구
5th row의창구

Common Values

ValueCountFrequency (%)
광진구 44
22.0%
마포구 43
21.5%
도봉구 27
13.5%
성동구 17
 
8.5%
송파구 13
 
6.5%
중구 12
 
6.0%
강서구 7
 
3.5%
의창구 5
 
2.5%
양천구 4
 
2.0%
종로구 4
 
2.0%
Other values (9) 24
12.0%

Length

2023-12-10T15:38:59.535084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
광진구 44
22.0%
마포구 43
21.5%
도봉구 27
13.5%
성동구 17
 
8.5%
송파구 13
 
6.5%
중구 12
 
6.0%
강서구 7
 
3.5%
의창구 5
 
2.5%
구로구 4
 
2.0%
동대문구 4
 
2.0%
Other values (9) 24
12.0%

ADDR4
Categorical

HIGH CORRELATION 

Distinct37
Distinct (%)18.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
광장동
43 
서교동
43 
방학1동
24 
하왕십리동
13 
문정동
Other values (32)
68 

Length

Max length5
Median length3
Mean length3.36
Min length2

Unique

Unique16 ?
Unique (%)8.0%

Sample

1st row영통동
2nd row영통동
3rd row양백리
4th row팔용동
5th row팔용동

Common Values

ValueCountFrequency (%)
광장동 43
21.5%
서교동 43
21.5%
방학1동 24
12.0%
하왕십리동 13
 
6.5%
문정동 9
 
4.5%
팔용동 5
 
2.5%
등촌3동 5
 
2.5%
신당동 5
 
2.5%
개봉동 4
 
2.0%
마장동 4
 
2.0%
Other values (27) 45
22.5%

Length

2023-12-10T15:38:59.797383image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
광장동 43
21.5%
서교동 43
21.5%
방학1동 24
12.0%
하왕십리동 13
 
6.5%
문정동 9
 
4.5%
팔용동 5
 
2.5%
등촌3동 5
 
2.5%
신당동 5
 
2.5%
마장동 4
 
2.0%
개봉동 4
 
2.0%
Other values (27) 45
22.5%

XLAT
Categorical

CONSTANT 

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
**.**********
200 

Length

Max length13
Median length13
Mean length13
Min length13

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row**.**********
2nd row**.**********
3rd row**.**********
4th row**.**********
5th row**.**********

Common Values

ValueCountFrequency (%)
**.********** 200
100.0%

Length

2023-12-10T15:39:00.029827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:39:00.202726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
200
100.0%

YLON
Categorical

CONSTANT 

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
***.**********
200 

Length

Max length14
Median length14
Mean length14
Min length14

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row***.**********
2nd row***.**********
3rd row***.**********
4th row***.**********
5th row***.**********

Common Values

ValueCountFrequency (%)
***.********** 200
100.0%

Length

2023-12-10T15:39:00.366866image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:39:00.576014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
200
100.0%

DATE
Categorical

CONSTANT 

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
4
200 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4
2nd row4
3rd row4
4th row4
5th row4

Common Values

ValueCountFrequency (%)
4 200
100.0%

Length

2023-12-10T15:39:00.737849image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:39:00.912798image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
4 200
100.0%

TIMES
Real number (ℝ)

HIGH CORRELATION 

Distinct183
Distinct (%)91.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean118580.35
Minimum2244
Maximum195239
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB
2023-12-10T15:39:01.106652image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2244
5-th percentile90320
Q1100127.5
median111917
Q3133962
95-th percentile172660.4
Maximum195239
Range192995
Interquartile range (IQR)33834.5

Descriptive statistics

Standard deviation28111.705
Coefficient of variation (CV)0.23706883
Kurtosis1.216507
Mean118580.35
Median Absolute Deviation (MAD)11871
Skewness0.54665803
Sum23716070
Variance7.9026797 × 108
MonotonicityNot monotonic
2023-12-10T15:39:01.354578image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100145 3
 
1.5%
90320 3
 
1.5%
100157 2
 
1.0%
163913 2
 
1.0%
111911 2
 
1.0%
112116 2
 
1.0%
112115 2
 
1.0%
112009 2
 
1.0%
111917 2
 
1.0%
112018 2
 
1.0%
Other values (173) 178
89.0%
ValueCountFrequency (%)
2244 1
 
0.5%
70414 1
 
0.5%
75934 1
 
0.5%
80003 1
 
0.5%
80033 1
 
0.5%
82304 1
 
0.5%
84735 1
 
0.5%
90319 2
1.0%
90320 3
1.5%
90322 1
 
0.5%
ValueCountFrequency (%)
195239 1
0.5%
193030 1
0.5%
193029 1
0.5%
193028 1
0.5%
190026 1
0.5%
175812 1
0.5%
175032 1
0.5%
174821 1
0.5%
174707 1
0.5%
173637 1
0.5%

DAYS
Categorical

CONSTANT 

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
TUE
200 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTUE
2nd rowTUE
3rd rowTUE
4th rowTUE
5th rowTUE

Common Values

ValueCountFrequency (%)
TUE 200
100.0%

Length

2023-12-10T15:39:01.604897image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:39:01.808973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
tue 200
100.0%

TEL_NO
Categorical

CONSTANT 

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
**********
200 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row**********
2nd row**********
3rd row**********
4th row**********
5th row**********

Common Values

ValueCountFrequency (%)
********** 200
100.0%

Length

2023-12-10T15:39:02.001542image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:39:02.175731image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
200
100.0%

BUSI_NM
Categorical

HIGH CORRELATION 

Distinct36
Distinct (%)18.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
도서관
43 
교회
43 
은행
24 
내과
13 
주민센터
Other values (31)
68 

Length

Max length9
Median length8
Mean length3.185
Min length2

Unique

Unique11 ?
Unique (%)5.5%

Sample

1st row중고가전제품
2nd row중고가전제품
3rd row견인운송
4th row화물운송
5th row화물운송

Common Values

ValueCountFrequency (%)
도서관 43
21.5%
교회 43
21.5%
은행 24
12.0%
내과 13
 
6.5%
주민센터 9
 
4.5%
화물운송 6
 
3.0%
관리사무소 5
 
2.5%
초등학교 4
 
2.0%
의류판매(종합) 4
 
2.0%
인터넷쇼핑 3
 
1.5%
Other values (26) 46
23.0%

Length

2023-12-10T15:39:02.353945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
도서관 43
21.5%
교회 43
21.5%
은행 24
12.0%
내과 13
 
6.5%
주민센터 9
 
4.5%
화물운송 6
 
3.0%
관리사무소 5
 
2.5%
초등학교 4
 
2.0%
의류판매(종합 4
 
2.0%
비뇨기과 3
 
1.5%
Other values (26) 46
23.0%

BUSI_CD
Real number (ℝ)

HIGH CORRELATION 

Distinct37
Distinct (%)18.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean813380.41
Minimum14300
Maximum999999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB
2023-12-10T15:39:02.573339image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum14300
5-th percentile561885.2
Q1761002
median872001
Q3874106
95-th percentile962209
Maximum999999
Range985699
Interquartile range (IQR)113104

Descriptive statistics

Standard deviation147986.87
Coefficient of variation (CV)0.18194054
Kurtosis11.061202
Mean813380.41
Median Absolute Deviation (MAD)10783
Skewness-2.9021211
Sum1.6267608 × 108
Variance2.1900115 × 1010
MonotonicityNot monotonic
2023-12-10T15:39:02.808657image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=37)
ValueCountFrequency (%)
872001 43
21.5%
874106 43
21.5%
761002 24
12.0%
861201 13
 
6.5%
962209 9
 
4.5%
654301 5
 
2.5%
803006 5
 
2.5%
561900 4
 
2.0%
871200 4
 
2.0%
881100 3
 
1.5%
Other values (27) 47
23.5%
ValueCountFrequency (%)
14300 2
1.0%
112902 1
 
0.5%
191019 1
 
0.5%
461204 1
 
0.5%
479603 2
1.0%
540304 2
1.0%
561604 1
 
0.5%
561900 4
2.0%
592000 2
1.0%
602003 2
1.0%
ValueCountFrequency (%)
999999 1
 
0.5%
966103 1
 
0.5%
963312 2
 
1.0%
962209 9
 
4.5%
962205 2
 
1.0%
962126 2
 
1.0%
902901 3
 
1.5%
901218 3
 
1.5%
881100 3
 
1.5%
874106 43
21.5%

ADDR1
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
서울
189 
경남
 
7
경기
 
3
충북
 
1

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique1 ?
Unique (%)0.5%

Sample

1st row경기
2nd row경기
3rd row충북
4th row경남
5th row경남

Common Values

ValueCountFrequency (%)
서울 189
94.5%
경남 7
 
3.5%
경기 3
 
1.5%
충북 1
 
0.5%

Length

2023-12-10T15:39:02.959265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:39:03.133068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울 189
94.5%
경남 7
 
3.5%
경기 3
 
1.5%
충북 1
 
0.5%

ADDR2
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
<NA>
189 
창원시
 
5
수원시
 
2
양산시
 
2
진천군
 
1

Length

Max length4
Median length4
Mean length3.945
Min length3

Unique

Unique2 ?
Unique (%)1.0%

Sample

1st row수원시
2nd row수원시
3rd row진천군
4th row창원시
5th row창원시

Common Values

ValueCountFrequency (%)
<NA> 189
94.5%
창원시 5
 
2.5%
수원시 2
 
1.0%
양산시 2
 
1.0%
진천군 1
 
0.5%
광명시 1
 
0.5%

Length

2023-12-10T15:39:03.321605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:39:03.505971image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 189
94.5%
창원시 5
 
2.5%
수원시 2
 
1.0%
양산시 2
 
1.0%
진천군 1
 
0.5%
광명시 1
 
0.5%

Interactions

2023-12-10T15:38:57.514371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:38:57.168797image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:38:57.727017image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:38:57.345743image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T15:39:03.636329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ADDR3ADDR4TIMESBUSI_NMBUSI_CDADDR1ADDR2
ADDR31.0001.0000.8050.9970.9321.0001.000
ADDR41.0001.0000.9380.9990.9911.0001.000
TIMES0.8050.9381.0000.9340.7770.1840.541
BUSI_NM0.9970.9990.9341.0001.0001.0001.000
BUSI_CD0.9320.9910.7771.0001.0000.5521.000
ADDR11.0001.0000.1841.0000.5521.0001.000
ADDR21.0001.0000.5411.0001.0001.0001.000
2023-12-10T15:39:03.816577image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
BUSI_NMADDR2ADDR3ADDR1ADDR4
BUSI_NM1.0001.0000.9170.8920.951
ADDR21.0001.0001.0000.8661.000
ADDR30.9171.0001.0000.9630.951
ADDR10.8920.8660.9631.0000.912
ADDR40.9511.0000.9510.9121.000
2023-12-10T15:39:03.990678image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
TIMESBUSI_CDADDR3ADDR4BUSI_NMADDR1ADDR2
TIMES1.0000.0270.4700.6330.6270.1050.270
BUSI_CD0.0271.0000.6960.8250.9240.4040.866
ADDR30.4700.6961.0000.9510.9170.9631.000
ADDR40.6330.8250.9511.0000.9510.9121.000
BUSI_NM0.6270.9240.9170.9511.0000.8921.000
ADDR10.1050.4040.9630.9120.8921.0000.866
ADDR20.2700.8661.0001.0001.0000.8661.000

Missing values

2023-12-10T15:38:58.300103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T15:38:58.597330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

YEARMONTHADDR3ADDR4XLATYLONDATETIMESDAYSTEL_NOBUSI_NMBUSI_CDADDR1ADDR2
020202영통구영통동**.*************.**********4153114TUE**********중고가전제품603002경기수원시
120202영통구영통동**.*************.**********4172211TUE**********중고가전제품603002경기수원시
220202백곡면양백리**.*************.**********4122351TUE**********견인운송654204충북진천군
320202의창구팔용동**.*************.**********4103931TUE**********화물운송654301경남창원시
420202의창구팔용동**.*************.**********480033TUE**********화물운송654301경남창원시
520202의창구팔용동**.*************.**********4114726TUE**********화물운송654301경남창원시
620202의창구팔용동**.*************.**********475934TUE**********화물운송654301경남창원시
720202의창구팔용동**.*************.**********480003TUE**********화물운송654301경남창원시
820202<NA>남부동**.*************.**********4162641TUE**********LPG판매602003경남양산시
920202<NA>남부동**.*************.**********4114810TUE**********LPG판매602003경남양산시
YEARMONTHADDR3ADDR4XLATYLONDATETIMESDAYSTEL_NOBUSI_NMBUSI_CDADDR1ADDR2
19020202도봉구방학1동**.*************.**********490320TUE**********은행761002서울<NA>
19120202도봉구방학1동**.*************.**********490322TUE**********은행761002서울<NA>
19220202도봉구방학1동**.*************.**********490323TUE**********은행761002서울<NA>
19320202도봉구방학1동**.*************.**********492106TUE**********은행761002서울<NA>
19420202도봉구방학1동**.*************.**********492141TUE**********은행761002서울<NA>
19520202도봉구방학1동**.*************.**********492316TUE**********은행761002서울<NA>
19620202도봉구방학1동**.*************.**********4103339TUE**********은행761002서울<NA>
19720202도봉구방학1동**.*************.**********4110028TUE**********은행761002서울<NA>
19820202도봉구방학1동**.*************.**********4110032TUE**********은행761002서울<NA>
19920202도봉구방학1동**.*************.**********4140046TUE**********은행761002서울<NA>

Duplicate rows

Most frequently occurring

YEARMONTHADDR3ADDR4XLATYLONDATETIMESDAYSTEL_NOBUSI_NMBUSI_CDADDR1ADDR2# duplicates
120202광진구광장동**.*************.**********4100145TUE**********도서관874106서울<NA>3
520202도봉구방학1동**.*************.**********490320TUE**********은행761002서울<NA>3
020202광진구광장동**.*************.**********4100138TUE**********도서관874106서울<NA>2
220202광진구광장동**.*************.**********4100157TUE**********도서관874106서울<NA>2
320202광진구광장동**.*************.**********4163913TUE**********도서관874106서울<NA>2
420202도봉구방학1동**.*************.**********490319TUE**********은행761002서울<NA>2
620202마포구서교동**.*************.**********4111811TUE**********교회872001서울<NA>2
720202마포구서교동**.*************.**********4111813TUE**********교회872001서울<NA>2
820202마포구서교동**.*************.**********4111814TUE**********교회872001서울<NA>2
920202마포구서교동**.*************.**********4111911TUE**********교회872001서울<NA>2