Overview

Dataset statistics

Number of variables10
Number of observations100
Missing cells28
Missing cells (%)2.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory8.5 KiB
Average record size in memory87.3 B

Variable types

Numeric4
Categorical5
Text1

Alerts

gugun_dc is highly overall correlated with sido_dcHigh correlation
sido_dc is highly overall correlated with gugun_dcHigh correlation
join_dt is highly overall correlated with base_month and 2 other fieldsHigh correlation
base_month is highly overall correlated with join_dt and 1 other fieldsHigh correlation
base_day is highly overall correlated with join_dt and 2 other fieldsHigh correlation
age_dc is highly overall correlated with sex_dcHigh correlation
base_year is highly overall correlated with join_dt and 2 other fieldsHigh correlation
sex_dc is highly overall correlated with age_dcHigh correlation
sbr_cnt is highly overall correlated with base_dayHigh correlation
base_year is highly imbalanced (80.6%)Imbalance
sido_dc is highly imbalanced (71.4%)Imbalance
sbr_cnt is highly imbalanced (91.9%)Imbalance
dong_dc has 28 (28.0%) missing valuesMissing

Reproduction

Analysis started2023-12-10 10:14:19.091812
Analysis finished2023-12-10 10:14:23.495283
Duration4.4 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

join_dt
Real number (ℝ)

HIGH CORRELATION 

Distinct22
Distinct (%)22.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20170593
Minimum20161212
Maximum20170922
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T19:14:23.631537image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20161212
5-th percentile20170103
Q120170810
median20170922
Q320170922
95-th percentile20170922
Maximum20170922
Range9710
Interquartile range (IQR)111.75

Descriptive statistics

Standard deviation1372.5011
Coefficient of variation (CV)6.8044658 × 10-5
Kurtosis43.746342
Mean20170593
Median Absolute Deviation (MAD)0
Skewness-6.5753231
Sum2.0170593 × 109
Variance1883759.3
MonotonicityNot monotonic
2023-12-10T19:14:23.868628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
20170922 72
72.0%
20170103 4
 
4.0%
20170119 2
 
2.0%
20170126 2
 
2.0%
20170410 2
 
2.0%
20170629 2
 
2.0%
20161212 1
 
1.0%
20170618 1
 
1.0%
20170816 1
 
1.0%
20170815 1
 
1.0%
Other values (12) 12
 
12.0%
ValueCountFrequency (%)
20161212 1
 
1.0%
20161213 1
 
1.0%
20170103 4
4.0%
20170119 2
2.0%
20170121 1
 
1.0%
20170126 2
2.0%
20170204 1
 
1.0%
20170318 1
 
1.0%
20170331 1
 
1.0%
20170410 2
2.0%
ValueCountFrequency (%)
20170922 72
72.0%
20170816 1
 
1.0%
20170815 1
 
1.0%
20170812 1
 
1.0%
20170805 1
 
1.0%
20170707 1
 
1.0%
20170629 2
 
2.0%
20170622 1
 
1.0%
20170618 1
 
1.0%
20170519 1
 
1.0%

base_year
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2017
97 
2016
 
3

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016
2nd row2017
3rd row2016
4th row2016
5th row2017

Common Values

ValueCountFrequency (%)
2017 97
97.0%
2016 3
 
3.0%

Length

2023-12-10T19:14:24.108463image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:14:24.309191image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2017 97
97.0%
2016 3
 
3.0%

base_month
Real number (ℝ)

HIGH CORRELATION 

Distinct10
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.76
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T19:14:24.485975image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q18
median9
Q39
95-th percentile9
Maximum12
Range11
Interquartile range (IQR)1

Descriptive statistics

Standard deviation2.715909
Coefficient of variation (CV)0.34998827
Kurtosis1.4350038
Mean7.76
Median Absolute Deviation (MAD)0
Skewness-1.5840938
Sum776
Variance7.3761616
MonotonicityNot monotonic
2023-12-10T19:14:24.721857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
9 71
71.0%
1 9
 
9.0%
4 4
 
4.0%
6 4
 
4.0%
8 4
 
4.0%
12 3
 
3.0%
3 2
 
2.0%
2 1
 
1.0%
5 1
 
1.0%
7 1
 
1.0%
ValueCountFrequency (%)
1 9
 
9.0%
2 1
 
1.0%
3 2
 
2.0%
4 4
 
4.0%
5 1
 
1.0%
6 4
 
4.0%
7 1
 
1.0%
8 4
 
4.0%
9 71
71.0%
12 3
 
3.0%
ValueCountFrequency (%)
12 3
 
3.0%
9 71
71.0%
8 4
 
4.0%
7 1
 
1.0%
6 4
 
4.0%
5 1
 
1.0%
4 4
 
4.0%
3 2
 
2.0%
2 1
 
1.0%
1 9
 
9.0%

base_day
Real number (ℝ)

HIGH CORRELATION 

Distinct17
Distinct (%)17.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19.92
Minimum3
Maximum31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T19:14:24.930594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile4.95
Q122
median22
Q322
95-th percentile22.2
Maximum31
Range28
Interquartile range (IQR)0

Descriptive statistics

Standard deviation5.5571957
Coefficient of variation (CV)0.27897569
Kurtosis3.0233134
Mean19.92
Median Absolute Deviation (MAD)0
Skewness-1.7884213
Sum1992
Variance30.882424
MonotonicityNot monotonic
2023-12-10T19:14:25.128630image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
22 72
72.0%
3 4
 
4.0%
13 3
 
3.0%
19 3
 
3.0%
12 2
 
2.0%
26 2
 
2.0%
18 2
 
2.0%
10 2
 
2.0%
29 2
 
2.0%
7 1
 
1.0%
Other values (7) 7
 
7.0%
ValueCountFrequency (%)
3 4
4.0%
4 1
 
1.0%
5 1
 
1.0%
7 1
 
1.0%
10 2
2.0%
11 1
 
1.0%
12 2
2.0%
13 3
3.0%
15 1
 
1.0%
16 1
 
1.0%
ValueCountFrequency (%)
31 1
 
1.0%
29 2
 
2.0%
26 2
 
2.0%
22 72
72.0%
21 1
 
1.0%
19 3
 
3.0%
18 2
 
2.0%
16 1
 
1.0%
15 1
 
1.0%
13 3
 
3.0%

sex_dc
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
55 
45 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
55
55.0%
45
45.0%

Length

2023-12-10T19:14:25.360103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:14:25.676958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
55
55.0%
45
45.0%

age_dc
Real number (ℝ)

HIGH CORRELATION 

Distinct9
Distinct (%)9.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean46.75
Minimum30
Maximum75
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T19:14:26.123241image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum30
5-th percentile30
Q135
median45
Q352.5
95-th percentile70
Maximum75
Range45
Interquartile range (IQR)17.5

Descriptive statistics

Standard deviation12.151714
Coefficient of variation (CV)0.2599297
Kurtosis-0.70413454
Mean46.75
Median Absolute Deviation (MAD)10
Skewness0.66253445
Sum4675
Variance147.66414
MonotonicityNot monotonic
2023-12-10T19:14:26.440264image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
35 22
22.0%
45 19
19.0%
40 16
16.0%
50 12
12.0%
65 9
9.0%
60 8
 
8.0%
70 7
 
7.0%
30 6
 
6.0%
75 1
 
1.0%
ValueCountFrequency (%)
30 6
 
6.0%
35 22
22.0%
40 16
16.0%
45 19
19.0%
50 12
12.0%
60 8
 
8.0%
65 9
9.0%
70 7
 
7.0%
75 1
 
1.0%
ValueCountFrequency (%)
75 1
 
1.0%
70 7
 
7.0%
65 9
9.0%
60 8
 
8.0%
50 12
12.0%
45 19
19.0%
40 16
16.0%
35 22
22.0%
30 6
 
6.0%

sido_dc
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
부산광역시
95 
경상남도
 
5

Length

Max length5
Median length5
Mean length4.95
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row부산광역시
2nd row부산광역시
3rd row부산광역시
4th row부산광역시
5th row부산광역시

Common Values

ValueCountFrequency (%)
부산광역시 95
95.0%
경상남도 5
 
5.0%

Length

2023-12-10T19:14:26.699838image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:14:26.954304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
부산광역시 95
95.0%
경상남도 5
 
5.0%

gugun_dc
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)16.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
부산진구
15 
남구
14 
동래구
13 
해운대구
금정구
Other values (11)
43 

Length

Max length4
Median length3
Mean length3.01
Min length2

Unique

Unique2 ?
Unique (%)2.0%

Sample

1st row남구
2nd row강서구
3rd row서구
4th row사하구
5th row사하구

Common Values

ValueCountFrequency (%)
부산진구 15
15.0%
남구 14
14.0%
동래구 13
13.0%
해운대구 8
8.0%
금정구 7
7.0%
사하구 6
 
6.0%
연제구 6
 
6.0%
사상구 6
 
6.0%
수영구 5
 
5.0%
강서구 4
 
4.0%
Other values (6) 16
16.0%

Length

2023-12-10T19:14:27.246363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
부산진구 15
15.0%
남구 14
14.0%
동래구 13
13.0%
해운대구 8
8.0%
금정구 7
7.0%
사하구 6
 
6.0%
연제구 6
 
6.0%
사상구 6
 
6.0%
수영구 5
 
5.0%
강서구 4
 
4.0%
Other values (6) 16
16.0%

dong_dc
Text

MISSING 

Distinct59
Distinct (%)81.9%
Missing28
Missing (%)28.0%
Memory size932.0 B
2023-12-10T19:14:27.759636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length5
Mean length4.5416667
Min length2

Characters and Unicode

Total characters327
Distinct characters63
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique47 ?
Unique (%)65.3%

Sample

1st row대연제5동
2nd row명지2동
3rd row서대신제3동
4th row다대제2동
5th row부전제1동
ValueCountFrequency (%)
온천제1동 3
 
4.2%
문현제3동 2
 
2.8%
하단제2동 2
 
2.8%
민락동 2
 
2.8%
사직제2동 2
 
2.8%
우제3동 2
 
2.8%
문현제1동 2
 
2.8%
문현제2동 2
 
2.8%
명장제1동 2
 
2.8%
범일제2동 2
 
2.8%
Other values (49) 51
70.8%
2023-12-10T19:14:28.473754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
71
21.7%
57
17.4%
2 21
 
6.4%
1 21
 
6.4%
3 12
 
3.7%
8
 
2.4%
8
 
2.4%
7
 
2.1%
6
 
1.8%
6
 
1.8%
Other values (53) 110
33.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 267
81.7%
Decimal Number 60
 
18.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
71
26.6%
57
21.3%
8
 
3.0%
8
 
3.0%
7
 
2.6%
6
 
2.2%
6
 
2.2%
6
 
2.2%
6
 
2.2%
4
 
1.5%
Other values (46) 88
33.0%
Decimal Number
ValueCountFrequency (%)
2 21
35.0%
1 21
35.0%
3 12
20.0%
4 2
 
3.3%
6 2
 
3.3%
9 1
 
1.7%
5 1
 
1.7%

Most occurring scripts

ValueCountFrequency (%)
Hangul 267
81.7%
Common 60
 
18.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
71
26.6%
57
21.3%
8
 
3.0%
8
 
3.0%
7
 
2.6%
6
 
2.2%
6
 
2.2%
6
 
2.2%
6
 
2.2%
4
 
1.5%
Other values (46) 88
33.0%
Common
ValueCountFrequency (%)
2 21
35.0%
1 21
35.0%
3 12
20.0%
4 2
 
3.3%
6 2
 
3.3%
9 1
 
1.7%
5 1
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 267
81.7%
ASCII 60
 
18.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
71
26.6%
57
21.3%
8
 
3.0%
8
 
3.0%
7
 
2.6%
6
 
2.2%
6
 
2.2%
6
 
2.2%
6
 
2.2%
4
 
1.5%
Other values (46) 88
33.0%
ASCII
ValueCountFrequency (%)
2 21
35.0%
1 21
35.0%
3 12
20.0%
4 2
 
3.3%
6 2
 
3.3%
9 1
 
1.7%
5 1
 
1.7%

sbr_cnt
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
1
99 
2
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 99
99.0%
2 1
 
1.0%

Length

2023-12-10T19:14:28.759356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:14:28.961900image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 99
99.0%
2 1
 
1.0%

Interactions

2023-12-10T19:14:22.497857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:14:20.330194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:14:20.924069image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:14:21.869581image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:14:22.639648image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:14:20.471524image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:14:21.082672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:14:22.015438image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:14:22.790705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:14:20.635267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:14:21.235980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:14:22.190512image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:14:22.944987image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:14:20.787950image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:14:21.381659image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:14:22.342879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T19:14:29.097694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
join_dtbase_yearbase_monthbase_daysex_dcage_dcsido_dcgugun_dcdong_dcsbr_cnt
join_dt1.0000.4970.6630.4190.0000.2900.0000.4161.0000.000
base_year0.4971.0001.0000.7290.0000.0000.0000.0001.0000.000
base_month0.6631.0001.0000.9730.3150.0000.0000.0000.0000.420
base_day0.4190.7290.9731.0000.2450.0000.1430.0000.0000.663
sex_dc0.0000.0000.3150.2451.0000.5870.0000.2590.0000.000
age_dc0.2900.0000.0000.0000.5871.0000.1620.6180.4210.000
sido_dc0.0000.0000.0000.1430.0000.1621.0001.0001.0000.283
gugun_dc0.4160.0000.0000.0000.2590.6181.0001.0001.0000.419
dong_dc1.0001.0000.0000.0000.0000.4211.0001.0001.000NaN
sbr_cnt0.0000.0000.4200.6630.0000.0000.2830.419NaN1.000
2023-12-10T19:14:29.351763image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
sex_dcsbr_cntgugun_dcbase_yearsido_dc
sex_dc1.0000.0000.1840.0000.000
sbr_cnt0.0001.0000.3030.0000.182
gugun_dc0.1840.3031.0000.0000.926
base_year0.0000.0000.0001.0000.000
sido_dc0.0000.1820.9260.0001.000
2023-12-10T19:14:29.562478image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
join_dtbase_monthbase_dayage_dcbase_yearsex_dcsido_dcgugun_dcsbr_cnt
join_dt1.0000.7130.6110.1460.5980.0000.0000.0000.000
base_month0.7131.0000.4030.0980.9640.3020.0000.0000.404
base_day0.6110.4031.0000.1560.7160.2340.1350.0000.647
age_dc0.1460.0980.1561.0000.0000.5700.1530.2960.000
base_year0.5980.9640.7160.0001.0000.0000.0000.0000.000
sex_dc0.0000.3020.2340.5700.0001.0000.0000.1840.000
sido_dc0.0000.0000.1350.1530.0000.0001.0000.9260.182
gugun_dc0.0000.0000.0000.2960.0000.1840.9261.0000.303
sbr_cnt0.0000.4040.6470.0000.0000.0000.1820.3031.000

Missing values

2023-12-10T19:14:23.146088image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T19:14:23.395659image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

join_dtbase_yearbase_monthbase_daysex_dcage_dcsido_dcgugun_dcdong_dcsbr_cnt
0201612122016121250부산광역시남구대연제5동1
120170922201792265부산광역시강서구명지2동1
2201612132016121330부산광역시서구서대신제3동1
3201701032016121340부산광역시사하구<NA>1
42017010320171340부산광역시사하구다대제2동1
52017010320171335부산광역시부산진구<NA>1
62017010320171335부산광역시부산진구부전제1동1
720170922201792265부산광역시연제구<NA>1
82017011920171345부산광역시남구<NA>1
920170119201711945부산광역시남구대연제3동1
join_dtbase_yearbase_monthbase_daysex_dcage_dcsido_dcgugun_dcdong_dcsbr_cnt
9020170922201792235부산광역시서구<NA>1
9120170922201792235부산광역시서구부민동1
9220170922201792235부산광역시서구충무동1
9320170922201792235부산광역시수영구광안제4동1
9420170922201792235부산광역시연제구연산제9동1
9520170922201792235부산광역시해운대구<NA>1
9620170922201792235부산광역시해운대구송정동1
9720170922201792240경상남도김해시<NA>1
9820170922201792240경상남도김해시장유3동1
9920170922201792240경상남도양산시동면1