Overview

Dataset statistics

Number of variables7
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.0 KiB
Average record size in memory61.3 B

Variable types

Numeric3
Categorical2
Text2

Alerts

ctprvn_cd is highly overall correlated with signgu_cd and 2 other fieldsHigh correlation
signgu_cd is highly overall correlated with ctprvn_cd and 2 other fieldsHigh correlation
adstrd_cd is highly overall correlated with ctprvn_cd and 2 other fieldsHigh correlation
ctprvn_nm is highly overall correlated with ctprvn_cd and 2 other fieldsHigh correlation
co is highly imbalanced (75.8%)Imbalance
adstrd_cd has unique valuesUnique

Reproduction

Analysis started2023-12-10 10:13:22.728954
Analysis finished2023-12-10 10:13:25.773987
Duration3.05 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

ctprvn_cd
Real number (ℝ)

HIGH CORRELATION 

Distinct17
Distinct (%)17.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.76
Minimum11
Maximum39
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T19:13:25.885777image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile11
Q123.75
median32
Q336
95-th percentile38
Maximum39
Range28
Interquartile range (IQR)12.25

Descriptive statistics

Standard deviation8.1415756
Coefficient of variation (CV)0.27357445
Kurtosis0.32903147
Mean29.76
Median Absolute Deviation (MAD)5
Skewness-1.1166454
Sum2976
Variance66.285253
MonotonicityNot monotonic
2023-12-10T19:13:26.196898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
31 16
16.0%
35 10
10.0%
11 10
10.0%
37 9
9.0%
38 9
9.0%
36 8
8.0%
32 7
7.0%
23 7
7.0%
22 5
 
5.0%
21 3
 
3.0%
Other values (7) 16
16.0%
ValueCountFrequency (%)
11 10
10.0%
21 3
 
3.0%
22 5
 
5.0%
23 7
7.0%
24 1
 
1.0%
25 2
 
2.0%
26 3
 
3.0%
29 1
 
1.0%
31 16
16.0%
32 7
7.0%
ValueCountFrequency (%)
39 3
 
3.0%
38 9
9.0%
37 9
9.0%
36 8
8.0%
35 10
10.0%
34 3
 
3.0%
33 3
 
3.0%
32 7
7.0%
31 16
16.0%
29 1
 
1.0%

ctprvn_nm
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)17.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
경기도
16 
전라북도
10 
서울특별시
10 
경상북도
경상남도
Other values (12)
46 

Length

Max length7
Median length5
Mean length4.2
Min length3

Unique

Unique2 ?
Unique (%)2.0%

Sample

1st row경상북도
2nd row인천광역시
3rd row전라북도
4th row인천광역시
5th row서울특별시

Common Values

ValueCountFrequency (%)
경기도 16
16.0%
전라북도 10
10.0%
서울특별시 10
10.0%
경상북도 9
9.0%
경상남도 9
9.0%
전라남도 8
8.0%
강원도 7
7.0%
인천광역시 7
7.0%
대구광역시 5
 
5.0%
부산광역시 3
 
3.0%
Other values (7) 16
16.0%

Length

2023-12-10T19:13:26.500983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기도 16
16.0%
전라북도 10
10.0%
서울특별시 10
10.0%
경상북도 9
9.0%
경상남도 9
9.0%
전라남도 8
8.0%
강원도 7
7.0%
인천광역시 7
7.0%
대구광역시 5
 
5.0%
부산광역시 3
 
3.0%
Other values (7) 16
16.0%

signgu_cd
Real number (ℝ)

HIGH CORRELATION 

Distinct94
Distinct (%)94.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29910.37
Minimum11010
Maximum39020
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T19:13:26.918254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum11010
5-th percentile11077
Q123787.5
median32045
Q336382.5
95-th percentile38322
Maximum39020
Range28010
Interquartile range (IQR)12595

Descriptive statistics

Standard deviation8191.2265
Coefficient of variation (CV)0.27385908
Kurtosis0.30031795
Mean29910.37
Median Absolute Deviation (MAD)5030
Skewness-1.1069241
Sum2991037
Variance67096192
MonotonicityNot monotonic
2023-12-10T19:13:27.284806image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11010 4
 
4.0%
39010 2
 
2.0%
35310 2
 
2.0%
31070 2
 
2.0%
37100 1
 
1.0%
23050 1
 
1.0%
24020 1
 
1.0%
31160 1
 
1.0%
34310 1
 
1.0%
36460 1
 
1.0%
Other values (84) 84
84.0%
ValueCountFrequency (%)
11010 4
4.0%
11020 1
 
1.0%
11080 1
 
1.0%
11140 1
 
1.0%
11170 1
 
1.0%
11190 1
 
1.0%
11220 1
 
1.0%
21070 1
 
1.0%
21080 1
 
1.0%
21090 1
 
1.0%
ValueCountFrequency (%)
39020 1
1.0%
39010 2
2.0%
38390 1
1.0%
38360 1
1.0%
38320 1
1.0%
38310 1
1.0%
38114 1
1.0%
38111 1
1.0%
38100 1
1.0%
38090 1
1.0%
Distinct87
Distinct (%)87.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-10T19:13:27.842626image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length3
Mean length3.32
Min length2

Characters and Unicode

Total characters332
Distinct characters92
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique79 ?
Unique (%)79.0%

Sample

1st row경산시
2nd row계양구
3rd row전주시 덕진구
4th row연수구
5th row서초구
ValueCountFrequency (%)
종로구 4
 
3.7%
북구 3
 
2.8%
서구 3
 
2.8%
중구 3
 
2.8%
안양시 2
 
1.9%
제주시 2
 
1.9%
남구 2
 
1.9%
완주군 2
 
1.9%
평택시 2
 
1.9%
전주시 2
 
1.9%
Other values (82) 83
76.9%
2023-12-10T19:13:29.199459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
44
 
13.3%
39
 
11.7%
29
 
8.7%
12
 
3.6%
9
 
2.7%
8
 
2.4%
8
 
2.4%
7
 
2.1%
7
 
2.1%
7
 
2.1%
Other values (82) 162
48.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 324
97.6%
Space Separator 8
 
2.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
44
 
13.6%
39
 
12.0%
29
 
9.0%
12
 
3.7%
9
 
2.8%
8
 
2.5%
7
 
2.2%
7
 
2.2%
7
 
2.2%
7
 
2.2%
Other values (81) 155
47.8%
Space Separator
ValueCountFrequency (%)
8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 324
97.6%
Common 8
 
2.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
44
 
13.6%
39
 
12.0%
29
 
9.0%
12
 
3.7%
9
 
2.8%
8
 
2.5%
7
 
2.2%
7
 
2.2%
7
 
2.2%
7
 
2.2%
Other values (81) 155
47.8%
Common
ValueCountFrequency (%)
8
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 324
97.6%
ASCII 8
 
2.4%

Most frequent character per block

Hangul
ValueCountFrequency (%)
44
 
13.6%
39
 
12.0%
29
 
9.0%
12
 
3.7%
9
 
2.8%
8
 
2.5%
7
 
2.2%
7
 
2.2%
7
 
2.2%
7
 
2.2%
Other values (81) 155
47.8%
ASCII
ValueCountFrequency (%)
8
100.0%

adstrd_cd
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2991082.3
Minimum1101063
Maximum3902054
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T19:13:29.495415image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1101063
5-th percentile1107781.6
Q12378819.8
median3204556.5
Q33638261
95-th percentile3832211
Maximum3902054
Range2800991
Interquartile range (IQR)1259441.2

Descriptive statistics

Standard deviation819112
Coefficient of variation (CV)0.27385138
Kurtosis0.3003565
Mean2991082.3
Median Absolute Deviation (MAD)502995.5
Skewness-1.1069377
Sum2.9910823 × 108
Variance6.7094448 × 1011
MonotonicityNot monotonic
2023-12-10T19:13:29.795291image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3710055 1
 
1.0%
3105089 1
 
1.0%
2402068 1
 
1.0%
3116054 1
 
1.0%
3431011 1
 
1.0%
3646011 1
 
1.0%
3123053 1
 
1.0%
2205077 1
 
1.0%
2505053 1
 
1.0%
3110356 1
 
1.0%
Other values (90) 90
90.0%
ValueCountFrequency (%)
1101063 1
1.0%
1101064 1
1.0%
1101072 1
1.0%
1101073 1
1.0%
1102055 1
1.0%
1108083 1
1.0%
1114060 1
1.0%
1117052 1
1.0%
1119055 1
1.0%
1122051 1
1.0%
ValueCountFrequency (%)
3902054 1
1.0%
3901064 1
1.0%
3901052 1
1.0%
3839011 1
1.0%
3836011 1
1.0%
3832011 1
1.0%
3831011 1
1.0%
3811456 1
1.0%
3811155 1
1.0%
3810058 1
1.0%
Distinct97
Distinct (%)97.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-10T19:13:30.443464image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length3
Mean length3.38
Min length2

Characters and Unicode

Total characters338
Distinct characters115
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique96 ?
Unique (%)96.0%

Sample

1st row북부동
2nd row계산1동
3rd row덕진동
4th row송도2동
5th row서초1동
ValueCountFrequency (%)
중앙동 4
 
4.0%
북부동 1
 
1.0%
당산1동 1
 
1.0%
산본1동 1
 
1.0%
금산읍 1
 
1.0%
완도읍 1
 
1.0%
사우동 1
 
1.0%
관음동 1
 
1.0%
회덕동 1
 
1.0%
마두1동 1
 
1.0%
Other values (87) 87
87.0%
2023-12-10T19:13:31.029766image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
71
 
21.0%
29
 
8.6%
1 23
 
6.8%
7
 
2.1%
6
 
1.8%
6
 
1.8%
5
 
1.5%
5
 
1.5%
4
 
1.2%
4
 
1.2%
Other values (105) 178
52.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 304
89.9%
Decimal Number 31
 
9.2%
Other Punctuation 3
 
0.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
71
23.4%
29
 
9.5%
7
 
2.3%
6
 
2.0%
6
 
2.0%
5
 
1.6%
5
 
1.6%
4
 
1.3%
4
 
1.3%
4
 
1.3%
Other values (99) 163
53.6%
Decimal Number
ValueCountFrequency (%)
1 23
74.2%
2 3
 
9.7%
3 3
 
9.7%
5 1
 
3.2%
6 1
 
3.2%
Other Punctuation
ValueCountFrequency (%)
· 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 304
89.9%
Common 34
 
10.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
71
23.4%
29
 
9.5%
7
 
2.3%
6
 
2.0%
6
 
2.0%
5
 
1.6%
5
 
1.6%
4
 
1.3%
4
 
1.3%
4
 
1.3%
Other values (99) 163
53.6%
Common
ValueCountFrequency (%)
1 23
67.6%
· 3
 
8.8%
2 3
 
8.8%
3 3
 
8.8%
5 1
 
2.9%
6 1
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 304
89.9%
ASCII 31
 
9.2%
None 3
 
0.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
71
23.4%
29
 
9.5%
7
 
2.3%
6
 
2.0%
6
 
2.0%
5
 
1.6%
5
 
1.6%
4
 
1.3%
4
 
1.3%
4
 
1.3%
Other values (99) 163
53.6%
ASCII
ValueCountFrequency (%)
1 23
74.2%
2 3
 
9.7%
3 3
 
9.7%
5 1
 
3.2%
6 1
 
3.2%
None
ValueCountFrequency (%)
· 3
100.0%

co
Categorical

IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
1
96 
2
 
4

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
1 96
96.0%
2 4
 
4.0%

Length

2023-12-10T19:13:31.232709image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:13:31.402611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 96
96.0%
2 4
 
4.0%

Interactions

2023-12-10T19:13:24.844651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:13:23.864799image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:13:24.409517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:13:25.013266image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:13:24.010343image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:13:24.564255image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:13:25.193765image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:13:24.209618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:13:24.689800image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T19:13:31.528436image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ctprvn_cdctprvn_nmsigngu_cdsigngu_nmadstrd_cdadstrd_nmco
ctprvn_cd1.0001.0000.9990.8540.9990.9780.000
ctprvn_nm1.0001.0000.9940.0000.9940.9860.000
signgu_cd0.9990.9941.0000.5621.0000.9840.000
signgu_nm0.8540.0000.5621.0000.8200.9861.000
adstrd_cd0.9990.9941.0000.8201.0000.9880.000
adstrd_nm0.9780.9860.9840.9860.9881.0001.000
co0.0000.0000.0001.0000.0001.0001.000
2023-12-10T19:13:31.684482image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ctprvn_nmco
ctprvn_nm1.0000.000
co0.0001.000
2023-12-10T19:13:31.837399image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ctprvn_cdsigngu_cdadstrd_cdctprvn_nmco
ctprvn_cd1.0000.9960.9960.9500.000
signgu_cd0.9961.0001.0000.9250.000
adstrd_cd0.9961.0001.0000.9250.000
ctprvn_nm0.9500.9250.9251.0000.000
co0.0000.0000.0000.0001.000

Missing values

2023-12-10T19:13:25.432787image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T19:13:25.695110image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

ctprvn_cdctprvn_nmsigngu_cdsigngu_nmadstrd_cdadstrd_nmco
037경상북도37100경산시3710055북부동2
123인천광역시23070계양구2307053계산1동1
235전라북도35012전주시 덕진구3501257덕진동2
323인천광역시23040연수구2304066송도2동2
411서울특별시11220서초구1122051서초1동2
534충청남도34080당진시3408051당진1동1
635전라북도35380부안군3538011부안읍1
735전라북도35310완주군3531013용진읍1
823인천광역시23060부평구2306053부평3동1
933충청북도33370음성군3337011음성읍1
ctprvn_cdctprvn_nmsigngu_cdsigngu_nmadstrd_cdadstrd_nmco
9021부산광역시21080북구2108056덕천1동1
9137경상북도37070영천시3707052중앙동1
9223인천광역시23080서구2308056가정3동1
9332강원도32030강릉시3203062강남동1
9432강원도32070삼척시3207052성내동1
9533충청북도33030제천시3303060화산동1
9611서울특별시11140마포구1114060대흥동1
9731경기도31270포천시3127031군내면1
9835전라북도35011전주시 완산구3501175풍남동1
9937경상북도37330청송군3733011청송읍1