Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells1989
Missing cells (%)2.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory732.4 KiB
Average record size in memory75.0 B

Variable types

Text2
DateTime1
Categorical3
Numeric2

Dataset

Description소비자 민원 신청인에 대한 정보를 성별, 연령대, 지역, 기타 특성정보에 따라 관리하고 이를 보여주는 데이터 입니다.
Author공정거래위원회
URLhttps://www.data.go.kr/data/15098318/fileData.do

Alerts

성별(GENDER) is highly overall correlated with 성별코드(GENDER_CODE)High correlation
성별코드(GENDER_CODE) is highly overall correlated with 성별(GENDER)High correlation
연령대코드(AGE_GROUP_CODE) is highly overall correlated with 연령대명(AGE_GROUP_NAME)High correlation
연령대명(AGE_GROUP_NAME) is highly overall correlated with 연령대코드(AGE_GROUP_CODE)High correlation
연령대코드(AGE_GROUP_CODE) has 1989 (19.9%) missing valuesMissing
사건번호(ACCIDENT_NO) has unique valuesUnique

Reproduction

Analysis started2023-12-12 12:57:16.674341
Analysis finished2023-12-12 12:57:18.071012
Duration1.4 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T21:57:18.289485image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters120000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row2017-0001942
2nd row2016-0944163
3rd row2017-0213769
4th row2017-0173807
5th row2016-0162800
ValueCountFrequency (%)
2017-0001942 1
 
< 0.1%
2017-0136026 1
 
< 0.1%
2017-0084414 1
 
< 0.1%
2017-0077677 1
 
< 0.1%
2016-0230959 1
 
< 0.1%
2016-0879880 1
 
< 0.1%
2017-0121707 1
 
< 0.1%
2017-0142526 1
 
< 0.1%
2016-0868853 1
 
< 0.1%
2016-0536762 1
 
< 0.1%
Other values (9990) 9990
99.9%
2023-12-12T21:57:18.711818image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 26430
22.0%
2 16015
13.3%
1 15937
13.3%
6 13864
11.6%
- 10000
 
8.3%
7 8131
 
6.8%
3 6166
 
5.1%
4 6120
 
5.1%
5 5943
 
5.0%
8 5824
 
4.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 110000
91.7%
Dash Punctuation 10000
 
8.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 26430
24.0%
2 16015
14.6%
1 15937
14.5%
6 13864
12.6%
7 8131
 
7.4%
3 6166
 
5.6%
4 6120
 
5.6%
5 5943
 
5.4%
8 5824
 
5.3%
9 5570
 
5.1%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 120000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 26430
22.0%
2 16015
13.3%
1 15937
13.3%
6 13864
11.6%
- 10000
 
8.3%
7 8131
 
6.8%
3 6166
 
5.1%
4 6120
 
5.1%
5 5943
 
5.0%
8 5824
 
4.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 120000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 26430
22.0%
2 16015
13.3%
1 15937
13.3%
6 13864
11.6%
- 10000
 
8.3%
7 8131
 
6.8%
3 6166
 
5.1%
4 6120
 
5.1%
5 5943
 
5.0%
8 5824
 
4.9%
Distinct457
Distinct (%)4.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2016-01-01 00:00:00
Maximum2017-11-23 00:00:00
2023-12-12T21:57:18.897700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:57:19.069185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

성별코드(GENDER_CODE)
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
6224 
2
2111 
1
1665 

Length

Max length4
Median length4
Mean length2.8672
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row1
4th row2
5th row1

Common Values

ValueCountFrequency (%)
<NA> 6224
62.2%
2 2111
 
21.1%
1 1665
 
16.7%

Length

2023-12-12T21:57:19.232460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:57:19.356178image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 6224
62.2%
2 2111
 
21.1%
1 1665
 
16.7%

성별(GENDER)
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
6224 
여성
2111 
남성
1665 

Length

Max length4
Median length4
Mean length3.2448
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row여성
2nd row여성
3rd row남성
4th row여성
5th row남성

Common Values

ValueCountFrequency (%)
<NA> 6224
62.2%
여성 2111
 
21.1%
남성 1665
 
16.7%

Length

2023-12-12T21:57:19.498093image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:57:19.620461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 6224
62.2%
여성 2111
 
21.1%
남성 1665
 
16.7%

연령대코드(AGE_GROUP_CODE)
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct12
Distinct (%)0.1%
Missing1989
Missing (%)19.9%
Infinite0
Infinite (%)0.0%
Mean4.8288603
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T21:57:19.724705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q14
median5
Q36
95-th percentile7
Maximum12
Range11
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.5353662
Coefficient of variation (CV)0.31795622
Kurtosis4.0917753
Mean4.8288603
Median Absolute Deviation (MAD)1
Skewness1.568505
Sum38684
Variance2.3573493
MonotonicityNot monotonic
2023-12-12T21:57:19.833510image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
4 2689
26.9%
5 2023
20.2%
6 1320
13.2%
3 1163
11.6%
7 372
 
3.7%
8 136
 
1.4%
9 111
 
1.1%
11 90
 
0.9%
2 49
 
0.5%
12 36
 
0.4%
Other values (2) 22
 
0.2%
(Missing) 1989
19.9%
ValueCountFrequency (%)
1 3
 
< 0.1%
2 49
 
0.5%
3 1163
11.6%
4 2689
26.9%
5 2023
20.2%
6 1320
13.2%
7 372
 
3.7%
8 136
 
1.4%
9 111
 
1.1%
10 19
 
0.2%
ValueCountFrequency (%)
12 36
 
0.4%
11 90
 
0.9%
10 19
 
0.2%
9 111
 
1.1%
8 136
 
1.4%
7 372
 
3.7%
6 1320
13.2%
5 2023
20.2%
4 2689
26.9%
3 1163
11.6%

연령대명(AGE_GROUP_NAME)
Categorical

HIGH CORRELATION 

Distinct13
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
30 - 39세
2689 
40 - 49세
2023 
<NA>
1989 
50 - 59세
1320 
20 - 29세
1163 
Other values (8)
816 

Length

Max length11
Median length8
Mean length7.2431
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row40 - 49세
2nd row30 - 39세
3rd row불명
4th row30 - 39세
5th row40 - 49세

Common Values

ValueCountFrequency (%)
30 - 39세 2689
26.9%
40 - 49세 2023
20.2%
<NA> 1989
19.9%
50 - 59세 1320
13.2%
20 - 29세 1163
11.6%
(구)60 - 69세 372
 
3.7%
70 - 79세 136
 
1.4%
불명 111
 
1.1%
60 - 64세 90
 
0.9%
10 - 19세 49
 
0.5%
Other values (3) 58
 
0.6%

Length

2023-12-12T21:57:19.957854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
7878
30.6%
30 2689
 
10.4%
39세 2689
 
10.4%
40 2023
 
7.9%
49세 2023
 
7.9%
na 1989
 
7.7%
50 1320
 
5.1%
59세 1320
 
5.1%
20 1163
 
4.5%
29세 1163
 
4.5%
Other values (13) 1502
 
5.8%

지역코드(AREA_CODE)
Real number (ℝ)

Distinct243
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean805.7359
Minimum100
Maximum9907
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T21:57:20.092972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile100
Q1201
median800
Q31001
95-th percentile1506
Maximum9907
Range9807
Interquartile range (IQR)800

Descriptive statistics

Standard deviation1187.2989
Coefficient of variation (CV)1.4735584
Kurtosis47.000181
Mean805.7359
Median Absolute Deviation (MAD)400
Skewness6.4824096
Sum8057359
Variance1409678.6
MonotonicityNot monotonic
2023-12-12T21:57:20.233832image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
800 872
 
8.7%
100 647
 
6.5%
1205 338
 
3.4%
500 213
 
2.1%
810 211
 
2.1%
101 201
 
2.0%
400 181
 
1.8%
801 179
 
1.8%
809 174
 
1.7%
808 162
 
1.6%
Other values (233) 6822
68.2%
ValueCountFrequency (%)
100 647
6.5%
101 201
 
2.0%
102 84
 
0.8%
103 53
 
0.5%
104 89
 
0.9%
105 90
 
0.9%
106 65
 
0.7%
107 74
 
0.7%
108 38
 
0.4%
109 68
 
0.7%
ValueCountFrequency (%)
9907 6
 
0.1%
9906 1
 
< 0.1%
9903 4
 
< 0.1%
9902 1
 
< 0.1%
9901 9
 
0.1%
9900 124
1.2%
1700 34
 
0.3%
1604 108
1.1%
1603 24
 
0.2%
1600 41
 
0.4%
Distinct222
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T21:57:20.610190image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length3
Mean length3.3393
Min length2

Characters and Unicode

Total characters33393
Distinct characters145
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)0.1%

Sample

1st row전주시
2nd row제주시
3rd row강남구
4th row경기도
5th row화성시
ValueCountFrequency (%)
경기도 872
 
8.5%
서울특별시 647
 
6.3%
전주시 338
 
3.3%
서구 222
 
2.2%
광주광역시 213
 
2.1%
수원시 211
 
2.1%
강남구 201
 
2.0%
인천광역시 181
 
1.8%
고양시 179
 
1.7%
성남시 174
 
1.7%
Other values (213) 7010
68.4%
2023-12-12T21:57:21.118227image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5246
 
15.7%
3377
 
10.1%
1261
 
3.8%
1219
 
3.7%
1191
 
3.6%
1152
 
3.4%
1012
 
3.0%
962
 
2.9%
809
 
2.4%
771
 
2.3%
Other values (135) 16393
49.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 33145
99.3%
Space Separator 248
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5246
 
15.8%
3377
 
10.2%
1261
 
3.8%
1219
 
3.7%
1191
 
3.6%
1152
 
3.5%
1012
 
3.1%
962
 
2.9%
809
 
2.4%
771
 
2.3%
Other values (134) 16145
48.7%
Space Separator
ValueCountFrequency (%)
248
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 33145
99.3%
Common 248
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5246
 
15.8%
3377
 
10.2%
1261
 
3.8%
1219
 
3.7%
1191
 
3.6%
1152
 
3.5%
1012
 
3.1%
962
 
2.9%
809
 
2.4%
771
 
2.3%
Other values (134) 16145
48.7%
Common
ValueCountFrequency (%)
248
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 33145
99.3%
ASCII 248
 
0.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5246
 
15.8%
3377
 
10.2%
1261
 
3.8%
1219
 
3.7%
1191
 
3.6%
1152
 
3.5%
1012
 
3.1%
962
 
2.9%
809
 
2.4%
771
 
2.3%
Other values (134) 16145
48.7%
ASCII
ValueCountFrequency (%)
248
100.0%

Interactions

2023-12-12T21:57:17.525589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:57:17.317028image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:57:17.649425image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:57:17.410132image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T21:57:21.236580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성별코드(GENDER_CODE)성별(GENDER)연령대코드(AGE_GROUP_CODE)연령대명(AGE_GROUP_NAME)지역코드(AREA_CODE)
성별코드(GENDER_CODE)1.0001.0000.1400.1350.000
성별(GENDER)1.0001.0000.1400.1350.000
연령대코드(AGE_GROUP_CODE)0.1400.1401.0001.0000.127
연령대명(AGE_GROUP_NAME)0.1350.1351.0001.0000.166
지역코드(AREA_CODE)0.0000.0000.1270.1661.000
2023-12-12T21:57:21.364994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성별(GENDER)성별코드(GENDER_CODE)연령대명(AGE_GROUP_NAME)
성별(GENDER)1.0000.9990.105
성별코드(GENDER_CODE)0.9991.0000.105
연령대명(AGE_GROUP_NAME)0.1050.1051.000
2023-12-12T21:57:21.474587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연령대코드(AGE_GROUP_CODE)지역코드(AREA_CODE)성별코드(GENDER_CODE)성별(GENDER)연령대명(AGE_GROUP_NAME)
연령대코드(AGE_GROUP_CODE)1.0000.0830.1070.1071.000
지역코드(AREA_CODE)0.0831.0000.0000.0000.075
성별코드(GENDER_CODE)0.1070.0001.0000.9990.105
성별(GENDER)0.1070.0000.9991.0000.105
연령대명(AGE_GROUP_NAME)1.0000.0750.1050.1051.000

Missing values

2023-12-12T21:57:17.802971image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:57:17.976789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

사건번호(ACCIDENT_NO)접수일자(RCPT_YMD)성별코드(GENDER_CODE)성별(GENDER)연령대코드(AGE_GROUP_CODE)연령대명(AGE_GROUP_NAME)지역코드(AREA_CODE)지역명(AREA_NAME)
699622017-00019422017-01-022여성540 - 49세1205전주시
770152016-09441632016-12-262여성430 - 39세1604제주시
945592017-02137692017-03-231남성9불명101강남구
833572017-01738072017-03-092여성430 - 39세800경기도
37612016-01628002016-03-091남성540 - 49세831화성시
34382016-02590742016-04-16<NA><NA>320 - 29세603서구
373802016-06406842016-09-072여성430 - 39세119양천구
773242017-00187552017-01-09<NA><NA><NA><NA>821하남시
869722017-00458502017-01-18<NA><NA>320 - 29세814오산시
463672016-07606322016-10-24<NA><NA><NA><NA>831화성시
사건번호(ACCIDENT_NO)접수일자(RCPT_YMD)성별코드(GENDER_CODE)성별(GENDER)연령대코드(AGE_GROUP_CODE)연령대명(AGE_GROUP_NAME)지역코드(AREA_CODE)지역명(AREA_NAME)
119122016-04114912016-06-172여성<NA><NA>100서울특별시
280852016-05405712016-08-02<NA><NA><NA><NA>800경기도
247252016-05226662016-07-271남성870 - 79세1205전주시
128282016-03794542016-06-03<NA><NA>320 - 29세1700세종특별자치시
138402016-03497922016-05-242여성430 - 39세1305여수시
91092016-03073092016-05-08<NA><NA>540 - 49세407연수구
565352016-07455362016-10-18<NA><NA>430 - 39세1115홍성군
503092016-07942302016-11-041남성540 - 49세906춘천시
917072017-00927682017-02-072여성1265 - 69세1205전주시
553732016-06565762016-09-122여성<NA><NA>505서구