Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows40
Duplicate rows (%)0.4%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

Text1
DateTime2
Categorical2
Numeric2

Dataset

Description강원도 춘천시 경영지원과 요금팀의 2015년 6월 16일 ~ 2021년 5월 21일까지의 상수도 및 하수도 관련 전화 접수민원에 대한 분석자료
Author강원도 춘천시
URLhttps://www.data.go.kr/data/15097628/fileData.do

Alerts

데이터기준일 has constant value ""Constant
Dataset has 40 (0.4%) duplicate rowsDuplicates
안내수 is highly overall correlated with 안내금액High correlation
안내금액 is highly overall correlated with 안내수High correlation
분류 is highly imbalanced (50.1%)Imbalance
부과유형 is highly imbalanced (57.8%)Imbalance
안내금액 is highly skewed (γ1 = 20.56845779)Skewed
안내수 has 3254 (32.5%) zerosZeros
안내금액 has 1412 (14.1%) zerosZeros

Reproduction

Analysis started2023-12-12 10:33:41.269576
Analysis finished2023-12-12 10:33:42.538401
Duration1.27 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct5716
Distinct (%)57.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T19:33:42.724128image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length15
Mean length15
Min length15

Characters and Unicode

Total characters150000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3704 ?
Unique (%)37.0%

Sample

1st row017-014-1125-70
2nd row025-501-1250-00
3rd row016-072-3100-02
4th row013-051-1900-00
5th row020-017-0264-00
ValueCountFrequency (%)
025-501-5920-00 11
 
0.1%
030-010-0499-00 10
 
0.1%
012-207-0140-70 10
 
0.1%
016-100-9420-02 9
 
0.1%
013-280-2210-70 9
 
0.1%
015-040-2600-01 9
 
0.1%
016-071-1800-01 9
 
0.1%
007-050-0369-00 9
 
0.1%
013-051-2400-04 9
 
0.1%
015-013-6600-01 9
 
0.1%
Other values (5706) 9906
99.1%
2023-12-12T19:33:43.101808image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 56251
37.5%
- 30000
20.0%
1 17599
 
11.7%
2 10103
 
6.7%
3 7567
 
5.0%
5 7183
 
4.8%
7 6201
 
4.1%
4 4978
 
3.3%
6 4201
 
2.8%
8 3266
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 120000
80.0%
Dash Punctuation 30000
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 56251
46.9%
1 17599
 
14.7%
2 10103
 
8.4%
3 7567
 
6.3%
5 7183
 
6.0%
7 6201
 
5.2%
4 4978
 
4.1%
6 4201
 
3.5%
8 3266
 
2.7%
9 2651
 
2.2%
Dash Punctuation
ValueCountFrequency (%)
- 30000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 150000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 56251
37.5%
- 30000
20.0%
1 17599
 
11.7%
2 10103
 
6.7%
3 7567
 
5.0%
5 7183
 
4.8%
7 6201
 
4.1%
4 4978
 
3.3%
6 4201
 
2.8%
8 3266
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 150000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 56251
37.5%
- 30000
20.0%
1 17599
 
11.7%
2 10103
 
6.7%
3 7567
 
5.0%
5 7183
 
4.8%
7 6201
 
4.1%
4 4978
 
3.3%
6 4201
 
2.8%
8 3266
 
2.2%

날짜
Date

Distinct1289
Distinct (%)12.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2015-06-16 00:00:00
Maximum2021-05-12 00:00:00
2023-12-12T19:33:43.293584image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:33:43.497279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

분류
Categorical

IMBALANCE 

Distinct21
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
요금안내
3569 
자동납부
3521 
요금체납
1590 
기타
674 
고지서
 
143
Other values (16)
503 

Length

Max length9
Median length4
Mean length3.8222
Min length1

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row지침
2nd row자동납부
3rd row요금체납
4th row요금안내
5th row요금안내

Common Values

ValueCountFrequency (%)
요금안내 3569
35.7%
자동납부 3521
35.2%
요금체납 1590
15.9%
기타 674
 
6.7%
고지서 143
 
1.4%
단수예정 115
 
1.1%
이사 114
 
1.1%
지침 104
 
1.0%
이사정산 63
 
0.6%
수도검침 50
 
0.5%
Other values (11) 57
 
0.6%

Length

2023-12-12T19:33:43.686500image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
요금안내 3569
35.7%
자동납부 3521
35.2%
요금체납 1590
15.9%
기타 674
 
6.7%
고지서 143
 
1.4%
단수예정 115
 
1.1%
이사 114
 
1.1%
지침 104
 
1.0%
이사정산 63
 
0.6%
수도검침 50
 
0.5%
Other values (11) 57
 
0.6%

안내수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct41
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.7919
Minimum0
Maximum67
Zeros3254
Zeros (%)32.5%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T19:33:43.857900image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q31
95-th percentile7
Maximum67
Range67
Interquartile range (IQR)1

Descriptive statistics

Standard deviation3.3689359
Coefficient of variation (CV)1.8800915
Kurtosis60.780371
Mean1.7919
Median Absolute Deviation (MAD)1
Skewness5.8848345
Sum17919
Variance11.349729
MonotonicityNot monotonic
2023-12-12T19:33:44.050982image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%)
1 4295
43.0%
0 3254
32.5%
2 606
 
6.1%
4 412
 
4.1%
3 402
 
4.0%
5 249
 
2.5%
6 179
 
1.8%
7 105
 
1.1%
8 95
 
0.9%
9 77
 
0.8%
Other values (31) 326
 
3.3%
ValueCountFrequency (%)
0 3254
32.5%
1 4295
43.0%
2 606
 
6.1%
3 402
 
4.0%
4 412
 
4.1%
5 249
 
2.5%
6 179
 
1.8%
7 105
 
1.1%
8 95
 
0.9%
9 77
 
0.8%
ValueCountFrequency (%)
67 1
 
< 0.1%
56 2
< 0.1%
52 1
 
< 0.1%
49 1
 
< 0.1%
48 1
 
< 0.1%
45 1
 
< 0.1%
43 1
 
< 0.1%
38 1
 
< 0.1%
37 1
 
< 0.1%
36 4
< 0.1%

안내금액
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct4650
Distinct (%)46.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean83241.073
Minimum0
Maximum18438240
Zeros1412
Zeros (%)14.1%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T19:33:44.256945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q14460
median16010
Q346432.5
95-th percentile313802
Maximum18438240
Range18438240
Interquartile range (IQR)41972.5

Descriptive statistics

Standard deviation381723.69
Coefficient of variation (CV)4.5857613
Kurtosis694.57151
Mean83241.073
Median Absolute Deviation (MAD)14600
Skewness20.568458
Sum8.3241073 × 108
Variance1.4571298 × 1011
MonotonicityNot monotonic
2023-12-12T19:33:44.483454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1412
 
14.1%
1410 130
 
1.3%
1260 101
 
1.0%
1130 63
 
0.6%
9210 28
 
0.3%
3080 28
 
0.3%
6660 27
 
0.3%
2750 27
 
0.3%
7810 27
 
0.3%
4950 25
 
0.2%
Other values (4640) 8132
81.3%
ValueCountFrequency (%)
0 1412
14.1%
90 1
 
< 0.1%
430 1
 
< 0.1%
570 1
 
< 0.1%
650 1
 
< 0.1%
680 1
 
< 0.1%
770 1
 
< 0.1%
900 2
 
< 0.1%
910 1
 
< 0.1%
920 6
 
0.1%
ValueCountFrequency (%)
18438240 1
< 0.1%
9562040 1
< 0.1%
8684980 1
< 0.1%
7890620 1
< 0.1%
7544720 1
< 0.1%
7032590 1
< 0.1%
6933120 1
< 0.1%
6872830 1
< 0.1%
6496530 1
< 0.1%
6041550 1
< 0.1%

부과유형
Categorical

IMBALANCE 

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
5258 
가정용
4047 
일반용
575 
일반혼합용
 
106
전용공업용
 
10
Other values (4)
 
4

Length

Max length7
Median length4
Mean length3.5497
Min length3

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row가정용
3rd row일반용
4th row<NA>
5th row일반용

Common Values

ValueCountFrequency (%)
<NA> 5258
52.6%
가정용 4047
40.5%
일반용 575
 
5.8%
일반혼합용 106
 
1.1%
전용공업용 10
 
0.1%
혼합용 1
 
< 0.1%
일반용 1
 
< 0.1%
대중탕용 1
 
< 0.1%
가정용/일반용 1
 
< 0.1%

Length

2023-12-12T19:33:44.672087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T19:33:44.888135image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 5258
52.6%
가정용 4047
40.5%
일반용 576
 
5.8%
일반혼합용 106
 
1.1%
전용공업용 10
 
0.1%
혼합용 1
 
< 0.1%
대중탕용 1
 
< 0.1%
가정용/일반용 1
 
< 0.1%

데이터기준일
Date

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2021-12-17 00:00:00
Maximum2021-12-17 00:00:00
2023-12-12T19:33:45.079460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:33:45.212836image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

Interactions

2023-12-12T19:33:42.100434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:33:41.836541image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:33:42.212250image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:33:41.975834image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T19:33:45.314225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
분류안내수안내금액부과유형
분류1.0000.2150.0390.240
안내수0.2151.0000.0380.111
안내금액0.0390.0381.0000.060
부과유형0.2400.1110.0601.000
2023-12-12T19:33:45.431079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
부과유형분류
부과유형1.0000.116
분류0.1161.000
2023-12-12T19:33:45.545464image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
안내수안내금액분류부과유형
안내수1.0000.5070.0810.050
안내금액0.5071.0000.0150.027
분류0.0810.0151.0000.116
부과유형0.0500.0270.1161.000

Missing values

2023-12-12T19:33:42.337483image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T19:33:42.455486image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

수용가번호날짜분류안내수안내금액부과유형데이터기준일
54913017-014-1125-702019-09-25지침00<NA>2021-12-17
86186025-501-1250-002020-12-15자동납부13120가정용2021-12-17
63817016-072-3100-022020-01-13요금체납120950일반용2021-12-17
52796013-051-1900-002019-09-02요금안내134860<NA>2021-12-17
35106020-017-0264-002018-12-11요금안내123740일반용2021-12-17
15318015-032-0030-052017-05-17요금안내00<NA>2021-12-17
37531030-033-0500-022019-01-11자동납부08530가정용2021-12-17
61598025-501-5400-002019-12-11자동납부111760가정용2021-12-17
6458026-300-0250-702016-04-11기타00<NA>2021-12-17
51154019-010-0056-002019-08-09요금안내0148940일반혼합용2021-12-17
수용가번호날짜분류안내수안내금액부과유형데이터기준일
33644030-417-0300-002018-11-09자동납부113920일반용2021-12-17
93112029-100-0607-012021-03-09이사정산256030<NA>2021-12-17
40799033-400-5670-002019-03-12요금안내018860가정용2021-12-17
20921024-331-2700-002018-01-22요금안내5204430<NA>2021-12-17
46960007-090-1029-002019-06-12자동납부01260가정용2021-12-17
35757025-501-0460-002018-12-11자동납부18610가정용2021-12-17
23599020-016-0125-002018-04-30요금안내179680<NA>2021-12-17
98710016-416-0200-022021-04-21기타00<NA>2021-12-17
64209021-824-0330-502020-01-13자동납부16660가정용2021-12-17
68270029-400-1900-402020-03-05요금안내198830<NA>2021-12-17

Duplicate rows

Most frequently occurring

수용가번호날짜분류안내수안내금액부과유형데이터기준일# duplicates
35030-020-0267-622019-01-31요금안내17360<NA>2021-12-173
0002-209-0600-022020-02-25요금체납218980<NA>2021-12-172
1004-110-0800-002018-10-15자동납부133240일반혼합용2021-12-172
2005-020-1800-002018-03-05기타00<NA>2021-12-172
3006-020-0990-702017-04-27요금안내00<NA>2021-12-172
4007-090-1017-002017-12-04요금안내1048080<NA>2021-12-172
5007-090-1086-002020-01-09자동납부00<NA>2021-12-172
6008-035-0400-012020-04-17요금체납333220<NA>2021-12-172
7009-100-5400-002018-07-11자동납부12270가정용2021-12-172
8010-096-5600-002017-08-30요금안내00<NA>2021-12-172