Overview

Dataset statistics

Number of variables6
Number of observations269
Missing cells236
Missing cells (%)14.6%
Duplicate rows19
Duplicate rows (%)7.1%
Total size in memory13.5 KiB
Average record size in memory51.5 B

Variable types

Categorical2
Numeric2
DateTime2

Dataset

Description오산시 지방세 ARS카드납부시스템의 선택납부에 대한 데이터로 총합계금액, 선택납부 횟수 등의 항목을 제공합니다.
URLhttps://www.data.go.kr/data/15081647/fileData.do

Alerts

Dataset has 19 (7.1%) duplicate rowsDuplicates
과세구분 is highly overall correlated with 총합계금액 and 2 other fieldsHigh correlation
시구분 is highly overall correlated with 총합계금액 and 2 other fieldsHigh correlation
총합계금액 is highly overall correlated with 과세구분 and 1 other fieldsHigh correlation
횟수 is highly overall correlated with 과세구분 and 1 other fieldsHigh correlation
총합계금액 has 59 (21.9%) missing valuesMissing
횟수 has 59 (21.9%) missing valuesMissing
등록일자 has 59 (21.9%) missing valuesMissing
유지만료일자 has 59 (21.9%) missing valuesMissing

Reproduction

Analysis started2023-12-12 12:31:03.538167
Analysis finished2023-12-12 12:31:04.626686
Duration1.09 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

과세구분
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
통합
210 
<NA>
59 

Length

Max length4
Median length2
Mean length2.4386617
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row통합
2nd row통합
3rd row통합
4th row통합
5th row통합

Common Values

ValueCountFrequency (%)
통합 210
78.1%
<NA> 59
 
21.9%

Length

2023-12-12T21:31:04.768291image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:31:04.948367image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
통합 210
78.1%
na 59
 
21.9%

총합계금액
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct99
Distinct (%)47.1%
Missing59
Missing (%)21.9%
Infinite0
Infinite (%)0.0%
Mean801470.24
Minimum15450
Maximum8506860
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2023-12-12T21:31:05.124304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum15450
5-th percentile100000
Q1181540
median300000
Q3747657.5
95-th percentile3000000
Maximum8506860
Range8491410
Interquartile range (IQR)566117.5

Descriptive statistics

Standard deviation1355436.2
Coefficient of variation (CV)1.6911872
Kurtosis14.242091
Mean801470.24
Median Absolute Deviation (MAD)200000
Skewness3.5978903
Sum1.6830875 × 108
Variance1.8372074 × 1012
MonotonicityNot monotonic
2023-12-12T21:31:05.316276image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100000 27
 
10.0%
1000000 18
 
6.7%
300000 16
 
5.9%
500000 12
 
4.5%
2000000 11
 
4.1%
200000 6
 
2.2%
560310 4
 
1.5%
3000000 3
 
1.1%
243040 3
 
1.1%
4700000 3
 
1.1%
Other values (89) 107
39.8%
(Missing) 59
21.9%
ValueCountFrequency (%)
15450 1
 
0.4%
40000 1
 
0.4%
42410 1
 
0.4%
55760 1
 
0.4%
66580 1
 
0.4%
71310 1
 
0.4%
88290 1
 
0.4%
100000 27
10.0%
104230 1
 
0.4%
105810 1
 
0.4%
ValueCountFrequency (%)
8506860 1
 
0.4%
8000000 2
 
0.7%
6000000 1
 
0.4%
5628860 2
 
0.7%
5000000 1
 
0.4%
4700000 3
 
1.1%
3000000 3
 
1.1%
2889800 1
 
0.4%
2135950 1
 
0.4%
2000000 11
4.1%

횟수
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct12
Distinct (%)5.7%
Missing59
Missing (%)21.9%
Infinite0
Infinite (%)0.0%
Mean2.0047619
Minimum1
Maximum27
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2023-12-12T21:31:05.501585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile6
Maximum27
Range26
Interquartile range (IQR)1

Descriptive statistics

Standard deviation2.7728996
Coefficient of variation (CV)1.3831566
Kurtosis40.31584
Mean2.0047619
Median Absolute Deviation (MAD)0
Skewness5.5931787
Sum421
Variance7.6889724
MonotonicityNot monotonic
2023-12-12T21:31:05.673845image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
1 146
54.3%
2 25
 
9.3%
3 15
 
5.6%
4 11
 
4.1%
10 3
 
1.1%
6 3
 
1.1%
9 2
 
0.7%
11 1
 
0.4%
20 1
 
0.4%
7 1
 
0.4%
Other values (2) 2
 
0.7%
(Missing) 59
21.9%
ValueCountFrequency (%)
1 146
54.3%
2 25
 
9.3%
3 15
 
5.6%
4 11
 
4.1%
5 1
 
0.4%
6 3
 
1.1%
7 1
 
0.4%
9 2
 
0.7%
10 3
 
1.1%
11 1
 
0.4%
ValueCountFrequency (%)
27 1
 
0.4%
20 1
 
0.4%
11 1
 
0.4%
10 3
 
1.1%
9 2
 
0.7%
7 1
 
0.4%
6 3
 
1.1%
5 1
 
0.4%
4 11
4.1%
3 15
5.6%

시구분
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
41370
210 
<NA>
59 

Length

Max length5
Median length5
Mean length4.7806691
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row41370
2nd row41370
3rd row41370
4th row41370
5th row41370

Common Values

ValueCountFrequency (%)
41370 210
78.1%
<NA> 59
 
21.9%

Length

2023-12-12T21:31:05.848107image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:31:06.000069image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
41370 210
78.1%
na 59
 
21.9%

등록일자
Date

MISSING 

Distinct115
Distinct (%)54.8%
Missing59
Missing (%)21.9%
Memory size2.2 KiB
Minimum2022-01-03 00:00:00
Maximum2022-12-30 00:00:00
2023-12-12T21:31:06.144434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:31:06.324304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

유지만료일자
Date

MISSING 

Distinct113
Distinct (%)53.8%
Missing59
Missing (%)21.9%
Memory size2.2 KiB
Minimum2022-01-04 00:00:00
Maximum2022-12-31 00:00:00
2023-12-12T21:31:06.530203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:31:06.736204image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2023-12-12T21:31:03.922621image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:31:03.709929image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:31:04.046202image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:31:03.810151image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T21:31:06.850757image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
총합계금액횟수
총합계금액1.0000.411
횟수0.4111.000
2023-12-12T21:31:06.983537image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
과세구분시구분
과세구분1.0001.000
시구분1.0001.000
2023-12-12T21:31:07.104973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
총합계금액횟수과세구분시구분
총합계금액1.0000.1871.0001.000
횟수0.1871.0001.0001.000
과세구분1.0001.0001.0001.000
시구분1.0001.0001.0001.000

Missing values

2023-12-12T21:31:04.220773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:31:04.370810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T21:31:04.508757image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

과세구분총합계금액횟수시구분등록일자유지만료일자
0통합1337501413702022-01-032022-01-04
1통합57721011413702022-01-032022-01-04
2통합4616401413702022-01-032022-01-04
3통합4616401413702022-01-032022-01-04
4통합1332801413702022-01-042022-01-05
5통합4616401413702022-01-102022-01-11
6통합2671201413702022-01-112022-01-12
7통합2949501413702022-01-132022-01-14
8통합5603101413702022-01-132022-01-14
9통합5603101413702022-01-132022-01-14
과세구분총합계금액횟수시구분등록일자유지만료일자
259<NA><NA><NA><NA><NA><NA>
260<NA><NA><NA><NA><NA><NA>
261<NA><NA><NA><NA><NA><NA>
262<NA><NA><NA><NA><NA><NA>
263<NA><NA><NA><NA><NA><NA>
264<NA><NA><NA><NA><NA><NA>
265<NA><NA><NA><NA><NA><NA>
266<NA><NA><NA><NA><NA><NA>
267<NA><NA><NA><NA><NA><NA>
268<NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

과세구분총합계금액횟수시구분등록일자유지만료일자# duplicates
18<NA><NA><NA><NA><NA><NA>59
13통합20000004413702022-01-242022-01-255
5통합5603101413702022-01-132022-01-144
2통합3000001413702022-04-012022-04-023
11통합100000010413702022-05-032022-05-043
12통합20000001413702022-12-202022-12-213
15통합47000001413702022-02-172022-02-173
0통합1315201413702022-01-212022-01-222
1통합2497001413702022-02-212022-02-212
3통합4616401413702022-01-032022-01-042