Overview

Dataset statistics

Number of variables11
Number of observations30
Missing cells38
Missing cells (%)11.5%
Duplicate rows1
Duplicate rows (%)3.3%
Total size in memory2.9 KiB
Average record size in memory99.4 B

Variable types

Categorical6
Numeric4
Unsupported1

Dataset

Description지방세 체납현황으로 과세년도, 세목명, 체납액구간, 체납건수, 체납금액, 누적체납건수, 누적체납금액 등을 제공합니다.
URLhttps://www.data.go.kr/data/15079309/fileData.do

Alerts

Dataset has 1 (3.3%) duplicate rowsDuplicates
시군구명 is highly overall correlated with 체납건수 and 8 other fieldsHigh correlation
체납액구간 is highly overall correlated with 체납금액 and 4 other fieldsHigh correlation
과세년도 is highly overall correlated with 체납건수 and 8 other fieldsHigh correlation
자치단체코드 is highly overall correlated with 체납건수 and 8 other fieldsHigh correlation
시도명 is highly overall correlated with 체납건수 and 8 other fieldsHigh correlation
세목명 is highly overall correlated with 시도명 and 3 other fieldsHigh correlation
체납건수 is highly overall correlated with 누적체납건수 and 4 other fieldsHigh correlation
체납금액 is highly overall correlated with 누적체납금액 and 5 other fieldsHigh correlation
누적체납건수 is highly overall correlated with 체납건수 and 4 other fieldsHigh correlation
누적체납금액 is highly overall correlated with 체납금액 and 4 other fieldsHigh correlation
시도명 is highly imbalanced (64.7%)Imbalance
시군구명 is highly imbalanced (64.7%)Imbalance
자치단체코드 is highly imbalanced (64.7%)Imbalance
과세년도 is highly imbalanced (64.7%)Imbalance
체납건수 has 2 (6.7%) missing valuesMissing
체납금액 has 2 (6.7%) missing valuesMissing
누적체납건수 has 2 (6.7%) missing valuesMissing
누적체납금액 has 2 (6.7%) missing valuesMissing
Unnamed: 10 has 30 (100.0%) missing valuesMissing
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-12 19:32:04.632987
Analysis finished2023-12-12 19:32:07.741578
Duration3.11 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시도명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
전라남도
28 
<NA>
 
2

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row전라남도
2nd row전라남도
3rd row전라남도
4th row전라남도
5th row전라남도

Common Values

ValueCountFrequency (%)
전라남도 28
93.3%
<NA> 2
 
6.7%

Length

2023-12-13T04:32:07.822553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:32:07.934285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
전라남도 28
93.3%
na 2
 
6.7%

시군구명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
강진군
28 
<NA>
 
2

Length

Max length4
Median length3
Mean length3.0666667
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강진군
2nd row강진군
3rd row강진군
4th row강진군
5th row강진군

Common Values

ValueCountFrequency (%)
강진군 28
93.3%
<NA> 2
 
6.7%

Length

2023-12-13T04:32:08.086692image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:32:08.207567image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
강진군 28
93.3%
na 2
 
6.7%

자치단체코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
46810
28 
<NA>
 
2

Length

Max length5
Median length5
Mean length4.9333333
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row46810
2nd row46810
3rd row46810
4th row46810
5th row46810

Common Values

ValueCountFrequency (%)
46810 28
93.3%
<NA> 2
 
6.7%

Length

2023-12-13T04:32:08.344827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:32:08.472119image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
46810 28
93.3%
na 2
 
6.7%

과세년도
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
2022
28 
<NA>
 
2

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022
2nd row2022
3rd row2022
4th row2022
5th row2022

Common Values

ValueCountFrequency (%)
2022 28
93.3%
<NA> 2
 
6.7%

Length

2023-12-13T04:32:08.608968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:32:08.734592image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2022 28
93.3%
na 2
 
6.7%

세목명
Categorical

HIGH CORRELATION 

Distinct8
Distinct (%)26.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
재산세
지방소득세
취득세
자동차세
주민세
Other values (3)

Length

Max length7
Median length3
Mean length3.8333333
Min length3

Unique

Unique2 ?
Unique (%)6.7%

Sample

1st row등록면허세
2nd row자동차세
3rd row자동차세
4th row자동차세
5th row재산세

Common Values

ValueCountFrequency (%)
재산세 7
23.3%
지방소득세 7
23.3%
취득세 6
20.0%
자동차세 3
10.0%
주민세 3
10.0%
<NA> 2
 
6.7%
등록면허세 1
 
3.3%
지역자원시설세 1
 
3.3%

Length

2023-12-13T04:32:08.891894image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:32:09.044801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
재산세 7
23.3%
지방소득세 7
23.3%
취득세 6
20.0%
자동차세 3
10.0%
주민세 3
10.0%
na 2
 
6.7%
등록면허세 1
 
3.3%
지역자원시설세 1
 
3.3%

체납액구간
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)33.3%
Missing0
Missing (%)0.0%
Memory size372.0 B
10만원 미만
10만원~30만원미만
30만원~50만원미만
1백만원~3백만원미만
50만원~1백만원미만
Other values (5)

Length

Max length11
Median length11
Mean length9.5666667
Min length4

Unique

Unique2 ?
Unique (%)6.7%

Sample

1st row10만원 미만
2nd row10만원 미만
3rd row10만원~30만원미만
4th row30만원~50만원미만
5th row10만원 미만

Common Values

ValueCountFrequency (%)
10만원 미만 7
23.3%
10만원~30만원미만 5
16.7%
30만원~50만원미만 4
13.3%
1백만원~3백만원미만 3
10.0%
50만원~1백만원미만 3
10.0%
3백만원~5백만원미만 2
 
6.7%
5백만원~1천만원미만 2
 
6.7%
<NA> 2
 
6.7%
1천만원~3천만원미만 1
 
3.3%
5천만원~1억원미만 1
 
3.3%

Length

2023-12-13T04:32:09.203735image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:32:09.367257image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
10만원 7
18.9%
미만 7
18.9%
10만원~30만원미만 5
13.5%
30만원~50만원미만 4
10.8%
1백만원~3백만원미만 3
8.1%
50만원~1백만원미만 3
8.1%
3백만원~5백만원미만 2
 
5.4%
5백만원~1천만원미만 2
 
5.4%
na 2
 
5.4%
1천만원~3천만원미만 1
 
2.7%

체납건수
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct17
Distinct (%)60.7%
Missing2
Missing (%)6.7%
Infinite0
Infinite (%)0.0%
Mean105.5
Minimum1
Maximum1513
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size402.0 B
2023-12-13T04:32:09.512184image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12.5
median6.5
Q330.75
95-th percentile527.7
Maximum1513
Range1512
Interquartile range (IQR)28.25

Descriptive statistics

Standard deviation307.04029
Coefficient of variation (CV)2.9103345
Kurtosis17.624148
Mean105.5
Median Absolute Deviation (MAD)5.5
Skewness4.0705953
Sum2954
Variance94273.741
MonotonicityNot monotonic
2023-12-13T04:32:09.640938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
1 7
23.3%
3 3
10.0%
8 2
 
6.7%
5 2
 
6.7%
9 2
 
6.7%
7 1
 
3.3%
4 1
 
3.3%
22 1
 
3.3%
60 1
 
3.3%
136 1
 
3.3%
Other values (7) 7
23.3%
(Missing) 2
 
6.7%
ValueCountFrequency (%)
1 7
23.3%
3 3
10.0%
4 1
 
3.3%
5 2
 
6.7%
6 1
 
3.3%
7 1
 
3.3%
8 2
 
6.7%
9 2
 
6.7%
12 1
 
3.3%
22 1
 
3.3%
ValueCountFrequency (%)
1513 1
3.3%
674 1
3.3%
256 1
3.3%
147 1
3.3%
136 1
3.3%
60 1
3.3%
57 1
3.3%
22 1
3.3%
12 1
3.3%
9 2
6.7%

체납금액
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct28
Distinct (%)100.0%
Missing2
Missing (%)6.7%
Infinite0
Infinite (%)0.0%
Mean9856011.1
Minimum5410
Maximum65859290
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size402.0 B
2023-12-13T04:32:09.802990image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum5410
5-th percentile331777
Q11709647.5
median4352445
Q39951725
95-th percentile32413943
Maximum65859290
Range65853880
Interquartile range (IQR)8242077.5

Descriptive statistics

Standard deviation14229224
Coefficient of variation (CV)1.4437102
Kurtosis8.5227254
Mean9856011.1
Median Absolute Deviation (MAD)3202915
Skewness2.7131647
Sum2.7596831 × 108
Variance2.0247081 × 1014
MonotonicityNot monotonic
2023-12-13T04:32:09.962504image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
3958960 1
 
3.3%
65859290 1
 
3.3%
1265310 1
 
3.3%
24894030 1
 
3.3%
3983460 1
 
3.3%
1189090 1
 
3.3%
280460 1
 
3.3%
5410 1
 
3.3%
36264150 1
 
3.3%
5976220 1
 
3.3%
Other values (18) 18
60.0%
(Missing) 2
 
6.7%
ValueCountFrequency (%)
5410 1
3.3%
280460 1
3.3%
427080 1
3.3%
1189090 1
3.3%
1229070 1
3.3%
1265310 1
3.3%
1544520 1
3.3%
1764690 1
3.3%
1980460 1
3.3%
2993030 1
3.3%
ValueCountFrequency (%)
65859290 1
3.3%
36264150 1
3.3%
25263560 1
3.3%
24894030 1
3.3%
23699940 1
3.3%
11793870 1
3.3%
10833230 1
3.3%
9657890 1
3.3%
8647840 1
3.3%
8507020 1
3.3%

누적체납건수
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct20
Distinct (%)71.4%
Missing2
Missing (%)6.7%
Infinite0
Infinite (%)0.0%
Mean211.67857
Minimum1
Maximum3051
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size402.0 B
2023-12-13T04:32:10.103673image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median11
Q346.25
95-th percentile1056.25
Maximum3051
Range3050
Interquartile range (IQR)43.25

Descriptive statistics

Standard deviation620.01094
Coefficient of variation (CV)2.9290208
Kurtosis17.503533
Mean211.67857
Median Absolute Deviation (MAD)9
Skewness4.0467013
Sum5927
Variance384413.56
MonotonicityNot monotonic
2023-12-13T04:32:10.249004image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
1 4
 
13.3%
3 3
 
10.0%
2 2
 
6.7%
8 2
 
6.7%
17 2
 
6.7%
80 1
 
3.3%
22 1
 
3.3%
14 1
 
3.3%
3051 1
 
3.3%
15 1
 
3.3%
Other values (10) 10
33.3%
(Missing) 2
 
6.7%
ValueCountFrequency (%)
1 4
13.3%
2 2
6.7%
3 3
10.0%
5 1
 
3.3%
6 1
 
3.3%
8 2
6.7%
9 1
 
3.3%
13 1
 
3.3%
14 1
 
3.3%
15 1
 
3.3%
ValueCountFrequency (%)
3051 1
3.3%
1338 1
3.3%
533 1
3.3%
426 1
3.3%
230 1
3.3%
83 1
3.3%
80 1
3.3%
35 1
3.3%
22 1
3.3%
17 2
6.7%

누적체납금액
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct28
Distinct (%)100.0%
Missing2
Missing (%)6.7%
Infinite0
Infinite (%)0.0%
Mean16246859
Minimum5410
Maximum73287640
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size402.0 B
2023-12-13T04:32:10.392215image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum5410
5-th percentile509557.5
Q12207882.5
median7095880
Q323586870
95-th percentile57307593
Maximum73287640
Range73282230
Interquartile range (IQR)21378988

Descriptive statistics

Standard deviation19651472
Coefficient of variation (CV)1.2095552
Kurtosis2.2456312
Mean16246859
Median Absolute Deviation (MAD)6058080
Skewness1.6444485
Sum4.5491204 × 108
Variance3.8618034 × 1014
MonotonicityNot monotonic
2023-12-13T04:32:10.539808image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
6008480 1
 
3.3%
65859290 1
 
3.3%
1265310 1
 
3.3%
37275790 1
 
3.3%
3983460 1
 
3.3%
1487390 1
 
3.3%
662730 1
 
3.3%
5410 1
 
3.3%
36264150 1
 
3.3%
11235940 1
 
3.3%
Other values (18) 18
60.0%
(Missing) 2
 
6.7%
ValueCountFrequency (%)
5410 1
3.3%
427080 1
3.3%
662730 1
3.3%
1265310 1
3.3%
1487390 1
3.3%
1980460 1
3.3%
2063440 1
3.3%
2256030 1
3.3%
2927280 1
3.3%
3983460 1
3.3%
ValueCountFrequency (%)
73287640 1
3.3%
65859290 1
3.3%
41425870 1
3.3%
37275790 1
3.3%
36264150 1
3.3%
35972960 1
3.3%
25622760 1
3.3%
22908240 1
3.3%
18569190 1
3.3%
14550690 1
3.3%

Unnamed: 10
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing30
Missing (%)100.0%
Memory size402.0 B

Interactions

2023-12-13T04:32:06.698609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:32:05.107815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:32:05.529111image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:32:06.280661image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:32:06.806443image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:32:05.209850image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:32:05.934703image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:32:06.389720image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:32:06.915057image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:32:05.316527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:32:06.054192image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:32:06.487090image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:32:07.058422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:32:05.421915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:32:06.162567image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:32:06.592627image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T04:32:10.643153image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
세목명체납액구간체납건수체납금액누적체납건수누적체납금액
세목명1.0000.0000.2630.0000.5550.000
체납액구간0.0001.0000.0000.7570.0000.650
체납건수0.2630.0001.0000.2820.9940.958
체납금액0.0000.7570.2821.0000.3500.896
누적체납건수0.5550.0000.9940.3501.0000.978
누적체납금액0.0000.6500.9580.8960.9781.000
2023-12-13T04:32:10.771778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시군구명체납액구간과세년도자치단체코드시도명세목명
시군구명1.0001.0001.0001.0001.0001.000
체납액구간1.0001.0001.0001.0001.0000.000
과세년도1.0001.0001.0001.0001.0001.000
자치단체코드1.0001.0001.0001.0001.0001.000
시도명1.0001.0001.0001.0001.0001.000
세목명1.0000.0001.0001.0001.0001.000
2023-12-13T04:32:10.904729image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
체납건수체납금액누적체납건수누적체납금액시도명시군구명자치단체코드과세년도세목명체납액구간
체납건수1.0000.2420.9650.2751.0001.0001.0001.0000.1440.000
체납금액0.2421.0000.2420.9491.0001.0001.0001.0000.0000.511
누적체납건수0.9650.2421.0000.3301.0001.0001.0001.0000.3800.000
누적체납금액0.2750.9490.3301.0001.0001.0001.0001.0000.0000.371
시도명1.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
시군구명1.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
자치단체코드1.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
과세년도1.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
세목명0.1440.0000.3800.0001.0001.0001.0001.0001.0000.000
체납액구간0.0000.5110.0000.3711.0001.0001.0001.0000.0001.000

Missing values

2023-12-13T04:32:07.229435image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T04:32:07.410431image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T04:32:07.594752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

시도명시군구명자치단체코드과세년도세목명체납액구간체납건수체납금액누적체납건수누적체납금액Unnamed: 10
0전라남도강진군468102022등록면허세10만원 미만13617646902302927280<NA>
1전라남도강진군468102022자동차세10만원 미만2561083323053322908240<NA>
2전라남도강진군468102022자동차세10만원~30만원미만1472526356042673287640<NA>
3전라남도강진군468102022자동차세30만원~50만원미만82993030176124480<NA>
4전라남도강진군468102022재산세10만원 미만151323699940305141425870<NA>
5전라남도강진군468102022재산세10만원~30만원미만5796578908013381470<NA>
6전라남도강진군468102022재산세1백만원~3백만원미만8117938702235972960<NA>
7전라남도강진군468102022재산세30만원~50만원미만93488540145591730<NA>
8전라남도강진군468102022재산세3백만원~5백만원미만1310911028067280<NA>
9전라남도강진군468102022재산세50만원~1백만원미만1285070201510671760<NA>
시도명시군구명자치단체코드과세년도세목명체납액구간체납건수체납금액누적체납건수누적체납금액Unnamed: 10
20전라남도강진군468102022지방소득세5백만원~1천만원미만536264150536264150<NA>
21전라남도강진군468102022지역자원시설세10만원 미만1541015410<NA>
22전라남도강진군468102022취득세10만원 미만428046013662730<NA>
23전라남도강진군468102022취득세10만원~30만원미만7118909081487390<NA>
24전라남도강진군468102022취득세1백만원~3백만원미만3398346033983460<NA>
25전라남도강진군468102022취득세1천만원~3천만원미만124894030237275790<NA>
26전라남도강진군468102022취득세30만원~50만원미만3126531031265310<NA>
27전라남도강진군468102022취득세5천만원~1억원미만165859290165859290<NA>
28<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>
29<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

시도명시군구명자치단체코드과세년도세목명체납액구간체납건수체납금액누적체납건수누적체납금액# duplicates
0<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>2