Overview

Dataset statistics

Number of variables8
Number of observations38
Missing cells4
Missing cells (%)1.3%
Duplicate rows1
Duplicate rows (%)2.6%
Total size in memory2.6 KiB
Average record size in memory70.5 B

Variable types

Categorical6
Boolean1
Numeric1

Dataset

Description지방세 납세자 현황으로 시도명, 시군구명, 과세년도, 세목명, 납세자유형, 관내/관외 , 납세자수 등을 제공합니다.
URLhttps://www.data.go.kr/data/15079186/fileData.do

Alerts

Dataset has 1 (2.6%) duplicate rowsDuplicates
시도명 is highly overall correlated with 납세자수 and 6 other fieldsHigh correlation
관내_관외 is highly overall correlated with 시도명 and 3 other fieldsHigh correlation
자치단체코드 is highly overall correlated with 납세자수 and 6 other fieldsHigh correlation
시군구명 is highly overall correlated with 납세자수 and 6 other fieldsHigh correlation
과세년도 is highly overall correlated with 납세자수 and 6 other fieldsHigh correlation
세목명 is highly overall correlated with 시도명 and 3 other fieldsHigh correlation
납세자유형 is highly overall correlated with 시도명 and 3 other fieldsHigh correlation
납세자수 is highly overall correlated with 시도명 and 3 other fieldsHigh correlation
시도명 is highly imbalanced (70.3%)Imbalance
시군구명 is highly imbalanced (70.3%)Imbalance
자치단체코드 is highly imbalanced (70.3%)Imbalance
과세년도 is highly imbalanced (70.3%)Imbalance
관내_관외 has 2 (5.3%) missing valuesMissing
납세자수 has 2 (5.3%) missing valuesMissing

Reproduction

Analysis started2023-12-12 16:09:27.709493
Analysis finished2023-12-12 16:09:28.810923
Duration1.1 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시도명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Memory size436.0 B
전라남도
36 
<NA>
 
2

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row전라남도
2nd row전라남도
3rd row전라남도
4th row전라남도
5th row전라남도

Common Values

ValueCountFrequency (%)
전라남도 36
94.7%
<NA> 2
 
5.3%

Length

2023-12-13T01:09:28.888367image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T01:09:29.005258image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
전라남도 36
94.7%
na 2
 
5.3%

시군구명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Memory size436.0 B
강진군
36 
<NA>
 
2

Length

Max length4
Median length3
Mean length3.0526316
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강진군
2nd row강진군
3rd row강진군
4th row강진군
5th row강진군

Common Values

ValueCountFrequency (%)
강진군 36
94.7%
<NA> 2
 
5.3%

Length

2023-12-13T01:09:29.123734image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T01:09:29.242341image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
강진군 36
94.7%
na 2
 
5.3%

자치단체코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Memory size436.0 B
46810
36 
<NA>
 
2

Length

Max length5
Median length5
Mean length4.9473684
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row46810
2nd row46810
3rd row46810
4th row46810
5th row46810

Common Values

ValueCountFrequency (%)
46810 36
94.7%
<NA> 2
 
5.3%

Length

2023-12-13T01:09:29.368325image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T01:09:29.463605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
46810 36
94.7%
na 2
 
5.3%

과세년도
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Memory size436.0 B
2022
36 
<NA>
 
2

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022
2nd row2022
3rd row2022
4th row2022
5th row2022

Common Values

ValueCountFrequency (%)
2022 36
94.7%
<NA> 2
 
5.3%

Length

2023-12-13T01:09:29.579839image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T01:09:29.681522image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2022 36
94.7%
na 2
 
5.3%

세목명
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)31.6%
Missing0
Missing (%)0.0%
Memory size436.0 B
등록세
재산세
주민세
취득세
자동차세
Other values (7)
18 

Length

Max length7
Median length5
Mean length4.0526316
Min length3

Unique

Unique1 ?
Unique (%)2.6%

Sample

1st row등록세
2nd row등록세
3rd row등록세
4th row등록세
5th row레저세

Common Values

ValueCountFrequency (%)
등록세 4
10.5%
재산세 4
10.5%
주민세 4
10.5%
취득세 4
10.5%
자동차세 4
10.5%
등록면허세 4
10.5%
지방소득세 4
10.5%
지역자원시설세 3
7.9%
레저세 2
5.3%
담배소비세 2
5.3%
Other values (2) 3
7.9%

Length

2023-12-13T01:09:29.812968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
등록세 4
10.5%
재산세 4
10.5%
주민세 4
10.5%
취득세 4
10.5%
자동차세 4
10.5%
등록면허세 4
10.5%
지방소득세 4
10.5%
지역자원시설세 3
7.9%
레저세 2
5.3%
담배소비세 2
5.3%
Other values (2) 3
7.9%

납세자유형
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)7.9%
Missing0
Missing (%)0.0%
Memory size436.0 B
개인
19 
법인
17 
<NA>

Length

Max length4
Median length2
Mean length2.1052632
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row개인
2nd row개인
3rd row법인
4th row법인
5th row개인

Common Values

ValueCountFrequency (%)
개인 19
50.0%
법인 17
44.7%
<NA> 2
 
5.3%

Length

2023-12-13T01:09:29.959698image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T01:09:30.093380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
개인 19
50.0%
법인 17
44.7%
na 2
 
5.3%

관내_관외
Boolean

HIGH CORRELATION  MISSING 

Distinct2
Distinct (%)5.6%
Missing2
Missing (%)5.3%
Memory size208.0 B
False
20 
True
16 
(Missing)
 
2
ValueCountFrequency (%)
False 20
52.6%
True 16
42.1%
(Missing) 2
 
5.3%
2023-12-13T01:09:30.183578image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

납세자수
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct33
Distinct (%)91.7%
Missing2
Missing (%)5.3%
Infinite0
Infinite (%)0.0%
Mean2962.4167
Minimum1
Maximum25397
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size474.0 B
2023-12-13T01:09:30.306585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2.5
Q133.25
median597.5
Q31659
95-th percentile17944
Maximum25397
Range25396
Interquartile range (IQR)1625.75

Descriptive statistics

Standard deviation6276.1117
Coefficient of variation (CV)2.1185783
Kurtosis6.9637688
Mean2962.4167
Median Absolute Deviation (MAD)583
Skewness2.7455343
Sum106647
Variance39389578
MonotonicityNot monotonic
2023-12-13T01:09:30.461097image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
3 3
 
7.9%
1 2
 
5.3%
767 1
 
2.6%
73 1
 
2.6%
471 1
 
2.6%
38 1
 
2.6%
2085 1
 
2.6%
6524 1
 
2.6%
715 1
 
2.6%
967 1
 
2.6%
Other values (23) 23
60.5%
(Missing) 2
 
5.3%
ValueCountFrequency (%)
1 2
5.3%
3 3
7.9%
5 1
 
2.6%
8 1
 
2.6%
10 1
 
2.6%
19 1
 
2.6%
38 1
 
2.6%
73 1
 
2.6%
94 1
 
2.6%
202 1
 
2.6%
ValueCountFrequency (%)
25397 1
2.6%
23503 1
2.6%
16091 1
2.6%
11676 1
2.6%
6524 1
2.6%
5105 1
2.6%
3946 1
2.6%
2368 1
2.6%
2085 1
2.6%
1517 1
2.6%

Interactions

2023-12-13T01:09:28.058164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T01:09:30.560342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
세목명납세자유형관내_관외납세자수
세목명1.0000.0000.0000.000
납세자유형0.0001.0000.0000.415
관내_관외0.0000.0001.0000.390
납세자수0.0000.4150.3901.000
2023-12-13T01:09:30.676645image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시도명관내_관외자치단체코드시군구명과세년도세목명납세자유형
시도명1.0001.0001.0001.0001.0001.0001.000
관내_관외1.0001.0001.0001.0001.0000.0000.000
자치단체코드1.0001.0001.0001.0001.0001.0001.000
시군구명1.0001.0001.0001.0001.0001.0001.000
과세년도1.0001.0001.0001.0001.0001.0001.000
세목명1.0000.0001.0001.0001.0001.0000.000
납세자유형1.0000.0001.0001.0001.0000.0001.000
2023-12-13T01:09:30.810627image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
납세자수시도명시군구명자치단체코드과세년도세목명납세자유형관내_관외
납세자수1.0001.0001.0001.0001.0000.0000.2740.256
시도명1.0001.0001.0001.0001.0001.0001.0001.000
시군구명1.0001.0001.0001.0001.0001.0001.0001.000
자치단체코드1.0001.0001.0001.0001.0001.0001.0001.000
과세년도1.0001.0001.0001.0001.0001.0001.0001.000
세목명0.0001.0001.0001.0001.0001.0000.0000.000
납세자유형0.2741.0001.0001.0001.0000.0001.0000.000
관내_관외0.2561.0001.0001.0001.0000.0000.0001.000

Missing values

2023-12-13T01:09:28.167930image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T01:09:28.298400image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T01:09:28.718976image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

시도명시군구명자치단체코드과세년도세목명납세자유형관내_관외납세자수
0전라남도강진군468102022등록세개인N452
1전라남도강진군468102022등록세개인Y516
2전라남도강진군468102022등록세법인N5
3전라남도강진군468102022등록세법인Y19
4전라남도강진군468102022레저세개인N1
5전라남도강진군468102022레저세법인N3
6전라남도강진군468102022재산세개인N25397
7전라남도강진군468102022재산세개인Y23503
8전라남도강진군468102022재산세법인N719
9전라남도강진군468102022재산세법인Y2368
시도명시군구명자치단체코드과세년도세목명납세자유형관내_관외납세자수
28전라남도강진군468102022지방소득세개인N967
29전라남도강진군468102022지방소득세개인Y5105
30전라남도강진군468102022지방소득세법인N305
31전라남도강진군468102022지방소득세법인Y681
32전라남도강진군468102022지방소비세개인N1
33전라남도강진군468102022지역자원시설세개인N3
34전라남도강진군468102022지역자원시설세개인Y10
35전라남도강진군468102022지역자원시설세법인Y8
36<NA><NA><NA><NA><NA><NA><NA><NA>
37<NA><NA><NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

시도명시군구명자치단체코드과세년도세목명납세자유형관내_관외납세자수# duplicates
0<NA><NA><NA><NA><NA><NA><NA><NA>2