Overview

Dataset statistics

Number of variables9
Number of observations66
Missing cells18
Missing cells (%)3.0%
Duplicate rows1
Duplicate rows (%)1.5%
Total size in memory5.0 KiB
Average record size in memory78.0 B

Variable types

Numeric2
Categorical6
Boolean1

Dataset

Description부산광역시 남구 2017년부터 2019년까지 연도별 납세자유형, 관내/관외 구분, 납세자수와 관련된 지방세 납세자 현황에 대해 세목별로 제공합니다.
URLhttps://www.data.go.kr/data/15078574/fileData.do

Alerts

Dataset has 1 (1.5%) duplicate rowsDuplicates
과세년도 is highly overall correlated with 연번 and 3 other fieldsHigh correlation
납세자유형 is highly overall correlated with 납세자수 and 3 other fieldsHigh correlation
시군구명 is highly overall correlated with 연번 and 7 other fieldsHigh correlation
자치단체코드 is highly overall correlated with 연번 and 7 other fieldsHigh correlation
시도명 is highly overall correlated with 연번 and 7 other fieldsHigh correlation
관내_관외 is highly overall correlated with 시도명 and 2 other fieldsHigh correlation
세목명 is highly overall correlated with 시도명 and 2 other fieldsHigh correlation
연번 is highly overall correlated with 시도명 and 3 other fieldsHigh correlation
납세자수 is highly overall correlated with 시도명 and 3 other fieldsHigh correlation
시도명 is highly imbalanced (56.1%)Imbalance
시군구명 is highly imbalanced (56.1%)Imbalance
자치단체코드 is highly imbalanced (56.1%)Imbalance
연번 has 6 (9.1%) missing valuesMissing
관내_관외 has 6 (9.1%) missing valuesMissing
납세자수 has 6 (9.1%) missing valuesMissing

Reproduction

Analysis started2023-12-12 06:15:12.905504
Analysis finished2023-12-12 06:15:14.224928
Duration1.32 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct60
Distinct (%)100.0%
Missing6
Missing (%)9.1%
Infinite0
Infinite (%)0.0%
Mean30.5
Minimum1
Maximum60
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size726.0 B
2023-12-12T15:15:14.299185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3.95
Q115.75
median30.5
Q345.25
95-th percentile57.05
Maximum60
Range59
Interquartile range (IQR)29.5

Descriptive statistics

Standard deviation17.464249
Coefficient of variation (CV)0.57259833
Kurtosis-1.2
Mean30.5
Median Absolute Deviation (MAD)15
Skewness0
Sum1830
Variance305
MonotonicityStrictly increasing
2023-12-12T15:15:14.473374image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
32 1
 
1.5%
34 1
 
1.5%
35 1
 
1.5%
36 1
 
1.5%
37 1
 
1.5%
38 1
 
1.5%
39 1
 
1.5%
40 1
 
1.5%
41 1
 
1.5%
42 1
 
1.5%
Other values (50) 50
75.8%
(Missing) 6
 
9.1%
ValueCountFrequency (%)
1 1
1.5%
2 1
1.5%
3 1
1.5%
4 1
1.5%
5 1
1.5%
6 1
1.5%
7 1
1.5%
8 1
1.5%
9 1
1.5%
10 1
1.5%
ValueCountFrequency (%)
60 1
1.5%
59 1
1.5%
58 1
1.5%
57 1
1.5%
56 1
1.5%
55 1
1.5%
54 1
1.5%
53 1
1.5%
52 1
1.5%
51 1
1.5%

시도명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size660.0 B
부산광역시
60 
<NA>
 
6

Length

Max length5
Median length5
Mean length4.9090909
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row부산광역시
2nd row부산광역시
3rd row부산광역시
4th row부산광역시
5th row부산광역시

Common Values

ValueCountFrequency (%)
부산광역시 60
90.9%
<NA> 6
 
9.1%

Length

2023-12-12T15:15:14.650643image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:15:14.757572image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
부산광역시 60
90.9%
na 6
 
9.1%

시군구명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size660.0 B
남구
60 
<NA>
 
6

Length

Max length4
Median length2
Mean length2.1818182
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row남구
2nd row남구
3rd row남구
4th row남구
5th row남구

Common Values

ValueCountFrequency (%)
남구 60
90.9%
<NA> 6
 
9.1%

Length

2023-12-12T15:15:14.905066image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:15:15.031440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
남구 60
90.9%
na 6
 
9.1%

자치단체코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size660.0 B
26290
60 
<NA>
 
6

Length

Max length5
Median length5
Mean length4.9090909
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row26290
2nd row26290
3rd row26290
4th row26290
5th row26290

Common Values

ValueCountFrequency (%)
26290 60
90.9%
<NA> 6
 
9.1%

Length

2023-12-12T15:15:15.140944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:15:15.255120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
26290 60
90.9%
na 6
 
9.1%

과세년도
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Memory size660.0 B
2020
31 
2021
29 
<NA>

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020
2nd row2020
3rd row2020
4th row2020
5th row2020

Common Values

ValueCountFrequency (%)
2020 31
47.0%
2021 29
43.9%
<NA> 6
 
9.1%

Length

2023-12-12T15:15:15.372698image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:15:15.482250image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020 31
47.0%
2021 29
43.9%
na 6
 
9.1%

세목명
Categorical

HIGH CORRELATION 

Distinct11
Distinct (%)16.7%
Missing0
Missing (%)0.0%
Memory size660.0 B
등록면허세
지방소득세
지역자원시설세
재산세
주민세
Other values (6)
26 

Length

Max length7
Median length5
Mean length4.2424242
Min length3

Unique

Unique2 ?
Unique (%)3.0%

Sample

1st row등록면허세
2nd row등록면허세
3rd row지방소득세
4th row지방소득세
5th row지방소득세

Common Values

ValueCountFrequency (%)
등록면허세 8
12.1%
지방소득세 8
12.1%
지역자원시설세 8
12.1%
재산세 8
12.1%
주민세 8
12.1%
취득세 8
12.1%
자동차세 8
12.1%
<NA> 6
9.1%
지방소비세 2
 
3.0%
등록세 1
 
1.5%

Length

2023-12-12T15:15:15.643096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
등록면허세 8
12.1%
지방소득세 8
12.1%
지역자원시설세 8
12.1%
재산세 8
12.1%
주민세 8
12.1%
취득세 8
12.1%
자동차세 8
12.1%
na 6
9.1%
지방소비세 2
 
3.0%
등록세 1
 
1.5%

납세자유형
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Memory size660.0 B
법인
31 
개인
29 
<NA>

Length

Max length4
Median length2
Mean length2.1818182
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row법인
2nd row법인
3rd row개인
4th row개인
5th row법인

Common Values

ValueCountFrequency (%)
법인 31
47.0%
개인 29
43.9%
<NA> 6
 
9.1%

Length

2023-12-12T15:15:15.794602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:15:15.907459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
법인 31
47.0%
개인 29
43.9%
na 6
 
9.1%

관내_관외
Boolean

HIGH CORRELATION  MISSING 

Distinct2
Distinct (%)3.3%
Missing6
Missing (%)9.1%
Memory size264.0 B
True
31 
False
29 
(Missing)
ValueCountFrequency (%)
True 31
47.0%
False 29
43.9%
(Missing) 6
 
9.1%
2023-12-12T15:15:16.000638image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

납세자수
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct56
Distinct (%)93.3%
Missing6
Missing (%)9.1%
Infinite0
Infinite (%)0.0%
Mean13545.183
Minimum1
Maximum95702
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size726.0 B
2023-12-12T15:15:16.126267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q1298
median1979.5
Q311109.25
95-th percentile77369.45
Maximum95702
Range95701
Interquartile range (IQR)10811.25

Descriptive statistics

Standard deviation24807.658
Coefficient of variation (CV)1.8314745
Kurtosis3.8193505
Mean13545.183
Median Absolute Deviation (MAD)1953
Skewness2.1968911
Sum812711
Variance6.1541992 × 108
MonotonicityNot monotonic
2023-12-12T15:15:16.613934image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 4
 
6.1%
5 2
 
3.0%
302 1
 
1.5%
659 1
 
1.5%
8367 1
 
1.5%
95702 1
 
1.5%
1121 1
 
1.5%
2415 1
 
1.5%
2940 1
 
1.5%
6201 1
 
1.5%
Other values (46) 46
69.7%
(Missing) 6
 
9.1%
ValueCountFrequency (%)
1 4
6.1%
5 2
3.0%
25 1
 
1.5%
28 1
 
1.5%
29 1
 
1.5%
33 1
 
1.5%
71 1
 
1.5%
80 1
 
1.5%
143 1
 
1.5%
186 1
 
1.5%
ValueCountFrequency (%)
95702 1
1.5%
92072 1
1.5%
80114 1
1.5%
77225 1
1.5%
70461 1
1.5%
66656 1
1.5%
45014 1
1.5%
39636 1
1.5%
32984 1
1.5%
32660 1
1.5%

Interactions

2023-12-12T15:15:13.590766image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:15:13.303145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:15:13.697026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:15:13.416284image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T15:15:16.727183image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번과세년도세목명납세자유형관내_관외납세자수
연번1.0000.9990.8970.0000.0000.000
과세년도0.9991.0000.0000.0000.0000.000
세목명0.8970.0001.0000.0000.0000.413
납세자유형0.0000.0000.0001.0000.0000.607
관내_관외0.0000.0000.0000.0001.0000.437
납세자수0.0000.0000.4130.6070.4371.000
2023-12-12T15:15:16.898870image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
과세년도납세자유형시군구명자치단체코드시도명관내_관외세목명
과세년도1.0000.0001.0001.0001.0000.0000.000
납세자유형0.0001.0001.0001.0001.0000.0000.000
시군구명1.0001.0001.0001.0001.0001.0001.000
자치단체코드1.0001.0001.0001.0001.0001.0001.000
시도명1.0001.0001.0001.0001.0001.0001.000
관내_관외0.0000.0001.0001.0001.0001.0000.000
세목명0.0000.0001.0001.0001.0000.0001.000
2023-12-12T15:15:17.050404image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번납세자수시도명시군구명자치단체코드과세년도세목명납세자유형관내_관외
연번1.0000.0151.0001.0001.0000.8970.4910.0000.000
납세자수0.0151.0001.0001.0001.0000.0000.1920.5730.408
시도명1.0001.0001.0001.0001.0001.0001.0001.0001.000
시군구명1.0001.0001.0001.0001.0001.0001.0001.0001.000
자치단체코드1.0001.0001.0001.0001.0001.0001.0001.0001.000
과세년도0.8970.0001.0001.0001.0001.0000.0000.0000.000
세목명0.4910.1921.0001.0001.0000.0001.0000.0000.000
납세자유형0.0000.5731.0001.0001.0000.0000.0001.0000.000
관내_관외0.0000.4081.0001.0001.0000.0000.0000.0001.000

Missing values

2023-12-12T15:15:13.814869image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:15:13.969794image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T15:15:14.123650image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

연번시도명시군구명자치단체코드과세년도세목명납세자유형관내_관외납세자수
01부산광역시남구262902020등록면허세법인N1891
12부산광역시남구262902020등록면허세법인Y1945
23부산광역시남구262902020지방소득세개인N10121
34부산광역시남구262902020지방소득세개인Y39636
45부산광역시남구262902020지방소득세법인N1017
56부산광역시남구262902020지방소득세법인Y2049
67부산광역시남구262902020지방소비세법인Y1
78부산광역시남구262902020지역자원시설세개인N33
89부산광역시남구262902020지역자원시설세개인Y71
910부산광역시남구262902020지역자원시설세법인N5
연번시도명시군구명자치단체코드과세년도세목명납세자유형관내_관외납세자수
5657부산광역시남구262902021지역자원시설세개인N25
5758부산광역시남구262902021지역자원시설세개인Y80
5859부산광역시남구262902021지역자원시설세법인N5
5960부산광역시남구262902021지역자원시설세법인Y29
60<NA><NA><NA><NA><NA><NA><NA><NA><NA>
61<NA><NA><NA><NA><NA><NA><NA><NA><NA>
62<NA><NA><NA><NA><NA><NA><NA><NA><NA>
63<NA><NA><NA><NA><NA><NA><NA><NA><NA>
64<NA><NA><NA><NA><NA><NA><NA><NA><NA>
65<NA><NA><NA><NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

연번시도명시군구명자치단체코드과세년도세목명납세자유형관내_관외납세자수# duplicates
0<NA><NA><NA><NA><NA><NA><NA><NA><NA>6