Overview

Dataset statistics

Number of variables9
Number of observations103
Missing cells30
Missing cells (%)3.2%
Duplicate rows1
Duplicate rows (%)1.0%
Total size in memory7.8 KiB
Average record size in memory77.3 B

Variable types

Categorical6
Text1
Numeric2

Dataset

Description지방세 과세를 위해 세원이 되는 과세 대상 유형별 부과된 현황을 제공하며, 물건 유형에 따른 세부담 수준의 형평성 검토 및 부동산 등 관련분야 규제정책 대상 확인 시 기초자료 활용된다.
URLhttps://www.data.go.kr/data/15080261/fileData.do

Alerts

Dataset has 1 (1.0%) duplicate rowsDuplicates
데이터기준일 is highly overall correlated with 부과건수 and 6 other fieldsHigh correlation
세목명 is highly overall correlated with 부과건수 and 4 other fieldsHigh correlation
과세년도 is highly overall correlated with 시도명 and 3 other fieldsHigh correlation
시도명 is highly overall correlated with 부과건수 and 6 other fieldsHigh correlation
시군구명 is highly overall correlated with 부과건수 and 6 other fieldsHigh correlation
자치단체코드 is highly overall correlated with 부과건수 and 6 other fieldsHigh correlation
부과건수 is highly overall correlated with 부과금액 and 5 other fieldsHigh correlation
부과금액 is highly overall correlated with 부과건수 and 4 other fieldsHigh correlation
시도명 is highly imbalanced (54.0%)Imbalance
시군구명 is highly imbalanced (54.0%)Imbalance
자치단체코드 is highly imbalanced (54.0%)Imbalance
데이터기준일 is highly imbalanced (54.0%)Imbalance
세원 유형명 has 10 (9.7%) missing valuesMissing
부과건수 has 10 (9.7%) missing valuesMissing
부과금액 has 10 (9.7%) missing valuesMissing
부과건수 has 25 (24.3%) zerosZeros
부과금액 has 26 (25.2%) zerosZeros

Reproduction

Analysis started2023-12-12 05:41:23.866343
Analysis finished2023-12-12 05:41:24.943461
Duration1.08 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시도명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size956.0 B
울산광역시
93 
<NA>
10 

Length

Max length5
Median length5
Mean length4.9029126
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row울산광역시
2nd row울산광역시
3rd row울산광역시
4th row울산광역시
5th row울산광역시

Common Values

ValueCountFrequency (%)
울산광역시 93
90.3%
<NA> 10
 
9.7%

Length

2023-12-12T14:41:25.007796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:41:25.085155image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
울산광역시 93
90.3%
na 10
 
9.7%

시군구명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size956.0 B
북구
93 
<NA>
10 

Length

Max length4
Median length2
Mean length2.1941748
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row북구
2nd row북구
3rd row북구
4th row북구
5th row북구

Common Values

ValueCountFrequency (%)
북구 93
90.3%
<NA> 10
 
9.7%

Length

2023-12-12T14:41:25.176624image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:41:25.269041image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
북구 93
90.3%
na 10
 
9.7%

자치단체코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size956.0 B
31200
93 
<NA>
10 

Length

Max length5
Median length5
Mean length4.9029126
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row31200
2nd row31200
3rd row31200
4th row31200
5th row31200

Common Values

ValueCountFrequency (%)
31200 93
90.3%
<NA> 10
 
9.7%

Length

2023-12-12T14:41:25.352178image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:41:25.682711image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
31200 93
90.3%
na 10
 
9.7%

과세년도
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size956.0 B
2020
47 
2021
46 
<NA>
10 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020
2nd row2020
3rd row2020
4th row2020
5th row2020

Common Values

ValueCountFrequency (%)
2020 47
45.6%
2021 46
44.7%
<NA> 10
 
9.7%

Length

2023-12-12T14:41:25.777959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:41:25.872924image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020 47
45.6%
2021 46
44.7%
na 10
 
9.7%

세목명
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)13.6%
Missing0
Missing (%)0.0%
Memory size956.0 B
취득세
18 
주민세
16 
자동차세
14 
재산세
10 
<NA>
10 
Other values (9)
35 

Length

Max length7
Median length3
Mean length3.7572816
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row담배소비세
2nd row교육세
3rd row도시계획세
4th row취득세
5th row취득세

Common Values

ValueCountFrequency (%)
취득세 18
17.5%
주민세 16
15.5%
자동차세 14
13.6%
재산세 10
9.7%
<NA> 10
9.7%
레저세 8
7.8%
지방소득세 8
7.8%
지역자원시설세 5
 
4.9%
등록면허세 4
 
3.9%
담배소비세 2
 
1.9%
Other values (4) 8
7.8%

Length

2023-12-12T14:41:25.986542image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
취득세 18
17.5%
주민세 16
15.5%
자동차세 14
13.6%
재산세 10
9.7%
na 10
9.7%
레저세 8
7.8%
지방소득세 8
7.8%
지역자원시설세 5
 
4.9%
등록면허세 4
 
3.9%
담배소비세 2
 
1.9%
Other values (4) 8
7.8%

세원 유형명
Text

MISSING 

Distinct50
Distinct (%)53.8%
Missing10
Missing (%)9.7%
Memory size956.0 B
2023-12-12T14:41:26.241753image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length8
Mean length6.0322581
Min length2

Characters and Unicode

Total characters561
Distinct characters74
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)7.5%

Sample

1st row담배소비세
2nd row교육세
3rd row도시계획세
4th row건축물
5th row주택(개별)
ValueCountFrequency (%)
자동차세(주행 2
 
2.2%
담배소비세 2
 
2.2%
화물 2
 
2.2%
3륜이하 2
 
2.2%
기타승용 2
 
2.2%
승용 2
 
2.2%
지방소비세 2
 
2.2%
등록면허세(면허 2
 
2.2%
특수 2
 
2.2%
지역자원시설세(소방 2
 
2.2%
Other values (40) 73
78.5%
2023-12-12T14:41:26.616399image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
55
 
9.8%
( 49
 
8.7%
) 49
 
8.7%
27
 
4.8%
24
 
4.3%
19
 
3.4%
18
 
3.2%
16
 
2.9%
12
 
2.1%
11
 
2.0%
Other values (64) 281
50.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 461
82.2%
Open Punctuation 49
 
8.7%
Close Punctuation 49
 
8.7%
Decimal Number 2
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
55
 
11.9%
27
 
5.9%
24
 
5.2%
19
 
4.1%
18
 
3.9%
16
 
3.5%
12
 
2.6%
11
 
2.4%
11
 
2.4%
10
 
2.2%
Other values (61) 258
56.0%
Open Punctuation
ValueCountFrequency (%)
( 49
100.0%
Close Punctuation
ValueCountFrequency (%)
) 49
100.0%
Decimal Number
ValueCountFrequency (%)
3 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 461
82.2%
Common 100
 
17.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
55
 
11.9%
27
 
5.9%
24
 
5.2%
19
 
4.1%
18
 
3.9%
16
 
3.5%
12
 
2.6%
11
 
2.4%
11
 
2.4%
10
 
2.2%
Other values (61) 258
56.0%
Common
ValueCountFrequency (%)
( 49
49.0%
) 49
49.0%
3 2
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 461
82.2%
ASCII 100
 
17.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
55
 
11.9%
27
 
5.9%
24
 
5.2%
19
 
4.1%
18
 
3.9%
16
 
3.5%
12
 
2.6%
11
 
2.4%
11
 
2.4%
10
 
2.2%
Other values (61) 258
56.0%
ASCII
ValueCountFrequency (%)
( 49
49.0%
) 49
49.0%
3 2
 
2.0%

부과건수
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct69
Distinct (%)74.2%
Missing10
Missing (%)9.7%
Infinite0
Infinite (%)0.0%
Mean26501.516
Minimum0
Maximum437257
Zeros25
Zeros (%)24.3%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-12T14:41:26.774942image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1162
Q312279
95-th percentile124224.4
Maximum437257
Range437257
Interquartile range (IQR)12279

Descriptive statistics

Standard deviation71133.073
Coefficient of variation (CV)2.6841133
Kurtosis22.955678
Mean26501.516
Median Absolute Deviation (MAD)1162
Skewness4.488098
Sum2464641
Variance5.059914 × 109
MonotonicityNot monotonic
2023-12-12T14:41:26.924219image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 25
24.3%
945 1
 
1.0%
80470 1
 
1.0%
1003 1
 
1.0%
1493 1
 
1.0%
12932 1
 
1.0%
188 1
 
1.0%
15 1
 
1.0%
33947 1
 
1.0%
3553 1
 
1.0%
Other values (59) 59
57.3%
(Missing) 10
 
9.7%
ValueCountFrequency (%)
0 25
24.3%
6 1
 
1.0%
7 1
 
1.0%
10 1
 
1.0%
15 1
 
1.0%
25 1
 
1.0%
31 1
 
1.0%
33 1
 
1.0%
41 1
 
1.0%
43 1
 
1.0%
ValueCountFrequency (%)
437257 1
1.0%
435672 1
1.0%
159248 1
1.0%
157745 1
1.0%
130795 1
1.0%
119844 1
1.0%
119439 1
1.0%
115815 1
1.0%
80470 1
1.0%
79752 1
1.0%

부과금액
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct68
Distinct (%)73.1%
Missing10
Missing (%)9.7%
Infinite0
Infinite (%)0.0%
Mean6.8419459 × 109
Minimum0
Maximum5.9398151 × 1010
Zeros26
Zeros (%)25.2%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-12T14:41:27.095309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median3.11072 × 108
Q37.532908 × 109
95-th percentile3.3536746 × 1010
Maximum5.9398151 × 1010
Range5.9398151 × 1010
Interquartile range (IQR)7.532908 × 109

Descriptive statistics

Standard deviation1.2381122 × 1010
Coefficient of variation (CV)1.8095907
Kurtosis5.344308
Mean6.8419459 × 109
Median Absolute Deviation (MAD)3.11072 × 108
Skewness2.3210964
Sum6.3630097 × 1011
Variance1.5329217 × 1020
MonotonicityNot monotonic
2023-12-12T14:41:27.287346image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 26
25.2%
12691772000 1
 
1.0%
139588000 1
 
1.0%
19527517000 1
 
1.0%
311072000 1
 
1.0%
34720000 1
 
1.0%
428702000 1
 
1.0%
12311000 1
 
1.0%
44495675000 1
 
1.0%
15529136000 1
 
1.0%
Other values (58) 58
56.3%
(Missing) 10
 
9.7%
ValueCountFrequency (%)
0 26
25.2%
5414000 1
 
1.0%
5554000 1
 
1.0%
11579000 1
 
1.0%
12311000 1
 
1.0%
13603000 1
 
1.0%
13676000 1
 
1.0%
15696000 1
 
1.0%
16981000 1
 
1.0%
25476000 1
 
1.0%
ValueCountFrequency (%)
59398151000 1
1.0%
46239683000 1
1.0%
46206814000 1
1.0%
44495675000 1
1.0%
43532300000 1
1.0%
26873043000 1
1.0%
24759156000 1
1.0%
24570776000 1
1.0%
24257537000 1
1.0%
23734377000 1
1.0%

데이터기준일
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size956.0 B
2023-04-04
93 
<NA>
10 

Length

Max length10
Median length10
Mean length9.4174757
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2023-04-04
2nd row2023-04-04
3rd row2023-04-04
4th row2023-04-04
5th row2023-04-04

Common Values

ValueCountFrequency (%)
2023-04-04 93
90.3%
<NA> 10
 
9.7%

Length

2023-12-12T14:41:27.445220image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:41:27.582603image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2023-04-04 93
90.3%
na 10
 
9.7%

Interactions

2023-12-12T14:41:24.422493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:41:24.281035image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:41:24.496840image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:41:24.347366image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T14:41:27.692952image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
과세년도세목명세원 유형명부과건수부과금액
과세년도1.0000.0000.0000.0000.000
세목명0.0001.0001.0000.8400.580
세원 유형명0.0001.0001.0001.0000.913
부과건수0.0000.8401.0001.0000.630
부과금액0.0000.5800.9130.6301.000
2023-12-12T14:41:27.845177image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
데이터기준일세목명과세년도시도명시군구명자치단체코드
데이터기준일1.0001.0001.0001.0001.0001.000
세목명1.0001.0000.0001.0001.0001.000
과세년도1.0000.0001.0001.0001.0001.000
시도명1.0001.0001.0001.0001.0001.000
시군구명1.0001.0001.0001.0001.0001.000
자치단체코드1.0001.0001.0001.0001.0001.000
2023-12-12T14:41:27.972768image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
부과건수부과금액시도명시군구명자치단체코드과세년도세목명데이터기준일
부과건수1.0000.8461.0001.0001.0000.0000.6271.000
부과금액0.8461.0001.0001.0001.0000.0000.3021.000
시도명1.0001.0001.0001.0001.0001.0001.0001.000
시군구명1.0001.0001.0001.0001.0001.0001.0001.000
자치단체코드1.0001.0001.0001.0001.0001.0001.0001.000
과세년도0.0000.0001.0001.0001.0001.0000.0001.000
세목명0.6270.3021.0001.0001.0000.0001.0001.000
데이터기준일1.0001.0001.0001.0001.0001.0001.0001.000

Missing values

2023-12-12T14:41:24.612826image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:41:24.743702image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T14:41:24.855688image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

시도명시군구명자치단체코드과세년도세목명세원 유형명부과건수부과금액데이터기준일
0울산광역시북구312002020담배소비세담배소비세002023-04-04
1울산광역시북구312002020교육세교육세435672190136600002023-04-04
2울산광역시북구312002020도시계획세도시계획세002023-04-04
3울산광역시북구312002020취득세건축물945126917720002023-04-04
4울산광역시북구312002020취득세주택(개별)46635944400002023-04-04
5울산광역시북구312002020취득세주택(단독)4969130643650002023-04-04
6울산광역시북구312002020취득세기타251633900002023-04-04
7울산광역시북구312002020취득세항공기002023-04-04
8울산광역시북구312002020취득세기계장비33254760002023-04-04
9울산광역시북구312002020취득세차량24564123560002023-04-04
시도명시군구명자치단체코드과세년도세목명세원 유형명부과건수부과금액데이터기준일
93<NA><NA><NA><NA><NA><NA><NA><NA><NA>
94<NA><NA><NA><NA><NA><NA><NA><NA><NA>
95<NA><NA><NA><NA><NA><NA><NA><NA><NA>
96<NA><NA><NA><NA><NA><NA><NA><NA><NA>
97<NA><NA><NA><NA><NA><NA><NA><NA><NA>
98<NA><NA><NA><NA><NA><NA><NA><NA><NA>
99<NA><NA><NA><NA><NA><NA><NA><NA><NA>
100<NA><NA><NA><NA><NA><NA><NA><NA><NA>
101<NA><NA><NA><NA><NA><NA><NA><NA><NA>
102<NA><NA><NA><NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

시도명시군구명자치단체코드과세년도세목명세원 유형명부과건수부과금액데이터기준일# duplicates
0<NA><NA><NA><NA><NA><NA><NA><NA><NA>10