Overview

Dataset statistics

Number of variables5
Number of observations58
Missing cells61
Missing cells (%)21.0%
Duplicate rows1
Duplicate rows (%)1.7%
Total size in memory2.4 KiB
Average record size in memory42.3 B

Variable types

Text3
Categorical2

Dataset

Description2016년 대부동 서남부 연결도로 개설공사 관련 보상 진행 중인 필지 현황입니다.
Author안산도시공사
URLhttps://www.data.go.kr/data/15045872/fileData.do

Alerts

Unnamed: 4 has constant value ""Constant
Dataset has 1 (1.7%) duplicate rowsDuplicates
Unnamed: 1 is highly overall correlated with Unnamed: 3High correlation
Unnamed: 3 is highly overall correlated with Unnamed: 1High correlation
Unnamed: 1 is highly imbalanced (78.5%)Imbalance
2016년 대부동 서남부 연결도로 개설공사 보상 필지 현황 has 2 (3.4%) missing valuesMissing
Unnamed: 2 has 2 (3.4%) missing valuesMissing
Unnamed: 4 has 57 (98.3%) missing valuesMissing

Reproduction

Analysis started2023-12-12 04:07:02.613590
Analysis finished2023-12-12 04:07:03.519362
Duration0.91 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct56
Distinct (%)100.0%
Missing2
Missing (%)3.4%
Memory size596.0 B
2023-12-12T13:07:03.762701image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length3
Mean length2.3392857
Min length1

Characters and Unicode

Total characters131
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique56 ?
Unique (%)100.0%

Sample

1st row일련 번호
2nd row1
3rd row1-1
4th row1-2
5th row1-3
ValueCountFrequency (%)
일련 1
 
1.8%
8-1 1
 
1.8%
10 1
 
1.8%
11 1
 
1.8%
12 1
 
1.8%
13 1
 
1.8%
14 1
 
1.8%
15 1
 
1.8%
16 1
 
1.8%
17 1
 
1.8%
Other values (47) 47
82.5%
2023-12-12T13:07:04.165447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 25
19.1%
2 24
18.3%
- 20
15.3%
3 18
13.7%
6
 
4.6%
4 6
 
4.6%
5 6
 
4.6%
6 6
 
4.6%
7 6
 
4.6%
8 4
 
3.1%
Other values (6) 10
 
7.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 101
77.1%
Dash Punctuation 20
 
15.3%
Space Separator 6
 
4.6%
Other Letter 4
 
3.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 25
24.8%
2 24
23.8%
3 18
17.8%
4 6
 
5.9%
5 6
 
5.9%
6 6
 
5.9%
7 6
 
5.9%
8 4
 
4.0%
9 3
 
3.0%
0 3
 
3.0%
Other Letter
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Dash Punctuation
ValueCountFrequency (%)
- 20
100.0%
Space Separator
ValueCountFrequency (%)
6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 127
96.9%
Hangul 4
 
3.1%

Most frequent character per script

Common
ValueCountFrequency (%)
1 25
19.7%
2 24
18.9%
- 20
15.7%
3 18
14.2%
6
 
4.7%
4 6
 
4.7%
5 6
 
4.7%
6 6
 
4.7%
7 6
 
4.7%
8 4
 
3.1%
Other values (2) 6
 
4.7%
Hangul
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 127
96.9%
Hangul 4
 
3.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 25
19.7%
2 24
18.9%
- 20
15.7%
3 18
14.2%
6
 
4.7%
4 6
 
4.7%
5 6
 
4.7%
6 6
 
4.7%
7 6
 
4.7%
8 4
 
3.1%
Other values (2) 6
 
4.7%
Hangul
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Unnamed: 1
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)5.2%
Missing0
Missing (%)0.0%
Memory size596.0 B
안산시 단원구 대부북동
55 
<NA>
 
2
소재지
 
1

Length

Max length12
Median length12
Mean length11.568966
Min length3

Unique

Unique1 ?
Unique (%)1.7%

Sample

1st row<NA>
2nd row소재지
3rd row<NA>
4th row안산시 단원구 대부북동
5th row안산시 단원구 대부북동

Common Values

ValueCountFrequency (%)
안산시 단원구 대부북동 55
94.8%
<NA> 2
 
3.4%
소재지 1
 
1.7%

Length

2023-12-12T13:07:04.318888image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:07:04.433394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
안산시 55
32.7%
단원구 55
32.7%
대부북동 55
32.7%
na 2
 
1.2%
소재지 1
 
0.6%

Unnamed: 2
Text

MISSING 

Distinct36
Distinct (%)64.3%
Missing2
Missing (%)3.4%
Memory size596.0 B
2023-12-12T13:07:04.623908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length7
Mean length6.8571429
Min length2

Characters and Unicode

Total characters384
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique28 ?
Unique (%)50.0%

Sample

1st row지번
2nd row642-237
3rd row642-237
4th row642-237
5th row642-237
ValueCountFrequency (%)
642-237 4
 
7.1%
642-143 4
 
7.1%
642-146 4
 
7.1%
642-243 4
 
7.1%
642-238 4
 
7.1%
642-231 3
 
5.4%
642-233 3
 
5.4%
642-230 2
 
3.6%
1529-6 1
 
1.8%
1529-7 1
 
1.8%
Other values (26) 26
46.4%
2023-12-12T13:07:04.964894image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 88
22.9%
6 59
15.4%
3 59
15.4%
- 55
14.3%
4 48
12.5%
9 27
 
7.0%
1 20
 
5.2%
7 8
 
2.1%
8 8
 
2.1%
0 5
 
1.3%
Other values (3) 7
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 327
85.2%
Dash Punctuation 55
 
14.3%
Other Letter 2
 
0.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 88
26.9%
6 59
18.0%
3 59
18.0%
4 48
14.7%
9 27
 
8.3%
1 20
 
6.1%
7 8
 
2.4%
8 8
 
2.4%
0 5
 
1.5%
5 5
 
1.5%
Other Letter
ValueCountFrequency (%)
1
50.0%
1
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 55
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 382
99.5%
Hangul 2
 
0.5%

Most frequent character per script

Common
ValueCountFrequency (%)
2 88
23.0%
6 59
15.4%
3 59
15.4%
- 55
14.4%
4 48
12.6%
9 27
 
7.1%
1 20
 
5.2%
7 8
 
2.1%
8 8
 
2.1%
0 5
 
1.3%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 382
99.5%
Hangul 2
 
0.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 88
23.0%
6 59
15.4%
3 59
15.4%
- 55
14.4%
4 48
12.6%
9 27
 
7.1%
1 20
 
5.2%
7 8
 
2.1%
8 8
 
2.1%
0 5
 
1.3%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

Unnamed: 3
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)17.2%
Missing0
Missing (%)0.0%
Memory size596.0 B
16 
13 
Other values (5)
10 

Length

Max length4
Median length1
Mean length1.1206897
Min length1

Unique

Unique2 ?
Unique (%)3.4%

Sample

1st row<NA>
2nd row지목
3rd row<NA>
4th row
5th row

Common Values

ValueCountFrequency (%)
16
27.6%
13
22.4%
8
13.8%
6
 
10.3%
5
 
8.6%
4
 
6.9%
<NA> 2
 
3.4%
2
 
3.4%
지목 1
 
1.7%
1
 
1.7%

Length

2023-12-12T13:07:05.116276image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:07:05.254746image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
16
27.6%
13
22.4%
8
13.8%
6
 
10.3%
5
 
8.6%
4
 
6.9%
na 2
 
3.4%
2
 
3.4%
지목 1
 
1.7%
1
 
1.7%

Unnamed: 4
Text

CONSTANT  MISSING 

Distinct1
Distinct (%)100.0%
Missing57
Missing (%)98.3%
Memory size596.0 B
2023-12-12T13:07:05.359909image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters2
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)100.0%

Sample

1st row비고
ValueCountFrequency (%)
비고 1
100.0%
2023-12-12T13:07:05.630795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

Correlations

2023-12-12T13:07:05.751508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2016년 대부동 서남부 연결도로 개설공사 보상 필지 현황Unnamed: 1Unnamed: 2Unnamed: 3
2016년 대부동 서남부 연결도로 개설공사 보상 필지 현황1.0001.0001.0001.000
Unnamed: 11.0001.0001.0001.000
Unnamed: 21.0001.0001.0001.000
Unnamed: 31.0001.0001.0001.000
2023-12-12T13:07:05.893684image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 3
Unnamed: 11.0000.933
Unnamed: 30.9331.000
2023-12-12T13:07:06.036594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 3
Unnamed: 11.0000.933
Unnamed: 30.9331.000

Missing values

2023-12-12T13:07:02.847438image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T13:07:02.972847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T13:07:03.430748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

2016년 대부동 서남부 연결도로 개설공사 보상 필지 현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4
0<NA><NA><NA><NA><NA>
1일련 번호소재지지번지목비고
2<NA><NA><NA><NA><NA>
31안산시 단원구 대부북동642-237<NA>
41-1안산시 단원구 대부북동642-237<NA>
51-2안산시 단원구 대부북동642-237<NA>
61-3안산시 단원구 대부북동642-237<NA>
72안산시 단원구 대부북동642-243<NA>
82-1안산시 단원구 대부북동642-243<NA>
92-2안산시 단원구 대부북동642-243<NA>
2016년 대부동 서남부 연결도로 개설공사 보상 필지 현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4
4826안산시 단원구 대부북동639-234<NA>
4927안산시 단원구 대부북동639-227<NA>
5028안산시 단원구 대부북동639-152<NA>
5129안산시 단원구 대부북동639-237<NA>
5230안산시 단원구 대부북동639-238<NA>
5331안산시 단원구 대부북동639-151<NA>
5432안산시 단원구 대부북동639-236<NA>
5533안산시 단원구 대부북동1529-6<NA>
5634안산시 단원구 대부북동1529-7<NA>
5735안산시 단원구 대부북동1898-2<NA>

Duplicate rows

Most frequently occurring

2016년 대부동 서남부 연결도로 개설공사 보상 필지 현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4# duplicates
0<NA><NA><NA><NA><NA>2