Overview

Dataset statistics

Number of variables5
Number of observations28
Missing cells45
Missing cells (%)32.1%
Duplicate rows10
Duplicate rows (%)35.7%
Total size in memory1.2 KiB
Average record size in memory45.7 B

Variable types

Text2
Unsupported2
Categorical1

Dataset

Description공간정보통합시스템의 도시기준점에 대한 데이터로 관리번호, 종좌표, 횡좌표, 표고 및 매설점성과기록부을 제공합니다.
Author공공데이터포털
URLhttps://www.data.go.kr/data/15117667/fileData.do

Alerts

Dataset has 10 (35.7%) duplicate rowsDuplicates
Unnamed: 2 is highly imbalanced (62.2%)Imbalance
매 설 점 의 조 서 has 7 (25.0%) missing valuesMissing
Unnamed: 1 has 12 (42.9%) missing valuesMissing
Unnamed: 3 has 12 (42.9%) missing valuesMissing
Unnamed: 4 has 14 (50.0%) missing valuesMissing
Unnamed: 1 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-17 11:19:48.200311
Analysis finished2024-04-17 11:19:48.569503
Duration0.37 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct11
Distinct (%)52.4%
Missing7
Missing (%)25.0%
Memory size356.0 B
2024-04-17T20:19:48.684847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length15
Mean length7.8571429
Min length4

Characters and Unicode

Total characters165
Distinct characters37
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)4.8%

Sample

1st row점의명칭
2nd row도엽번호
3rd row소 재 지
4th row계획기관
5th row매설연도
ValueCountFrequency (%)
3
 
6.5%
3
 
6.5%
점의명칭 2
 
4.3%
80 2
 
4.3%
2
 
4.3%
2
 
4.3%
2
 
4.3%
2
 
4.3%
2
 
4.3%
도엽번호 2
 
4.3%
Other values (14) 24
52.2%
2024-04-17T20:19:48.953158image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
77
46.7%
8
 
4.8%
5
 
3.0%
5
 
3.0%
4
 
2.4%
4
 
2.4%
3
 
1.8%
3
 
1.8%
2
 
1.2%
2
 
1.2%
Other values (27) 52
31.5%

Most occurring categories

ValueCountFrequency (%)
Space Separator 77
46.7%
Other Letter 72
43.6%
Uppercase Letter 6
 
3.6%
Decimal Number 4
 
2.4%
Close Punctuation 2
 
1.2%
Open Punctuation 2
 
1.2%
Control 2
 
1.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8
 
11.1%
5
 
6.9%
5
 
6.9%
4
 
5.6%
4
 
5.6%
3
 
4.2%
3
 
4.2%
2
 
2.8%
2
 
2.8%
2
 
2.8%
Other values (18) 34
47.2%
Uppercase Letter
ValueCountFrequency (%)
S 2
33.3%
R 2
33.3%
G 2
33.3%
Decimal Number
ValueCountFrequency (%)
0 2
50.0%
8 2
50.0%
Space Separator
ValueCountFrequency (%)
77
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Control
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 87
52.7%
Hangul 72
43.6%
Latin 6
 
3.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8
 
11.1%
5
 
6.9%
5
 
6.9%
4
 
5.6%
4
 
5.6%
3
 
4.2%
3
 
4.2%
2
 
2.8%
2
 
2.8%
2
 
2.8%
Other values (18) 34
47.2%
Common
ValueCountFrequency (%)
77
88.5%
) 2
 
2.3%
0 2
 
2.3%
8 2
 
2.3%
( 2
 
2.3%
2
 
2.3%
Latin
ValueCountFrequency (%)
S 2
33.3%
R 2
33.3%
G 2
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 93
56.4%
Hangul 72
43.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
77
82.8%
) 2
 
2.2%
0 2
 
2.2%
8 2
 
2.2%
S 2
 
2.2%
R 2
 
2.2%
G 2
 
2.2%
( 2
 
2.2%
2
 
2.2%
Hangul
ValueCountFrequency (%)
8
 
11.1%
5
 
6.9%
5
 
6.9%
4
 
5.6%
4
 
5.6%
3
 
4.2%
3
 
4.2%
2
 
2.8%
2
 
2.8%
2
 
2.8%
Other values (18) 34
47.2%

Unnamed: 1
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing12
Missing (%)42.9%
Memory size356.0 B

Unnamed: 2
Categorical

IMBALANCE 

Distinct5
Distinct (%)17.9%
Missing0
Missing (%)0.0%
Memory size356.0 B
<NA>
24 
239236.0481
 
1
261286.4507
 
1
239867.8075
 
1
261851.6631
 
1

Length

Max length11
Median length4
Mean length5
Min length4

Unique

Unique4 ?
Unique (%)14.3%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 24
85.7%
239236.0481 1
 
3.6%
261286.4507 1
 
3.6%
239867.8075 1
 
3.6%
261851.6631 1
 
3.6%

Length

2024-04-17T20:19:49.069576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-17T20:19:49.167229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 24
85.7%
239236.0481 1
 
3.6%
261286.4507 1
 
3.6%
239867.8075 1
 
3.6%
261851.6631 1
 
3.6%

Unnamed: 3
Text

MISSING 

Distinct8
Distinct (%)50.0%
Missing12
Missing (%)42.9%
Memory size356.0 B
2024-04-17T20:19:49.308282image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length9
Mean length6.75
Min length4

Characters and Unicode

Total characters108
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row번 호
2nd row도 엽 명
3rd row표석상황
4th row매 설 자
5th row관 측 자
ValueCountFrequency (%)
4
 
11.1%
4
 
11.1%
4
 
11.1%
2
 
5.6%
2
 
5.6%
2
 
5.6%
2
 
5.6%
2
 
5.6%
표석상황 2
 
5.6%
2
 
5.6%
Other values (5) 10
27.8%
2024-04-17T20:19:49.563718image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
44
40.7%
6
 
5.6%
4
 
3.7%
4
 
3.7%
4
 
3.7%
4
 
3.7%
4
 
3.7%
2
 
1.9%
2
 
1.9%
2
 
1.9%
Other values (16) 32
29.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 60
55.6%
Space Separator 44
40.7%
Close Punctuation 2
 
1.9%
Open Punctuation 2
 
1.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6
 
10.0%
4
 
6.7%
4
 
6.7%
4
 
6.7%
4
 
6.7%
4
 
6.7%
2
 
3.3%
2
 
3.3%
2
 
3.3%
2
 
3.3%
Other values (13) 26
43.3%
Space Separator
ValueCountFrequency (%)
44
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 60
55.6%
Common 48
44.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6
 
10.0%
4
 
6.7%
4
 
6.7%
4
 
6.7%
4
 
6.7%
4
 
6.7%
2
 
3.3%
2
 
3.3%
2
 
3.3%
2
 
3.3%
Other values (13) 26
43.3%
Common
ValueCountFrequency (%)
44
91.7%
) 2
 
4.2%
( 2
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 60
55.6%
ASCII 48
44.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
44
91.7%
) 2
 
4.2%
( 2
 
4.2%
Hangul
ValueCountFrequency (%)
6
 
10.0%
4
 
6.7%
4
 
6.7%
4
 
6.7%
4
 
6.7%
4
 
6.7%
2
 
3.3%
2
 
3.3%
2
 
3.3%
2
 
3.3%
Other values (13) 26
43.3%

Unnamed: 4
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing14
Missing (%)50.0%
Memory size356.0 B

Correlations

2024-04-17T20:19:49.639343image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
매 설 점 의 조 서Unnamed: 2Unnamed: 3
매 설 점 의 조 서1.000NaN1.000
Unnamed: 2NaN1.0001.000
Unnamed: 31.0001.0001.000

Missing values

2024-04-17T20:19:48.335956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-17T20:19:48.418213image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-17T20:19:48.508434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

매 설 점 의 조 서Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4
0점의명칭공공(삼각,수준)<NA>번 호NO.2-1
1도엽번호NI 52-6-10<NA>도 엽 명여 수
2소 재 지전라남도 여수시 웅천동<NA><NA>NaN
3계획기관NaN<NA>표석상황동 판
4매설연도2013.06<NA>매 설 자유 제 성
5관측연도2013.06<NA>관 측 자김 병 관
6성 과 (GRS 80)X(m)239236.0481해발고도(정표고)4.0604
7<NA>Y(m)261286.4507좌표원점중 부
8경 로여수 웅천지구 캠핑장내 야외공연장 뒷편 화단에 위치함.<NA><NA>NaN
9<NA>NaN<NA><NA>NaN
매 설 점 의 조 서Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4
18계획기관NaN<NA>표석상황동 판
19매설연도2013.06<NA>매 설 자유 제 성
20관측연도2013.06<NA>관 측 자김 병 관
21성 과 (GRS 80)X(m)239867.8075해발고도(정표고)52.894
22<NA>Y(m)261851.6631좌표원점중 부
23경 로여수세관 입구 화단 우측 끝지점에 위치함.<NA><NA>NaN
24<NA>NaN<NA><NA>NaN
25약 도NaN<NA><NA>NaN
26<NA>NaN<NA><NA>NaN
27매 설 상 황NaN<NA>관 측 사 진NaN

Duplicate rows

Most frequently occurring

매 설 점 의 조 서Unnamed: 2Unnamed: 3# duplicates
9<NA><NA><NA>5
0경 로<NA><NA>2
1계획기관<NA>표석상황2
2관측연도<NA>관 측 자2
3도엽번호<NA>도 엽 명2
4매 설 상 황<NA>관 측 사 진2
5매설연도<NA>매 설 자2
6소 재 지<NA><NA>2
7약 도<NA><NA>2
8점의명칭<NA>번 호2