Overview

Dataset statistics

Number of variables7
Number of observations29
Missing cells83
Missing cells (%)40.9%
Duplicate rows1
Duplicate rows (%)3.4%
Total size in memory1.7 KiB
Average record size in memory60.4 B

Variable types

Unsupported2
Text3
Categorical2

Dataset

Description생활안전지도 보건안전 학교환경 학교환경위생정화구역정보
Author행정안전부
URLhttps://www.vworld.kr/dtmk/dtmk_ntads_s002.do?dsId=30158

Alerts

Dataset has 1 (3.4%) duplicate rowsDuplicates
Unnamed: 3 is highly overall correlated with Unnamed: 6High correlation
Unnamed: 6 is highly overall correlated with Unnamed: 3High correlation
Unnamed: 1 has 19 (65.5%) missing valuesMissing
Unnamed: 2 has 18 (62.1%) missing valuesMissing
Unnamed: 4 has 19 (65.5%) missing valuesMissing
Unnamed: 5 has 27 (93.1%) missing valuesMissing
테이블정의서 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-21 00:58:05.511781
Analysis finished2024-04-21 00:58:06.414197
Duration0.9 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

테이블정의서
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size360.0 B

Unnamed: 1
Text

MISSING 

Distinct10
Distinct (%)100.0%
Missing19
Missing (%)65.5%
Memory size360.0 B
2024-04-21T09:58:06.805555image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length40
Median length8.5
Mean length10.5
Min length4

Characters and Unicode

Total characters105
Distinct characters49
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)100.0%

Sample

1st row컬럼ID
2nd rowOBJT_ID
3rd rowPRPOS_CD
4th rowNTFC_YEAR
5th rowNTFC_NO
ValueCountFrequency (%)
2
 
10.5%
objt_id 1
 
5.3%
통합코드 1
 
5.3%
구별없이 1
 
5.3%
일반시구 1
 
5.3%
제공기관기준 1
 
5.3%
1
 
5.3%
관리번호 1
 
5.3%
정화구역 1
 
5.3%
컬럼id 1
 
5.3%
Other values (8) 8
42.1%
2024-04-21T09:58:07.521863image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 8
 
7.6%
8
 
7.6%
C 7
 
6.7%
N 7
 
6.7%
D 5
 
4.8%
T 5
 
4.8%
G 4
 
3.8%
R 4
 
3.8%
P 4
 
3.8%
S 3
 
2.9%
Other values (39) 50
47.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 55
52.4%
Other Letter 30
28.6%
Connector Punctuation 8
 
7.6%
Space Separator 8
 
7.6%
Other Punctuation 3
 
2.9%
Control 1
 
1.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3
 
10.0%
2
 
6.7%
2
 
6.7%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
Other values (16) 16
53.3%
Uppercase Letter
ValueCountFrequency (%)
C 7
12.7%
N 7
12.7%
D 5
9.1%
T 5
9.1%
G 4
 
7.3%
R 4
 
7.3%
P 4
 
7.3%
S 3
 
5.5%
O 3
 
5.5%
M 2
 
3.6%
Other values (8) 11
20.0%
Other Punctuation
ValueCountFrequency (%)
2
66.7%
: 1
33.3%
Connector Punctuation
ValueCountFrequency (%)
_ 8
100.0%
Space Separator
ValueCountFrequency (%)
8
100.0%
Control
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 55
52.4%
Hangul 30
28.6%
Common 20
 
19.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3
 
10.0%
2
 
6.7%
2
 
6.7%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
Other values (16) 16
53.3%
Latin
ValueCountFrequency (%)
C 7
12.7%
N 7
12.7%
D 5
9.1%
T 5
9.1%
G 4
 
7.3%
R 4
 
7.3%
P 4
 
7.3%
S 3
 
5.5%
O 3
 
5.5%
M 2
 
3.6%
Other values (8) 11
20.0%
Common
ValueCountFrequency (%)
_ 8
40.0%
8
40.0%
2
 
10.0%
1
 
5.0%
: 1
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 73
69.5%
Hangul 30
28.6%
Punctuation 2
 
1.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 8
11.0%
8
11.0%
C 7
 
9.6%
N 7
 
9.6%
D 5
 
6.8%
T 5
 
6.8%
G 4
 
5.5%
R 4
 
5.5%
P 4
 
5.5%
S 3
 
4.1%
Other values (12) 18
24.7%
Hangul
ValueCountFrequency (%)
3
 
10.0%
2
 
6.7%
2
 
6.7%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
1
 
3.3%
Other values (16) 16
53.3%
Punctuation
ValueCountFrequency (%)
2
100.0%

Unnamed: 2
Text

MISSING 

Distinct11
Distinct (%)100.0%
Missing18
Missing (%)62.1%
Memory size360.0 B
2024-04-21T09:58:08.078508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length31
Median length4
Mean length8.6363636
Min length3

Characters and Unicode

Total characters95
Distinct characters53
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)100.0%

Sample

1st rowA2SM_SchulEnvrnSnitatPrfctnZone
2nd row생활안전지도 보건안전 학교환경 학교환경위생정화구역정보
3rd row컬럼명
4th row일련번호
5th row용도코드
ValueCountFrequency (%)
a2sm_schulenvrnsnitatprfctnzone 1
 
7.1%
생활안전지도 1
 
7.1%
보건안전 1
 
7.1%
학교환경 1
 
7.1%
학교환경위생정화구역정보 1
 
7.1%
컬럼명 1
 
7.1%
일련번호 1
 
7.1%
용도코드 1
 
7.1%
고시년도 1
 
7.1%
고시번호 1
 
7.1%
Other values (4) 4
28.6%
2024-04-21T09:58:08.934306image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6
 
6.3%
5
 
5.3%
n 5
 
5.3%
S 3
 
3.2%
t 3
 
3.2%
3
 
3.2%
3
 
3.2%
3
 
3.2%
3
 
3.2%
3
 
3.2%
Other values (43) 58
61.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 61
64.2%
Lowercase Letter 21
 
22.1%
Uppercase Letter 8
 
8.4%
Space Separator 3
 
3.2%
Connector Punctuation 1
 
1.1%
Decimal Number 1
 
1.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6
 
9.8%
5
 
8.2%
3
 
4.9%
3
 
4.9%
3
 
4.9%
3
 
4.9%
2
 
3.3%
2
 
3.3%
2
 
3.3%
2
 
3.3%
Other values (21) 30
49.2%
Lowercase Letter
ValueCountFrequency (%)
n 5
23.8%
t 3
14.3%
c 2
 
9.5%
r 2
 
9.5%
h 1
 
4.8%
u 1
 
4.8%
l 1
 
4.8%
v 1
 
4.8%
i 1
 
4.8%
a 1
 
4.8%
Other values (3) 3
14.3%
Uppercase Letter
ValueCountFrequency (%)
S 3
37.5%
M 1
 
12.5%
E 1
 
12.5%
P 1
 
12.5%
Z 1
 
12.5%
A 1
 
12.5%
Space Separator
ValueCountFrequency (%)
3
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%
Decimal Number
ValueCountFrequency (%)
2 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 61
64.2%
Latin 29
30.5%
Common 5
 
5.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6
 
9.8%
5
 
8.2%
3
 
4.9%
3
 
4.9%
3
 
4.9%
3
 
4.9%
2
 
3.3%
2
 
3.3%
2
 
3.3%
2
 
3.3%
Other values (21) 30
49.2%
Latin
ValueCountFrequency (%)
n 5
17.2%
S 3
 
10.3%
t 3
 
10.3%
c 2
 
6.9%
r 2
 
6.9%
M 1
 
3.4%
h 1
 
3.4%
u 1
 
3.4%
l 1
 
3.4%
E 1
 
3.4%
Other values (9) 9
31.0%
Common
ValueCountFrequency (%)
3
60.0%
_ 1
 
20.0%
2 1
 
20.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 61
64.2%
ASCII 34
35.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
6
 
9.8%
5
 
8.2%
3
 
4.9%
3
 
4.9%
3
 
4.9%
3
 
4.9%
2
 
3.3%
2
 
3.3%
2
 
3.3%
2
 
3.3%
Other values (21) 30
49.2%
ASCII
ValueCountFrequency (%)
n 5
14.7%
S 3
 
8.8%
t 3
 
8.8%
3
 
8.8%
c 2
 
5.9%
r 2
 
5.9%
M 1
 
2.9%
_ 1
 
2.9%
h 1
 
2.9%
u 1
 
2.9%
Other values (12) 12
35.3%

Unnamed: 3
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)17.2%
Missing0
Missing (%)0.0%
Memory size360.0 B
<NA>
19 
VARCHAR2
테이블명
 
1
데이터 타입
 
1
NUMBER
 
1

Length

Max length8
Median length4
Mean length5.1034483
Min length4

Unique

Unique3 ?
Unique (%)10.3%

Sample

1st row테이블명
2nd row<NA>
3rd row데이터 타입
4th rowNUMBER
5th rowVARCHAR2

Common Values

ValueCountFrequency (%)
<NA> 19
65.5%
VARCHAR2 7
 
24.1%
테이블명 1
 
3.4%
데이터 타입 1
 
3.4%
NUMBER 1
 
3.4%

Length

2024-04-21T09:58:09.162442image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T09:58:09.361130image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 19
63.3%
varchar2 7
 
23.3%
테이블명 1
 
3.3%
데이터 1
 
3.3%
타입 1
 
3.3%
number 1
 
3.3%

Unnamed: 4
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing19
Missing (%)65.5%
Memory size360.0 B

Unnamed: 5
Text

MISSING 

Distinct2
Distinct (%)100.0%
Missing27
Missing (%)93.1%
Memory size360.0 B
2024-04-21T09:58:09.750962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length3
Median length2.5
Mean length2.5
Min length2

Characters and Unicode

Total characters5
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)100.0%

Sample

1st rowKey
2nd rowPK
ValueCountFrequency (%)
key 1
50.0%
pk 1
50.0%
2024-04-21T09:58:10.364978image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
K 2
40.0%
e 1
20.0%
y 1
20.0%
P 1
20.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 3
60.0%
Lowercase Letter 2
40.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
K 2
66.7%
P 1
33.3%
Lowercase Letter
ValueCountFrequency (%)
e 1
50.0%
y 1
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
K 2
40.0%
e 1
20.0%
y 1
20.0%
P 1
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
K 2
40.0%
e 1
20.0%
y 1
20.0%
P 1
20.0%

Unnamed: 6
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)10.3%
Missing0
Missing (%)0.0%
Memory size360.0 B
<NA>
23 
NOT NULL
NULL여부
 
1

Length

Max length8
Median length4
Mean length4.7586207
Min length4

Unique

Unique1 ?
Unique (%)3.4%

Sample

1st row<NA>
2nd row<NA>
3rd rowNULL여부
4th rowNOT NULL
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 23
79.3%
NOT NULL 5
 
17.2%
NULL여부 1
 
3.4%

Length

2024-04-21T09:58:10.588300image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T09:58:10.775723image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 23
67.6%
not 5
 
14.7%
null 5
 
14.7%
null여부 1
 
2.9%

Correlations

2024-04-21T09:58:10.894580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 5Unnamed: 6
Unnamed: 11.0001.0001.0000.0001.000
Unnamed: 21.0001.0001.0000.0001.000
Unnamed: 31.0001.0001.0000.0001.000
Unnamed: 50.0000.0000.0001.0000.000
Unnamed: 61.0001.0001.0000.0001.000
2024-04-21T09:58:11.162596image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 6Unnamed: 3
Unnamed: 61.0000.866
Unnamed: 30.8661.000
2024-04-21T09:58:11.388423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 3Unnamed: 6
Unnamed: 31.0000.866
Unnamed: 60.8661.000

Missing values

2024-04-21T09:58:05.882027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-21T09:58:06.081965image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-21T09:58:06.279550image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

테이블정의서Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6
0테이블ID<NA>A2SM_SchulEnvrnSnitatPrfctnZone테이블명교육환경보호구역<NA><NA>
1테이블설명<NA>생활안전지도 보건안전 학교환경 학교환경위생정화구역정보<NA>NaN<NA><NA>
2No.컬럼ID컬럼명데이터 타입길이KeyNULL여부
31OBJT_ID일련번호NUMBER10PKNOT NULL
42PRPOS_CD용도코드VARCHAR26<NA><NA>
53NTFC_YEAR고시년도VARCHAR24<NA><NA>
64NTFC_NO고시번호VARCHAR24<NA><NA>
75CTPRVN_NM시도명VARCHAR220<NA>NOT NULL
86SGG_NM시군구명VARCHAR220<NA>NOT NULL
97CTPRVN_CD시도코드VARCHAR22<NA>NOT NULL
테이블정의서Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6
1917<NA><NA><NA>NaN<NA><NA>
2018<NA><NA><NA>NaN<NA><NA>
2119<NA><NA><NA>NaN<NA><NA>
2220<NA><NA><NA>NaN<NA><NA>
2321<NA><NA><NA>NaN<NA><NA>
2422<NA><NA><NA>NaN<NA><NA>
2523<NA><NA><NA>NaN<NA><NA>
2624<NA><NA><NA>NaN<NA><NA>
2725<NA><NA><NA>NaN<NA><NA>
28기타※ 정화구역 관리번호 : 제공기관기준 ※ 일반시구 구별없이 통합코드 사용<NA><NA>NaN<NA><NA>

Duplicate rows

Most frequently occurring

Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 5Unnamed: 6# duplicates
0<NA><NA><NA><NA><NA>17