Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells39422
Missing cells (%)78.8%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory468.8 KiB
Average record size in memory48.0 B

Variable types

DateTime1
Text3
Categorical1

Dataset

Description승강기 중대사고 내역 등 정보 제공 ※ 승강기 안전관리법 제48조 및 동법 시행령 제37조제1항에 따른 중대사고 1. 사망자가 발생한 사고 2. 사고 발생일부터 7일 이내에 실시된 의사의 최초 진단 결과 1주 이상의 입원 치료가 필요한 부상자가 발생한 사고 3. 사고 발생일부터 7일 이내에 실시된 의사의 최초 진단 결과 3주 이상의 치료가 필요한 부상자가 발생한 사고
URLhttps://www.data.go.kr/data/15048078/fileData.do

Alerts

Dataset has 1 (< 0.1%) duplicate rowsDuplicates
사고구분 is highly imbalanced (89.0%)Imbalance
발생일시 has 9854 (98.5%) missing valuesMissing
건물명 has 9854 (98.5%) missing valuesMissing
승강기고유번호 has 9860 (98.6%) missing valuesMissing
주소 has 9854 (98.5%) missing valuesMissing

Reproduction

Analysis started2023-12-12 14:57:02.841546
Analysis finished2023-12-12 14:57:03.585797
Duration0.74 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

발생일시
Date

MISSING 

Distinct144
Distinct (%)98.6%
Missing9854
Missing (%)98.5%
Memory size156.2 KiB
Minimum2007-01-11 00:00:00
Maximum2023-07-13 00:00:00
2023-12-12T23:57:03.682768image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T23:57:03.855353image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

건물명
Text

MISSING 

Distinct144
Distinct (%)98.6%
Missing9854
Missing (%)98.5%
Memory size156.2 KiB
2023-12-12T23:57:04.221400image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length14.5
Mean length9.4452055
Min length3

Characters and Unicode

Total characters1379
Distinct characters237
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique142 ?
Unique (%)97.3%

Sample

1st row한국공항공사 김해공항
2nd row이마트 트레이더스 연산점
3rd rowJ빌딩
4th row리맥스관리단
5th row롯데월드
ValueCountFrequency (%)
홈플러스 7
 
2.7%
한국철도공사 6
 
2.3%
서울도시철도공사 6
 
2.3%
이마트 6
 
2.3%
도시철도공사 6
 
2.3%
신세계이마트 5
 
1.9%
대구도시철도공사 4
 
1.6%
7호선 4
 
1.6%
서울 4
 
1.6%
부산교통공사 4
 
1.6%
Other values (185) 205
79.8%
2023-12-12T23:57:04.762785image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
111
 
8.0%
48
 
3.5%
43
 
3.1%
42
 
3.0%
41
 
3.0%
36
 
2.6%
33
 
2.4%
32
 
2.3%
26
 
1.9%
26
 
1.9%
Other values (227) 941
68.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1186
86.0%
Space Separator 111
 
8.0%
Uppercase Letter 30
 
2.2%
Decimal Number 27
 
2.0%
Close Punctuation 8
 
0.6%
Open Punctuation 8
 
0.6%
Other Symbol 5
 
0.4%
Dash Punctuation 3
 
0.2%
Lowercase Letter 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
48
 
4.0%
43
 
3.6%
42
 
3.5%
41
 
3.5%
36
 
3.0%
33
 
2.8%
32
 
2.7%
26
 
2.2%
26
 
2.2%
26
 
2.2%
Other values (197) 833
70.2%
Uppercase Letter
ValueCountFrequency (%)
K 4
13.3%
A 3
10.0%
S 3
10.0%
T 3
10.0%
H 2
 
6.7%
M 2
 
6.7%
L 2
 
6.7%
R 2
 
6.7%
O 1
 
3.3%
I 1
 
3.3%
Other values (7) 7
23.3%
Decimal Number
ValueCountFrequency (%)
2 6
22.2%
7 5
18.5%
6 4
14.8%
1 4
14.8%
3 4
14.8%
5 3
11.1%
8 1
 
3.7%
Space Separator
ValueCountFrequency (%)
111
100.0%
Close Punctuation
ValueCountFrequency (%)
) 8
100.0%
Open Punctuation
ValueCountFrequency (%)
( 8
100.0%
Other Symbol
ValueCountFrequency (%)
5
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
Lowercase Letter
ValueCountFrequency (%)
a 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1191
86.4%
Common 157
 
11.4%
Latin 31
 
2.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
48
 
4.0%
43
 
3.6%
42
 
3.5%
41
 
3.4%
36
 
3.0%
33
 
2.8%
32
 
2.7%
26
 
2.2%
26
 
2.2%
26
 
2.2%
Other values (198) 838
70.4%
Latin
ValueCountFrequency (%)
K 4
12.9%
A 3
 
9.7%
S 3
 
9.7%
T 3
 
9.7%
H 2
 
6.5%
M 2
 
6.5%
L 2
 
6.5%
R 2
 
6.5%
O 1
 
3.2%
I 1
 
3.2%
Other values (8) 8
25.8%
Common
ValueCountFrequency (%)
111
70.7%
) 8
 
5.1%
( 8
 
5.1%
2 6
 
3.8%
7 5
 
3.2%
6 4
 
2.5%
1 4
 
2.5%
3 4
 
2.5%
5 3
 
1.9%
- 3
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1186
86.0%
ASCII 188
 
13.6%
None 5
 
0.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
111
59.0%
) 8
 
4.3%
( 8
 
4.3%
2 6
 
3.2%
7 5
 
2.7%
6 4
 
2.1%
1 4
 
2.1%
3 4
 
2.1%
K 4
 
2.1%
5 3
 
1.6%
Other values (19) 31
 
16.5%
Hangul
ValueCountFrequency (%)
48
 
4.0%
43
 
3.6%
42
 
3.5%
41
 
3.5%
36
 
3.0%
33
 
2.8%
32
 
2.7%
26
 
2.2%
26
 
2.2%
26
 
2.2%
Other values (197) 833
70.2%
None
ValueCountFrequency (%)
5
100.0%

승강기고유번호
Text

MISSING 

Distinct138
Distinct (%)98.6%
Missing9860
Missing (%)98.6%
Memory size156.2 KiB
2023-12-12T23:57:05.172553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters1120
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique136 ?
Unique (%)97.1%

Sample

1st row8803-420
2nd row1811-762
3rd row2130-136
4th row1805-773
5th row1801-949
ValueCountFrequency (%)
5802-916 2
 
1.4%
8804-508 2
 
1.4%
2204-600 1
 
0.7%
8803-073 1
 
0.7%
8075-826 1
 
0.7%
5012-093 1
 
0.7%
2013-382 1
 
0.7%
0020-943 1
 
0.7%
6800-867 1
 
0.7%
5802-130 1
 
0.7%
Other values (128) 128
91.4%
2023-12-12T23:57:05.750248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 238
21.2%
8 173
15.4%
- 140
12.5%
1 103
9.2%
2 91
 
8.1%
3 84
 
7.5%
6 62
 
5.5%
5 61
 
5.4%
7 58
 
5.2%
4 57
 
5.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 980
87.5%
Dash Punctuation 140
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 238
24.3%
8 173
17.7%
1 103
10.5%
2 91
 
9.3%
3 84
 
8.6%
6 62
 
6.3%
5 61
 
6.2%
7 58
 
5.9%
4 57
 
5.8%
9 53
 
5.4%
Dash Punctuation
ValueCountFrequency (%)
- 140
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1120
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 238
21.2%
8 173
15.4%
- 140
12.5%
1 103
9.2%
2 91
 
8.1%
3 84
 
7.5%
6 62
 
5.5%
5 61
 
5.4%
7 58
 
5.2%
4 57
 
5.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1120
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 238
21.2%
8 173
15.4%
- 140
12.5%
1 103
9.2%
2 91
 
8.1%
3 84
 
7.5%
6 62
 
5.5%
5 61
 
5.4%
7 58
 
5.2%
4 57
 
5.1%

주소
Text

MISSING 

Distinct143
Distinct (%)97.9%
Missing9854
Missing (%)98.5%
Memory size156.2 KiB
2023-12-12T23:57:06.278995image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length29
Median length23
Mean length17.39726
Min length12

Characters and Unicode

Total characters2540
Distinct characters196
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique140 ?
Unique (%)95.9%

Sample

1st row부산 강서구 대저2동 2350
2nd row부산 연제구 좌수영로 241
3rd row경기 구리시 검배로 46 (수택동)
4th row경기 구리시 경춘로 239 (인창동)
5th row서울 송파구 올림픽로 240
ValueCountFrequency (%)
서울 42
 
6.5%
경기 30
 
4.7%
부산 24
 
3.7%
대구 17
 
2.6%
중구 10
 
1.6%
북구 7
 
1.1%
충남 6
 
0.9%
대전 6
 
0.9%
부산진구 6
 
0.9%
강남구 5
 
0.8%
Other values (396) 491
76.2%
2023-12-12T23:57:06.940867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
502
 
19.8%
144
 
5.7%
139
 
5.5%
1 113
 
4.4%
2 74
 
2.9%
3 67
 
2.6%
65
 
2.6%
64
 
2.5%
56
 
2.2%
- 53
 
2.1%
Other values (186) 1263
49.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1379
54.3%
Decimal Number 535
 
21.1%
Space Separator 502
 
19.8%
Dash Punctuation 53
 
2.1%
Open Punctuation 35
 
1.4%
Close Punctuation 35
 
1.4%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
144
 
10.4%
139
 
10.1%
65
 
4.7%
64
 
4.6%
56
 
4.1%
46
 
3.3%
44
 
3.2%
40
 
2.9%
39
 
2.8%
39
 
2.8%
Other values (171) 703
51.0%
Decimal Number
ValueCountFrequency (%)
1 113
21.1%
2 74
13.8%
3 67
12.5%
5 51
9.5%
0 47
8.8%
4 45
 
8.4%
8 38
 
7.1%
9 37
 
6.9%
6 35
 
6.5%
7 28
 
5.2%
Space Separator
ValueCountFrequency (%)
502
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 53
100.0%
Open Punctuation
ValueCountFrequency (%)
( 35
100.0%
Close Punctuation
ValueCountFrequency (%)
) 35
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1379
54.3%
Common 1161
45.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
144
 
10.4%
139
 
10.1%
65
 
4.7%
64
 
4.6%
56
 
4.1%
46
 
3.3%
44
 
3.2%
40
 
2.9%
39
 
2.8%
39
 
2.8%
Other values (171) 703
51.0%
Common
ValueCountFrequency (%)
502
43.2%
1 113
 
9.7%
2 74
 
6.4%
3 67
 
5.8%
- 53
 
4.6%
5 51
 
4.4%
0 47
 
4.0%
4 45
 
3.9%
8 38
 
3.3%
9 37
 
3.2%
Other values (5) 134
 
11.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1379
54.3%
ASCII 1161
45.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
502
43.2%
1 113
 
9.7%
2 74
 
6.4%
3 67
 
5.8%
- 53
 
4.6%
5 51
 
4.4%
0 47
 
4.0%
4 45
 
3.9%
8 38
 
3.3%
9 37
 
3.2%
Other values (5) 134
 
11.5%
Hangul
ValueCountFrequency (%)
144
 
10.4%
139
 
10.1%
65
 
4.7%
64
 
4.6%
56
 
4.1%
46
 
3.3%
44
 
3.2%
40
 
2.9%
39
 
2.8%
39
 
2.8%
Other values (171) 703
51.0%

사고구분
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9854 
중대사고
 
146

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9854
98.5%
중대사고 146
 
1.5%

Length

2023-12-12T23:57:07.122082image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:57:07.227620image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9854
98.5%
중대사고 146
 
1.5%

Missing values

2023-12-12T23:57:03.289201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T23:57:03.406764image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T23:57:03.510805image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

발생일시건물명승강기고유번호주소사고구분
40835<NA><NA><NA><NA><NA>
8927<NA><NA><NA><NA><NA>
37400<NA><NA><NA><NA><NA>
25903<NA><NA><NA><NA><NA>
21970<NA><NA><NA><NA><NA>
18939<NA><NA><NA><NA><NA>
45793<NA><NA><NA><NA><NA>
97498<NA><NA><NA><NA><NA>
12968<NA><NA><NA><NA><NA>
20340<NA><NA><NA><NA><NA>
발생일시건물명승강기고유번호주소사고구분
49755<NA><NA><NA><NA><NA>
70041<NA><NA><NA><NA><NA>
35417<NA><NA><NA><NA><NA>
67408<NA><NA><NA><NA><NA>
2235<NA><NA><NA><NA><NA>
2201<NA><NA><NA><NA><NA>
15677<NA><NA><NA><NA><NA>
36684<NA><NA><NA><NA><NA>
43324<NA><NA><NA><NA><NA>
61669<NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

발생일시건물명승강기고유번호주소사고구분# duplicates
0<NA><NA><NA><NA><NA>9854