gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	39422
Missing cells (%)	78.8%
Duplicate rows	1
Duplicate rows (%)	< 0.1%
Total size in memory	468.8 KiB
Average record size in memory	48.0 B

Variable types

DateTime	1
Text	3
Categorical	1

Dataset

Description	승강기 중대사고 내역 등 정보 제공 ※ 승강기 안전관리법 제48조 및 동법 시행령 제37조제1항에 따른 중대사고 1. 사망자가 발생한 사고 2. 사고 발생일부터 7일 이내에 실시된 의사의 최초 진단 결과 1주 이상의 입원 치료가 필요한 부상자가 발생한 사고 3. 사고 발생일부터 7일 이내에 실시된 의사의 최초 진단 결과 3주 이상의 치료가 필요한 부상자가 발생한 사고
URL	https://www.data.go.kr/data/15048078/fileData.do

Alerts

Dataset has 1 (< 0.1%) duplicate rows	Duplicates
`사고구분` is highly imbalanced (89.0%)	Imbalance
`발생일시` has 9854 (98.5%) missing values	Missing
`건물명` has 9854 (98.5%) missing values	Missing
`승강기고유번호` has 9860 (98.6%) missing values	Missing
`주소` has 9854 (98.5%) missing values	Missing

Reproduction

Analysis started	2023-12-12 14:57:02.841546
Analysis finished	2023-12-12 14:57:03.585797
Duration	0.74 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

발생일시
Date

MISSING

Distinct	144
Distinct (%)	98.6%
Missing	9854
Missing (%)	98.5%
Memory size	156.2 KiB

Minimum	2007-01-11 00:00:00
Maximum	2023-07-13 00:00:00

Histogram

Histogram with fixed size bins (bins=50)

건물명
Text

MISSING

Distinct	144
Distinct (%)	98.6%
Missing	9854
Missing (%)	98.5%
Memory size	156.2 KiB

Length

Max length	19
Median length	14.5
Mean length	9.4452055
Min length	3

Characters and Unicode

Total characters	1379
Distinct characters	237
Distinct categories	9 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	142 ?
Unique (%)	97.3%

Sample

1st row	한국공항공사 김해공항
2nd row	이마트 트레이더스 연산점
3rd row	J빌딩
4th row	리맥스관리단
5th row	롯데월드

Value	Count	Frequency (%)
홈플러스	7	2.7%
한국철도공사	6	2.3%
서울도시철도공사	6	2.3%
이마트	6	2.3%
도시철도공사	6	2.3%
신세계이마트	5	1.9%
대구도시철도공사	4	1.6%
7호선	4	1.6%
서울	4	1.6%
부산교통공사	4	1.6%
Other values (185)	205	79.8%

Most occurring characters

Value	Count	Frequency (%)
	111	8.0%
점	48	3.5%
도	43	3.1%
사	42	3.0%
역	41	3.0%
공	36	2.6%
이	33	2.4%
트	32	2.3%
마	26	1.9%
대	26	1.9%
Other values (227)	941	68.2%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	1186	86.0%
Space Separator	111	8.0%
Uppercase Letter	30	2.2%
Decimal Number	27	2.0%
Close Punctuation	8	0.6%
Open Punctuation	8	0.6%
Other Symbol	5	0.4%
Dash Punctuation	3	0.2%
Lowercase Letter	1	0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
점	48	4.0%
도	43	3.6%
사	42	3.5%
역	41	3.5%
공	36	3.0%
이	33	2.8%
트	32	2.7%
마	26	2.2%
대	26	2.2%
철	26	2.2%
Other values (197)	833	70.2%

Uppercase Letter

Value	Count	Frequency (%)
K	4	13.3%
A	3	10.0%
S	3	10.0%
T	3	10.0%
H	2	6.7%
M	2	6.7%
L	2	6.7%
R	2	6.7%
O	1	3.3%
I	1	3.3%
Other values (7)	7	23.3%

Decimal Number

Value	Count	Frequency (%)
2	6	22.2%
7	5	18.5%
6	4	14.8%
1	4	14.8%
3	4	14.8%
5	3	11.1%
8	1	3.7%

Space Separator

Value	Count	Frequency (%)
	111	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	8	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	8	100.0%

Other Symbol

Value	Count	Frequency (%)
㈜	5	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	3	100.0%

Lowercase Letter

Value	Count	Frequency (%)
a	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	1191	86.4%
Common	157	11.4%
Latin	31	2.2%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
점	48	4.0%
도	43	3.6%
사	42	3.5%
역	41	3.4%
공	36	3.0%
이	33	2.8%
트	32	2.7%
마	26	2.2%
대	26	2.2%
철	26	2.2%
Other values (198)	838	70.4%

Latin

Value	Count	Frequency (%)
K	4	12.9%
A	3	9.7%
S	3	9.7%
T	3	9.7%
H	2	6.5%
M	2	6.5%
L	2	6.5%
R	2	6.5%
O	1	3.2%
I	1	3.2%
Other values (8)	8	25.8%

Common

Value	Count	Frequency (%)
	111	70.7%
)	8	5.1%
(	8	5.1%
2	6	3.8%
7	5	3.2%
6	4	2.5%
1	4	2.5%
3	4	2.5%
5	3	1.9%
-	3	1.9%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	1186	86.0%
ASCII	188	13.6%
None	5	0.4%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	111	59.0%
)	8	4.3%
(	8	4.3%
2	6	3.2%
7	5	2.7%
6	4	2.1%
1	4	2.1%
3	4	2.1%
K	4	2.1%
5	3	1.6%
Other values (19)	31	16.5%

Hangul

Value	Count	Frequency (%)
점	48	4.0%
도	43	3.6%
사	42	3.5%
역	41	3.5%
공	36	3.0%
이	33	2.8%
트	32	2.7%
마	26	2.2%
대	26	2.2%
철	26	2.2%
Other values (197)	833	70.2%

None

Value	Count	Frequency (%)
㈜	5	100.0%

승강기고유번호
Text

MISSING

Distinct	138
Distinct (%)	98.6%
Missing	9860
Missing (%)	98.6%
Memory size	156.2 KiB

Length

Max length	8
Median length	8
Mean length	8
Min length	8

Characters and Unicode

Total characters	1120
Distinct characters	11
Distinct categories	2 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	136 ?
Unique (%)	97.1%

Sample

1st row	8803-420
2nd row	1811-762
3rd row	2130-136
4th row	1805-773
5th row	1801-949

Value	Count	Frequency (%)
5802-916	2	1.4%
8804-508	2	1.4%
2204-600	1	0.7%
8803-073	1	0.7%
8075-826	1	0.7%
5012-093	1	0.7%
2013-382	1	0.7%
0020-943	1	0.7%
6800-867	1	0.7%
5802-130	1	0.7%
Other values (128)	128	91.4%

Most occurring characters

Value	Count	Frequency (%)
0	238	21.2%
8	173	15.4%
-	140	12.5%
1	103	9.2%
2	91	8.1%
3	84	7.5%
6	62	5.5%
5	61	5.4%
7	58	5.2%
4	57	5.1%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	980	87.5%
Dash Punctuation	140	12.5%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	238	24.3%
8	173	17.7%
1	103	10.5%
2	91	9.3%
3	84	8.6%
6	62	6.3%
5	61	6.2%
7	58	5.9%
4	57	5.8%
9	53	5.4%

Dash Punctuation

Value	Count	Frequency (%)
-	140	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	1120	100.0%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	238	21.2%
8	173	15.4%
-	140	12.5%
1	103	9.2%
2	91	8.1%
3	84	7.5%
6	62	5.5%
5	61	5.4%
7	58	5.2%
4	57	5.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	1120	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	238	21.2%
8	173	15.4%
-	140	12.5%
1	103	9.2%
2	91	8.1%
3	84	7.5%
6	62	5.5%
5	61	5.4%
7	58	5.2%
4	57	5.1%

주소
Text

MISSING

Distinct	143
Distinct (%)	97.9%
Missing	9854
Missing (%)	98.5%
Memory size	156.2 KiB

Length

Max length	29
Median length	23
Mean length	17.39726
Min length	12

Characters and Unicode

Total characters	2540
Distinct characters	196
Distinct categories	7 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	140 ?
Unique (%)	95.9%

Sample

1st row	부산 강서구 대저2동 2350
2nd row	부산 연제구 좌수영로 241
3rd row	경기 구리시 검배로 46 (수택동)
4th row	경기 구리시 경춘로 239 (인창동)
5th row	서울 송파구 올림픽로 240

Value	Count	Frequency (%)
서울	42	6.5%
경기	30	4.7%
부산	24	3.7%
대구	17	2.6%
중구	10	1.6%
북구	7	1.1%
충남	6	0.9%
대전	6	0.9%
부산진구	6	0.9%
강남구	5	0.8%
Other values (396)	491	76.2%

Most occurring characters

Value	Count	Frequency (%)
	502	19.8%
구	144	5.7%
동	139	5.5%
1	113	4.4%
2	74	2.9%
3	67	2.6%
서	65	2.6%
로	64	2.5%
산	56	2.2%
-	53	2.1%
Other values (186)	1263	49.7%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	1379	54.3%
Decimal Number	535	21.1%
Space Separator	502	19.8%
Dash Punctuation	53	2.1%
Open Punctuation	35	1.4%
Close Punctuation	35	1.4%
Other Punctuation	1	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
구	144	10.4%
동	139	10.1%
서	65	4.7%
로	64	4.6%
산	56	4.1%
시	46	3.3%
울	44	3.2%
대	40	2.9%
경	39	2.8%
부	39	2.8%
Other values (171)	703	51.0%

Decimal Number

Value	Count	Frequency (%)
1	113	21.1%
2	74	13.8%
3	67	12.5%
5	51	9.5%
0	47	8.8%
4	45	8.4%
8	38	7.1%
9	37	6.9%
6	35	6.5%
7	28	5.2%

Space Separator

Value	Count	Frequency (%)
	502	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	53	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	35	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	35	100.0%

Other Punctuation

Value	Count	Frequency (%)
,	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	1379	54.3%
Common	1161	45.7%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
구	144	10.4%
동	139	10.1%
서	65	4.7%
로	64	4.6%
산	56	4.1%
시	46	3.3%
울	44	3.2%
대	40	2.9%
경	39	2.8%
부	39	2.8%
Other values (171)	703	51.0%

Common

Value	Count	Frequency (%)
	502	43.2%
1	113	9.7%
2	74	6.4%
3	67	5.8%
-	53	4.6%
5	51	4.4%
0	47	4.0%
4	45	3.9%
8	38	3.3%
9	37	3.2%
Other values (5)	134	11.5%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	1379	54.3%
ASCII	1161	45.7%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	502	43.2%
1	113	9.7%
2	74	6.4%
3	67	5.8%
-	53	4.6%
5	51	4.4%
0	47	4.0%
4	45	3.9%
8	38	3.3%
9	37	3.2%
Other values (5)	134	11.5%

Hangul

Value	Count	Frequency (%)
구	144	10.4%
동	139	10.1%
서	65	4.7%
로	64	4.6%
산	56	4.1%
시	46	3.3%
울	44	3.2%
대	40	2.9%
경	39	2.8%
부	39	2.8%
Other values (171)	703	51.0%

사고구분
Categorical

IMBALANCE

Distinct	2
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

<NA>	9854
중대사고	146

Length

Max length	4
Median length	4
Mean length	4
Min length	4

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	<NA>
2nd row	<NA>
3rd row	<NA>
4th row	<NA>
5th row	<NA>

Common Values

Value	Count	Frequency (%)
<NA>	9854	98.5%
중대사고	146	1.5%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
na	9854	98.5%
중대사고	146	1.5%

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

First rows
Last rows

	발생일시	건물명	승강기고유번호	주소	사고구분
40835	<NA>	<NA>	<NA>	<NA>	<NA>
8927	<NA>	<NA>	<NA>	<NA>	<NA>
37400	<NA>	<NA>	<NA>	<NA>	<NA>
25903	<NA>	<NA>	<NA>	<NA>	<NA>
21970	<NA>	<NA>	<NA>	<NA>	<NA>
18939	<NA>	<NA>	<NA>	<NA>	<NA>
45793	<NA>	<NA>	<NA>	<NA>	<NA>
97498	<NA>	<NA>	<NA>	<NA>	<NA>
12968	<NA>	<NA>	<NA>	<NA>	<NA>
20340	<NA>	<NA>	<NA>	<NA>	<NA>

	발생일시	건물명	승강기고유번호	주소	사고구분
49755	<NA>	<NA>	<NA>	<NA>	<NA>
70041	<NA>	<NA>	<NA>	<NA>	<NA>
35417	<NA>	<NA>	<NA>	<NA>	<NA>
67408	<NA>	<NA>	<NA>	<NA>	<NA>
2235	<NA>	<NA>	<NA>	<NA>	<NA>
2201	<NA>	<NA>	<NA>	<NA>	<NA>
15677	<NA>	<NA>	<NA>	<NA>	<NA>
36684	<NA>	<NA>	<NA>	<NA>	<NA>
43324	<NA>	<NA>	<NA>	<NA>	<NA>
61669	<NA>	<NA>	<NA>	<NA>	<NA>

Most frequently occurring

	발생일시	건물명	승강기고유번호	주소	사고구분	# duplicates
0	<NA>	<NA>	<NA>	<NA>	<NA>	9854

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Decimal Number

Space Separator

Close Punctuation

Open Punctuation

Other Symbol

Dash Punctuation

Lowercase Letter

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

None

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Dash Punctuation

Most occurring scripts

Most frequent character per script

Common

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Space Separator

Dash Punctuation

Open Punctuation

Close Punctuation

Other Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Common Values

Length

Common Values (Plot)

Missing values

Sample

Duplicate rows

Most frequently occurring