gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	657
Missing cells	445
Missing cells (%)	16.9%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	20.7 KiB
Average record size in memory	32.2 B

Variable types

Categorical	1
Text	3

Dataset

Description	인천광역시 교통행정종합관리시스템 시스템 관련 코드 데이터로 (상세코드, 상세코드명1, 상세코드명2)로 구성되어 있습니다.
Author	인천광역시
URL	https://data.incheon.go.kr/findData/publicDataDetail?dataId=15049214&srcSe=7661IVAWM27C61E190

Alerts

상세코드명2 has 445 (67.7%) missing values Missing

Reproduction

Analysis started	2024-01-28 09:42:53.176100
Analysis finished	2024-01-28 09:42:53.591813
Duration	0.42 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

마스터코드
Categorical

Distinct	47
Distinct (%)	7.2%
Missing	0
Missing (%)	0.0%
Memory size	5.3 KiB

TFA005	47
COM021	44
TFA028	44
TFA022	44
TFA023	38
Other values (42)	440

Length

Max length	6
Median length	6
Mean length	5.9939117
Min length	5

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	TFA026
2nd row	TFA006
3rd row	COM015
4th row	COM015
5th row	TFA005

Common Values

Value	Count	Frequency (%)
TFA005	47	7.2%
COM021	44	6.7%
TFA028	44	6.7%
TFA022	44	6.7%
TFA023	38	5.8%
TFA031	30	4.6%
TFA006	28	4.3%
COM027	21	3.2%
NIS002	20	3.0%
TFA024	20	3.0%
Other values (37)	321	48.9%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
tfa005	47	7.2%
tfa022	44	6.7%
com021	44	6.7%
tfa028	44	6.7%
tfa023	38	5.8%
tfa031	30	4.6%
tfa006	28	4.3%
com027	21	3.2%
nis002	20	3.0%
tfa024	20	3.0%
Other values (37)	321	48.9%

상세코드
Text

Distinct	202
Distinct (%)	30.7%
Missing	0
Missing (%)	0.0%
Memory size	5.3 KiB

Length

Max length	11
Median length	2
Mean length	2.1933029
Min length	1

Characters and Unicode

Total characters	1441
Distinct characters	39
Distinct categories	4 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	119 ?
Unique (%)	18.1%

Sample

1st row	5
2nd row	21
3rd row	1
4th row	2
5th row	1

Value	Count	Frequency (%)
2	24	3.7%
1	23	3.5%
5	22	3.3%
3	20	3.0%
4	20	3.0%
6	20	3.0%
11	17	2.6%
12	16	2.4%
7	16	2.4%
8	14	2.1%
Other values (192)	465	70.8%

Most occurring characters

Value	Count	Frequency (%)
0	277	19.2%
1	235	16.3%
2	218	15.1%
3	131	9.1%
5	108	7.5%
4	95	6.6%
9	88	6.1%
6	81	5.6%
8	80	5.6%
7	61	4.2%
Other values (29)	67	4.6%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	1374	95.4%
Uppercase Letter	55	3.8%
Other Letter	8	0.6%
Connector Punctuation	4	0.3%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
A	8	14.5%
C	5	9.1%
F	4	7.3%
P	3	5.5%
N	3	5.5%
I	3	5.5%
T	3	5.5%
B	3	5.5%
V	3	5.5%
M	3	5.5%
Other values (10)	17	30.9%

Decimal Number

Value	Count	Frequency (%)
0	277	20.2%
1	235	17.1%
2	218	15.9%
3	131	9.5%
5	108	7.9%
4	95	6.9%
9	88	6.4%
6	81	5.9%
8	80	5.8%
7	61	4.4%

Other Letter

Value	Count	Frequency (%)
체	1	12.5%
코	1	12.5%
분	1	12.5%
구	1	12.5%
자	1	12.5%
단	1	12.5%
치	1	12.5%
드	1	12.5%

Connector Punctuation

Value	Count	Frequency (%)
_	4	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	1378	95.6%
Latin	55	3.8%
Hangul	8	0.6%

Most frequent character per script

Latin

Value	Count	Frequency (%)
A	8	14.5%
C	5	9.1%
F	4	7.3%
P	3	5.5%
N	3	5.5%
I	3	5.5%
T	3	5.5%
B	3	5.5%
V	3	5.5%
M	3	5.5%
Other values (10)	17	30.9%

Common

Value	Count	Frequency (%)
0	277	20.1%
1	235	17.1%
2	218	15.8%
3	131	9.5%
5	108	7.8%
4	95	6.9%
9	88	6.4%
6	81	5.9%
8	80	5.8%
7	61	4.4%

Hangul

Value	Count	Frequency (%)
체	1	12.5%
코	1	12.5%
분	1	12.5%
구	1	12.5%
자	1	12.5%
단	1	12.5%
치	1	12.5%
드	1	12.5%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	1433	99.4%
Hangul	8	0.6%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	277	19.3%
1	235	16.4%
2	218	15.2%
3	131	9.1%
5	108	7.5%
4	95	6.6%
9	88	6.1%
6	81	5.7%
8	80	5.6%
7	61	4.3%
Other values (21)	59	4.1%

Hangul

Value	Count	Frequency (%)
체	1	12.5%
코	1	12.5%
분	1	12.5%
구	1	12.5%
자	1	12.5%
단	1	12.5%
치	1	12.5%
드	1	12.5%

상세코드명1
Text

Distinct	500
Distinct (%)	76.1%
Missing	0
Missing (%)	0.0%
Memory size	5.3 KiB

Length

Max length	100
Median length	25
Mean length	6.8980213
Min length	1

Characters and Unicode

Total characters	4532
Distinct characters	372
Distinct categories	11 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	416 ?
Unique (%)	63.3%

Sample

1st row	의견진술가결
2nd row	36인승이상승합
3rd row	FORM
4th row	JSP
5th row	횡단보도

Value	Count	Frequency (%)
인천광역시	28	3.4%
기타	14	1.7%
9	12	1.5%
총무과	12	1.5%
세무과	12	1.5%
과오납	11	1.4%
교통행정과	10	1.2%
부과취소	6	0.7%
수납(이중수납포함	5	0.6%
완납	5	0.6%
Other values (545)	699	85.9%

Most occurring characters

Value	Count	Frequency (%)
	161	3.6%
과	125	2.8%
납	111	2.4%
수	104	2.3%
(	92	2.0%
)	92	2.0%
인	67	1.5%
이	62	1.4%
소	61	1.3%
차	61	1.3%
Other values (362)	3596	79.3%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	3399	75.0%
Uppercase Letter	278	6.1%
Lowercase Letter	241	5.3%
Decimal Number	213	4.7%
Space Separator	161	3.6%
Open Punctuation	92	2.0%
Close Punctuation	92	2.0%
Other Punctuation	41	0.9%
Dash Punctuation	8	0.2%
Math Symbol	4	0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
과	125	3.7%
납	111	3.3%
수	104	3.1%
인	67	2.0%
이	62	1.8%
소	61	1.8%
차	61	1.8%
자	61	1.8%
부	59	1.7%
시	56	1.6%
Other values (290)	2632	77.4%

Uppercase Letter

Value	Count	Frequency (%)
B	28	10.1%
D	23	8.3%
C	21	7.6%
T	16	5.8%
E	15	5.4%
F	14	5.0%
X	13	4.7%
P	13	4.7%
Y	11	4.0%
L	11	4.0%
Other values (16)	113	40.6%

Lowercase Letter

Value	Count	Frequency (%)
e	27	11.2%
w	14	5.8%
y	11	4.6%
k	11	4.6%
g	11	4.6%
r	11	4.6%
t	10	4.1%
f	10	4.1%
i	10	4.1%
q	9	3.7%
Other values (16)	117	48.5%

Decimal Number

Value	Count	Frequency (%)
2	43	20.2%
3	31	14.6%
1	30	14.1%
9	28	13.1%
4	22	10.3%
5	16	7.5%
0	15	7.0%
6	13	6.1%
7	8	3.8%
8	7	3.3%

Other Punctuation

Value	Count	Frequency (%)
%	30	73.2%
,	9	22.0%
/	2	4.9%

Math Symbol

Value	Count	Frequency (%)
>	2	50.0%
<	2	50.0%

Space Separator

Value	Count	Frequency (%)
	161	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	92	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	92	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	8	100.0%

Connector Punctuation

Value	Count	Frequency (%)
_	3	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	3399	75.0%
Common	614	13.5%
Latin	519	11.5%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
과	125	3.7%
납	111	3.3%
수	104	3.1%
인	67	2.0%
이	62	1.8%
소	61	1.8%
차	61	1.8%
자	61	1.8%
부	59	1.7%
시	56	1.6%
Other values (290)	2632	77.4%

Latin

Value	Count	Frequency (%)
B	28	5.4%
e	27	5.2%
D	23	4.4%
C	21	4.0%
T	16	3.1%
E	15	2.9%
F	14	2.7%
w	14	2.7%
X	13	2.5%
P	13	2.5%
Other values (42)	335	64.5%

Common

Value	Count	Frequency (%)
	161	26.2%
(	92	15.0%
)	92	15.0%
2	43	7.0%
3	31	5.0%
%	30	4.9%
1	30	4.9%
9	28	4.6%
4	22	3.6%
5	16	2.6%
Other values (10)	69	11.2%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	3399	75.0%
ASCII	1133	25.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	161	14.2%
(	92	8.1%
)	92	8.1%
2	43	3.8%
3	31	2.7%
%	30	2.6%
1	30	2.6%
B	28	2.5%
9	28	2.5%
e	27	2.4%
Other values (62)	571	50.4%

Hangul

Value	Count	Frequency (%)
과	125	3.7%
납	111	3.3%
수	104	3.1%
인	67	2.0%
이	62	1.8%
소	61	1.8%
차	61	1.8%
자	61	1.8%
부	59	1.7%
시	56	1.6%
Other values (290)	2632	77.4%

상세코드명2
Text

MISSING

Distinct	84
Distinct (%)	39.6%
Missing	445
Missing (%)	67.7%
Memory size	5.3 KiB

Length

Max length	36
Median length	28
Mean length	5.5518868
Min length	1

Characters and Unicode

Total characters	1177
Distinct characters	158
Distinct categories	9 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	58 ?
Unique (%)	27.4%

Sample

1st row	2
2nd row	0
3rd row	20091111
4th row	20091111
5th row	20091111

Value	Count	Frequency (%)
cr	28	10.7%
dr	16	6.1%
기타사유	10	3.8%
인천광역시	10	3.8%
20091111	10	3.8%
0	6	2.3%
300010	5	1.9%
300003	5	1.9%
300006	5	1.9%
300002	5	1.9%
Other values (97)	161	61.7%

Most occurring characters

Value	Count	Frequency (%)
0	348	29.6%
3	79	6.7%
1	77	6.5%
	49	4.2%
R	44	3.7%
2	32	2.7%
C	28	2.4%
9	24	2.0%
시	18	1.5%
D	17	1.4%
Other values (148)	461	39.2%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	626	53.2%
Other Letter	398	33.8%
Uppercase Letter	93	7.9%
Space Separator	49	4.2%
Dash Punctuation	4	0.3%
Other Punctuation	3	0.3%
Math Symbol	2	0.2%
Open Punctuation	1	0.1%
Close Punctuation	1	0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
시	18	4.5%
기	14	3.5%
타	13	3.3%
통	12	3.0%
과	12	3.0%
사	12	3.0%
교	11	2.8%
유	10	2.5%
인	10	2.5%
천	10	2.5%
Other values (125)	276	69.3%

Decimal Number

Value	Count	Frequency (%)
0	348	55.6%
3	79	12.6%
1	77	12.3%
2	32	5.1%
9	24	3.8%
6	15	2.4%
7	15	2.4%
4	15	2.4%
5	14	2.2%
8	7	1.1%

Uppercase Letter

Value	Count	Frequency (%)
R	44	47.3%
C	28	30.1%
D	17	18.3%
E	2	2.2%
A	1	1.1%
P	1	1.1%

Other Punctuation

Value	Count	Frequency (%)
.	2	66.7%
,	1	33.3%

Space Separator

Value	Count	Frequency (%)
	49	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	4	100.0%

Math Symbol

Value	Count	Frequency (%)
+	2	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	1	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	686	58.3%
Hangul	398	33.8%
Latin	93	7.9%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
시	18	4.5%
기	14	3.5%
타	13	3.3%
통	12	3.0%
과	12	3.0%
사	12	3.0%
교	11	2.8%
유	10	2.5%
인	10	2.5%
천	10	2.5%
Other values (125)	276	69.3%

Common

Value	Count	Frequency (%)
0	348	50.7%
3	79	11.5%
1	77	11.2%
	49	7.1%
2	32	4.7%
9	24	3.5%
6	15	2.2%
7	15	2.2%
4	15	2.2%
5	14	2.0%
Other values (7)	18	2.6%

Latin

Value	Count	Frequency (%)
R	44	47.3%
C	28	30.1%
D	17	18.3%
E	2	2.2%
A	1	1.1%
P	1	1.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	779	66.2%
Hangul	398	33.8%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	348	44.7%
3	79	10.1%
1	77	9.9%
	49	6.3%
R	44	5.6%
2	32	4.1%
C	28	3.6%
9	24	3.1%
D	17	2.2%
6	15	1.9%
Other values (13)	66	8.5%

Hangul

Value	Count	Frequency (%)
시	18	4.5%
기	14	3.5%
타	13	3.3%
통	12	3.0%
과	12	3.0%
사	12	3.0%
교	11	2.8%
유	10	2.5%
인	10	2.5%
천	10	2.5%
Other values (125)	276	69.3%

Phik (φk)

Heatmap
Table

	마스터코드	상세코드명2
마스터코드	1.000	0.991
상세코드명2	0.991	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	마스터코드	상세코드	상세코드명1	상세코드명2
0	TFA026	5	의견진술가결	<NA>
1	TFA006	21	36인승이상승합	<NA>
2	COM015	1	FORM	<NA>
3	COM015	2	JSP	<NA>
4	TFA005	1	횡단보도	<NA>
5	TFA005	2	보도주차	<NA>
6	TFA005	3	모퉁이	<NA>
7	TFA005	4	버스정류장	<NA>
8	TFA005	5	이중주차	<NA>
9	TFA005	6	소화전	<NA>

	마스터코드	상세코드	상세코드명1	상세코드명2
647	TFA005	40	적색노면표시 소화전	<NA>
648	COM000	S	안전신문고(24시)	<NA>
649	TFA003	8	이륜차	40000
650	TFA025	12	상호금융압류	<NA>
651	TFA025	13	증권압류	<NA>
652	TFA031	13	분납대상자	<NA>
653	TFA037	W	소화전	<NA>
654	TFA036	28000	신한은행	1.00E+11
655	TFA036	28260	신한은행	1.00E+11
656	NDG001	28245	W%2BBIWe%2BULr1WSKTZEbYotxptG7Sks4ktjlWedke6YBufwokVyHMxPe9wq4Ys0%2BX%2BBDrDFSkcQpyCE1qbihEjIA%3D%3D	<NA>

Overview

Variables

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Uppercase Letter

Decimal Number

Other Letter

Connector Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Hangul

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Math Symbol

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Connector Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Uppercase Letter

Other Punctuation

Space Separator

Dash Punctuation

Math Symbol

Open Punctuation

Close Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Correlations

Missing values

Sample