gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	448
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	14.6 KiB
Average record size in memory	33.3 B

Variable types

Numeric	1
Categorical	2
Text	1

Dataset

Description	IDX,코드,자치구,회사
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15491/S/1/datasetView.do

Alerts

`IDX` is highly overall correlated with `코드`	High correlation
`코드` is highly overall correlated with `IDX` and 1 other fields	High correlation
`자치구` is highly overall correlated with `코드`	High correlation

Reproduction

Analysis started	2024-05-10 22:43:46.511390
Analysis finished	2024-05-10 22:43:47.757367
Duration	1.25 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

IDX
Real number (ℝ)

HIGH CORRELATION

Distinct	441
Distinct (%)	98.4%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	486.01116

Minimum	0
Maximum	1004
Zeros	1
Zeros (%)	0.2%
Negative	0
Negative (%)	0.0%
Memory size	4.1 KiB

Quantile statistics

Minimum	0
5-th percentile	21.35
Q1	118.75
median	576.5
Q3	710.25
95-th percentile	821.65
Maximum	1004
Range	1004
Interquartile range (IQR)	591.5

Descriptive statistics

Standard deviation	282.60121
Coefficient of variation (CV)	0.58147061
Kurtosis	-1.1417554
Mean	486.01116
Median Absolute Deviation (MAD)	150
Skewness	-0.57366509
Sum	217733
Variance	79863.443
Monotonicity	Decreasing

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
826	3	0.7%
827	3	0.7%
828	2	0.4%
1	2	0.4%
825	2	0.4%
1004	1	0.2%
469	1	0.2%
468	1	0.2%
467	1	0.2%
466	1	0.2%
Other values (431)	431	96.2%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
0	1	0.2%
1	2	0.4%
2	1	0.2%
3	1	0.2%
4	1	0.2%
5	1	0.2%
6	1	0.2%
7	1	0.2%
8	1	0.2%
9	1	0.2%

Value	Count	Frequency (%)
1004	1	0.2%
1003	1	0.2%
1002	1	0.2%
1001	1	0.2%
838	1	0.2%
837	1	0.2%
836	1	0.2%
835	1	0.2%
834	1	0.2%
830	1	0.2%

코드
Categorical

HIGH CORRELATION

Distinct	5
Distinct (%)	1.1%
Missing	0
Missing (%)	0.0%
Memory size	3.6 KiB

t1	258
b2	123
b1	65
t2	1
CODE	1

Length

Max length	4
Median length	2
Mean length	2.0044643
Min length	2

Unique

Unique	2 ?
Unique (%)	0.4%

Sample

1st row	b1
2nd row	b1
3rd row	b1
4th row	b1
5th row	t1

Common Values

Value	Count	Frequency (%)
t1	258	57.6%
b2	123	27.5%
b1	65	14.5%
t2	1	0.2%
CODE	1	0.2%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
t1	258	57.6%
b2	123	27.5%
b1	65	14.5%
t2	1	0.2%
code	1	0.2%

자치구
Categorical

HIGH CORRELATION

Distinct	26
Distinct (%)	5.8%
Missing	0
Missing (%)	0.0%
Memory size	3.6 KiB

강서구	48
도봉구	41
강동구	25
노원구	24
은평구	24
Other values (21)	286

Length

Max length	6
Median length	3
Mean length	3.1227679
Min length	3

Unique

Unique	2 ?
Unique (%)	0.4%

Sample

1st row	동대문구
2nd row	은평구
3rd row	강서구
4th row	동작구
5th row	구로구

Common Values

Value	Count	Frequency (%)
강서구	48	10.7%
도봉구	41	9.2%
강동구	25	5.6%
노원구	24	5.4%
은평구	24	5.4%
중랑구	23	5.1%
서초구	22	4.9%
금천구	19	4.2%
송파구	19	4.2%
양천구	19	4.2%
Other values (16)	184	41.1%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
강서구	48	10.7%
도봉구	41	9.2%
강동구	25	5.6%
노원구	24	5.4%
은평구	24	5.4%
중랑구	23	5.1%
서초구	22	4.9%
금천구	19	4.2%
송파구	19	4.2%
양천구	19	4.2%
Other values (16)	184	41.1%

회사
Text

Distinct	441
Distinct (%)	98.4%
Missing	0
Missing (%)	0.0%
Memory size	3.6 KiB

Length

Max length	8
Median length	4
Mean length	4.1741071
Min length	2

Characters and Unicode

Total characters	1870
Distinct characters	208
Distinct categories	6 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	434 ?
Unique (%)	96.9%

Sample

1st row	보광운수
2nd row	신수교통
3rd row	양천운수
4th row	서울매일버스
5th row	우종기업

Value	Count	Frequency (%)
시온교통	2	0.4%
대진여객	2	0.4%
범일운수	2	0.4%
선진운수	2	0.4%
청록운수	2	0.4%
대영마을버스	2	0.4%
대종상운	2	0.4%
한성운수	1	0.2%
메트로버스	1	0.2%
진아교통	1	0.2%
Other values (431)	431	96.2%

Most occurring characters

Value	Count	Frequency (%)
운	222	11.9%
수	181	9.7%
통	108	5.8%
교	95	5.1%
시	56	3.0%
택	53	2.8%
상	52	2.8%
동	43	2.3%
성	40	2.1%
업	38	2.0%
Other values (198)	982	52.5%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	1854	99.1%
Uppercase Letter	9	0.5%
Lowercase Letter	3	0.2%
Decimal Number	2	0.1%
Close Punctuation	1	0.1%
Open Punctuation	1	0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
운	222	12.0%
수	181	9.8%
통	108	5.8%
교	95	5.1%
시	56	3.0%
택	53	2.9%
상	52	2.8%
동	43	2.3%
성	40	2.2%
업	38	2.0%
Other values (184)	966	52.1%

Uppercase Letter

Value	Count	Frequency (%)
O	2	22.2%
N	1	11.1%
P	1	11.1%
M	1	11.1%
C	1	11.1%
A	1	11.1%
K	1	11.1%
Y	1	11.1%

Lowercase Letter

Value	Count	Frequency (%)
t	1	33.3%
r	1	33.3%
b	1	33.3%

Decimal Number

Value	Count	Frequency (%)
3	2	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	1	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	1854	99.1%
Latin	12	0.6%
Common	4	0.2%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
운	222	12.0%
수	181	9.8%
통	108	5.8%
교	95	5.1%
시	56	3.0%
택	53	2.9%
상	52	2.8%
동	43	2.3%
성	40	2.2%
업	38	2.0%
Other values (184)	966	52.1%

Latin

Value	Count	Frequency (%)
O	2	16.7%
N	1	8.3%
P	1	8.3%
M	1	8.3%
C	1	8.3%
A	1	8.3%
K	1	8.3%
t	1	8.3%
r	1	8.3%
b	1	8.3%

Common

Value	Count	Frequency (%)
3	2	50.0%
)	1	25.0%
(	1	25.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	1854	99.1%
ASCII	16	0.9%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
운	222	12.0%
수	181	9.8%
통	108	5.8%
교	95	5.1%
시	56	3.0%
택	53	2.9%
상	52	2.8%
동	43	2.3%
성	40	2.2%
업	38	2.0%
Other values (184)	966	52.1%

ASCII

Value	Count	Frequency (%)
O	2	12.5%
3	2	12.5%
N	1	6.2%
P	1	6.2%
M	1	6.2%
C	1	6.2%
A	1	6.2%
)	1	6.2%
(	1	6.2%
K	1	6.2%
Other values (4)	4	25.0%

IDX

IDX

Heatmap
Table

	IDX	코드	자치구
IDX	1.000	0.778	0.768
코드	0.778	1.000	0.841
자치구	0.768	0.841	1.000

Heatmap
Table

	자치구	코드
자치구	1.000	0.582
코드	0.582	1.000

Heatmap
Table

	IDX	코드	자치구
IDX	1.000	0.625	0.428
코드	0.625	1.000	0.582
자치구	0.428	0.582	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	IDX	코드	자치구	회사
0	1004	b1	동대문구	보광운수
1	1003	b1	은평구	신수교통
2	1002	b1	강서구	양천운수
3	1001	b1	동작구	서울매일버스
4	838	t1	구로구	우종기업
5	837	t1	강서구	소망기업
6	836	t1	강동구	천마교통
7	835	t1	노원구	복흥기업
8	834	t1	동대문구	대덕운수
9	830	t1	금천구	강북운수

	IDX	코드	자치구	회사
438	8	b2	강북구	화계운수
439	7	b2	강북구	인수운수
440	6	b2	강북구	수유운수
441	5	b2	강동구	신명운수
442	4	b2	강동구	강동교통
443	3	b2	강남구	포이운수
444	2	b2	강남구	일원교통
445	1	t1	택시운송조합	택시운송조합
446	1	b2	강남구	개포운수
447	0	CODE	GIYUG	COMPANY

Overview

Variables

Common Values

Length

Common Values (Plot)

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Close Punctuation

Open Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Interactions

Correlations

Missing values

Sample