gimi9 Pandas Profiling

Dataset statistics

Number of variables	2
Number of observations	2720
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	682
Duplicate rows (%)	25.1%
Total size in memory	45.3 KiB
Average record size in memory	17.0 B

Variable types

Text	1
Numeric	1

Dataset

Description	노선명,노선
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15262/S/1/datasetView.do

Alerts

Dataset has 682 (25.1%) duplicate rows

Duplicates

Reproduction

Analysis started	2024-05-11 06:16:06.410877
Analysis finished	2024-05-11 06:16:06.957274
Duration	0.55 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

노선명
Text

Distinct	690
Distinct (%)	25.4%
Missing	0
Missing (%)	0.0%
Memory size	21.4 KiB

Length

Max length	8
Median length	4
Mean length	3.9466912
Min length	3

Characters and Unicode

Total characters	10735
Distinct characters	70
Distinct categories	4 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	8 ?
Unique (%)	0.3%

Sample

1st row	0017
2nd row	01A
3rd row	01B
4th row	0411
5th row	100

Value	Count	Frequency (%)
0017	4	0.1%
강서05-1	4	0.1%
관악08	4	0.1%
강북11	4	0.1%
강북12	4	0.1%
강서01	4	0.1%
강서02	4	0.1%
강서03	4	0.1%
강서04	4	0.1%
강서05	4	0.1%
Other values (680)	2680	98.5%

Most occurring characters

Value	Count	Frequency (%)
1	1796	16.7%
0	1515	14.1%
2	986	9.2%
6	868	8.1%
3	735	6.8%
7	674	6.3%
5	649	6.0%
4	612	5.7%
8	241	2.2%
서	208	1.9%
Other values (60)	2451	22.8%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	8234	76.7%
Other Letter	2240	20.9%
Uppercase Letter	207	1.9%
Dash Punctuation	54	0.5%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
서	208	9.3%
동	160	7.1%
강	128	5.7%
성	120	5.4%
포	120	5.4%
북	116	5.2%
로	100	4.5%
대	100	4.5%
문	84	3.8%
초	84	3.8%
Other values (42)	1020	45.5%

Decimal Number

Value	Count	Frequency (%)
1	1796	21.8%
0	1515	18.4%
2	986	12.0%
6	868	10.5%
3	735	8.9%
7	674	8.2%
5	649	7.9%
4	612	7.4%
8	241	2.9%
9	158	1.9%

Uppercase Letter

Value	Count	Frequency (%)
N	76	36.7%
A	36	17.4%
B	31	15.0%
U	16	7.7%
O	16	7.7%
R	16	7.7%
T	16	7.7%

Dash Punctuation

Value	Count	Frequency (%)
-	54	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	8288	77.2%
Hangul	2240	20.9%
Latin	207	1.9%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
서	208	9.3%
동	160	7.1%
강	128	5.7%
성	120	5.4%
포	120	5.4%
북	116	5.2%
로	100	4.5%
대	100	4.5%
문	84	3.8%
초	84	3.8%
Other values (42)	1020	45.5%

Common

Value	Count	Frequency (%)
1	1796	21.7%
0	1515	18.3%
2	986	11.9%
6	868	10.5%
3	735	8.9%
7	674	8.1%
5	649	7.8%
4	612	7.4%
8	241	2.9%
9	158	1.9%

Latin

Value	Count	Frequency (%)
N	76	36.7%
A	36	17.4%
B	31	15.0%
U	16	7.7%
O	16	7.7%
R	16	7.7%
T	16	7.7%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	8495	79.1%
Hangul	2240	20.9%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
1	1796	21.1%
0	1515	17.8%
2	986	11.6%
6	868	10.2%
3	735	8.7%
7	674	7.9%
5	649	7.6%
4	612	7.2%
8	241	2.8%
9	158	1.9%
Other values (8)	261	3.1%

Hangul

Value	Count	Frequency (%)
서	208	9.3%
동	160	7.1%
강	128	5.7%
성	120	5.4%
포	120	5.4%
북	116	5.2%
로	100	4.5%
대	100	4.5%
문	84	3.8%
초	84	3.8%
Other values (42)	1020	45.5%

노선
Real number (ℝ)

Distinct	690
Distinct (%)	25.4%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	1.0656566 × 10⁸

Minimum	1.0000002 × 10⁸
Maximum	1.249 × 10⁸
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	24.0 KiB

Quantile statistics

Minimum	1.0000002 × 10⁸
5-th percentile	1.0010004 × 10⁸
Q1	1.0010025 × 10⁸
median	1.0010059 × 10⁸
Q3	1.13 × 10⁸
95-th percentile	1.22 × 10⁸
Maximum	1.249 × 10⁸
Range	24899986
Interquartile range (IQR)	12899753

Descriptive statistics

Standard deviation	8154293.5
Coefficient of variation (CV)	0.07651896
Kurtosis	-0.85545463
Mean	1.0656566 × 10⁸
Median Absolute Deviation (MAD)	573
Skewness	0.81842412
Sum	2.898586 × 10¹¹
Variance	6.6492503 × 10¹³
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
100100124	4	0.1%
115900002	4	0.1%
108900001	4	0.1%
108900012	4	0.1%
115900006	4	0.1%
115900003	4	0.1%
115900004	4	0.1%
115900001	4	0.1%
115900005	4	0.1%
115900008	4	0.1%
Other values (680)	2680	98.5%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
100000017	4	0.1%
100000018	4	0.1%
100100001	4	0.1%
100100006	4	0.1%
100100007	4	0.1%
100100008	4	0.1%
100100009	4	0.1%
100100010	4	0.1%
100100011	4	0.1%
100100012	4	0.1%

Value	Count	Frequency (%)
124900003	4	0.1%
124900002	4	0.1%
124900001	4	0.1%
124000040	1	< 0.1%
124000039	4	0.1%
124000038	4	0.1%
124000036	4	0.1%
124000016	4	0.1%
124000015	4	0.1%
124000014	4	0.1%

노선

노선

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	노선명	노선
0	0017	100100124
1	01A	100100001
2	01B	106000004
3	0411	104000012
4	100	100100549
5	101	100100006
6	1014	100100129
7	1017	100100130
8	102	100100007
9	1020	100100131

	노선명	노선
2710	종로03	100900010
2711	종로05	100900011
2712	종로07	100900004
2713	종로08	100900005
2714	종로09	100900003
2715	종로11	100900007
2716	종로12	100900009
2717	종로13	100900002
2718	중랑01	106900001
2719	중랑02	106900002

Most frequently occurring

	노선명	노선	# duplicates
0	0017	100100124	4
1	01A	100100001	4
2	01B	106000004	4
3	0411	104000012	4
4	100	100100549	4
5	101	100100006	4
6	1014	100100129	4
7	1017	100100130	4
8	102	100100007	4
9	1020	100100131	4

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Uppercase Letter

Dash Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Interactions

Missing values

Sample

Duplicate rows

Most frequently occurring