Overview

Dataset statistics

Number of variables6
Number of observations736
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory36.8 KiB
Average record size in memory51.2 B

Variable types

Numeric3
Categorical2
Text1

Dataset

Description샘플 데이터
Author서울시(스마트카드사)
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=13

Alerts

교통운영기관명(CORP_NM) is highly overall correlated with 교통운영기관ID(CORP_ID) and 3 other fieldsHigh correlation
호선명(LINE_NM) is highly overall correlated with 교통운영기관ID(CORP_ID) and 3 other fieldsHigh correlation
교통운영기관ID(CORP_ID) is highly overall correlated with 호선코드(LINE_CD) and 3 other fieldsHigh correlation
호선코드(LINE_CD) is highly overall correlated with 교통운영기관ID(CORP_ID) and 3 other fieldsHigh correlation
역ID(STATION_ID) is highly overall correlated with 교통운영기관ID(CORP_ID) and 3 other fieldsHigh correlation
역ID(STATION_ID) has unique valuesUnique

Reproduction

Analysis started2024-04-17 19:24:57.823440
Analysis finished2024-04-17 19:24:58.933294
Duration1.11 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

교통운영기관ID(CORP_ID)
Real number (ℝ)

HIGH CORRELATION 

Distinct14
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2147.3152
Minimum2110
Maximum2413
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.6 KiB
2024-04-18T04:24:58.973061image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2110
5-th percentile2110
Q12110
median2111
Q32120
95-th percentile2411
Maximum2413
Range303
Interquartile range (IQR)10

Descriptive statistics

Standard deviation87.111336
Coefficient of variation (CV)0.040567559
Kurtosis3.6313372
Mean2147.3152
Median Absolute Deviation (MAD)1
Skewness2.261919
Sum1580424
Variance7588.3849
MonotonicityNot monotonic
2024-04-18T04:24:59.064430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
2110 283
38.5%
2111 264
35.9%
2283 30
 
4.1%
2284 29
 
3.9%
2120 25
 
3.4%
2410 15
 
2.0%
2411 15
 
2.0%
2121 14
 
1.9%
2125 13
 
1.8%
2412 13
 
1.8%
Other values (4) 35
 
4.8%
ValueCountFrequency (%)
2110 283
38.5%
2111 264
35.9%
2120 25
 
3.4%
2121 14
 
1.9%
2123 6
 
0.8%
2125 13
 
1.8%
2126 7
 
1.0%
2127 12
 
1.6%
2283 30
 
4.1%
2284 29
 
3.9%
ValueCountFrequency (%)
2413 10
 
1.4%
2412 13
1.8%
2411 15
2.0%
2410 15
2.0%
2284 29
3.9%
2283 30
4.1%
2127 12
 
1.6%
2126 7
 
1.0%
2125 13
1.8%
2123 6
 
0.8%

교통운영기관명(CORP_NM)
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
서울교통공사
283 
한국철도공사
264 
인천교통공사
30 
인천도시철도2호선
29 
서울메트로9호선
 
25
Other values (9)
105 

Length

Max length17
Median length6
Mean length6.4116848
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울교통공사
2nd row서울교통공사
3rd row서울교통공사
4th row서울교통공사
5th row서울교통공사

Common Values

ValueCountFrequency (%)
서울교통공사 283
38.5%
한국철도공사 264
35.9%
인천교통공사 30
 
4.1%
인천도시철도2호선 29
 
3.9%
서울메트로9호선 25
 
3.4%
용인경량전철 15
 
2.0%
의정부경전철 15
 
2.0%
공항철도 14
 
1.9%
서울교통공사(9호선 2~3단계) 13
 
1.8%
우이신설경전철 13
 
1.8%
Other values (4) 35
 
4.8%

Length

2024-04-18T04:24:59.163217image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서울교통공사 283
37.8%
한국철도공사 264
35.2%
인천교통공사 30
 
4.0%
인천도시철도2호선 29
 
3.9%
서울메트로9호선 25
 
3.3%
용인경량전철 15
 
2.0%
의정부경전철 15
 
2.0%
공항철도 14
 
1.9%
서울교통공사(9호선 13
 
1.7%
2~3단계 13
 
1.7%
Other values (5) 48
 
6.4%

호선코드(LINE_CD)
Real number (ℝ)

HIGH CORRELATION 

Distinct33
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean175.10598
Minimum1
Maximum409
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.6 KiB
2024-04-18T04:24:59.262729image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q1103
median112
Q3208
95-th percentile406
Maximum409
Range408
Interquartile range (IQR)105

Descriptive statistics

Standard deviation128.7186
Coefficient of variation (CV)0.73508971
Kurtosis-0.74065505
Mean175.10598
Median Absolute Deviation (MAD)94
Skewness0.53977491
Sum128878
Variance16568.479
MonotonicityNot monotonic
2024-04-18T04:24:59.364074image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
205 56
 
7.6%
207 51
 
6.9%
2 50
 
6.8%
101 39
 
5.3%
206 39
 
5.3%
106 35
 
4.8%
3 34
 
4.6%
110 33
 
4.5%
103 30
 
4.1%
301 30
 
4.1%
Other values (23) 339
46.1%
ValueCountFrequency (%)
1 10
 
1.4%
2 50
6.8%
3 34
4.6%
4 26
3.5%
101 39
5.3%
102 20
 
2.7%
103 30
4.1%
104 13
 
1.8%
105 8
 
1.1%
106 35
4.8%
ValueCountFrequency (%)
409 10
 
1.4%
408 13
1.8%
407 7
 
1.0%
406 15
2.0%
405 15
2.0%
404 13
1.8%
403 6
 
0.8%
402 14
1.9%
401 25
3.4%
302 29
3.9%

호선명(LINE_NM)
Categorical

HIGH CORRELATION 

Distinct33
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
5호선
56 
7호선
51 
2호선
 
50
경부선
 
39
6호선
 
39
Other values (28)
501 

Length

Max length8
Median length3
Mean length3.5597826
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1호선
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
5호선 56
 
7.6%
7호선 51
 
6.9%
2호선 50
 
6.8%
경부선 39
 
5.3%
6호선 39
 
5.3%
분당선 35
 
4.8%
3호선 34
 
4.6%
경의선 33
 
4.5%
인천1호선 30
 
4.1%
경원선 30
 
4.1%
Other values (23) 339
46.1%

Length

2024-04-18T04:24:59.466670image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
5호선 56
 
7.5%
7호선 51
 
6.8%
2호선 50
 
6.7%
경부선 39
 
5.2%
6호선 39
 
5.2%
분당선 35
 
4.7%
3호선 34
 
4.5%
경의선 33
 
4.4%
경원선 30
 
4.0%
인천1호선 30
 
4.0%
Other values (23) 353
47.1%

역ID(STATION_ID)
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct736
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2227.4293
Minimum150
Maximum4929
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.6 KiB
2024-04-18T04:24:59.563675image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum150
5-th percentile227.75
Q11271.75
median1911.5
Q33114.25
95-th percentile4613.25
Maximum4929
Range4779
Interquartile range (IQR)1842.5

Descriptive statistics

Standard deviation1344.9106
Coefficient of variation (CV)0.60379493
Kurtosis-0.78834098
Mean2227.4293
Median Absolute Deviation (MAD)839
Skewness0.29038684
Sum1639388
Variance1808784.4
MonotonicityStrictly increasing
2024-04-18T04:24:59.677605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
150 1
 
0.1%
2727 1
 
0.1%
2718 1
 
0.1%
2719 1
 
0.1%
2720 1
 
0.1%
2721 1
 
0.1%
2722 1
 
0.1%
2723 1
 
0.1%
2724 1
 
0.1%
2725 1
 
0.1%
Other values (726) 726
98.6%
ValueCountFrequency (%)
150 1
0.1%
151 1
0.1%
152 1
0.1%
153 1
0.1%
154 1
0.1%
155 1
0.1%
156 1
0.1%
157 1
0.1%
158 1
0.1%
159 1
0.1%
ValueCountFrequency (%)
4929 1
0.1%
4928 1
0.1%
4927 1
0.1%
4926 1
0.1%
4925 1
0.1%
4924 1
0.1%
4923 1
0.1%
4922 1
0.1%
4921 1
0.1%
4920 1
0.1%
Distinct626
Distinct (%)85.1%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
2024-04-18T04:24:59.918678image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length2
Mean length3.6589674
Min length2

Characters and Unicode

Total characters2693
Distinct characters317
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique537 ?
Unique (%)73.0%

Sample

1st row서울역
2nd row시청
3rd row종각
4th row종로3가
5th row종로5가
ValueCountFrequency (%)
서울역 6
 
0.8%
김포공항 5
 
0.7%
공덕 5
 
0.7%
디지털미디어시티 4
 
0.5%
홍대입구 4
 
0.5%
신설동 3
 
0.4%
동대문역사문화공원(ddp 3
 
0.4%
고속터미널 3
 
0.4%
검암 3
 
0.4%
계양 3
 
0.4%
Other values (616) 697
94.7%
2024-04-18T04:25:00.502264image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
) 89
 
3.3%
( 89
 
3.3%
87
 
3.2%
80
 
3.0%
59
 
2.2%
56
 
2.1%
55
 
2.0%
50
 
1.9%
50
 
1.9%
45
 
1.7%
Other values (307) 2033
75.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2485
92.3%
Close Punctuation 89
 
3.3%
Open Punctuation 89
 
3.3%
Decimal Number 13
 
0.5%
Uppercase Letter 9
 
0.3%
Other Punctuation 8
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
87
 
3.5%
80
 
3.2%
59
 
2.4%
56
 
2.3%
55
 
2.2%
50
 
2.0%
50
 
2.0%
45
 
1.8%
43
 
1.7%
40
 
1.6%
Other values (295) 1920
77.3%
Decimal Number
ValueCountFrequency (%)
3 5
38.5%
4 3
23.1%
1 2
 
15.4%
9 1
 
7.7%
2 1
 
7.7%
5 1
 
7.7%
Other Punctuation
ValueCountFrequency (%)
. 7
87.5%
· 1
 
12.5%
Uppercase Letter
ValueCountFrequency (%)
D 6
66.7%
P 3
33.3%
Close Punctuation
ValueCountFrequency (%)
) 89
100.0%
Open Punctuation
ValueCountFrequency (%)
( 89
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2485
92.3%
Common 199
 
7.4%
Latin 9
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
87
 
3.5%
80
 
3.2%
59
 
2.4%
56
 
2.3%
55
 
2.2%
50
 
2.0%
50
 
2.0%
45
 
1.8%
43
 
1.7%
40
 
1.6%
Other values (295) 1920
77.3%
Common
ValueCountFrequency (%)
) 89
44.7%
( 89
44.7%
. 7
 
3.5%
3 5
 
2.5%
4 3
 
1.5%
1 2
 
1.0%
9 1
 
0.5%
2 1
 
0.5%
· 1
 
0.5%
5 1
 
0.5%
Latin
ValueCountFrequency (%)
D 6
66.7%
P 3
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2485
92.3%
ASCII 207
 
7.7%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
) 89
43.0%
( 89
43.0%
. 7
 
3.4%
D 6
 
2.9%
3 5
 
2.4%
P 3
 
1.4%
4 3
 
1.4%
1 2
 
1.0%
9 1
 
0.5%
2 1
 
0.5%
Hangul
ValueCountFrequency (%)
87
 
3.5%
80
 
3.2%
59
 
2.4%
56
 
2.3%
55
 
2.2%
50
 
2.0%
50
 
2.0%
45
 
1.8%
43
 
1.7%
40
 
1.6%
Other values (295) 1920
77.3%
None
ValueCountFrequency (%)
· 1
100.0%

Interactions

2024-04-18T04:24:58.574508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-18T04:24:58.115989image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-18T04:24:58.342317image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-18T04:24:58.645560image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-18T04:24:58.189070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-18T04:24:58.422941image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-18T04:24:58.721207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-18T04:24:58.270562image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-18T04:24:58.503822image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-18T04:25:00.576472image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
교통운영기관ID(CORP_ID)교통운영기관명(CORP_NM)호선코드(LINE_CD)호선명(LINE_NM)역ID(STATION_ID)
교통운영기관ID(CORP_ID)1.0001.0000.9801.0000.929
교통운영기관명(CORP_NM)1.0001.0000.9191.0000.901
호선코드(LINE_CD)0.9800.9191.0001.0000.973
호선명(LINE_NM)1.0001.0001.0001.0000.977
역ID(STATION_ID)0.9290.9010.9730.9771.000
2024-04-18T04:25:00.674431image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
교통운영기관명(CORP_NM)호선명(LINE_NM)
교통운영기관명(CORP_NM)1.0000.987
호선명(LINE_NM)0.9871.000
2024-04-18T04:25:00.742156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
교통운영기관ID(CORP_ID)호선코드(LINE_CD)역ID(STATION_ID)교통운영기관명(CORP_NM)호선명(LINE_NM)
교통운영기관ID(CORP_ID)1.0000.5330.5620.9920.979
호선코드(LINE_CD)0.5331.0000.9290.7660.981
역ID(STATION_ID)0.5620.9291.0000.6600.837
교통운영기관명(CORP_NM)0.9920.7660.6601.0000.987
호선명(LINE_NM)0.9790.9810.8370.9871.000

Missing values

2024-04-18T04:24:58.820525image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-18T04:24:58.899306image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

교통운영기관ID(CORP_ID)교통운영기관명(CORP_NM)호선코드(LINE_CD)호선명(LINE_NM)역ID(STATION_ID)역명(STATION_NM)
02110서울교통공사11호선150서울역
12110서울교통공사11호선151시청
22110서울교통공사11호선152종각
32110서울교통공사11호선153종로3가
42110서울교통공사11호선154종로5가
52110서울교통공사11호선155동대문
62110서울교통공사11호선156신설동
72110서울교통공사11호선157제기동
82110서울교통공사11호선158청량리(서울시립대입구)
92110서울교통공사11호선159동묘앞
교통운영기관ID(CORP_ID)교통운영기관명(CORP_NM)호선코드(LINE_CD)호선명(LINE_NM)역ID(STATION_ID)역명(STATION_NM)
7262413김포시청409김포골드라인4920양촌
7272413김포시청409김포골드라인4921구래
7282413김포시청409김포골드라인4922마산
7292413김포시청409김포골드라인4923장기
7302413김포시청409김포골드라인4924운양
7312413김포시청409김포골드라인4925걸포북변
7322413김포시청409김포골드라인4926사우(김포시청)
7332413김포시청409김포골드라인4927풍무
7342413김포시청409김포골드라인4928고촌
7352413김포시청409김포골드라인4929김포공항