Overview

Dataset statistics

Number of variables5
Number of observations204
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory8.5 KiB
Average record size in memory42.6 B

Variable types

Categorical3
Numeric2

Dataset

Description한국철도공사 이용객의 5대도시 (경주, 부산, 여수, 전주, 강릉) 성별, 연령, 거주지별 승차권 발매수 정보입니다.
Author한국철도공사
URLhttps://www.data.go.kr/data/15106016/fileData.do

Alerts

시도코드 is highly overall correlated with 거주시도High correlation
거주시도 is highly overall correlated with 시도코드High correlation

Reproduction

Analysis started2023-12-12 02:58:35.648157
Analysis finished2023-12-12 02:58:36.891920
Duration1.24 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

성별
Categorical

Distinct2
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
남성
102 
여성
102 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row남성
2nd row남성
3rd row남성
4th row남성
5th row남성

Common Values

ValueCountFrequency (%)
남성 102
50.0%
여성 102
50.0%

Length

2023-12-12T11:58:36.988767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:58:37.123522image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
남성 102
50.0%
여성 102
50.0%

연령
Categorical

Distinct6
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
20대
34 
30대
34 
40대
34 
50대
34 
60대이상
34 

Length

Max length5
Median length3
Mean length3.1666667
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20대
2nd row20대
3rd row20대
4th row20대
5th row20대

Common Values

ValueCountFrequency (%)
20대 34
16.7%
30대 34
16.7%
40대 34
16.7%
50대 34
16.7%
60대이상 34
16.7%
기타 34
16.7%

Length

2023-12-12T11:58:37.314198image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:58:37.496669image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20대 34
16.7%
30대 34
16.7%
40대 34
16.7%
50대 34
16.7%
60대이상 34
16.7%
기타 34
16.7%

시도코드
Real number (ℝ)

HIGH CORRELATION 

Distinct17
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.705882
Minimum11
Maximum50
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB
2023-12-12T11:58:37.660748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile11
Q129
median41
Q345
95-th percentile50
Maximum50
Range39
Interquartile range (IQR)16

Descriptive statistics

Standard deviation10.279434
Coefficient of variation (CV)0.28004868
Kurtosis-0.059455855
Mean36.705882
Median Absolute Deviation (MAD)7
Skewness-0.7482056
Sum7488
Variance105.66676
MonotonicityNot monotonic
2023-12-12T11:58:37.839003image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
11 12
 
5.9%
26 12
 
5.9%
50 12
 
5.9%
48 12
 
5.9%
47 12
 
5.9%
46 12
 
5.9%
45 12
 
5.9%
44 12
 
5.9%
43 12
 
5.9%
42 12
 
5.9%
Other values (7) 84
41.2%
ValueCountFrequency (%)
11 12
5.9%
26 12
5.9%
27 12
5.9%
28 12
5.9%
29 12
5.9%
30 12
5.9%
31 12
5.9%
36 12
5.9%
41 12
5.9%
42 12
5.9%
ValueCountFrequency (%)
50 12
5.9%
48 12
5.9%
47 12
5.9%
46 12
5.9%
45 12
5.9%
44 12
5.9%
43 12
5.9%
42 12
5.9%
41 12
5.9%
36 12
5.9%

거주시도
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
서울
 
12
부산
 
12
대구
 
12
인천
 
12
광주
 
12
Other values (12)
144 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울
2nd row부산
3rd row대구
4th row인천
5th row광주

Common Values

ValueCountFrequency (%)
서울 12
 
5.9%
부산 12
 
5.9%
대구 12
 
5.9%
인천 12
 
5.9%
광주 12
 
5.9%
대전 12
 
5.9%
울산 12
 
5.9%
세종 12
 
5.9%
경기 12
 
5.9%
강원 12
 
5.9%
Other values (7) 84
41.2%

Length

2023-12-12T11:58:38.047380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서울 12
 
5.9%
강원 12
 
5.9%
경남 12
 
5.9%
경북 12
 
5.9%
전남 12
 
5.9%
전북 12
 
5.9%
충남 12
 
5.9%
충북 12
 
5.9%
경기 12
 
5.9%
부산 12
 
5.9%
Other values (7) 84
41.2%

총발매수
Real number (ℝ)

Distinct198
Distinct (%)97.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean434814.22
Minimum3200
Maximum3026200
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 KiB
2023-12-12T11:58:38.208855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3200
5-th percentile20815
Q1107825
median249250
Q3485525
95-th percentile1926090
Maximum3026200
Range3023000
Interquartile range (IQR)377700

Descriptive statistics

Standard deviation594714.58
Coefficient of variation (CV)1.3677441
Kurtosis7.5456167
Mean434814.22
Median Absolute Deviation (MAD)189150
Skewness2.7412641
Sum88702100
Variance3.5368543 × 1011
MonotonicityNot monotonic
2023-12-12T11:58:38.454770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
39900 2
 
1.0%
23800 2
 
1.0%
19300 2
 
1.0%
13600 2
 
1.0%
22700 2
 
1.0%
57500 2
 
1.0%
1898800 1
 
0.5%
2452300 1
 
0.5%
353400 1
 
0.5%
279300 1
 
0.5%
Other values (188) 188
92.2%
ValueCountFrequency (%)
3200 1
0.5%
4200 1
0.5%
4700 1
0.5%
9100 1
0.5%
10300 1
0.5%
13600 2
1.0%
19300 2
1.0%
19800 1
0.5%
20500 1
0.5%
22600 1
0.5%
ValueCountFrequency (%)
3026200 1
0.5%
2953600 1
0.5%
2901700 1
0.5%
2818300 1
0.5%
2634900 1
0.5%
2528500 1
0.5%
2452300 1
0.5%
2291900 1
0.5%
2085800 1
0.5%
2051900 1
0.5%

Interactions

2023-12-12T11:58:36.335753image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:58:35.993828image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:58:36.485580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:58:36.166197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T11:58:38.608724image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성별연령시도코드거주시도총발매수
성별1.0000.0000.0000.0000.000
연령0.0001.0000.0000.0000.422
시도코드0.0000.0001.0001.0000.459
거주시도0.0000.0001.0001.0000.540
총발매수0.0000.4220.4590.5401.000
2023-12-12T11:58:38.732578image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연령거주시도성별
연령1.0000.0000.000
거주시도0.0001.0000.000
성별0.0000.0001.000
2023-12-12T11:58:38.848953image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시도코드총발매수성별연령거주시도
시도코드1.000-0.2960.0000.0000.977
총발매수-0.2961.0000.0000.2360.240
성별0.0000.0001.0000.0000.000
연령0.0000.2360.0001.0000.000
거주시도0.9770.2400.0000.0001.000

Missing values

2023-12-12T11:58:36.674552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T11:58:36.836555image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

성별연령시도코드거주시도총발매수
0남성20대11서울1909600
1남성20대26부산584400
2남성20대27대구701500
3남성20대28인천318200
4남성20대29광주187600
5남성20대30대전691200
6남성20대31울산329900
7남성20대36세종155400
8남성20대41경기2051900
9남성20대42강원324900
성별연령시도코드거주시도총발매수
194여성기타36세종13600
195여성기타41경기177000
196여성기타42강원23700
197여성기타43충북19300
198여성기타44충남57500
199여성기타45전북27800
200여성기타46전남25200
201여성기타47경북73000
202여성기타48경남38200
203여성기타50제주4700