Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells123
Missing cells (%)0.2%
Duplicate rows390
Duplicate rows (%)3.9%
Total size in memory664.1 KiB
Average record size in memory68.0 B

Variable types

Categorical2
Unsupported1
Text1
Numeric3

Dataset

Description광주광역시 내 시내버스 승하차 인원정보에 대한 데이터로 일자별, 노선명, 정류장명, 시간별, 승하차별 거래건수를 제공합니다.
Author광주광역시
URLhttps://www.data.go.kr/data/15088456/fileData.do

Alerts

일자 has constant value ""Constant
Dataset has 390 (3.9%) duplicate rowsDuplicates
노선명 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-13 12:35:17.963765
Analysis finished2024-04-13 12:35:24.170909
Duration6.21 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

일자
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
20240301
10000 

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20240301
2nd row20240301
3rd row20240301
4th row20240301
5th row20240301

Common Values

ValueCountFrequency (%)
20240301 10000
100.0%

Length

2024-04-13T21:35:24.378927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-13T21:35:24.679179image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20240301 10000
100.0%

노선명
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size156.2 KiB
Distinct922
Distinct (%)9.3%
Missing61
Missing (%)0.6%
Memory size156.2 KiB
2024-04-13T21:35:25.439810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length15
Mean length5.8954623
Min length2

Characters and Unicode

Total characters58595
Distinct characters373
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique164 ?
Unique (%)1.7%

Sample

1st row진월대주아파트
2nd row대성초교
3rd row운암시장
4th row금호초교
5th row남광주역
ValueCountFrequency (%)
광주종합버스터미널 152
 
1.5%
경신여고 108
 
1.1%
국립아시아문화전당(구.도청 88
 
0.9%
광천치안센터 84
 
0.8%
남광주역 75
 
0.7%
도로교통공단 71
 
0.7%
대신파크 71
 
0.7%
살레시오여고 70
 
0.7%
운암3단지 68
 
0.7%
진월대주아파트 65
 
0.6%
Other values (918) 9195
91.5%
2024-04-13T21:35:26.717509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1625
 
2.8%
1567
 
2.7%
1504
 
2.6%
1480
 
2.5%
1402
 
2.4%
1322
 
2.3%
1201
 
2.0%
1159
 
2.0%
1080
 
1.8%
1005
 
1.7%
Other values (363) 45250
77.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 54874
93.6%
Decimal Number 1137
 
1.9%
Close Punctuation 913
 
1.6%
Open Punctuation 913
 
1.6%
Other Punctuation 342
 
0.6%
Uppercase Letter 278
 
0.5%
Space Separator 108
 
0.2%
Dash Punctuation 30
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1625
 
3.0%
1567
 
2.9%
1504
 
2.7%
1480
 
2.7%
1402
 
2.6%
1322
 
2.4%
1201
 
2.2%
1159
 
2.1%
1080
 
2.0%
1005
 
1.8%
Other values (335) 41529
75.7%
Uppercase Letter
ValueCountFrequency (%)
S 83
29.9%
C 72
25.9%
I 29
 
10.4%
B 29
 
10.4%
K 24
 
8.6%
G 18
 
6.5%
T 13
 
4.7%
N 2
 
0.7%
D 2
 
0.7%
L 2
 
0.7%
Other values (3) 4
 
1.4%
Decimal Number
ValueCountFrequency (%)
1 296
26.0%
2 284
25.0%
3 182
16.0%
4 122
10.7%
5 101
 
8.9%
9 66
 
5.8%
6 43
 
3.8%
8 43
 
3.8%
Other Punctuation
ValueCountFrequency (%)
/ 193
56.4%
. 137
40.1%
& 12
 
3.5%
Close Punctuation
ValueCountFrequency (%)
) 913
100.0%
Open Punctuation
ValueCountFrequency (%)
( 913
100.0%
Space Separator
ValueCountFrequency (%)
108
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 30
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 54874
93.6%
Common 3443
 
5.9%
Latin 278
 
0.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1625
 
3.0%
1567
 
2.9%
1504
 
2.7%
1480
 
2.7%
1402
 
2.6%
1322
 
2.4%
1201
 
2.2%
1159
 
2.1%
1080
 
2.0%
1005
 
1.8%
Other values (335) 41529
75.7%
Common
ValueCountFrequency (%)
) 913
26.5%
( 913
26.5%
1 296
 
8.6%
2 284
 
8.2%
/ 193
 
5.6%
3 182
 
5.3%
. 137
 
4.0%
4 122
 
3.5%
108
 
3.1%
5 101
 
2.9%
Other values (5) 194
 
5.6%
Latin
ValueCountFrequency (%)
S 83
29.9%
C 72
25.9%
I 29
 
10.4%
B 29
 
10.4%
K 24
 
8.6%
G 18
 
6.5%
T 13
 
4.7%
N 2
 
0.7%
D 2
 
0.7%
L 2
 
0.7%
Other values (3) 4
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 54874
93.6%
ASCII 3721
 
6.4%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1625
 
3.0%
1567
 
2.9%
1504
 
2.7%
1480
 
2.7%
1402
 
2.6%
1322
 
2.4%
1201
 
2.2%
1159
 
2.1%
1080
 
2.0%
1005
 
1.8%
Other values (335) 41529
75.7%
ASCII
ValueCountFrequency (%)
) 913
24.5%
( 913
24.5%
1 296
 
8.0%
2 284
 
7.6%
/ 193
 
5.2%
3 182
 
4.9%
. 137
 
3.7%
4 122
 
3.3%
108
 
2.9%
5 101
 
2.7%
Other values (18) 472
12.7%

ARS_ID
Real number (ℝ)

Distinct1504
Distinct (%)15.1%
Missing62
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean3480.9936
Minimum1002
Maximum6634
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-13T21:35:27.116319image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1002
5-th percentile1071.55
Q12164
median4077
Q34533.75
95-th percentile5580.45
Maximum6634
Range5632
Interquartile range (IQR)2369.75

Descriptive statistics

Standard deviation1508.5589
Coefficient of variation (CV)0.43337021
Kurtosis-1.2335517
Mean3480.9936
Median Absolute Deviation (MAD)1185
Skewness-0.1856791
Sum34594114
Variance2275749.9
MonotonicityNot monotonic
2024-04-13T21:35:27.523804image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2002 100
 
1.0%
4435 55
 
0.5%
4434 53
 
0.5%
2001 52
 
0.5%
1122 51
 
0.5%
1130 48
 
0.5%
2003 48
 
0.5%
1141 42
 
0.4%
1017 38
 
0.4%
3236 38
 
0.4%
Other values (1494) 9413
94.1%
(Missing) 62
 
0.6%
ValueCountFrequency (%)
1002 6
 
0.1%
1003 16
0.2%
1004 17
0.2%
1005 9
0.1%
1006 12
0.1%
1007 21
0.2%
1008 16
0.2%
1009 10
0.1%
1010 15
0.1%
1011 1
 
< 0.1%
ValueCountFrequency (%)
6634 1
 
< 0.1%
6633 1
 
< 0.1%
6626 1
 
< 0.1%
6625 4
< 0.1%
6622 2
< 0.1%
6621 2
< 0.1%
6619 3
< 0.1%
6612 1
 
< 0.1%
6611 1
 
< 0.1%
6468 1
 
< 0.1%

시간
Real number (ℝ)

Distinct19
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.1921
Minimum5
Maximum23
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-13T21:35:27.886653image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile7
Q111
median14
Q318
95-th percentile21
Maximum23
Range18
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.5286034
Coefficient of variation (CV)0.31909325
Kurtosis-1.0201796
Mean14.1921
Median Absolute Deviation (MAD)4
Skewness-0.040097755
Sum141921
Variance20.508248
MonotonicityNot monotonic
2024-04-13T21:35:28.275865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
16 735
 
7.3%
17 720
 
7.2%
15 704
 
7.0%
13 691
 
6.9%
18 671
 
6.7%
14 661
 
6.6%
12 648
 
6.5%
11 625
 
6.2%
9 592
 
5.9%
10 578
 
5.8%
Other values (9) 3375
33.8%
ValueCountFrequency (%)
5 46
 
0.5%
6 309
3.1%
7 415
4.2%
8 559
5.6%
9 592
5.9%
10 578
5.8%
11 625
6.2%
12 648
6.5%
13 691
6.9%
14 661
6.6%
ValueCountFrequency (%)
23 54
 
0.5%
22 395
4.0%
21 507
5.1%
20 537
5.4%
19 553
5.5%
18 671
6.7%
17 720
7.2%
16 735
7.3%
15 704
7.0%
14 661
6.6%

승하차
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
승차
4954 
하차
3540 
환승
1506 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row하차
2nd row승차
3rd row승차
4th row승차
5th row하차

Common Values

ValueCountFrequency (%)
승차 4954
49.5%
하차 3540
35.4%
환승 1506
 
15.1%

Length

2024-04-13T21:35:28.678309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-13T21:35:28.980670image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
승차 4954
49.5%
하차 3540
35.4%
환승 1506
 
15.1%

거래건수
Real number (ℝ)

Distinct45
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.5537
Minimum1
Maximum112
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-13T21:35:29.328696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q33
95-th percentile8
Maximum112
Range111
Interquartile range (IQR)2

Descriptive statistics

Standard deviation3.3723662
Coefficient of variation (CV)1.3205804
Kurtosis156.32076
Mean2.5537
Median Absolute Deviation (MAD)0
Skewness8.3161765
Sum25537
Variance11.372854
MonotonicityNot monotonic
2024-04-13T21:35:29.739616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
1 5092
50.9%
2 2034
 
20.3%
3 1040
 
10.4%
4 590
 
5.9%
5 349
 
3.5%
6 218
 
2.2%
7 167
 
1.7%
8 115
 
1.1%
9 85
 
0.9%
10 62
 
0.6%
Other values (35) 248
 
2.5%
ValueCountFrequency (%)
1 5092
50.9%
2 2034
 
20.3%
3 1040
 
10.4%
4 590
 
5.9%
5 349
 
3.5%
6 218
 
2.2%
7 167
 
1.7%
8 115
 
1.1%
9 85
 
0.9%
10 62
 
0.6%
ValueCountFrequency (%)
112 1
< 0.1%
62 1
< 0.1%
55 1
< 0.1%
48 1
< 0.1%
47 1
< 0.1%
46 1
< 0.1%
45 1
< 0.1%
43 1
< 0.1%
42 1
< 0.1%
40 1
< 0.1%

Interactions

2024-04-13T21:35:22.464850image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-13T21:35:20.872867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-13T21:35:21.666043image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-13T21:35:22.718887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-13T21:35:21.135265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-13T21:35:21.928603image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-13T21:35:22.987079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-13T21:35:21.410118image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-13T21:35:22.202502image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-13T21:35:29.992368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ARS_ID시간승하차거래건수
ARS_ID1.0000.0330.1090.061
시간0.0331.0000.0650.051
승하차0.1090.0651.0000.082
거래건수0.0610.0510.0821.000
2024-04-13T21:35:30.250962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ARS_ID시간거래건수승하차
ARS_ID1.0000.007-0.0310.064
시간0.0071.0000.0230.038
거래건수-0.0310.0231.0000.054
승하차0.0640.0380.0541.000

Missing values

2024-04-13T21:35:23.332136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-13T21:35:23.721653image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-13T21:35:24.030838image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

일자노선명정류장명ARS_ID시간승하차거래건수
8965520240301진월07진월대주아파트324214하차10
5822520240301수완12대성초교313818승차2
3203520240301봉선27운암시장444921승차7
6213020240301순환01(운천저수지)금호초교207613승차1
3322520240301봉선37남광주역114222하차2
5317220240301송정98광주여대527420하차1
2593420240301문흥18신가동530315승차4
7910420240301좌석02도산역50089승차3
3687820240301상무64비엔날레전시관459719승차1
2146420240301매월16현대자동차200817환승6
일자노선명정류장명ARS_ID시간승하차거래건수
799520240301금호36각화무등파크417318하차1
6706020240301운림50방림삼거리308420하차1
782520240301금남59화정남초교231521승차1
3669320240301상무64광주종합버스터미널200216환승2
1625820240301마을760돌고개역(동)216416승차1
8562420240301지원45풍암저수지226421승차1
409820240301금남55북구청455316승차3
5513420240301송정98월곡일신아파트531415환승1
7755920240301일곡38요한병원452620승차1
8991120240301진월07<NA><NA>8환승1

Duplicate rows

Most frequently occurring

일자정류장명ARS_ID시간승하차거래건수# duplicates
17120240301문화전당역113017환승14
320240301CBS방송국201318승차23
1320240301경신여고443416하차23
5920240301광주종합버스터미널20027하차13
6120240301광주종합버스터미널20029환승23
6220240301광주종합버스터미널200212승차83
6620240301광주종합버스터미널200216환승23
7020240301광주지방기상청444722하차13
8120240301국립아시아문화전당(구.도청)112317하차13
9620240301남광주사거리113917환승33