Overview

Dataset statistics

Number of variables8
Number of observations28
Missing cells27
Missing cells (%)12.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.0 KiB
Average record size in memory74.7 B

Variable types

Text1
Categorical2
Numeric5

Dataset

Description파일 다운로드
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-13208/F/1/datasetView.do

Alerts

본사 is highly overall correlated with 수량 and 5 other fieldsHigh correlation
단위 is highly overall correlated with 수량 and 5 other fieldsHigh correlation
수량 is highly overall correlated with 1호선 and 5 other fieldsHigh correlation
1호선 is highly overall correlated with 수량 and 5 other fieldsHigh correlation
2호선 is highly overall correlated with 수량 and 5 other fieldsHigh correlation
3호선 is highly overall correlated with 수량 and 5 other fieldsHigh correlation
4호선 is highly overall correlated with 수량 and 5 other fieldsHigh correlation
단위 is highly imbalanced (77.8%)Imbalance
1호선 has 8 (28.6%) missing valuesMissing
2호선 has 6 (21.4%) missing valuesMissing
3호선 has 7 (25.0%) missing valuesMissing
4호선 has 6 (21.4%) missing valuesMissing

Reproduction

Analysis started2024-04-29 16:43:18.903677
Analysis finished2024-04-29 16:43:24.249918
Duration5.35 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct27
Distinct (%)96.4%
Missing0
Missing (%)0.0%
Memory size356.0 B
2024-04-30T01:43:24.372707image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length9.5
Mean length6.7142857
Min length1

Characters and Unicode

Total characters188
Distinct characters73
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique26 ?
Unique (%)92.9%

Sample

1st row총계
2nd row개집표기 턴스타일게이트
3rd row개집표기 슬림게이트
4th row개집표기 장애인게이트
5th row개집표기 스피드게이트
ValueCountFrequency (%)
개집표기 5
 
15.2%
유인충전기 2
 
6.1%
유지보수관리전산기 1
 
3.0%
cctv모니터 1
 
3.0%
cctv카메라 1
 
3.0%
발권기 1
 
3.0%
판매기 1
 
3.0%
무인정산기 1
 
3.0%
환급기 1
 
3.0%
발매기 1
 
3.0%
Other values (18) 18
54.5%
2024-04-30T01:43:24.727662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
17
 
9.0%
8
 
4.3%
8
 
4.3%
7
 
3.7%
6
 
3.2%
C 5
 
2.7%
5
 
2.7%
5
 
2.7%
5
 
2.7%
5
 
2.7%
Other values (63) 117
62.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 170
90.4%
Uppercase Letter 13
 
6.9%
Space Separator 5
 
2.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
17
 
10.0%
8
 
4.7%
8
 
4.7%
7
 
4.1%
6
 
3.5%
5
 
2.9%
5
 
2.9%
5
 
2.9%
5
 
2.9%
4
 
2.4%
Other values (55) 100
58.8%
Uppercase Letter
ValueCountFrequency (%)
C 5
38.5%
V 2
 
15.4%
T 2
 
15.4%
N 1
 
7.7%
S 1
 
7.7%
M 1
 
7.7%
P 1
 
7.7%
Space Separator
ValueCountFrequency (%)
5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 170
90.4%
Latin 13
 
6.9%
Common 5
 
2.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
17
 
10.0%
8
 
4.7%
8
 
4.7%
7
 
4.1%
6
 
3.5%
5
 
2.9%
5
 
2.9%
5
 
2.9%
5
 
2.9%
4
 
2.4%
Other values (55) 100
58.8%
Latin
ValueCountFrequency (%)
C 5
38.5%
V 2
 
15.4%
T 2
 
15.4%
N 1
 
7.7%
S 1
 
7.7%
M 1
 
7.7%
P 1
 
7.7%
Common
ValueCountFrequency (%)
5
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 170
90.4%
ASCII 18
 
9.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
17
 
10.0%
8
 
4.7%
8
 
4.7%
7
 
4.1%
6
 
3.5%
5
 
2.9%
5
 
2.9%
5
 
2.9%
5
 
2.9%
4
 
2.4%
Other values (55) 100
58.8%
ASCII
ValueCountFrequency (%)
C 5
27.8%
5
27.8%
V 2
 
11.1%
T 2
 
11.1%
N 1
 
5.6%
S 1
 
5.6%
M 1
 
5.6%
P 1
 
5.6%

단위
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)7.1%
Missing0
Missing (%)0.0%
Memory size356.0 B
27 
<NA>
 
1

Length

Max length4
Median length1
Mean length1.1071429
Min length1

Unique

Unique1 ?
Unique (%)3.6%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
27
96.4%
<NA> 1
 
3.6%

Length

2024-04-30T01:43:24.889562image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-30T01:43:24.978904image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
27
96.4%
na 1
 
3.6%

수량
Real number (ℝ)

HIGH CORRELATION 

Distinct22
Distinct (%)78.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean600.60714
Minimum1
Maximum6524
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size384.0 B
2024-04-30T01:43:25.064710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q19.25
median159
Q3334.25
95-th percentile3219.35
Maximum6524
Range6523
Interquartile range (IQR)325

Descriptive statistics

Standard deviation1406.8203
Coefficient of variation (CV)2.3423303
Kurtosis12.259621
Mean600.60714
Median Absolute Deviation (MAD)156
Skewness3.4157355
Sum16817
Variance1979143.4
MonotonicityNot monotonic
2024-04-30T01:43:25.183716image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
1 5
17.9%
119 2
 
7.1%
270 2
 
7.1%
39 1
 
3.6%
227 1
 
3.6%
332 1
 
3.6%
17 1
 
3.6%
199 1
 
3.6%
335 1
 
3.6%
432 1
 
3.6%
Other values (12) 12
42.9%
ValueCountFrequency (%)
1 5
17.9%
5 1
 
3.6%
7 1
 
3.6%
10 1
 
3.6%
11 1
 
3.6%
15 1
 
3.6%
17 1
 
3.6%
39 1
 
3.6%
119 2
 
7.1%
199 1
 
3.6%
ValueCountFrequency (%)
6524 1
3.6%
3499 1
3.6%
2700 1
3.6%
601 1
3.6%
455 1
3.6%
432 1
3.6%
335 1
3.6%
334 1
3.6%
332 1
3.6%
292 1
3.6%

본사
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)14.3%
Missing0
Missing (%)0.0%
Memory size356.0 B
<NA>
16 
1
10 
14
 
1
5
 
1

Length

Max length4
Median length4
Mean length2.75
Min length1

Unique

Unique2 ?
Unique (%)7.1%

Sample

1st row14
2nd row1
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 16
57.1%
1 10
35.7%
14 1
 
3.6%
5 1
 
3.6%

Length

2024-04-30T01:43:25.313924image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-30T01:43:25.421446image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 16
57.1%
1 10
35.7%
14 1
 
3.6%
5 1
 
3.6%

1호선
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct18
Distinct (%)90.0%
Missing8
Missing (%)28.6%
Infinite0
Infinite (%)0.0%
Mean93.65
Minimum1
Maximum724
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size384.0 B
2024-04-30T01:43:25.550961image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1.95
Q110
median28
Q357
95-th percentile418.1
Maximum724
Range723
Interquartile range (IQR)47

Descriptive statistics

Standard deviation179.57179
Coefficient of variation (CV)1.9174778
Kurtosis8.1855439
Mean93.65
Median Absolute Deviation (MAD)20.5
Skewness2.8287786
Sum1873
Variance32246.029
MonotonicityNot monotonic
2024-04-30T01:43:25.692872image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
10 2
 
7.1%
23 2
 
7.1%
51 1
 
3.6%
1 1
 
3.6%
25 1
 
3.6%
36 1
 
3.6%
3 1
 
3.6%
18 1
 
3.6%
37 1
 
3.6%
724 1
 
3.6%
Other values (8) 8
28.6%
(Missing) 8
28.6%
ValueCountFrequency (%)
1 1
3.6%
2 1
3.6%
3 1
3.6%
4 1
3.6%
10 2
7.1%
18 1
3.6%
23 2
7.1%
25 1
3.6%
31 1
3.6%
34 1
3.6%
ValueCountFrequency (%)
724 1
3.6%
402 1
3.6%
284 1
3.6%
80 1
3.6%
75 1
3.6%
51 1
3.6%
37 1
3.6%
36 1
3.6%
34 1
3.6%
31 1
3.6%

2호선
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct19
Distinct (%)86.4%
Missing6
Missing (%)21.4%
Infinite0
Infinite (%)0.0%
Mean351.54545
Minimum3
Maximum2983
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size384.0 B
2024-04-30T01:43:25.797829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile4.05
Q129.75
median118
Q3180
95-th percentile1630.5
Maximum2983
Range2980
Interquartile range (IQR)150.25

Descriptive statistics

Standard deviation716.61345
Coefficient of variation (CV)2.038466
Kurtosis8.9241826
Mean351.54545
Median Absolute Deviation (MAD)83
Skewness2.9551662
Sum7734
Variance513534.83
MonotonicityNot monotonic
2024-04-30T01:43:25.897721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
7 2
 
7.1%
50 2
 
7.1%
118 2
 
7.1%
153 1
 
3.6%
1650 1
 
3.6%
3 1
 
3.6%
4 1
 
3.6%
5 1
 
3.6%
233 1
 
3.6%
1260 1
 
3.6%
Other values (9) 9
32.1%
(Missing) 6
21.4%
ValueCountFrequency (%)
3 1
3.6%
4 1
3.6%
5 1
3.6%
7 2
7.1%
23 1
3.6%
50 2
7.1%
94 1
3.6%
103 1
3.6%
118 2
7.1%
122 1
3.6%
ValueCountFrequency (%)
2983 1
3.6%
1650 1
3.6%
1260 1
3.6%
261 1
3.6%
233 1
3.6%
189 1
3.6%
153 1
3.6%
151 1
3.6%
150 1
3.6%
122 1
3.6%

3호선
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct15
Distinct (%)71.4%
Missing7
Missing (%)25.0%
Infinite0
Infinite (%)0.0%
Mean170.7619
Minimum2
Maximum1408
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size384.0 B
2024-04-30T01:43:26.008412image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile2
Q133
median71
Q373
95-th percentile699
Maximum1408
Range1406
Interquartile range (IQR)40

Descriptive statistics

Standard deviation335.12728
Coefficient of variation (CV)1.9625412
Kurtosis9.6122936
Mean170.7619
Median Absolute Deviation (MAD)38
Skewness3.0269447
Sum3586
Variance112310.29
MonotonicityNot monotonic
2024-04-30T01:43:26.105508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
71 3
10.7%
73 2
 
7.1%
2 2
 
7.1%
3 2
 
7.1%
33 2
 
7.1%
1408 1
 
3.6%
562 1
 
3.6%
64 1
 
3.6%
699 1
 
3.6%
72 1
 
3.6%
Other values (5) 5
17.9%
(Missing) 7
25.0%
ValueCountFrequency (%)
2 2
7.1%
3 2
7.1%
7 1
 
3.6%
33 2
7.1%
43 1
 
3.6%
58 1
 
3.6%
64 1
 
3.6%
71 3
10.7%
72 1
 
3.6%
73 2
7.1%
ValueCountFrequency (%)
1408 1
 
3.6%
699 1
 
3.6%
562 1
 
3.6%
139 1
 
3.6%
99 1
 
3.6%
73 2
7.1%
72 1
 
3.6%
71 3
10.7%
64 1
 
3.6%
58 1
 
3.6%

4호선
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct18
Distinct (%)81.8%
Missing6
Missing (%)21.4%
Infinite0
Infinite (%)0.0%
Mean163.81818
Minimum1
Maximum1395
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size384.0 B
2024-04-30T01:43:26.208252image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2.05
Q111.75
median62.5
Q377.25
95-th percentile739.3
Maximum1395
Range1394
Interquartile range (IQR)65.5

Descriptive statistics

Standard deviation333.08567
Coefficient of variation (CV)2.0332643
Kurtosis9.154695
Mean163.81818
Median Absolute Deviation (MAD)36.5
Skewness2.9820463
Sum3604
Variance110946.06
MonotonicityNot monotonic
2024-04-30T01:43:26.320047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
74 2
 
7.1%
3 2
 
7.1%
26 2
 
7.1%
67 2
 
7.1%
93 1
 
3.6%
7 1
 
3.6%
41 1
 
3.6%
75 1
 
3.6%
4 1
 
3.6%
44 1
 
3.6%
Other values (8) 8
28.6%
(Missing) 6
21.4%
ValueCountFrequency (%)
1 1
3.6%
2 1
3.6%
3 2
7.1%
4 1
3.6%
7 1
3.6%
26 2
7.1%
41 1
3.6%
44 1
3.6%
58 1
3.6%
67 2
7.1%
ValueCountFrequency (%)
1395 1
3.6%
747 1
3.6%
593 1
3.6%
126 1
3.6%
93 1
3.6%
78 1
3.6%
75 1
3.6%
74 2
7.1%
67 2
7.1%
58 1
3.6%

Interactions

2024-04-30T01:43:23.433615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:21.115472image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:22.135571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:22.589763image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:23.022813image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:23.511278image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:21.292643image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:22.215198image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:22.693738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:23.106662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:23.625368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:21.420874image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:22.311344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:22.778291image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:23.192651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:23.734574image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:21.900154image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:22.402489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:22.853902image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:23.268813image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:23.822515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:22.016045image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:22.502146image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:22.932566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:43:23.348418image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-30T01:43:26.402792image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
설비명수량본사1호선2호선3호선4호선
설비명1.0001.0001.0001.0001.0001.0001.000
수량1.0001.0000.5791.0001.0001.0001.000
본사1.0000.5791.0001.0001.0001.0001.000
1호선1.0001.0001.0001.0001.0001.0001.000
2호선1.0001.0001.0001.0001.0001.0001.000
3호선1.0001.0001.0001.0001.0001.0001.000
4호선1.0001.0001.0001.0001.0001.0001.000
2024-04-30T01:43:26.492652image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
본사단위
본사1.0001.000
단위1.0001.000
2024-04-30T01:43:26.576273image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수량1호선2호선3호선4호선단위본사
수량1.0000.9790.9990.9590.9931.0000.540
1호선0.9791.0000.9760.9090.9681.0001.000
2호선0.9990.9761.0000.9590.9921.0000.707
3호선0.9590.9090.9591.0000.9561.0000.707
4호선0.9930.9680.9920.9561.0001.0000.707
단위1.0001.0001.0001.0001.0001.0001.000
본사0.5401.0000.7070.7070.7071.0001.000

Missing values

2024-04-30T01:43:23.937837image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-30T01:43:24.055263image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-30T01:43:24.171795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

설비명단위수량본사1호선2호선3호선4호선
0총계652414724298314081395
1개집표기 턴스타일게이트270012841260562593
2개집표기 슬림게이트455<NA>802336478
3개집표기 장애인게이트10<NA>44<NA>2
4개집표기 스피드게이트334<NA>341537374
5<NA>349914021650699747
6센터시스템11<NA><NA><NA><NA>
7NMS서브메니저11<NA><NA><NA><NA>
8PC보안원격제어서버11<NA><NA><NA><NA>
9원격정비시스템71<NA>321
설비명단위수량본사1호선2호선3호선4호선
18유인충전기270<NA>231187167
19휴대용정산기292<NA>311227267
20발매기601<NA>75261139126
21환급기432<NA>511899993
22무인정산기335<NA>371517374
23판매기199<NA>18944344
24발권기17<NA>3734
25CCTV카메라332<NA>361507175
26CCTV모니터227<NA>251035841
27무정전전원장치39112377