Overview

Dataset statistics

Number of variables6
Number of observations535
Missing cells314
Missing cells (%)9.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory27.3 KiB
Average record size in memory52.2 B

Variable types

Text1
Categorical1
Numeric4

Alerts

Y좌표값 is highly overall correlated with 노선명High correlation
노선코드 is highly overall correlated with 노선명High correlation
영업소코드 is highly overall correlated with 노선명High correlation
노선명 is highly overall correlated with Y좌표값 and 2 other fieldsHigh correlation
X좌표값 has 157 (29.3%) missing valuesMissing
Y좌표값 has 157 (29.3%) missing valuesMissing
영업소명 has unique valuesUnique
영업소코드 has unique valuesUnique

Reproduction

Analysis started2024-03-13 11:52:09.766600
Analysis finished2024-03-13 11:52:12.376243
Duration2.61 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

영업소명
Text

UNIQUE 

Distinct535
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size4.3 KiB
2024-03-13T20:52:12.698957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length2
Mean length2.7495327
Min length2

Characters and Unicode

Total characters1471
Distinct characters223
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique535 ?
Unique (%)100.0%

Sample

1st row판교
2nd row대왕판교
3rd row서울
4th row수원신갈
5th row기흥
ValueCountFrequency (%)
판교 1
 
0.2%
신림 1
 
0.2%
삼척 1
 
0.2%
속초 1
 
0.2%
북양양 1
 
0.2%
하조대 1
 
0.2%
남강릉 1
 
0.2%
동해 1
 
0.2%
망상 1
 
0.2%
옥계 1
 
0.2%
Other values (525) 525
98.1%
2024-03-13T20:52:13.236217image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
60
 
4.1%
59
 
4.0%
59
 
4.0%
54
 
3.7%
51
 
3.5%
49
 
3.3%
46
 
3.1%
37
 
2.5%
28
 
1.9%
28
 
1.9%
Other values (213) 1000
68.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1413
96.1%
Uppercase Letter 45
 
3.1%
Close Punctuation 6
 
0.4%
Open Punctuation 6
 
0.4%
Decimal Number 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
60
 
4.2%
59
 
4.2%
59
 
4.2%
54
 
3.8%
51
 
3.6%
49
 
3.5%
46
 
3.3%
37
 
2.6%
28
 
2.0%
28
 
2.0%
Other values (204) 942
66.7%
Uppercase Letter
ValueCountFrequency (%)
C 21
46.7%
J 19
42.2%
T 2
 
4.4%
K 1
 
2.2%
E 1
 
2.2%
I 1
 
2.2%
Close Punctuation
ValueCountFrequency (%)
) 6
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6
100.0%
Decimal Number
ValueCountFrequency (%)
2 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1413
96.1%
Latin 45
 
3.1%
Common 13
 
0.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
60
 
4.2%
59
 
4.2%
59
 
4.2%
54
 
3.8%
51
 
3.6%
49
 
3.5%
46
 
3.3%
37
 
2.6%
28
 
2.0%
28
 
2.0%
Other values (204) 942
66.7%
Latin
ValueCountFrequency (%)
C 21
46.7%
J 19
42.2%
T 2
 
4.4%
K 1
 
2.2%
E 1
 
2.2%
I 1
 
2.2%
Common
ValueCountFrequency (%)
) 6
46.2%
( 6
46.2%
2 1
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1413
96.1%
ASCII 58
 
3.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
60
 
4.2%
59
 
4.2%
59
 
4.2%
54
 
3.8%
51
 
3.6%
49
 
3.5%
46
 
3.3%
37
 
2.6%
28
 
2.0%
28
 
2.0%
Other values (204) 942
66.7%
ASCII
ValueCountFrequency (%)
C 21
36.2%
J 19
32.8%
) 6
 
10.3%
( 6
 
10.3%
T 2
 
3.4%
K 1
 
1.7%
E 1
 
1.7%
2 1
 
1.7%
I 1
 
1.7%

노선명
Categorical

HIGH CORRELATION 

Distinct46
Distinct (%)8.6%
Missing0
Missing (%)0.0%
Memory size4.3 KiB
경부선
44 
남해선A
 
34
호남선A
 
31
중부선-대전통영선A
 
31
서해안선
 
29
Other values (41)
366 

Length

Max length10
Median length8
Mean length5.1196262
Min length3

Unique

Unique3 ?
Unique (%)0.6%

Sample

1st row경부선
2nd row경부선
3rd row경부선
4th row경부선
5th row경부선

Common Values

ValueCountFrequency (%)
경부선 44
 
8.2%
남해선A 34
 
6.4%
호남선A 31
 
5.8%
중부선-대전통영선A 31
 
5.8%
서해안선 29
 
5.4%
수도권제2순환선 27
 
5.0%
영동선 25
 
4.7%
중부내륙선 25
 
4.7%
당진상주선 23
 
4.3%
중앙선 22
 
4.1%
Other values (36) 244
45.6%

Length

2024-03-13T20:52:13.395161image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경부선 44
 
8.2%
남해선a 34
 
6.4%
호남선a 31
 
5.8%
중부선-대전통영선a 31
 
5.8%
서해안선 29
 
5.4%
수도권제2순환선 27
 
5.0%
영동선 25
 
4.7%
중부내륙선 25
 
4.7%
당진상주선 23
 
4.3%
중앙선 22
 
4.1%
Other values (36) 244
45.6%

X좌표값
Real number (ℝ)

MISSING 

Distinct378
Distinct (%)100.0%
Missing157
Missing (%)29.3%
Infinite0
Infinite (%)0.0%
Mean127.77785
Minimum126.43506
Maximum129.43521
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.8 KiB
2024-03-13T20:52:13.558928image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum126.43506
5-th percentile126.68688
Q1127.10253
median127.63796
Q3128.47895
95-th percentile129.13705
Maximum129.43521
Range3.000148
Interquartile range (IQR)1.37642

Descriptive statistics

Standard deviation0.79847217
Coefficient of variation (CV)0.0062489093
Kurtosis-1.1130404
Mean127.77785
Median Absolute Deviation (MAD)0.659708
Skewness0.27127908
Sum48300.026
Variance0.6375578
MonotonicityNot monotonic
2024-03-13T20:52:13.747326image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
126.780851 1
 
0.2%
128.577591 1
 
0.2%
128.563472 1
 
0.2%
128.535586 1
 
0.2%
128.514744 1
 
0.2%
128.539343 1
 
0.2%
127.855581 1
 
0.2%
127.775947 1
 
0.2%
126.84252 1
 
0.2%
126.792398 1
 
0.2%
Other values (368) 368
68.8%
(Missing) 157
29.3%
ValueCountFrequency (%)
126.435065 1
0.2%
126.480819 1
0.2%
126.485422 1
0.2%
126.48607 1
0.2%
126.497717 1
0.2%
126.542929 1
0.2%
126.554466 1
0.2%
126.55674 1
0.2%
126.562018 1
0.2%
126.56535 1
0.2%
ValueCountFrequency (%)
129.435213 1
0.2%
129.394802 1
0.2%
129.364993 1
0.2%
129.314538 1
0.2%
129.298314 1
0.2%
129.288158 1
0.2%
129.275381 1
0.2%
129.262036 1
0.2%
129.249202 1
0.2%
129.245438 1
0.2%

Y좌표값
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct378
Distinct (%)100.0%
Missing157
Missing (%)29.3%
Infinite0
Infinite (%)0.0%
Mean36.254568
Minimum34.693356
Maximum38.202889
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.8 KiB
2024-03-13T20:52:13.889633image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum34.693356
5-th percentile34.996765
Q135.419931
median36.135414
Q337.102793
95-th percentile37.670387
Maximum38.202889
Range3.509533
Interquartile range (IQR)1.6828622

Descriptive statistics

Standard deviation0.90531614
Coefficient of variation (CV)0.024971092
Kurtosis-1.2179285
Mean36.254568
Median Absolute Deviation (MAD)0.8423645
Skewness0.19284873
Sum13704.227
Variance0.81959732
MonotonicityNot monotonic
2024-03-13T20:52:14.050543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
37.35841 1
 
0.2%
36.288086 1
 
0.2%
36.200288 1
 
0.2%
36.091865 1
 
0.2%
36.04843 1
 
0.2%
35.934437 1
 
0.2%
37.67533 1
 
0.2%
37.839835 1
 
0.2%
37.349762 1
 
0.2%
37.344963 1
 
0.2%
Other values (368) 368
68.8%
(Missing) 157
29.3%
ValueCountFrequency (%)
34.693356 1
0.2%
34.702834 1
0.2%
34.737619 1
0.2%
34.812134 1
0.2%
34.825889 1
0.2%
34.837264 1
0.2%
34.858761 1
0.2%
34.863289 1
0.2%
34.869414 1
0.2%
34.883591 1
0.2%
ValueCountFrequency (%)
38.202889 1
0.2%
38.155223 1
0.2%
38.071824 1
0.2%
38.029957 1
0.2%
37.990592 1
0.2%
37.921166 1
0.2%
37.917929 1
0.2%
37.839835 1
0.2%
37.835913 1
0.2%
37.785347 1
0.2%

노선코드
Real number (ℝ)

HIGH CORRELATION 

Distinct46
Distinct (%)8.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean129.18131
Minimum1
Maximum700
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.8 KiB
2024-03-13T20:52:14.204686image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q117
median45
Q3110
95-th percentile627
Maximum700
Range699
Interquartile range (IQR)93

Descriptive statistics

Standard deviation192.32185
Coefficient of variation (CV)1.4887746
Kurtosis2.1784105
Mean129.18131
Median Absolute Deviation (MAD)30
Skewness1.8590017
Sum69112
Variance36987.696
MonotonicityIncreasing
2024-03-13T20:52:14.368311image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
1 44
 
8.2%
10 34
 
6.4%
25 31
 
5.8%
35 31
 
5.8%
15 29
 
5.4%
400 27
 
5.0%
50 25
 
4.7%
45 25
 
4.7%
30 23
 
4.3%
55 22
 
4.1%
Other values (36) 244
45.6%
ValueCountFrequency (%)
1 44
8.2%
10 34
6.4%
12 12
 
2.2%
14 3
 
0.6%
15 29
5.4%
16 1
 
0.2%
17 15
 
2.8%
20 8
 
1.5%
25 31
5.8%
29 12
 
2.2%
ValueCountFrequency (%)
700 10
 
1.9%
688 11
2.1%
627 9
 
1.7%
600 14
2.6%
551 2
 
0.4%
500 3
 
0.6%
451 5
 
0.9%
400 27
5.0%
301 12
2.2%
300 2
 
0.4%

영업소코드
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct535
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean448.43925
Minimum4
Maximum987
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.8 KiB
2024-03-13T20:52:14.510256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum4
5-th percentile63.7
Q1193.5
median527
Q3678.5
95-th percentile832.3
Maximum987
Range983
Interquartile range (IQR)485

Descriptive statistics

Standard deviation266.62386
Coefficient of variation (CV)0.5945596
Kurtosis-1.4183981
Mean448.43925
Median Absolute Deviation (MAD)251
Skewness-0.0052675723
Sum239915
Variance71088.284
MonotonicityNot monotonic
2024-03-13T20:52:14.670405image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
65 1
 
0.2%
765 1
 
0.2%
703 1
 
0.2%
702 1
 
0.2%
701 1
 
0.2%
585 1
 
0.2%
584 1
 
0.2%
583 1
 
0.2%
582 1
 
0.2%
581 1
 
0.2%
Other values (525) 525
98.1%
ValueCountFrequency (%)
4 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
25 1
0.2%
26 1
0.2%
29 1
0.2%
31 1
0.2%
33 1
0.2%
34 1
0.2%
ValueCountFrequency (%)
987 1
0.2%
986 1
0.2%
985 1
0.2%
984 1
0.2%
983 1
0.2%
982 1
0.2%
981 1
0.2%
879 1
0.2%
876 1
0.2%
870 1
0.2%

Interactions

2024-03-13T20:52:11.761259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:52:10.122322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:52:10.932262image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:52:11.407029image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:52:11.838375image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:52:10.262012image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:52:11.063297image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:52:11.503603image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:52:11.924608image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:52:10.392036image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:52:11.171082image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:52:11.596790image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:52:12.004385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:52:10.791699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:52:11.302089image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T20:52:11.680643image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-13T20:52:14.796526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
노선명X좌표값Y좌표값노선코드영업소코드
노선명1.0000.8140.8651.0000.898
X좌표값0.8141.0000.5850.4590.440
Y좌표값0.8650.5851.0000.5350.506
노선코드1.0000.4590.5351.0000.624
영업소코드0.8980.4400.5060.6241.000
2024-03-13T20:52:14.935065image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
X좌표값Y좌표값노선코드영업소코드노선명
X좌표값1.000-0.0890.1450.0710.428
Y좌표값-0.0891.0000.225-0.0690.503
노선코드0.1450.2251.0000.2590.965
영업소코드0.071-0.0690.2591.0000.573
노선명0.4280.5030.9650.5731.000

Missing values

2024-03-13T20:52:12.136298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-13T20:52:12.247310image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-13T20:52:12.332314image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

영업소명노선명X좌표값Y좌표값노선코드영업소코드
0판교경부선127.103237.396399165
1대왕판교경부선127.09345637.408887169
2서울경부선127.10207737.3650461101
3수원신갈경부선127.10239537.2668351103
4기흥경부선127.10244837.2222671105
5오산경부선127.0801337.1434451106
6안성경부선127.1504136.9955861107
7천안경부선127.16631436.8261431108
8목천경부선127.23060836.7680491110
9청주경부선127.3797936.6251531111
영업소명노선명X좌표값Y좌표값노선코드영업소코드
525다사대구외곽순환선128.46934235.84940770057
526지천대구외곽순환선128.53481135.93763970058
527북달성대구외곽순환선128.45865335.871124700772
528북다사대구외곽순환선128.46822435.885329700773
529남칠곡대구외곽순환선128.51565635.916919700774
530동명동호대구외곽순환선128.551735.970377700775
531연경대구외곽순환선128.61132835.936574700776
532파군재대구외곽순환선128.63767235.934839700777
533둔산대구외곽순환선128.68943835.900013700778
534율암대구외곽순환선128.70178935.888182700779