Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells1652
Missing cells (%)3.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Numeric2
Text2
Categorical1

Alerts

업체ID is highly overall correlated with 제조사High correlation
제조사 is highly overall correlated with 업체IDHigh correlation
제조사 is highly imbalanced (54.6%)Imbalance
차량명 has 1651 (16.5%) missing valuesMissing
업체ID is highly skewed (γ1 = 55.47928002)Skewed
차량ID has unique valuesUnique

Reproduction

Analysis started2023-12-10 22:02:49.368460
Analysis finished2023-12-10 22:02:50.790694
Duration1.42 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

차량ID
Real number (ℝ)

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.2294873 × 108
Minimum2 × 108
Maximum8 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T07:02:50.871290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2 × 108
5-th percentile2.000011 × 108
Q12.0890054 × 108
median2.2400013 × 108
Q32.3400022 × 108
95-th percentile2.4901148 × 108
Maximum8 × 108
Range6 × 108
Interquartile range (IQR)25099680

Descriptive statistics

Standard deviation16107578
Coefficient of variation (CV)0.072247904
Kurtosis163.43766
Mean2.2294873 × 108
Median Absolute Deviation (MAD)10002346
Skewness4.6660438
Sum2.2294873 × 1012
Variance2.5945407 × 1014
MonotonicityNot monotonic
2023-12-11T07:02:51.032293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
213000210 1
 
< 0.1%
228300614 1
 
< 0.1%
210301382 1
 
< 0.1%
207000117 1
 
< 0.1%
220010058 1
 
< 0.1%
208900562 1
 
< 0.1%
236000067 1
 
< 0.1%
233000958 1
 
< 0.1%
204001356 1
 
< 0.1%
229900780 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
200000002 1
< 0.1%
200000004 1
< 0.1%
200000005 1
< 0.1%
200000006 1
< 0.1%
200000009 1
< 0.1%
200000010 1
< 0.1%
200000011 1
< 0.1%
200000012 1
< 0.1%
200000013 1
< 0.1%
200000017 1
< 0.1%
ValueCountFrequency (%)
800000002 1
< 0.1%
289901825 1
< 0.1%
289901817 1
< 0.1%
289901813 1
< 0.1%
289901245 1
< 0.1%
249912246 1
< 0.1%
249912244 1
< 0.1%
249911587 1
< 0.1%
249911531 1
< 0.1%
249911523 1
< 0.1%
Distinct9693
Distinct (%)96.9%
Missing1
Missing (%)< 0.1%
Memory size156.2 KiB
2023-12-11T07:02:51.328418image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length8.9994999
Min length5

Characters and Unicode

Total characters89986
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9416 ?
Unique (%)94.2%

Sample

1st row경기72사1198
2nd row경기74아1353
3rd row경기72아3016
4th row경기77바5913
5th row경기78아1048
ValueCountFrequency (%)
서울71바3246 5
 
< 0.1%
서울71바3248 4
 
< 0.1%
경기78아7095 4
 
< 0.1%
경기70사1505 4
 
< 0.1%
경기74자7401 3
 
< 0.1%
경기76아1002 3
 
< 0.1%
경기78사7744 3
 
< 0.1%
경기78아5916 3
 
< 0.1%
경기72아2048 3
 
< 0.1%
서울71바3285 3
 
< 0.1%
Other values (9684) 9965
99.7%
2023-12-11T07:02:51.750275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7 14910
16.6%
9797
10.9%
9797
10.9%
1 8587
9.5%
0 5836
 
6.5%
3 5194
 
5.8%
6 4974
 
5.5%
4858
 
5.4%
2 4691
 
5.2%
8 4656
 
5.2%
Other values (26) 16686
18.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 59987
66.7%
Other Letter 29997
33.3%
Space Separator 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9797
32.7%
9797
32.7%
4858
16.2%
3499
 
11.7%
827
 
2.8%
812
 
2.7%
164
 
0.5%
163
 
0.5%
15
 
0.1%
15
 
0.1%
Other values (15) 50
 
0.2%
Decimal Number
ValueCountFrequency (%)
7 14910
24.9%
1 8587
14.3%
0 5836
 
9.7%
3 5194
 
8.7%
6 4974
 
8.3%
2 4691
 
7.8%
8 4656
 
7.8%
5 4099
 
6.8%
4 3924
 
6.5%
9 3116
 
5.2%
Space Separator
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 59989
66.7%
Hangul 29997
33.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9797
32.7%
9797
32.7%
4858
16.2%
3499
 
11.7%
827
 
2.8%
812
 
2.7%
164
 
0.5%
163
 
0.5%
15
 
0.1%
15
 
0.1%
Other values (15) 50
 
0.2%
Common
ValueCountFrequency (%)
7 14910
24.9%
1 8587
14.3%
0 5836
 
9.7%
3 5194
 
8.7%
6 4974
 
8.3%
2 4691
 
7.8%
8 4656
 
7.8%
5 4099
 
6.8%
4 3924
 
6.5%
9 3116
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 59989
66.7%
Hangul 29997
33.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7 14910
24.9%
1 8587
14.3%
0 5836
 
9.7%
3 5194
 
8.7%
6 4974
 
8.3%
2 4691
 
7.8%
8 4656
 
7.8%
5 4099
 
6.8%
4 3924
 
6.5%
9 3116
 
5.2%
Hangul
ValueCountFrequency (%)
9797
32.7%
9797
32.7%
4858
16.2%
3499
 
11.7%
827
 
2.8%
812
 
2.7%
164
 
0.5%
163
 
0.5%
15
 
0.1%
15
 
0.1%
Other values (15) 50
 
0.2%

업체ID
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct192
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4113108.4
Minimum4100100
Maximum9999999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T07:02:51.907237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum4100100
5-th percentile4100200
Q14101100
median4103600
Q34109100
95-th percentile4150300
Maximum9999999
Range5899899
Interquartile range (IQR)8000

Descriptive statistics

Standard deviation103341.43
Coefficient of variation (CV)0.025124899
Kurtosis3158.3418
Mean4113108.4
Median Absolute Deviation (MAD)3200
Skewness55.47928
Sum4.1131084 × 1010
Variance1.0679452 × 1010
MonotonicityNot monotonic
2023-12-11T07:02:52.052996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4100200 629
 
6.3%
4100300 489
 
4.9%
4103100 432
 
4.3%
4150200 361
 
3.6%
4150300 330
 
3.3%
4103600 320
 
3.2%
4100600 313
 
3.1%
4100700 306
 
3.1%
4100400 296
 
3.0%
4100500 270
 
2.7%
Other values (182) 6254
62.5%
ValueCountFrequency (%)
4100100 15
 
0.1%
4100200 629
6.3%
4100300 489
4.9%
4100400 296
3.0%
4100500 270
2.7%
4100600 313
3.1%
4100700 306
3.1%
4100800 96
 
1.0%
4100900 23
 
0.2%
4101100 196
 
2.0%
ValueCountFrequency (%)
9999999 3
 
< 0.1%
4159800 3
 
< 0.1%
4159300 12
 
0.1%
4158500 1
 
< 0.1%
4155500 1
 
< 0.1%
4155200 53
0.5%
4155100 13
 
0.1%
4155000 28
0.3%
4154500 3
 
< 0.1%
4154400 2
 
< 0.1%

제조사
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct25
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
현대
4953 
대우
2485 
<NA>
1622 
에디슨모터스
 
122
기아
 
121
Other values (20)
697 

Length

Max length8
Median length2
Mean length2.4874
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row대우
2nd row대우
3rd row현대
4th row대우
5th row현대

Common Values

ValueCountFrequency (%)
현대 4953
49.5%
대우 2485
24.9%
<NA> 1622
 
16.2%
에디슨모터스 122
 
1.2%
기아 121
 
1.2%
이엠코리아 102
 
1.0%
MAN 101
 
1.0%
하이거 94
 
0.9%
볼보 72
 
0.7%
BLK 48
 
0.5%
Other values (15) 280
 
2.8%

Length

2023-12-11T07:02:52.188432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
현대 4953
49.5%
대우 2485
24.9%
na 1622
 
16.2%
에디슨모터스 122
 
1.2%
기아 121
 
1.2%
이엠코리아 102
 
1.0%
man 101
 
1.0%
하이거 94
 
0.9%
볼보 72
 
0.7%
blk 48
 
0.5%
Other values (15) 280
 
2.8%

차량명
Text

MISSING 

Distinct99
Distinct (%)1.2%
Missing1651
Missing (%)16.5%
Memory size156.2 KiB
2023-12-11T07:02:52.448710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length20
Mean length12.108276
Min length6

Characters and Unicode

Total characters101092
Distinct characters161
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)0.1%

Sample

1st row[대우] BS090
2nd row[대우] BS106
3rd row[현대] 일렉시티
4th row[대우] FX116
5th row[현대] 뉴슈퍼에어로시티
ValueCountFrequency (%)
현대 4839
24.8%
대우 2485
12.7%
cng 2480
12.7%
유니버스 1620
 
8.3%
뉴슈퍼에어로시티 1164
 
6.0%
bs106 740
 
3.8%
fx116 669
 
3.4%
그린시티 591
 
3.0%
bs090 453
 
2.3%
일렉시티 413
 
2.1%
Other values (109) 4071
20.9%
2023-12-11T07:02:52.916616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
11176
 
11.1%
[ 8348
 
8.3%
] 8348
 
8.3%
7409
 
7.3%
4924
 
4.9%
3183
 
3.1%
1 3176
 
3.1%
2824
 
2.8%
N 2694
 
2.7%
2678
 
2.6%
Other values (151) 46332
45.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 49545
49.0%
Uppercase Letter 14543
 
14.4%
Space Separator 11176
 
11.1%
Open Punctuation 8367
 
8.3%
Close Punctuation 8367
 
8.3%
Decimal Number 7938
 
7.9%
Lowercase Letter 907
 
0.9%
Dash Punctuation 244
 
0.2%
Other Punctuation 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
7409
15.0%
4924
 
9.9%
3183
 
6.4%
2824
 
5.7%
2678
 
5.4%
2503
 
5.1%
2192
 
4.4%
1929
 
3.9%
1929
 
3.9%
1845
 
3.7%
Other values (95) 18129
36.6%
Uppercase Letter
ValueCountFrequency (%)
N 2694
18.5%
G 2511
17.3%
C 2502
17.2%
S 1502
10.3%
B 1497
10.3%
F 1083
7.4%
X 1076
 
7.4%
E 199
 
1.4%
Y 198
 
1.4%
A 190
 
1.3%
Other values (14) 1091
7.5%
Lowercase Letter
ValueCountFrequency (%)
i 153
16.9%
o 147
16.2%
e 119
13.1%
s 112
12.3%
n 111
12.2%
t 54
 
6.0%
c 51
 
5.6%
l 46
 
5.1%
y 42
 
4.6%
w 28
 
3.1%
Other values (6) 44
 
4.9%
Decimal Number
ValueCountFrequency (%)
1 3176
40.0%
0 2297
28.9%
6 1478
18.6%
9 484
 
6.1%
2 478
 
6.0%
8 15
 
0.2%
3 5
 
0.1%
7 3
 
< 0.1%
4 2
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
[ 8348
99.8%
( 19
 
0.2%
Close Punctuation
ValueCountFrequency (%)
] 8348
99.8%
) 19
 
0.2%
Space Separator
ValueCountFrequency (%)
11176
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 244
100.0%
Other Punctuation
ValueCountFrequency (%)
. 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 49545
49.0%
Common 36097
35.7%
Latin 15450
 
15.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
7409
15.0%
4924
 
9.9%
3183
 
6.4%
2824
 
5.7%
2678
 
5.4%
2503
 
5.1%
2192
 
4.4%
1929
 
3.9%
1929
 
3.9%
1845
 
3.7%
Other values (95) 18129
36.6%
Latin
ValueCountFrequency (%)
N 2694
17.4%
G 2511
16.3%
C 2502
16.2%
S 1502
9.7%
B 1497
9.7%
F 1083
7.0%
X 1076
 
7.0%
E 199
 
1.3%
Y 198
 
1.3%
A 190
 
1.2%
Other values (30) 1998
12.9%
Common
ValueCountFrequency (%)
11176
31.0%
[ 8348
23.1%
] 8348
23.1%
1 3176
 
8.8%
0 2297
 
6.4%
6 1478
 
4.1%
9 484
 
1.3%
2 478
 
1.3%
- 244
 
0.7%
( 19
 
0.1%
Other values (6) 49
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 51547
51.0%
Hangul 49545
49.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
11176
21.7%
[ 8348
16.2%
] 8348
16.2%
1 3176
 
6.2%
N 2694
 
5.2%
G 2511
 
4.9%
C 2502
 
4.9%
0 2297
 
4.5%
S 1502
 
2.9%
B 1497
 
2.9%
Other values (46) 7496
14.5%
Hangul
ValueCountFrequency (%)
7409
15.0%
4924
 
9.9%
3183
 
6.4%
2824
 
5.7%
2678
 
5.4%
2503
 
5.1%
2192
 
4.4%
1929
 
3.9%
1929
 
3.9%
1845
 
3.7%
Other values (95) 18129
36.6%

Interactions

2023-12-11T07:02:50.279080image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:02:50.066619image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:02:50.385904image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:02:50.164749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T07:02:53.016696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
차량ID업체ID제조사차량명
차량ID1.0000.0000.0931.000
업체ID0.0001.000NaNNaN
제조사0.093NaN1.0001.000
차량명1.000NaN1.0001.000
2023-12-11T07:02:53.151290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
차량ID업체ID제조사
차량ID1.0000.1940.073
업체ID0.1941.0001.000
제조사0.0731.0001.000

Missing values

2023-12-11T07:02:50.549702image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T07:02:50.642107image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T07:02:50.736963image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

차량ID차량번호업체ID제조사차량명
9651213000210경기72사11984104500대우[대우] BS090
15131222000206경기74아13534107800대우[대우] BS106
6038214000017경기72아30164102200현대[현대] 일렉시티
18961249010883경기77바59134150300대우[대우] FX116
16756228000080경기78아10484100600현대[현대] 뉴슈퍼에어로시티
11451222000010경기74아37034107700이엠코리아[이엠코리아] 에픽시티
16084233000765서울70바92404108600현대[현대] 유니버스
8132232000106경기79바62694110100현대[현대] 뉴슈퍼에어로시티 CNG
9646249012181경기78아61204150600기아[기아] 뉴그랜버드
187216000371경기73바14784100700현대[현대] 그린시티
차량ID차량번호업체ID제조사차량명
19315210000159경기71아81604106300대우[대우] BS090
2216200000169경기70사12624102700현대[현대] 뉴슈퍼에어로시티
17494234001044경기77바24474100300대우[대우] FX116
17220214300382경기72아80754128300대우[대우] 레스타
13307200011019경기70사66054105000현대[현대] 유니버스 CNG
17549200000954경기70바55304103600현대[현대] 에어로시티 CNG
17491234001040경기77바24434100300대우[대우] FX116
16500249011143경기77바69204150300대우[대우] FX116
795238000010경기77사35574103500현대[현대] 뉴카운티
5897234001125경기77바25624100300대우[대우] BS090