Overview

Dataset statistics

Number of variables8
Number of observations53
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.8 KiB
Average record size in memory72.5 B

Variable types

Categorical3
Text1
Numeric4

Dataset

Description* 부문별 무면허 교통사고(2018)
Author도로교통공단
URLhttps://www.data.go.kr/data/15094166/fileData.do

Alerts

발생건수 is highly overall correlated with 부상자수 and 3 other fieldsHigh correlation
부상자수 is highly overall correlated with 발생건수 and 2 other fieldsHigh correlation
중상 is highly overall correlated with 부상신고High correlation
경상 is highly overall correlated with 발생건수 and 2 other fieldsHigh correlation
시도 is highly overall correlated with 발생건수High correlation
부상신고 is highly overall correlated with 발생건수 and 3 other fieldsHigh correlation
사망자수 is highly imbalanced (83.0%)Imbalance
부상신고 is highly imbalanced (54.2%)Imbalance
중상 has 38 (71.7%) zerosZeros
경상 has 9 (17.0%) zerosZeros

Reproduction

Analysis started2023-12-12 22:14:34.215202
Analysis finished2023-12-12 22:14:36.785432
Duration2.57 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시도
Categorical

HIGH CORRELATION 

Distinct13
Distinct (%)24.5%
Missing0
Missing (%)0.0%
Memory size556.0 B
경기
11 
서울
10 
강원
대구
경남
Other values (8)
18 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique2 ?
Unique (%)3.8%

Sample

1st row서울
2nd row서울
3rd row서울
4th row서울
5th row서울

Common Values

ValueCountFrequency (%)
경기 11
20.8%
서울 10
18.9%
강원 5
9.4%
대구 5
9.4%
경남 4
 
7.5%
충남 3
 
5.7%
전북 3
 
5.7%
인천 3
 
5.7%
대전 3
 
5.7%
경북 2
 
3.8%
Other values (3) 4
 
7.5%

Length

2023-12-13T07:14:36.852984image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기 11
20.8%
서울 10
18.9%
강원 5
9.4%
대구 5
9.4%
경남 4
 
7.5%
충남 3
 
5.7%
전북 3
 
5.7%
인천 3
 
5.7%
대전 3
 
5.7%
경북 2
 
3.8%
Other values (3) 4
 
7.5%
Distinct51
Distinct (%)96.2%
Missing0
Missing (%)0.0%
Memory size556.0 B
2023-12-13T07:14:37.069617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length3
Mean length3
Min length2

Characters and Unicode

Total characters159
Distinct characters59
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique49 ?
Unique (%)92.5%

Sample

1st row중구
2nd row용산구
3rd row성동구
4th row마포구
5th row영등포구
ValueCountFrequency (%)
중구 2
 
3.8%
서구 2
 
3.8%
계양구 1
 
1.9%
달서구 1
 
1.9%
청주시 1
 
1.9%
유성구 1
 
1.9%
거제시 1
 
1.9%
천안시 1
 
1.9%
부여군 1
 
1.9%
예산군 1
 
1.9%
Other values (41) 41
77.4%
2023-12-13T07:14:37.401188image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
25
 
15.7%
23
 
14.5%
8
 
5.0%
6
 
3.8%
5
 
3.1%
5
 
3.1%
5
 
3.1%
5
 
3.1%
4
 
2.5%
4
 
2.5%
Other values (49) 69
43.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 157
98.7%
Open Punctuation 1
 
0.6%
Close Punctuation 1
 
0.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
25
 
15.9%
23
 
14.6%
8
 
5.1%
6
 
3.8%
5
 
3.2%
5
 
3.2%
5
 
3.2%
5
 
3.2%
4
 
2.5%
4
 
2.5%
Other values (47) 67
42.7%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 157
98.7%
Common 2
 
1.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
25
 
15.9%
23
 
14.6%
8
 
5.1%
6
 
3.8%
5
 
3.2%
5
 
3.2%
5
 
3.2%
5
 
3.2%
4
 
2.5%
4
 
2.5%
Other values (47) 67
42.7%
Common
ValueCountFrequency (%)
( 1
50.0%
) 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 157
98.7%
ASCII 2
 
1.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
25
 
15.9%
23
 
14.6%
8
 
5.1%
6
 
3.8%
5
 
3.2%
5
 
3.2%
5
 
3.2%
5
 
3.2%
4
 
2.5%
4
 
2.5%
Other values (47) 67
42.7%
ASCII
ValueCountFrequency (%)
( 1
50.0%
) 1
50.0%

발생건수
Real number (ℝ)

HIGH CORRELATION 

Distinct6
Distinct (%)11.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.9622642
Minimum1
Maximum25
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.0 B
2023-12-13T07:14:37.506495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile3.4
Maximum25
Range24
Interquartile range (IQR)1

Descriptive statistics

Standard deviation3.3510174
Coefficient of variation (CV)1.70773
Kurtosis45.057421
Mean1.9622642
Median Absolute Deviation (MAD)0
Skewness6.4995988
Sum104
Variance11.229318
MonotonicityNot monotonic
2023-12-13T07:14:37.597852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
1 36
67.9%
2 8
 
15.1%
3 6
 
11.3%
25 1
 
1.9%
4 1
 
1.9%
5 1
 
1.9%
ValueCountFrequency (%)
1 36
67.9%
2 8
 
15.1%
3 6
 
11.3%
4 1
 
1.9%
5 1
 
1.9%
25 1
 
1.9%
ValueCountFrequency (%)
25 1
 
1.9%
5 1
 
1.9%
4 1
 
1.9%
3 6
 
11.3%
2 8
 
15.1%
1 36
67.9%

사망자수
Categorical

IMBALANCE 

Distinct3
Distinct (%)5.7%
Missing0
Missing (%)0.0%
Memory size556.0 B
0
51 
2
 
1
1
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique2 ?
Unique (%)3.8%

Sample

1st row0
2nd row2
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 51
96.2%
2 1
 
1.9%
1 1
 
1.9%

Length

2023-12-13T07:14:37.703570image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T07:14:37.794872image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 51
96.2%
2 1
 
1.9%
1 1
 
1.9%

부상자수
Real number (ℝ)

HIGH CORRELATION 

Distinct10
Distinct (%)18.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.1320755
Minimum1
Maximum38
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.0 B
2023-12-13T07:14:37.885190image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q33
95-th percentile7.4
Maximum38
Range37
Interquartile range (IQR)2

Descriptive statistics

Standard deviation5.3386751
Coefficient of variation (CV)1.7045167
Kurtosis36.294939
Mean3.1320755
Median Absolute Deviation (MAD)1
Skewness5.6568301
Sum166
Variance28.501451
MonotonicityNot monotonic
2023-12-13T07:14:37.990353image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1 24
45.3%
2 12
22.6%
4 6
 
11.3%
3 4
 
7.5%
6 2
 
3.8%
8 1
 
1.9%
38 1
 
1.9%
5 1
 
1.9%
7 1
 
1.9%
12 1
 
1.9%
ValueCountFrequency (%)
1 24
45.3%
2 12
22.6%
3 4
 
7.5%
4 6
 
11.3%
5 1
 
1.9%
6 2
 
3.8%
7 1
 
1.9%
8 1
 
1.9%
12 1
 
1.9%
38 1
 
1.9%
ValueCountFrequency (%)
38 1
 
1.9%
12 1
 
1.9%
8 1
 
1.9%
7 1
 
1.9%
6 2
 
3.8%
5 1
 
1.9%
4 6
 
11.3%
3 4
 
7.5%
2 12
22.6%
1 24
45.3%

중상
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct6
Distinct (%)11.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.50943396
Minimum0
Maximum6
Zeros38
Zeros (%)71.7%
Negative0
Negative (%)0.0%
Memory size609.0 B
2023-12-13T07:14:38.085234image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2.4
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.1201415
Coefficient of variation (CV)2.1987963
Kurtosis12.019532
Mean0.50943396
Median Absolute Deviation (MAD)0
Skewness3.2175608
Sum27
Variance1.254717
MonotonicityNot monotonic
2023-12-13T07:14:38.179567image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 38
71.7%
1 10
 
18.9%
2 2
 
3.8%
6 1
 
1.9%
3 1
 
1.9%
4 1
 
1.9%
ValueCountFrequency (%)
0 38
71.7%
1 10
 
18.9%
2 2
 
3.8%
3 1
 
1.9%
4 1
 
1.9%
6 1
 
1.9%
ValueCountFrequency (%)
6 1
 
1.9%
4 1
 
1.9%
3 1
 
1.9%
2 2
 
3.8%
1 10
 
18.9%
0 38
71.7%

경상
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct10
Distinct (%)18.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.3207547
Minimum0
Maximum27
Zeros9
Zeros (%)17.0%
Negative0
Negative (%)0.0%
Memory size609.0 B
2023-12-13T07:14:38.278639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q32
95-th percentile6.4
Maximum27
Range27
Interquartile range (IQR)1

Descriptive statistics

Standard deviation3.8619126
Coefficient of variation (CV)1.6640761
Kurtosis33.017264
Mean2.3207547
Median Absolute Deviation (MAD)1
Skewness5.290279
Sum123
Variance14.914369
MonotonicityNot monotonic
2023-12-13T07:14:38.382948image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1 18
34.0%
2 14
26.4%
0 9
17.0%
3 4
 
7.5%
4 3
 
5.7%
7 1
 
1.9%
27 1
 
1.9%
6 1
 
1.9%
5 1
 
1.9%
8 1
 
1.9%
ValueCountFrequency (%)
0 9
17.0%
1 18
34.0%
2 14
26.4%
3 4
 
7.5%
4 3
 
5.7%
5 1
 
1.9%
6 1
 
1.9%
7 1
 
1.9%
8 1
 
1.9%
27 1
 
1.9%
ValueCountFrequency (%)
27 1
 
1.9%
8 1
 
1.9%
7 1
 
1.9%
6 1
 
1.9%
5 1
 
1.9%
4 3
 
5.7%
3 4
 
7.5%
2 14
26.4%
1 18
34.0%
0 9
17.0%

부상신고
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)7.5%
Missing0
Missing (%)0.0%
Memory size556.0 B
0
42 
1
5
 
1
2
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique2 ?
Unique (%)3.8%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 42
79.2%
1 9
 
17.0%
5 1
 
1.9%
2 1
 
1.9%

Length

2023-12-13T07:14:38.482939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T07:14:38.562208image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 42
79.2%
1 9
 
17.0%
5 1
 
1.9%
2 1
 
1.9%

Interactions

2023-12-13T07:14:36.225621image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:14:34.529249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:14:35.325779image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:14:35.759691image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:14:36.318703image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:14:34.635075image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:14:35.434255image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:14:35.888802image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:14:36.409387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:14:35.135324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:14:35.545150image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:14:36.023368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:14:36.503530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:14:35.225755image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:14:35.651752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:14:36.123710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T07:14:38.626979image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시도시군구발생건수사망자수부상자수중상경상부상신고
시도1.0000.8450.7830.0000.3490.0000.0000.413
시군구0.8451.0001.0001.0001.0000.9830.9291.000
발생건수0.7831.0001.0000.0000.7650.9850.7230.805
사망자수0.0001.0000.0001.0000.0000.7810.0000.000
부상자수0.3491.0000.7650.0001.0000.9390.9670.914
중상0.0000.9830.9850.7810.9391.0000.8050.720
경상0.0000.9290.7230.0000.9670.8051.0000.896
부상신고0.4131.0000.8050.0000.9140.7200.8961.000
2023-12-13T07:14:38.722483image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사망자수시도부상신고
사망자수1.0000.0000.000
시도0.0001.0000.218
부상신고0.0000.2181.000
2023-12-13T07:14:38.808578image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
발생건수부상자수중상경상시도사망자수부상신고
발생건수1.0000.7430.3820.5790.5640.0000.860
부상자수0.7431.0000.3630.8720.1780.0000.614
중상0.3820.3631.0000.0220.0000.4450.540
경상0.5790.8720.0221.0000.0000.0000.578
시도0.5640.1780.0000.0001.0000.0000.218
사망자수0.0000.0000.4450.0000.0001.0000.000
부상신고0.8600.6140.5400.5780.2180.0001.000

Missing values

2023-12-13T07:14:36.626836image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T07:14:36.742157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

시도시군구발생건수사망자수부상자수중상경상부상신고
0서울중구101010
1서울용산구324220
2서울성동구101010
3서울마포구101010
4서울영등포구101010
5서울강남구202020
6서울강동구102020
7서울송파구304121
8서울서초구203021
9서울중랑구104040
시도시군구발생건수사망자수부상자수중상경상부상신고
43대구남구103030
44대구북구3012480
45대구수성구202110
46대구달서구104040
47인천중구104130
48인천서구101010
49인천계양구101100
50대전서구101010
51대전유성구204040
52대전대덕구101001