Overview

Dataset statistics

Number of variables6
Number of observations31
Missing cells22
Missing cells (%)11.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.7 KiB
Average record size in memory57.3 B

Variable types

Text1
Categorical1
Numeric4

Dataset

Description서울경찰청에서 관리하는 2022년도 서울특별시 내에서 발생한 청소년 5대 범죄 관련 범죄 별 검거 통계 현황 데이터 파일( 31개 경찰서 별 )입니다.
URLhttps://www.data.go.kr/data/15114278/fileData.do

Alerts

강간-추행 is highly overall correlated with 절도 and 1 other fieldsHigh correlation
절도 is highly overall correlated with 강간-추행 and 1 other fieldsHigh correlation
폭력 is highly overall correlated with 강간-추행 and 1 other fieldsHigh correlation
살인 is highly imbalanced (79.4%)Imbalance
강도 has 19 (61.3%) missing valuesMissing
강간-추행 has 2 (6.5%) missing valuesMissing
폭력 has 1 (3.2%) missing valuesMissing
구분 has unique valuesUnique

Reproduction

Analysis started2023-12-12 06:49:04.456341
Analysis finished2023-12-12 06:49:06.608111
Duration2.15 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

구분
Text

UNIQUE 

Distinct31
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size380.0 B
2023-12-12T15:49:06.775299image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length3
Median length2
Mean length2.1290323
Min length2

Characters and Unicode

Total characters66
Distinct characters43
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)100.0%

Sample

1st row중부
2nd row종로
3rd row남대문
4th row서대문
5th row혜화
ValueCountFrequency (%)
중부 1
 
3.2%
중랑 1
 
3.2%
도봉 1
 
3.2%
은평 1
 
3.2%
방배 1
 
3.2%
노원 1
 
3.2%
송파 1
 
3.2%
양천 1
 
3.2%
서초 1
 
3.2%
구로 1
 
3.2%
Other values (21) 21
67.7%
2023-12-12T15:49:07.157842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5
 
7.6%
4
 
6.1%
4
 
6.1%
3
 
4.5%
3
 
4.5%
2
 
3.0%
2
 
3.0%
2
 
3.0%
2
 
3.0%
2
 
3.0%
Other values (33) 37
56.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5
 
7.6%
4
 
6.1%
4
 
6.1%
3
 
4.5%
3
 
4.5%
2
 
3.0%
2
 
3.0%
2
 
3.0%
2
 
3.0%
2
 
3.0%
Other values (33) 37
56.1%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5
 
7.6%
4
 
6.1%
4
 
6.1%
3
 
4.5%
3
 
4.5%
2
 
3.0%
2
 
3.0%
2
 
3.0%
2
 
3.0%
2
 
3.0%
Other values (33) 37
56.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5
 
7.6%
4
 
6.1%
4
 
6.1%
3
 
4.5%
3
 
4.5%
2
 
3.0%
2
 
3.0%
2
 
3.0%
2
 
3.0%
2
 
3.0%
Other values (33) 37
56.1%

살인
Categorical

IMBALANCE 

Distinct2
Distinct (%)6.5%
Missing0
Missing (%)0.0%
Memory size380.0 B
<NA>
30 
1
 
1

Length

Max length4
Median length4
Mean length3.9032258
Min length1

Unique

Unique1 ?
Unique (%)3.2%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 30
96.8%
1 1
 
3.2%

Length

2023-12-12T15:49:07.312336image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:49:07.410888image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 30
96.8%
1 1
 
3.2%

강도
Real number (ℝ)

MISSING 

Distinct6
Distinct (%)50.0%
Missing19
Missing (%)61.3%
Infinite0
Infinite (%)0.0%
Mean3.4166667
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size411.0 B
2023-12-12T15:49:07.522514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11.75
median2.5
Q34.5
95-th percentile8
Maximum8
Range7
Interquartile range (IQR)2.75

Descriptive statistics

Standard deviation2.5746433
Coefficient of variation (CV)0.75355412
Kurtosis-0.29767993
Mean3.4166667
Median Absolute Deviation (MAD)1.5
Skewness1.0162407
Sum41
Variance6.6287879
MonotonicityNot monotonic
2023-12-12T15:49:07.643029image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
2 3
 
9.7%
1 3
 
9.7%
3 2
 
6.5%
8 2
 
6.5%
4 1
 
3.2%
6 1
 
3.2%
(Missing) 19
61.3%
ValueCountFrequency (%)
1 3
9.7%
2 3
9.7%
3 2
6.5%
4 1
 
3.2%
6 1
 
3.2%
8 2
6.5%
ValueCountFrequency (%)
8 2
6.5%
6 1
 
3.2%
4 1
 
3.2%
3 2
6.5%
2 3
9.7%
1 3
9.7%

강간-추행
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct13
Distinct (%)44.8%
Missing2
Missing (%)6.5%
Infinite0
Infinite (%)0.0%
Mean6.3448276
Minimum1
Maximum15
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size411.0 B
2023-12-12T15:49:07.780993image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q14
median6
Q39
95-th percentile12.2
Maximum15
Range14
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.7059398
Coefficient of variation (CV)0.58408835
Kurtosis-0.42043917
Mean6.3448276
Median Absolute Deviation (MAD)3
Skewness0.35347023
Sum184
Variance13.73399
MonotonicityNot monotonic
2023-12-12T15:49:07.909100image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
2 3
9.7%
1 3
9.7%
10 3
9.7%
4 3
9.7%
7 3
9.7%
9 3
9.7%
5 3
9.7%
8 2
6.5%
6 2
6.5%
3 1
 
3.2%
Other values (3) 3
9.7%
(Missing) 2
6.5%
ValueCountFrequency (%)
1 3
9.7%
2 3
9.7%
3 1
 
3.2%
4 3
9.7%
5 3
9.7%
6 2
6.5%
7 3
9.7%
8 2
6.5%
9 3
9.7%
10 3
9.7%
ValueCountFrequency (%)
15 1
 
3.2%
13 1
 
3.2%
11 1
 
3.2%
10 3
9.7%
9 3
9.7%
8 2
6.5%
7 3
9.7%
6 2
6.5%
5 3
9.7%
4 3
9.7%

절도
Real number (ℝ)

HIGH CORRELATION 

Distinct27
Distinct (%)87.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean60.548387
Minimum8
Maximum132
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size411.0 B
2023-12-12T15:49:08.026399image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum8
5-th percentile12.5
Q126
median58
Q395
95-th percentile125.5
Maximum132
Range124
Interquartile range (IQR)69

Descriptive statistics

Standard deviation37.596843
Coefficient of variation (CV)0.6209388
Kurtosis-1.0669205
Mean60.548387
Median Absolute Deviation (MAD)32
Skewness0.40547893
Sum1877
Variance1413.5226
MonotonicityNot monotonic
2023-12-12T15:49:08.159516image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
26 3
 
9.7%
34 2
 
6.5%
95 2
 
6.5%
19 1
 
3.2%
128 1
 
3.2%
105 1
 
3.2%
98 1
 
3.2%
44 1
 
3.2%
16 1
 
3.2%
111 1
 
3.2%
Other values (17) 17
54.8%
ValueCountFrequency (%)
8 1
 
3.2%
9 1
 
3.2%
16 1
 
3.2%
19 1
 
3.2%
21 1
 
3.2%
25 1
 
3.2%
26 3
9.7%
34 2
6.5%
39 1
 
3.2%
43 1
 
3.2%
ValueCountFrequency (%)
132 1
3.2%
128 1
3.2%
123 1
3.2%
111 1
3.2%
105 1
3.2%
98 1
3.2%
97 1
3.2%
95 2
6.5%
87 1
3.2%
76 1
3.2%

폭력
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct27
Distinct (%)90.0%
Missing1
Missing (%)3.2%
Infinite0
Infinite (%)0.0%
Mean75.066667
Minimum7
Maximum207
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size411.0 B
2023-12-12T15:49:08.280495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile17.45
Q134.5
median65.5
Q3102.25
95-th percentile139.3
Maximum207
Range200
Interquartile range (IQR)67.75

Descriptive statistics

Standard deviation46.056886
Coefficient of variation (CV)0.61354644
Kurtosis0.68722893
Mean75.066667
Median Absolute Deviation (MAD)36
Skewness0.75061351
Sum2252
Variance2121.2368
MonotonicityNot monotonic
2023-12-12T15:49:08.417079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
65 3
 
9.7%
27 2
 
6.5%
68 1
 
3.2%
66 1
 
3.2%
115 1
 
3.2%
17 1
 
3.2%
207 1
 
3.2%
121 1
 
3.2%
91 1
 
3.2%
142 1
 
3.2%
Other values (17) 17
54.8%
ValueCountFrequency (%)
7 1
3.2%
17 1
3.2%
18 1
3.2%
26 1
3.2%
27 2
6.5%
28 1
3.2%
31 1
3.2%
45 1
3.2%
51 1
3.2%
52 1
3.2%
ValueCountFrequency (%)
207 1
3.2%
142 1
3.2%
136 1
3.2%
129 1
3.2%
121 1
3.2%
117 1
3.2%
115 1
3.2%
103 1
3.2%
100 1
3.2%
99 1
3.2%

Interactions

2023-12-12T15:49:05.888063image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:49:04.657683image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:49:05.058081image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:49:05.499995image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:49:05.995097image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:49:04.751668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:49:05.175224image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:49:05.601843image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:49:06.092392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:49:04.851469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:49:05.289995image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:49:05.699410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:49:06.199876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:49:04.956650image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:49:05.378050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:49:05.781788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T15:49:08.531899image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분강도강간-추행절도폭력
구분1.0001.0001.0001.0001.000
강도1.0001.0000.7770.7680.930
강간-추행1.0000.7771.0000.8520.678
절도1.0000.7680.8521.0000.744
폭력1.0000.9300.6780.7441.000
2023-12-12T15:49:08.646775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
강도강간-추행절도폭력살인
강도1.0000.0280.2350.3990.000
강간-추행0.0281.0000.6600.654NaN
절도0.2350.6601.0000.747NaN
폭력0.3990.6540.7471.000NaN
살인0.000NaNNaNNaN1.000

Missing values

2023-12-12T15:49:06.331623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:49:06.436481image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T15:49:06.539937image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

구분살인강도강간-추행절도폭력
0중부<NA><NA>21931
1종로<NA>4<NA>2118
2남대문<NA><NA>19<NA>
3서대문<NA>2104788
4혜화<NA><NA>187
5용산<NA><NA>42626
6성북<NA><NA>83451
7동대문<NA>174352
8마포<NA><NA>99599
9영등포<NA>637084
구분살인강도강간-추행절도폭력
21종암<NA><NA>52527
22구로<NA>81364142
23서초<NA>179765
24양천<NA><NA>68791
25송파<NA>88123121
26노원<NA>215111207
27방배<NA><NA>21617
28은평<NA><NA>544115
29도봉<NA><NA>99865
30수서<NA><NA><NA>10566