Overview

Dataset statistics

Number of variables6
Number of observations31
Missing cells27
Missing cells (%)14.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.7 KiB
Average record size in memory57.3 B

Variable types

Text1
Categorical1
Numeric4

Dataset

Description2021년 서울특별시경찰청 경찰서별 청소년 5대범죄에 대한 현황으로 살인, 강도, 추행 등의 통계를 제공합니다.
Author경찰청 서울특별시경찰청
URLhttps://www.data.go.kr/data/3075889/fileData.do

Alerts

강간-추행 is highly overall correlated with 폭력High correlation
절도 is highly overall correlated with 폭력High correlation
폭력 is highly overall correlated with 강간-추행 and 1 other fieldsHigh correlation
살인 is highly imbalanced (79.4%)Imbalance
강도 has 22 (71.0%) missing valuesMissing
강간-추행 has 5 (16.1%) missing valuesMissing
구분 has unique valuesUnique

Reproduction

Analysis started2023-12-12 08:49:35.427610
Analysis finished2023-12-12 08:49:38.016502
Duration2.59 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

구분
Text

UNIQUE 

Distinct31
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size380.0 B
2023-12-12T17:49:38.183498image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length3
Median length2
Mean length2.1290323
Min length2

Characters and Unicode

Total characters66
Distinct characters43
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)100.0%

Sample

1st row중부
2nd row종로
3rd row남대문
4th row서대문
5th row혜화
ValueCountFrequency (%)
중부 1
 
3.2%
중랑 1
 
3.2%
도봉 1
 
3.2%
은평 1
 
3.2%
방배 1
 
3.2%
노원 1
 
3.2%
송파 1
 
3.2%
양천 1
 
3.2%
서초 1
 
3.2%
구로 1
 
3.2%
Other values (21) 21
67.7%
2023-12-12T17:49:38.568830image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5
 
7.6%
4
 
6.1%
4
 
6.1%
3
 
4.5%
3
 
4.5%
2
 
3.0%
2
 
3.0%
2
 
3.0%
2
 
3.0%
2
 
3.0%
Other values (33) 37
56.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5
 
7.6%
4
 
6.1%
4
 
6.1%
3
 
4.5%
3
 
4.5%
2
 
3.0%
2
 
3.0%
2
 
3.0%
2
 
3.0%
2
 
3.0%
Other values (33) 37
56.1%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5
 
7.6%
4
 
6.1%
4
 
6.1%
3
 
4.5%
3
 
4.5%
2
 
3.0%
2
 
3.0%
2
 
3.0%
2
 
3.0%
2
 
3.0%
Other values (33) 37
56.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5
 
7.6%
4
 
6.1%
4
 
6.1%
3
 
4.5%
3
 
4.5%
2
 
3.0%
2
 
3.0%
2
 
3.0%
2
 
3.0%
2
 
3.0%
Other values (33) 37
56.1%

살인
Categorical

IMBALANCE 

Distinct2
Distinct (%)6.5%
Missing0
Missing (%)0.0%
Memory size380.0 B
<NA>
30 
1
 
1

Length

Max length4
Median length4
Mean length3.9032258
Min length1

Unique

Unique1 ?
Unique (%)3.2%

Sample

1st row1
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 30
96.8%
1 1
 
3.2%

Length

2023-12-12T17:49:38.730612image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T17:49:38.852344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 30
96.8%
1 1
 
3.2%

강도
Real number (ℝ)

MISSING 

Distinct6
Distinct (%)66.7%
Missing22
Missing (%)71.0%
Infinite0
Infinite (%)0.0%
Mean3.4444444
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size411.0 B
2023-12-12T17:49:38.957798image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q35
95-th percentile7.6
Maximum8
Range7
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.6034166
Coefficient of variation (CV)0.75583061
Kurtosis-0.60139363
Mean3.4444444
Median Absolute Deviation (MAD)1
Skewness0.95555393
Sum31
Variance6.7777778
MonotonicityNot monotonic
2023-12-12T17:49:39.095525image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
2 3
 
9.7%
1 2
 
6.5%
7 1
 
3.2%
5 1
 
3.2%
3 1
 
3.2%
8 1
 
3.2%
(Missing) 22
71.0%
ValueCountFrequency (%)
1 2
6.5%
2 3
9.7%
3 1
 
3.2%
5 1
 
3.2%
7 1
 
3.2%
8 1
 
3.2%
ValueCountFrequency (%)
8 1
 
3.2%
7 1
 
3.2%
5 1
 
3.2%
3 1
 
3.2%
2 3
9.7%
1 2
6.5%

강간-추행
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct14
Distinct (%)53.8%
Missing5
Missing (%)16.1%
Infinite0
Infinite (%)0.0%
Mean6.5769231
Minimum1
Maximum15
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size411.0 B
2023-12-12T17:49:39.244754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1.25
Q13.25
median6
Q39
95-th percentile13.5
Maximum15
Range14
Interquartile range (IQR)5.75

Descriptive statistics

Standard deviation3.8902244
Coefficient of variation (CV)0.59149611
Kurtosis-0.44779632
Mean6.5769231
Median Absolute Deviation (MAD)3
Skewness0.51903492
Sum171
Variance15.133846
MonotonicityNot monotonic
2023-12-12T17:49:39.401046image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
6 4
12.9%
3 3
9.7%
5 3
9.7%
1 2
 
6.5%
7 2
 
6.5%
11 2
 
6.5%
2 2
 
6.5%
9 2
 
6.5%
4 1
 
3.2%
8 1
 
3.2%
Other values (4) 4
12.9%
(Missing) 5
16.1%
ValueCountFrequency (%)
1 2
6.5%
2 2
6.5%
3 3
9.7%
4 1
 
3.2%
5 3
9.7%
6 4
12.9%
7 2
6.5%
8 1
 
3.2%
9 2
6.5%
10 1
 
3.2%
ValueCountFrequency (%)
15 1
 
3.2%
14 1
 
3.2%
12 1
 
3.2%
11 2
6.5%
10 1
 
3.2%
9 2
6.5%
8 1
 
3.2%
7 2
6.5%
6 4
12.9%
5 3
9.7%

절도
Real number (ℝ)

HIGH CORRELATION 

Distinct25
Distinct (%)80.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean53.83871
Minimum4
Maximum134
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size411.0 B
2023-12-12T17:49:39.579871image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum4
5-th percentile11.5
Q136.5
median52
Q365
95-th percentile111
Maximum134
Range130
Interquartile range (IQR)28.5

Descriptive statistics

Standard deviation30.21379
Coefficient of variation (CV)0.56119083
Kurtosis0.91403273
Mean53.83871
Median Absolute Deviation (MAD)15
Skewness0.70357378
Sum1669
Variance912.87312
MonotonicityNot monotonic
2023-12-12T17:49:39.743924image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=25)
ValueCountFrequency (%)
48 3
 
9.7%
62 3
 
9.7%
13 2
 
6.5%
60 2
 
6.5%
37 1
 
3.2%
39 1
 
3.2%
101 1
 
3.2%
88 1
 
3.2%
20 1
 
3.2%
121 1
 
3.2%
Other values (15) 15
48.4%
ValueCountFrequency (%)
4 1
3.2%
10 1
3.2%
13 2
6.5%
20 1
3.2%
22 1
3.2%
30 1
3.2%
36 1
3.2%
37 1
3.2%
39 1
3.2%
45 1
3.2%
ValueCountFrequency (%)
134 1
 
3.2%
121 1
 
3.2%
101 1
 
3.2%
88 1
 
3.2%
77 1
 
3.2%
72 1
 
3.2%
67 1
 
3.2%
66 1
 
3.2%
64 1
 
3.2%
62 3
9.7%

폭력
Real number (ℝ)

HIGH CORRELATION 

Distinct28
Distinct (%)90.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean65.935484
Minimum2
Maximum147
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size411.0 B
2023-12-12T17:49:39.895410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile9
Q130
median70
Q394.5
95-th percentile124.5
Maximum147
Range145
Interquartile range (IQR)64.5

Descriptive statistics

Standard deviation38.746127
Coefficient of variation (CV)0.58763696
Kurtosis-0.61133482
Mean65.935484
Median Absolute Deviation (MAD)27
Skewness0.10805597
Sum2044
Variance1501.2624
MonotonicityNot monotonic
2023-12-12T17:49:40.034322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
102 2
 
6.5%
20 2
 
6.5%
94 2
 
6.5%
24 1
 
3.2%
147 1
 
3.2%
39 1
 
3.2%
90 1
 
3.2%
18 1
 
3.2%
97 1
 
3.2%
74 1
 
3.2%
Other values (18) 18
58.1%
ValueCountFrequency (%)
2 1
3.2%
4 1
3.2%
14 1
3.2%
18 1
3.2%
20 2
6.5%
24 1
3.2%
29 1
3.2%
31 1
3.2%
39 1
3.2%
51 1
3.2%
ValueCountFrequency (%)
147 1
3.2%
144 1
3.2%
105 1
3.2%
102 2
6.5%
97 1
3.2%
96 1
3.2%
95 1
3.2%
94 2
6.5%
90 1
3.2%
85 1
3.2%

Interactions

2023-12-12T17:49:37.209444image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:49:35.654338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:49:36.023797image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:49:36.777191image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:49:37.343107image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:49:35.745715image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:49:36.410856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:49:36.875544image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:49:37.461334image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:49:35.839879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:49:36.564803image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:49:36.982347image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:49:37.568889image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:49:35.931610image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:49:36.663807image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:49:37.111164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T17:49:40.154104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분강도강간-추행절도폭력
구분1.0001.0001.0001.0001.000
강도1.0001.0000.8390.4570.573
강간-추행1.0000.8391.0000.8190.296
절도1.0000.4570.8191.0000.699
폭력1.0000.5730.2960.6991.000
2023-12-12T17:49:40.258393image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
강도강간-추행절도폭력살인
강도1.000-0.0570.4620.366NaN
강간-추행-0.0571.0000.4440.7020.000
절도0.4620.4441.0000.688NaN
폭력0.3660.7020.6881.000NaN
살인NaN0.000NaNNaN1.000

Missing values

2023-12-12T17:49:37.717602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T17:49:37.852679image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T17:49:37.963093image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

구분살인강도강간-추행절도폭력
0중부11<NA>3729
1종로<NA><NA>4414
2남대문<NA><NA>1102
3서대문<NA><NA>66466
4혜화<NA><NA><NA>134
5용산<NA><NA><NA>5220
6성북<NA><NA>32269
7동대문<NA>773070
8마포<NA>2117285
9영등포<NA>526252
구분살인강도강간-추행절도폭력
21종암<NA>1<NA>2031
22구로<NA><NA>94895
23서초<NA>236074
24양천<NA><NA>98894
25송파<NA><NA>1410197
26노원<NA><NA>1562102
27방배<NA><NA>11318
28은평<NA><NA>66290
29도봉<NA><NA>63939
30수서<NA><NA>1148147