Overview

Dataset statistics

Number of variables9
Number of observations8751
Missing cells7698
Missing cells (%)9.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory624.0 KiB
Average record size in memory73.0 B

Variable types

Numeric1
Categorical7
Text1

Dataset

Description2022년도 사행산업 또는 불법 사행산업으로 인한 중독 및 도박문제로 예방치유원과 접촉한 내담자들의 인구통계학적 데이터(성별, 지역, 연령대, 도박유형 등)
URLhttps://www.data.go.kr/data/15116633/fileData.do

Alerts

성별 is highly overall correlated with 도박자와의 관계High correlation
도박자와의 관계 is highly overall correlated with 성별High correlation
1차 도박 유형 is highly overall correlated with 합불법 여부High correlation
합불법 여부 is highly overall correlated with 1차 도박 유형High correlation
도박자와의 관계 is highly imbalanced (61.3%)Imbalance
온오프라인 여부 is highly imbalanced (69.8%)Imbalance
기타 has 7698 (88.0%) missing valuesMissing
순번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 12:43:16.539215
Analysis finished2023-12-12 12:43:17.939779
Duration1.4 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

순번
Real number (ℝ)

UNIQUE 

Distinct8751
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4376
Minimum1
Maximum8751
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size77.0 KiB
2023-12-12T21:43:18.014413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile438.5
Q12188.5
median4376
Q36563.5
95-th percentile8313.5
Maximum8751
Range8750
Interquartile range (IQR)4375

Descriptive statistics

Standard deviation2526.3404
Coefficient of variation (CV)0.57731728
Kurtosis-1.2
Mean4376
Median Absolute Deviation (MAD)2188
Skewness0
Sum38294376
Variance6382396
MonotonicityStrictly increasing
2023-12-12T21:43:18.171384image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
5838 1
 
< 0.1%
5832 1
 
< 0.1%
5833 1
 
< 0.1%
5834 1
 
< 0.1%
5835 1
 
< 0.1%
5836 1
 
< 0.1%
5837 1
 
< 0.1%
5839 1
 
< 0.1%
5881 1
 
< 0.1%
Other values (8741) 8741
99.9%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
8751 1
< 0.1%
8750 1
< 0.1%
8749 1
< 0.1%
8748 1
< 0.1%
8747 1
< 0.1%
8746 1
< 0.1%
8745 1
< 0.1%
8744 1
< 0.1%
8743 1
< 0.1%
8742 1
< 0.1%

성별
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size68.5 KiB
6889 
1862 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
6889
78.7%
1862
 
21.3%

Length

2023-12-12T21:43:18.601219image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:43:18.727054image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
6889
78.7%
1862
 
21.3%

연령대
Categorical

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size68.5 KiB
30대
2742 
20대
2164 
40대
1685 
50대
1170 
60대
526 
Other values (3)
464 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row50대
2nd row20대
3rd row20대
4th row20대
5th row30대

Common Values

ValueCountFrequency (%)
30대 2742
31.3%
20대 2164
24.7%
40대 1685
19.3%
50대 1170
13.4%
60대 526
 
6.0%
10대 391
 
4.5%
70대 70
 
0.8%
80대 3
 
< 0.1%

Length

2023-12-12T21:43:18.845886image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:43:18.944520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
30대 2742
31.3%
20대 2164
24.7%
40대 1685
19.3%
50대 1170
13.4%
60대 526
 
6.0%
10대 391
 
4.5%
70대 70
 
0.8%
80대 3
 
< 0.1%

지역
Categorical

Distinct18
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size68.5 KiB
경기
2209 
서울
1086 
부산
670 
인천
638 
충남
430 
Other values (13)
3718 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row전북
2nd row경남
3rd row인천
4th row전남
5th row전북

Common Values

ValueCountFrequency (%)
경기 2209
25.2%
서울 1086
12.4%
부산 670
 
7.7%
인천 638
 
7.3%
충남 430
 
4.9%
경북 424
 
4.8%
경남 421
 
4.8%
충북 389
 
4.4%
대구 389
 
4.4%
대전 385
 
4.4%
Other values (8) 1710
19.5%

Length

2023-12-12T21:43:19.075996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기 2209
25.2%
서울 1086
12.4%
부산 670
 
7.7%
인천 638
 
7.3%
충남 430
 
4.9%
경북 424
 
4.8%
경남 421
 
4.8%
충북 389
 
4.4%
대구 389
 
4.4%
대전 385
 
4.4%
Other values (8) 1710
19.5%

도박자와의 관계
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size68.5 KiB
본인
6815 
부모
1193 
배우자
 
529
형제자매
 
120
자녀
 
54
Other values (2)
 
40

Length

Max length4
Median length2
Mean length2.0887899
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row부모
2nd row본인
3rd row본인
4th row본인
5th row배우자

Common Values

ValueCountFrequency (%)
본인 6815
77.9%
부모 1193
 
13.6%
배우자 529
 
6.0%
형제자매 120
 
1.4%
자녀 54
 
0.6%
지인 32
 
0.4%
친인척 8
 
0.1%

Length

2023-12-12T21:43:19.181842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:43:19.333549image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
본인 6815
77.9%
부모 1193
 
13.6%
배우자 529
 
6.0%
형제자매 120
 
1.4%
자녀 54
 
0.6%
지인 32
 
0.4%
친인척 8
 
0.1%

1차 도박 유형
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size68.5 KiB
스포츠도박
3448 
기타
1232 
카드
1144 
주식
835 
미니게임/사다리게임
823 
Other values (9)
1269 

Length

Max length10
Median length5
Mean length4.0417095
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row스포츠도박
2nd row스포츠도박
3rd row미니게임/사다리게임
4th row미니게임/사다리게임
5th row스포츠도박

Common Values

ValueCountFrequency (%)
스포츠도박 3448
39.4%
기타 1232
 
14.1%
카드 1144
 
13.1%
주식 835
 
9.5%
미니게임/사다리게임 823
 
9.4%
카지노 688
 
7.9%
모름 309
 
3.5%
성인오락 124
 
1.4%
경마 58
 
0.7%
화투 37
 
0.4%
Other values (4) 53
 
0.6%

Length

2023-12-12T21:43:19.458543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
스포츠도박 3448
39.4%
기타 1232
 
14.1%
카드 1144
 
13.1%
주식 835
 
9.5%
미니게임/사다리게임 823
 
9.4%
카지노 688
 
7.9%
모름 309
 
3.5%
성인오락 124
 
1.4%
경마 58
 
0.7%
화투 37
 
0.4%
Other values (4) 53
 
0.6%

기타
Text

MISSING 

Distinct168
Distinct (%)16.0%
Missing7698
Missing (%)88.0%
Memory size68.5 KiB
2023-12-12T21:43:19.706611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length4
Mean length4.2867996
Min length2

Characters and Unicode

Total characters4514
Distinct characters174
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique107 ?
Unique (%)10.2%

Sample

1st row가상화폐
2nd row바카라
3rd row가상화폐
4th row가상화폐
5th row가상화폐
ValueCountFrequency (%)
가상화폐 490
43.8%
비트코인 82
 
7.3%
파워볼 70
 
6.3%
코인 51
 
4.6%
바카라 28
 
2.5%
해외선물 22
 
2.0%
선물 15
 
1.3%
가상화폐-선물거래 13
 
1.2%
게임 13
 
1.2%
선물옵션 13
 
1.2%
Other values (141) 321
28.7%
2023-12-12T21:43:20.108109image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
546
 
12.1%
546
 
12.1%
541
 
12.0%
537
 
11.9%
181
 
4.0%
157
 
3.5%
105
 
2.3%
105
 
2.3%
86
 
1.9%
84
 
1.9%
Other values (164) 1626
36.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4329
95.9%
Space Separator 79
 
1.8%
Dash Punctuation 27
 
0.6%
Uppercase Letter 20
 
0.4%
Close Punctuation 19
 
0.4%
Open Punctuation 19
 
0.4%
Lowercase Letter 10
 
0.2%
Other Punctuation 9
 
0.2%
Decimal Number 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
546
 
12.6%
546
 
12.6%
541
 
12.5%
537
 
12.4%
181
 
4.2%
157
 
3.6%
105
 
2.4%
105
 
2.4%
86
 
2.0%
84
 
1.9%
Other values (144) 1441
33.3%
Lowercase Letter
ValueCountFrequency (%)
x 3
30.0%
f 2
20.0%
g 1
 
10.0%
r 1
 
10.0%
a 1
 
10.0%
p 1
 
10.0%
h 1
 
10.0%
Uppercase Letter
ValueCountFrequency (%)
F 8
40.0%
X 7
35.0%
M 2
 
10.0%
R 1
 
5.0%
P 1
 
5.0%
G 1
 
5.0%
Other Punctuation
ValueCountFrequency (%)
, 7
77.8%
' 2
 
22.2%
Space Separator
ValueCountFrequency (%)
79
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 27
100.0%
Close Punctuation
ValueCountFrequency (%)
) 19
100.0%
Open Punctuation
ValueCountFrequency (%)
( 19
100.0%
Decimal Number
ValueCountFrequency (%)
2 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4329
95.9%
Common 155
 
3.4%
Latin 30
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
546
 
12.6%
546
 
12.6%
541
 
12.5%
537
 
12.4%
181
 
4.2%
157
 
3.6%
105
 
2.4%
105
 
2.4%
86
 
2.0%
84
 
1.9%
Other values (144) 1441
33.3%
Latin
ValueCountFrequency (%)
F 8
26.7%
X 7
23.3%
x 3
 
10.0%
M 2
 
6.7%
f 2
 
6.7%
R 1
 
3.3%
P 1
 
3.3%
G 1
 
3.3%
g 1
 
3.3%
r 1
 
3.3%
Other values (3) 3
 
10.0%
Common
ValueCountFrequency (%)
79
51.0%
- 27
 
17.4%
) 19
 
12.3%
( 19
 
12.3%
, 7
 
4.5%
' 2
 
1.3%
2 2
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4329
95.9%
ASCII 185
 
4.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
546
 
12.6%
546
 
12.6%
541
 
12.5%
537
 
12.4%
181
 
4.2%
157
 
3.6%
105
 
2.4%
105
 
2.4%
86
 
2.0%
84
 
1.9%
Other values (144) 1441
33.3%
ASCII
ValueCountFrequency (%)
79
42.7%
- 27
 
14.6%
) 19
 
10.3%
( 19
 
10.3%
F 8
 
4.3%
, 7
 
3.8%
X 7
 
3.8%
x 3
 
1.6%
' 2
 
1.1%
M 2
 
1.1%
Other values (10) 12
 
6.5%

온오프라인 여부
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size68.5 KiB
온라인
8013 
오프라인
 
585
<NA>
 
153

Length

Max length4
Median length3
Mean length3.0843332
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row온라인
2nd row온라인
3rd row온라인
4th row온라인
5th row온라인

Common Values

ValueCountFrequency (%)
온라인 8013
91.6%
오프라인 585
 
6.7%
<NA> 153
 
1.7%

Length

2023-12-12T21:43:20.265154image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:43:20.393258image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
온라인 8013
91.6%
오프라인 585
 
6.7%
na 153
 
1.7%

합불법 여부
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size68.5 KiB
불법
7034 
합법
1561 
<NA>
 
156

Length

Max length4
Median length2
Mean length2.0356531
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row불법
2nd row불법
3rd row불법
4th row불법
5th row불법

Common Values

ValueCountFrequency (%)
불법 7034
80.4%
합법 1561
 
17.8%
<NA> 156
 
1.8%

Length

2023-12-12T21:43:20.540223image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:43:20.667857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
불법 7034
80.4%
합법 1561
 
17.8%
na 156
 
1.8%

Interactions

2023-12-12T21:43:17.513401image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T21:43:20.768838image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번성별연령대지역도박자와의 관계1차 도박 유형온오프라인 여부합불법 여부
순번1.0000.0510.0740.0920.0500.0590.0330.050
성별0.0511.0000.5580.1500.7190.3340.0490.024
연령대0.0740.5581.0000.1560.4760.3440.3360.279
지역0.0920.1500.1561.0000.1590.1320.1120.099
도박자와의 관계0.0500.7190.4760.1591.0000.4390.1290.105
1차 도박 유형0.0590.3340.3440.1320.4391.0000.6120.821
온오프라인 여부0.0330.0490.3360.1120.1290.6121.0000.352
합불법 여부0.0500.0240.2790.0990.1050.8210.3521.000
2023-12-12T21:43:20.898459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성별지역연령대합불법 여부1차 도박 유형온오프라인 여부도박자와의 관계
성별1.0000.1180.4210.0150.2610.0310.778
지역0.1181.0000.0660.0780.0450.0880.071
연령대0.4210.0661.0000.2090.1590.2520.279
합불법 여부0.0150.0780.2091.0000.6680.2290.112
1차 도박 유형0.2610.0450.1590.6681.0000.4830.177
온오프라인 여부0.0310.0880.2520.2290.4831.0000.138
도박자와의 관계0.7780.0710.2790.1120.1770.1381.000
2023-12-12T21:43:21.032045image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번성별연령대지역도박자와의 관계1차 도박 유형온오프라인 여부합불법 여부
순번1.0000.0390.0350.0350.0250.0240.0260.039
성별0.0391.0000.4210.1180.7780.2610.0310.015
연령대0.0350.4211.0000.0660.2790.1590.2520.209
지역0.0350.1180.0661.0000.0710.0450.0880.078
도박자와의 관계0.0250.7780.2790.0711.0000.1770.1380.112
1차 도박 유형0.0240.2610.1590.0450.1771.0000.4830.668
온오프라인 여부0.0260.0310.2520.0880.1380.4831.0000.229
합불법 여부0.0390.0150.2090.0780.1120.6680.2291.000

Missing values

2023-12-12T21:43:17.691303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:43:17.853325image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

순번성별연령대지역도박자와의 관계1차 도박 유형기타온오프라인 여부합불법 여부
0150대전북부모스포츠도박<NA>온라인불법
1220대경남본인스포츠도박<NA>온라인불법
2320대인천본인미니게임/사다리게임<NA>온라인불법
3420대전남본인미니게임/사다리게임<NA>온라인불법
4530대전북배우자스포츠도박<NA>온라인불법
5640대경북배우자기타가상화폐온라인불법
6710대경남본인미니게임/사다리게임<NA>온라인불법
7820대전남본인스포츠도박<NA>온라인불법
8920대충북본인스포츠도박<NA>온라인불법
91050대전북본인카드<NA>오프라인불법
순번성별연령대지역도박자와의 관계1차 도박 유형기타온오프라인 여부합불법 여부
8741874250대대구본인성인오락<NA>온라인불법
8742874330대서울본인스포츠도박<NA>온라인불법
8743874450대강원본인기타해외선물온라인불법
8744874520대경기본인스포츠도박<NA>온라인불법
8745874620대서울본인스포츠도박<NA>온라인불법
8746874720대인천자녀스포츠도박<NA>오프라인합법
8747874820대전남본인스포츠도박<NA>온라인불법
8748874930대부산본인스포츠도박<NA>오프라인불법
8749875060대인천본인카지노<NA>오프라인합법
8750875140대전북부모카드<NA>온라인불법