Overview

Dataset statistics

Number of variables9
Number of observations8736
Missing cells7932
Missing cells (%)10.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory622.9 KiB
Average record size in memory73.0 B

Variable types

Numeric1
Categorical7
Text1

Dataset

Description사행산업 또는 불법 사행산업으로 인한 중독 및 도박문제로 예방치유원과 접촉한 내담자들의 인구통계학적 데이터(성별, 지역, 연령대, 도박유형 등)
Author한국도박문제관리센터
URLhttps://www.data.go.kr/data/15107961/fileData.do

Alerts

도박자와의 관계 is highly overall correlated with 성별 and 3 other fieldsHigh correlation
합불법 여부 is highly overall correlated with 도박자와의 관계 and 1 other fieldsHigh correlation
온오프라인 여부 is highly overall correlated with 도박자와의 관계High correlation
1차 도박 유형 is highly overall correlated with 도박자와의 관계 and 1 other fieldsHigh correlation
성별 is highly overall correlated with 도박자와의 관계High correlation
성별 is highly imbalanced (52.6%)Imbalance
도박자와의 관계 is highly imbalanced (65.0%)Imbalance
기타 has 7931 (90.8%) missing valuesMissing

Reproduction

Analysis started2023-12-12 15:38:21.636151
Analysis finished2023-12-12 15:38:23.485222
Duration1.85 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

순번
Real number (ℝ)

Distinct8735
Distinct (%)100.0%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean4368
Minimum1
Maximum8735
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size76.9 KiB
2023-12-13T00:38:23.591279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile437.7
Q12184.5
median4368
Q36551.5
95-th percentile8298.3
Maximum8735
Range8734
Interquartile range (IQR)4367

Descriptive statistics

Standard deviation2521.7216
Coefficient of variation (CV)0.57731722
Kurtosis-1.2
Mean4368
Median Absolute Deviation (MAD)2184
Skewness0
Sum38154480
Variance6359080
MonotonicityStrictly increasing
2023-12-13T00:38:23.780031image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
5827 1
 
< 0.1%
5821 1
 
< 0.1%
5822 1
 
< 0.1%
5823 1
 
< 0.1%
5824 1
 
< 0.1%
5825 1
 
< 0.1%
5826 1
 
< 0.1%
5828 1
 
< 0.1%
5870 1
 
< 0.1%
Other values (8725) 8725
99.9%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
8735 1
< 0.1%
8734 1
< 0.1%
8733 1
< 0.1%
8732 1
< 0.1%
8731 1
< 0.1%
8730 1
< 0.1%
8729 1
< 0.1%
8728 1
< 0.1%
8727 1
< 0.1%
8726 1
< 0.1%

성별
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size68.4 KiB
6866 
1869 
<NA>
 
1

Length

Max length4
Median length1
Mean length1.0003434
Min length1

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
6866
78.6%
1869
 
21.4%
<NA> 1
 
< 0.1%

Length

2023-12-13T00:38:23.946260image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:38:24.070946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
6866
78.6%
1869
 
21.4%
na 1
 
< 0.1%

연령대
Categorical

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size68.4 KiB
30대
2811 
20대
2229 
40대
1692 
50대
1099 
60대
475 
Other values (4)
430 

Length

Max length4
Median length3
Mean length3.0001145
Min length3

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row60대
2nd row40대
3rd row60대
4th row30대
5th row40대

Common Values

ValueCountFrequency (%)
30대 2811
32.2%
20대 2229
25.5%
40대 1692
19.4%
50대 1099
 
12.6%
60대 475
 
5.4%
10대 352
 
4.0%
70대 74
 
0.8%
80대 3
 
< 0.1%
<NA> 1
 
< 0.1%

Length

2023-12-13T00:38:24.191061image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:38:24.356218image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
30대 2811
32.2%
20대 2229
25.5%
40대 1692
19.4%
50대 1099
 
12.6%
60대 475
 
5.4%
10대 352
 
4.0%
70대 74
 
0.8%
80대 3
 
< 0.1%
na 1
 
< 0.1%

지역
Categorical

Distinct19
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size68.4 KiB
경기
2032 
서울
1305 
부산
696 
인천
641 
경남
479 
Other values (14)
3583 

Length

Max length4
Median length2
Mean length2.0002289
Min length2

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row제주
2nd row경기
3rd row광주
4th row광주
5th row경북

Common Values

ValueCountFrequency (%)
경기 2032
23.3%
서울 1305
14.9%
부산 696
 
8.0%
인천 641
 
7.3%
경남 479
 
5.5%
충남 398
 
4.6%
대구 396
 
4.5%
대전 374
 
4.3%
강원 367
 
4.2%
광주 343
 
3.9%
Other values (9) 1705
19.5%

Length

2023-12-13T00:38:24.535240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기 2032
23.3%
서울 1305
14.9%
부산 696
 
8.0%
인천 641
 
7.3%
경남 479
 
5.5%
충남 398
 
4.6%
대구 396
 
4.5%
대전 374
 
4.3%
강원 367
 
4.2%
광주 343
 
3.9%
Other values (9) 1705
19.5%

도박자와의 관계
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size68.4 KiB
본인
6783 
부모
1180 
배우자
 
525
형제자매
 
142
자녀
 
49
Other values (4)
 
57

Length

Max length4
Median length2
Mean length2.0943223
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row부모
2nd row배우자
3rd row부모
4th row본인
5th row본인

Common Values

ValueCountFrequency (%)
본인 6783
77.6%
부모 1180
 
13.5%
배우자 525
 
6.0%
형제자매 142
 
1.6%
자녀 49
 
0.6%
지인 41
 
0.5%
친인척 13
 
0.1%
기관 2
 
< 0.1%
<NA> 1
 
< 0.1%

Length

2023-12-13T00:38:24.699657image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:38:24.866229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
본인 6783
77.6%
부모 1180
 
13.5%
배우자 525
 
6.0%
형제자매 142
 
1.6%
자녀 49
 
0.6%
지인 41
 
0.5%
친인척 13
 
0.1%
기관 2
 
< 0.1%
na 1
 
< 0.1%

1차 도박 유형
Categorical

HIGH CORRELATION 

Distinct15
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size68.4 KiB
스포츠도박
2929 
<NA>
1953 
미니게임/사다리게임
823 
카지노
773 
주식
700 
Other values (10)
1558 

Length

Max length10
Median length5
Mean length4.3173077
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row복권
5th row카지노

Common Values

ValueCountFrequency (%)
스포츠도박 2929
33.5%
<NA> 1953
22.4%
미니게임/사다리게임 823
 
9.4%
카지노 773
 
8.8%
주식 700
 
8.0%
기타 680
 
7.8%
카드 543
 
6.2%
복권 155
 
1.8%
성인오락 91
 
1.0%
화투 38
 
0.4%
Other values (5) 51
 
0.6%

Length

2023-12-13T00:38:25.063649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
스포츠도박 2929
33.5%
na 1953
22.4%
미니게임/사다리게임 823
 
9.4%
카지노 773
 
8.8%
주식 700
 
8.0%
기타 680
 
7.8%
카드 543
 
6.2%
복권 155
 
1.8%
성인오락 91
 
1.0%
화투 38
 
0.4%
Other values (5) 51
 
0.6%

기타
Text

MISSING 

Distinct70
Distinct (%)8.7%
Missing7931
Missing (%)90.8%
Memory size68.4 KiB
2023-12-13T00:38:25.268279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length12
Mean length4.5652174
Min length2

Characters and Unicode

Total characters3675
Distinct characters136
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique36 ?
Unique (%)4.5%

Sample

1st row파워볼
2nd row바카라
3rd row릴게임
4th row선물옵션
5th row가상화폐
ValueCountFrequency (%)
가상화폐 232
28.0%
파워볼 118
14.3%
비트코인 65
 
7.9%
fx마진거래 48
 
5.8%
바카라 47
 
5.7%
게임 40
 
4.8%
게임(리니지 28
 
3.4%
미니게임 27
 
3.3%
선물옵션 26
 
3.1%
암호화폐 21
 
2.5%
Other values (63) 176
21.3%
2023-12-13T00:38:25.726584image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
273
 
7.4%
272
 
7.4%
260
 
7.1%
259
 
7.0%
170
 
4.6%
170
 
4.6%
122
 
3.3%
119
 
3.2%
118
 
3.2%
) 109
 
3.0%
Other values (126) 1803
49.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3295
89.7%
Uppercase Letter 131
 
3.6%
Close Punctuation 109
 
3.0%
Open Punctuation 109
 
3.0%
Space Separator 28
 
0.8%
Lowercase Letter 3
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
273
 
8.3%
272
 
8.3%
260
 
7.9%
259
 
7.9%
170
 
5.2%
170
 
5.2%
122
 
3.7%
119
 
3.6%
118
 
3.6%
78
 
2.4%
Other values (114) 1454
44.1%
Uppercase Letter
ValueCountFrequency (%)
F 61
46.6%
X 61
46.6%
G 3
 
2.3%
P 2
 
1.5%
M 2
 
1.5%
R 2
 
1.5%
Lowercase Letter
ValueCountFrequency (%)
m 1
33.3%
j 1
33.3%
w 1
33.3%
Close Punctuation
ValueCountFrequency (%)
) 109
100.0%
Open Punctuation
ValueCountFrequency (%)
( 109
100.0%
Space Separator
ValueCountFrequency (%)
28
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3295
89.7%
Common 246
 
6.7%
Latin 134
 
3.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
273
 
8.3%
272
 
8.3%
260
 
7.9%
259
 
7.9%
170
 
5.2%
170
 
5.2%
122
 
3.7%
119
 
3.6%
118
 
3.6%
78
 
2.4%
Other values (114) 1454
44.1%
Latin
ValueCountFrequency (%)
F 61
45.5%
X 61
45.5%
G 3
 
2.2%
P 2
 
1.5%
M 2
 
1.5%
R 2
 
1.5%
m 1
 
0.7%
j 1
 
0.7%
w 1
 
0.7%
Common
ValueCountFrequency (%)
) 109
44.3%
( 109
44.3%
28
 
11.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3295
89.7%
ASCII 380
 
10.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
273
 
8.3%
272
 
8.3%
260
 
7.9%
259
 
7.9%
170
 
5.2%
170
 
5.2%
122
 
3.7%
119
 
3.6%
118
 
3.6%
78
 
2.4%
Other values (114) 1454
44.1%
ASCII
ValueCountFrequency (%)
) 109
28.7%
( 109
28.7%
F 61
16.1%
X 61
16.1%
28
 
7.4%
G 3
 
0.8%
P 2
 
0.5%
M 2
 
0.5%
R 2
 
0.5%
m 1
 
0.3%
Other values (2) 2
 
0.5%

온오프라인 여부
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size68.4 KiB
온라인
6281 
<NA>
1963 
오프라인
 
492

Length

Max length4
Median length3
Mean length3.2810211
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row온라인
5th row온라인

Common Values

ValueCountFrequency (%)
온라인 6281
71.9%
<NA> 1963
 
22.5%
오프라인 492
 
5.6%

Length

2023-12-13T00:38:25.908644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:38:26.042194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
온라인 6281
71.9%
na 1963
 
22.5%
오프라인 492
 
5.6%

합불법 여부
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size68.4 KiB
불법
5564 
<NA>
1963 
합법
1209 

Length

Max length4
Median length2
Mean length2.4494048
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row불법
5th row불법

Common Values

ValueCountFrequency (%)
불법 5564
63.7%
<NA> 1963
 
22.5%
합법 1209
 
13.8%

Length

2023-12-13T00:38:26.191404image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:38:26.390454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
불법 5564
63.7%
na 1963
 
22.5%
합법 1209
 
13.8%

Interactions

2023-12-13T00:38:22.792801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T00:38:26.525960image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번성별연령대지역도박자와의 관계1차 도박 유형기타온오프라인 여부합불법 여부
순번1.0000.0490.0390.1390.0000.1120.4050.0640.089
성별0.0491.0000.5790.1440.9380.2750.2750.1910.164
연령대0.0390.5791.0000.1570.6350.3800.6080.5790.415
지역0.1390.1440.1571.0000.1360.1380.3570.1960.180
도박자와의 관계0.0000.9380.6350.1361.000NaNNaNNaNNaN
1차 도박 유형0.1120.2750.3800.138NaN1.0001.0000.5590.822
기타0.4050.2750.6080.357NaN1.0001.0000.9430.746
온오프라인 여부0.0640.1910.5790.196NaN0.5590.9431.0000.419
합불법 여부0.0890.1640.4150.180NaN0.8220.7460.4191.000
2023-12-13T00:38:26.725204image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
도박자와의 관계합불법 여부성별온오프라인 여부지역1차 도박 유형연령대
도박자와의 관계1.0001.0000.7831.0000.0571.0000.258
합불법 여부1.0001.0000.1050.2760.1410.6690.312
성별0.7830.1051.0000.1220.1130.2150.437
온오프라인 여부1.0000.2760.1221.0000.1540.4400.438
지역0.0570.1410.1130.1541.0000.0470.066
1차 도박 유형1.0000.6690.2150.4400.0471.0000.178
연령대0.2580.3120.4370.4380.0660.1781.000
2023-12-13T00:38:26.920696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번성별연령대지역도박자와의 관계1차 도박 유형온오프라인 여부합불법 여부
순번1.0000.0380.0190.0530.0000.0450.0490.068
성별0.0381.0000.4370.1130.7830.2150.1220.105
연령대0.0190.4371.0000.0660.2580.1780.4380.312
지역0.0530.1130.0661.0000.0570.0470.1540.141
도박자와의 관계0.0000.7830.2580.0571.0001.0001.0001.000
1차 도박 유형0.0450.2150.1780.0471.0001.0000.4400.669
온오프라인 여부0.0490.1220.4380.1541.0000.4401.0000.276
합불법 여부0.0680.1050.3120.1411.0000.6690.2761.000

Missing values

2023-12-13T00:38:22.949121image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T00:38:23.127582image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T00:38:23.339827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

순번성별연령대지역도박자와의 관계1차 도박 유형기타온오프라인 여부합불법 여부
0160대제주부모<NA><NA><NA><NA>
1240대경기배우자<NA><NA><NA><NA>
2360대광주부모<NA><NA><NA><NA>
3430대광주본인복권파워볼온라인불법
4540대경북본인카지노<NA>온라인불법
5620대경기본인미니게임/사다리게임<NA>온라인불법
6720대경북본인스포츠도박<NA>온라인불법
7830대경기본인스포츠도박<NA>온라인불법
8930대경기본인스포츠도박<NA>온라인불법
91020대경남지인<NA><NA><NA><NA>
순번성별연령대지역도박자와의 관계1차 도박 유형기타온오프라인 여부합불법 여부
8726872770대강원본인화투<NA>오프라인불법
8727872820대서울본인스포츠도박<NA>온라인불법
8728872930대서울본인스포츠도박<NA>온라인불법
8729873030대서울본인미니게임/사다리게임<NA>온라인불법
8730873140대서울본인기타<NA>온라인불법
8731873230대서울배우자<NA><NA><NA><NA>
8732873330대경기본인스포츠도박<NA>온라인불법
8733873430대경기본인카드<NA>온라인불법
8734873520대경기본인스포츠도박<NA>온라인불법
8735<NA><NA><NA><NA><NA><NA><NA><NA><NA>