Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells9630
Missing cells (%)16.1%
Duplicate rows1150
Duplicate rows (%)11.5%
Total size in memory566.4 KiB
Average record size in memory58.0 B

Variable types

Categorical2
Numeric1
Boolean1
DateTime2

Dataset

Description통합회원관리시스템 할인대상 회원에 대한 시설코드, 회원종류코드, 할인대상자여부, 할인대상승인여부, 등록일, 변경일 등 정보를 제공합니다.
Author공공데이터포털
URLhttps://www.data.go.kr/data/15121829/fileData.do

Alerts

Dataset has 1150 (11.5%) duplicate rowsDuplicates
할인대상자 여부 is highly overall correlated with 시설코드High correlation
시설코드 is highly overall correlated with 할인대상자 여부 and 1 other fieldsHigh correlation
할인대상 승인여부 is highly overall correlated with 시설코드High correlation
회원구분코드 is highly imbalanced (91.9%)Imbalance
할인대상자 여부 has 7736 (77.4%) missing valuesMissing
할인대상 승인여부 has 1894 (18.9%) missing valuesMissing

Reproduction

Analysis started2024-04-17 11:05:02.862041
Analysis finished2024-04-17 11:05:03.767343
Duration0.91 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시설코드
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
I_FMC1
3231 
I_FMC2
2202 
I_FMC4
2021 
I_FMC5
1354 
I_FMC7
887 
Other values (4)
 
305

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowI_FMC5
2nd rowI_FMC1
3rd rowI_FMC4
4th rowI_FMC1
5th rowI_FMC4

Common Values

ValueCountFrequency (%)
I_FMC1 3231
32.3%
I_FMC2 2202
22.0%
I_FMC4 2021
20.2%
I_FMC5 1354
13.5%
I_FMC7 887
 
8.9%
I_FMC9 192
 
1.9%
I_FMC8 93
 
0.9%
I_FMC6 18
 
0.2%
I_FMC3 2
 
< 0.1%

Length

2024-04-17T20:05:03.830415image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-17T20:05:03.945731image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
i_fmc1 3231
32.3%
i_fmc2 2202
22.0%
i_fmc4 2021
20.2%
i_fmc5 1354
13.5%
i_fmc7 887
 
8.9%
i_fmc9 192
 
1.9%
i_fmc8 93
 
0.9%
i_fmc6 18
 
0.2%
i_fmc3 2
 
< 0.1%

회원구분코드
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2
9846 
<NA>
 
107
1
 
47

Length

Max length4
Median length1
Mean length1.0321
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2 9846
98.5%
<NA> 107
 
1.1%
1 47
 
0.5%

Length

2024-04-17T20:05:04.061010image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-17T20:05:04.147982image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 9846
98.5%
na 107
 
1.1%
1 47
 
0.5%

할인대상자 여부
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct13
Distinct (%)0.6%
Missing7736
Missing (%)77.4%
Infinite0
Infinite (%)0.0%
Mean4.0026502
Minimum1
Maximum16
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-17T20:05:04.227390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q12
median2
Q34
95-th percentile12
Maximum16
Range15
Interquartile range (IQR)2

Descriptive statistics

Standard deviation3.410492
Coefficient of variation (CV)0.85205847
Kurtosis2.8275165
Mean4.0026502
Median Absolute Deviation (MAD)0
Skewness1.8926934
Sum9062
Variance11.631456
MonotonicityNot monotonic
2024-04-17T20:05:04.318955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
2 1339
 
13.4%
4 310
 
3.1%
8 260
 
2.6%
15 90
 
0.9%
3 75
 
0.8%
12 55
 
0.5%
5 55
 
0.5%
11 26
 
0.3%
6 24
 
0.2%
1 22
 
0.2%
Other values (3) 8
 
0.1%
(Missing) 7736
77.4%
ValueCountFrequency (%)
1 22
 
0.2%
2 1339
13.4%
3 75
 
0.8%
4 310
 
3.1%
5 55
 
0.5%
6 24
 
0.2%
8 260
 
2.6%
9 2
 
< 0.1%
10 2
 
< 0.1%
11 26
 
0.3%
ValueCountFrequency (%)
16 4
 
< 0.1%
15 90
 
0.9%
12 55
 
0.5%
11 26
 
0.3%
10 2
 
< 0.1%
9 2
 
< 0.1%
8 260
2.6%
6 24
 
0.2%
5 55
 
0.5%
4 310
3.1%

할인대상 승인여부
Boolean

HIGH CORRELATION  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing1894
Missing (%)18.9%
Memory size97.7 KiB
False
6282 
True
1824 
(Missing)
1894 
ValueCountFrequency (%)
False 6282
62.8%
True 1824
 
18.2%
(Missing) 1894
 
18.9%
2024-04-17T20:05:04.415748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Distinct1467
Distinct (%)14.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2010-04-06 00:00:00
Maximum2023-08-25 00:00:00
2024-04-17T20:05:04.530485image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T20:05:04.641101image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct2135
Distinct (%)21.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2010-04-06 00:00:00
Maximum2023-08-31 00:00:00
2024-04-17T20:05:04.753066image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T20:05:04.862494image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2024-04-17T20:05:03.120355image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-17T20:05:04.932930image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시설코드회원구분코드할인대상자 여부할인대상 승인여부
시설코드1.0000.1160.8180.571
회원구분코드0.1161.0000.1010.049
할인대상자 여부0.8180.1011.0000.423
할인대상 승인여부0.5710.0490.4231.000
2024-04-17T20:05:05.013229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
회원구분코드시설코드할인대상 승인여부
회원구분코드1.0000.0870.031
시설코드0.0871.0000.574
할인대상 승인여부0.0310.5741.000
2024-04-17T20:05:05.088042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
할인대상자 여부시설코드회원구분코드할인대상 승인여부
할인대상자 여부1.0000.5720.0990.417
시설코드0.5721.0000.0870.574
회원구분코드0.0990.0871.0000.031
할인대상 승인여부0.4170.5740.0311.000

Missing values

2024-04-17T20:05:03.230759image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-17T20:05:03.323932image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-17T20:05:03.710352image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

시설코드회원구분코드할인대상자 여부할인대상 승인여부등록일변경일
35358I_FMC52<NA>N2011-08-252011-08-25
74581I_FMC12<NA><NA>2014-03-132014-03-13
67842I_FMC422Y2013-12-122014-06-17
75812I_FMC12<NA>N2014-02-202014-04-15
57719I_FMC422Y2012-12-172014-10-31
46783I_FMC72<NA>N2012-02-202012-02-20
2301I_FMC42<NA>N2010-04-212014-08-06
37601I_FMC12<NA>N2011-11-212011-11-21
50877I_FMC12<NA>N2012-06-202012-06-20
97058I_FMC12<NA><NA>2015-02-212015-02-21
시설코드회원구분코드할인대상자 여부할인대상 승인여부등록일변경일
81228I_FMC92<NA>N2014-06-202019-09-03
21073I_FMC42<NA>N2011-03-232015-06-15
73316I_FMC422Y2014-02-252022-09-06
85319I_FMC92<NA>N2014-12-202014-12-20
66926I_FMC124N2013-11-202013-11-20
17831I_FMC12<NA>N2010-12-012014-04-11
22933I_FMC12<NA>N2011-06-232011-06-23
71795I_FMC12<NA><NA>2013-10-022013-10-02
6420I_FMC524Y2010-06-082017-08-01
98873I_FMC22<NA>N2015-05-072015-05-07

Duplicate rows

Most frequently occurring

시설코드회원구분코드할인대상자 여부할인대상 승인여부등록일변경일# duplicates
338I_FMC12<NA><NA>2014-03-132014-03-13146
1002I_FMC52<NA><NA>2011-11-112011-11-1191
327I_FMC12<NA><NA>2013-07-252013-07-2576
1003I_FMC52<NA><NA>2011-11-142011-11-1473
1012I_FMC72<NA>N2012-02-152012-02-1557
345I_FMC12<NA><NA>2014-04-162014-04-1655
342I_FMC12<NA><NA>2014-04-112014-04-1142
344I_FMC12<NA><NA>2014-04-152014-04-1541
1017I_FMC72<NA>N2012-02-162012-02-1639
1000I_FMC52<NA><NA>2011-11-032011-11-0336