Overview

Dataset statistics

Number of variables9
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.4 KiB
Average record size in memory75.3 B

Variable types

Numeric1
Categorical5
Boolean3

Alerts

examin_begin_de has constant value ""Constant
wt3m_ovsea_tour_plan_at is highly overall correlated with wt6m_ovsea_tour_plan_atHigh correlation
wt6m_ovsea_tour_plan_at is highly overall correlated with wt3m_ovsea_tour_plan_atHigh correlation
wt3m_ovsea_tour_plan_at is highly imbalanced (50.0%)Imbalance
respond_id has unique valuesUnique

Reproduction

Analysis started2023-12-10 09:57:02.395871
Analysis finished2023-12-10 09:57:04.053153
Duration1.66 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

respond_id
Real number (ℝ)

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean582726.86
Minimum924
Maximum3244803
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T18:57:04.658873image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum924
5-th percentile11020.9
Q1103337.75
median292789.5
Q3410394.5
95-th percentile3057500.8
Maximum3244803
Range3243879
Interquartile range (IQR)307056.75

Descriptive statistics

Standard deviation907900.38
Coefficient of variation (CV)1.5580205
Kurtosis3.5893097
Mean582726.86
Median Absolute Deviation (MAD)135276.5
Skewness2.2354705
Sum58272686
Variance8.2428309 × 1011
MonotonicityNot monotonic
2023-12-10T18:57:05.027768image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
365540 1
 
1.0%
105914 1
 
1.0%
3053256 1
 
1.0%
313326 1
 
1.0%
381170 1
 
1.0%
214611 1
 
1.0%
410882 1
 
1.0%
157823 1
 
1.0%
1718213 1
 
1.0%
320219 1
 
1.0%
Other values (90) 90
90.0%
ValueCountFrequency (%)
924 1
1.0%
1920 1
1.0%
3189 1
1.0%
5651 1
1.0%
10658 1
1.0%
11040 1
1.0%
22333 1
1.0%
23024 1
1.0%
23027 1
1.0%
26650 1
1.0%
ValueCountFrequency (%)
3244803 1
1.0%
3242158 1
1.0%
3239449 1
1.0%
3210223 1
1.0%
3073020 1
1.0%
3056684 1
1.0%
3053256 1
1.0%
3029255 1
1.0%
3015909 1
1.0%
2965280 1
1.0%

examin_begin_de
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
20211108
100 

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20211108
2nd row20211108
3rd row20211108
4th row20211108
5th row20211108

Common Values

ValueCountFrequency (%)
20211108 100
100.0%

Length

2023-12-10T18:57:05.327084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:57:05.508303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20211108 100
100.0%

sexdstn_flag_cd
Categorical

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
M
58 
F
42 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowF
3rd rowF
4th rowF
5th rowF

Common Values

ValueCountFrequency (%)
M 58
58.0%
F 42
42.0%

Length

2023-12-10T18:57:05.878976image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:57:06.087188image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
m 58
58.0%
f 42
42.0%

agrde_flag_nm
Categorical

Distinct4
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
30대
48 
50대
25 
60대
19 
40대

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row60대
2nd row50대
3rd row60대
4th row60대
5th row60대

Common Values

ValueCountFrequency (%)
30대 48
48.0%
50대 25
25.0%
60대 19
 
19.0%
40대 8
 
8.0%

Length

2023-12-10T18:57:06.395176image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:57:06.630762image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
30대 48
48.0%
50대 25
25.0%
60대 19
 
19.0%
40대 8
 
8.0%
Distinct11
Distinct (%)11.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
서울특별시
42 
경기도
26 
경상남도
광주광역시
부산광역시
Other values (6)
13 

Length

Max length5
Median length5
Mean length4.29
Min length3

Unique

Unique4 ?
Unique (%)4.0%

Sample

1st row서울특별시
2nd row강원도
3rd row서울특별시
4th row서울특별시
5th row서울특별시

Common Values

ValueCountFrequency (%)
서울특별시 42
42.0%
경기도 26
26.0%
경상남도 7
 
7.0%
광주광역시 7
 
7.0%
부산광역시 5
 
5.0%
대구광역시 5
 
5.0%
강원도 4
 
4.0%
전라남도 1
 
1.0%
인천광역시 1
 
1.0%
제주도 1
 
1.0%

Length

2023-12-10T18:57:06.979739image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서울특별시 42
42.0%
경기도 26
26.0%
경상남도 7
 
7.0%
광주광역시 7
 
7.0%
부산광역시 5
 
5.0%
대구광역시 5
 
5.0%
강원도 4
 
4.0%
전라남도 1
 
1.0%
인천광역시 1
 
1.0%
제주도 1
 
1.0%
Distinct4
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
300이상500만원 미만
30 
500이상700만원 미만
30 
700만원 이상
29 
300만원 미만
11 

Length

Max length13
Median length13
Mean length11
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row300만원 미만
2nd row700만원 이상
3rd row700만원 이상
4th row300만원 미만
5th row300이상500만원 미만

Common Values

ValueCountFrequency (%)
300이상500만원 미만 30
30.0%
500이상700만원 미만 30
30.0%
700만원 이상 29
29.0%
300만원 미만 11
 
11.0%

Length

2023-12-10T18:57:07.291696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:57:07.479422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
미만 71
35.5%
300이상500만원 30
15.0%
500이상700만원 30
15.0%
700만원 29
14.5%
이상 29
14.5%
300만원 11
 
5.5%
Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size232.0 B
True
71 
False
29 
ValueCountFrequency (%)
True 71
71.0%
False 29
29.0%
2023-12-10T18:57:07.725770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

wt3m_ovsea_tour_plan_at
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size232.0 B
False
89 
True
11 
ValueCountFrequency (%)
False 89
89.0%
True 11
 
11.0%
2023-12-10T18:57:07.874087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

wt6m_ovsea_tour_plan_at
Boolean

HIGH CORRELATION 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size232.0 B
False
72 
True
28 
ValueCountFrequency (%)
False 72
72.0%
True 28
 
28.0%
2023-12-10T18:57:08.023611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Interactions

2023-12-10T18:57:03.282605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T18:57:08.171885image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
respond_idsexdstn_flag_cdagrde_flag_nmanswrr_oc_area_nmhshld_income_dgree_nmwt3m_dmstc_tour_plan_atwt3m_ovsea_tour_plan_atwt6m_ovsea_tour_plan_at
respond_id1.0000.1390.0000.0000.0000.0610.2210.357
sexdstn_flag_cd0.1391.0000.5220.4780.0000.1040.0000.000
agrde_flag_nm0.0000.5221.0000.5680.1440.0000.0000.000
answrr_oc_area_nm0.0000.4780.5681.0000.0000.1990.0000.000
hshld_income_dgree_nm0.0000.0000.1440.0001.0000.2050.0000.000
wt3m_dmstc_tour_plan_at0.0610.1040.0000.1990.2051.0000.0000.127
wt3m_ovsea_tour_plan_at0.2210.0000.0000.0000.0000.0001.0000.730
wt6m_ovsea_tour_plan_at0.3570.0000.0000.0000.0000.1270.7301.000
2023-12-10T18:57:08.474239image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
agrde_flag_nmsexdstn_flag_cdhshld_income_dgree_nmwt3m_dmstc_tour_plan_atwt3m_ovsea_tour_plan_atwt6m_ovsea_tour_plan_atanswrr_oc_area_nm
agrde_flag_nm1.0000.3510.0540.0000.0000.0000.366
sexdstn_flag_cd0.3511.0000.0000.0650.0000.0000.437
hshld_income_dgree_nm0.0540.0001.0000.1340.0000.0000.000
wt3m_dmstc_tour_plan_at0.0000.0650.1341.0000.0000.0810.179
wt3m_ovsea_tour_plan_at0.0000.0000.0000.0001.0000.5210.000
wt6m_ovsea_tour_plan_at0.0000.0000.0000.0810.5211.0000.000
answrr_oc_area_nm0.3660.4370.0000.1790.0000.0001.000
2023-12-10T18:57:08.789758image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
respond_idsexdstn_flag_cdagrde_flag_nmanswrr_oc_area_nmhshld_income_dgree_nmwt3m_dmstc_tour_plan_atwt3m_ovsea_tour_plan_atwt6m_ovsea_tour_plan_at
respond_id1.0000.1740.0000.0000.0000.0640.2670.430
sexdstn_flag_cd0.1741.0000.3510.4370.0000.0650.0000.000
agrde_flag_nm0.0000.3511.0000.3660.0540.0000.0000.000
answrr_oc_area_nm0.0000.4370.3661.0000.0000.1790.0000.000
hshld_income_dgree_nm0.0000.0000.0540.0001.0000.1340.0000.000
wt3m_dmstc_tour_plan_at0.0640.0650.0000.1790.1341.0000.0000.081
wt3m_ovsea_tour_plan_at0.2670.0000.0000.0000.0000.0001.0000.521
wt6m_ovsea_tour_plan_at0.4300.0000.0000.0000.0000.0810.5211.000

Missing values

2023-12-10T18:57:03.587027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T18:57:03.881680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

respond_idexamin_begin_desexdstn_flag_cdagrde_flag_nmanswrr_oc_area_nmhshld_income_dgree_nmwt3m_dmstc_tour_plan_atwt3m_ovsea_tour_plan_atwt6m_ovsea_tour_plan_at
036554020211108F60대서울특별시300만원 미만NNN
1323944920211108F50대강원도700만원 이상YNY
239183020211108F60대서울특별시700만원 이상NNN
338215720211108F60대서울특별시300만원 미만YNN
439806720211108F60대서울특별시300이상500만원 미만YNN
527283320211108F60대서울특별시300이상500만원 미만YNN
6177694720211108F60대부산광역시300이상500만원 미만YNN
7324215820211108F50대경기도700만원 이상YNY
839419720211108F60대부산광역시500이상700만원 미만YNN
9305668420211108F50대서울특별시700만원 이상YNY
respond_idexamin_begin_desexdstn_flag_cdagrde_flag_nmanswrr_oc_area_nmhshld_income_dgree_nmwt3m_dmstc_tour_plan_atwt3m_ovsea_tour_plan_atwt6m_ovsea_tour_plan_at
903549920211108F60대대구광역시700만원 이상YNN
913558820211108M40대제주도300이상500만원 미만NNN
923632220211108F40대경상남도500이상700만원 미만YYY
933689020211108M50대경상북도700만원 이상NNN
943739020211108M40대부산광역시500이상700만원 미만YNN
953862220211108M60대광주광역시700만원 이상NNN
963942920211108M60대부산광역시300만원 미만NNN
974084420211108F30대경기도700만원 이상YNN
984130120211108M40대서울특별시300이상500만원 미만YNN
994301920211108F40대경기도300이상500만원 미만YNN