Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows959
Duplicate rows (%)9.6%
Total size in memory742.2 KiB
Average record size in memory76.0 B

Variable types

Numeric3
Categorical2
Boolean1
DateTime2

Dataset

Description연수번호, 호텔순번, 객실번호, 남녀구분 등 연수에 참가했던 인원의 숙박에 대한 정보 항목을 제공하기 위한 데이터 자료입니다.
URLhttps://www.data.go.kr/data/15042286/fileData.do

Alerts

독실여부 has constant value ""Constant
Dataset has 959 (9.6%) duplicate rowsDuplicates
객실번호 is highly overall correlated with 수용인원High correlation
수용인원 is highly overall correlated with 객실번호High correlation
신청인원 has 5418 (54.2%) zerosZeros

Reproduction

Analysis started2023-12-12 04:15:55.908788
Analysis finished2023-12-12 04:15:58.232994
Duration2.32 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호텔고유번호
Real number (ℝ)

Distinct28
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean261.6876
Minimum40
Maximum840
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T13:15:58.315350image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum40
5-th percentile100
Q1100
median160
Q3480
95-th percentile520
Maximum840
Range800
Interquartile range (IQR)380

Descriptive statistics

Standard deviation179.09853
Coefficient of variation (CV)0.68439825
Kurtosis-0.39054226
Mean261.6876
Median Absolute Deviation (MAD)60
Skewness0.8067231
Sum2616876
Variance32076.285
MonotonicityNot monotonic
2023-12-12T13:15:58.509781image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
100 3236
32.4%
500 1238
 
12.4%
160 1119
 
11.2%
480 996
 
10.0%
280 769
 
7.7%
120 490
 
4.9%
302 489
 
4.9%
180 351
 
3.5%
240 216
 
2.2%
800 165
 
1.7%
Other values (18) 931
 
9.3%
ValueCountFrequency (%)
40 151
 
1.5%
100 3236
32.4%
120 490
 
4.9%
140 8
 
0.1%
160 1119
 
11.2%
180 351
 
3.5%
200 10
 
0.1%
220 67
 
0.7%
240 216
 
2.2%
280 769
 
7.7%
ValueCountFrequency (%)
840 4
 
< 0.1%
800 165
 
1.7%
660 21
 
0.2%
640 14
 
0.1%
620 31
 
0.3%
600 59
 
0.6%
540 78
 
0.8%
520 155
 
1.6%
501 18
 
0.2%
500 1238
12.4%

객실번호
Real number (ℝ)

HIGH CORRELATION 

Distinct391
Distinct (%)3.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.2837
Minimum1
Maximum600
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T13:15:58.735164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q113
median27
Q350
95-th percentile132
Maximum600
Range599
Interquartile range (IQR)37

Descriptive statistics

Standard deviation62.874097
Coefficient of variation (CV)1.4526045
Kurtosis30.644257
Mean43.2837
Median Absolute Deviation (MAD)17
Skewness4.8566618
Sum432837
Variance3953.1521
MonotonicityNot monotonic
2023-12-12T13:15:58.912777image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6 218
 
2.2%
5 215
 
2.1%
18 211
 
2.1%
19 208
 
2.1%
10 208
 
2.1%
4 208
 
2.1%
16 208
 
2.1%
9 207
 
2.1%
13 205
 
2.1%
1 205
 
2.1%
Other values (381) 7907
79.1%
ValueCountFrequency (%)
1 205
2.1%
2 200
2.0%
3 198
2.0%
4 208
2.1%
5 215
2.1%
6 218
2.2%
7 200
2.0%
8 201
2.0%
9 207
2.1%
10 208
2.1%
ValueCountFrequency (%)
600 1
< 0.1%
597 1
< 0.1%
596 1
< 0.1%
595 1
< 0.1%
593 1
< 0.1%
591 1
< 0.1%
590 1
< 0.1%
589 1
< 0.1%
587 1
< 0.1%
586 1
< 0.1%

수용인원
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2
6107 
1
3602 
0
 
291

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2 6107
61.1%
1 3602
36.0%
0 291
 
2.9%

Length

2023-12-12T13:15:59.064926image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:15:59.158767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 6107
61.1%
1 3602
36.0%
0 291
 
2.9%

신청인원
Real number (ℝ)

ZEROS 

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.6344
Minimum-6
Maximum2
Zeros5418
Zeros (%)54.2%
Negative35
Negative (%)0.4%
Memory size166.0 KiB
2023-12-12T13:15:59.248929image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-6
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum2
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.78879583
Coefficient of variation (CV)1.243373
Kurtosis-0.016734052
Mean0.6344
Median Absolute Deviation (MAD)0
Skewness0.52997402
Sum6344
Variance0.62219886
MonotonicityNot monotonic
2023-12-12T13:15:59.383307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
0 5418
54.2%
1 2694
26.9%
2 1853
 
18.5%
-1 25
 
0.2%
-2 5
 
0.1%
-3 2
 
< 0.1%
-6 1
 
< 0.1%
-5 1
 
< 0.1%
-4 1
 
< 0.1%
ValueCountFrequency (%)
-6 1
 
< 0.1%
-5 1
 
< 0.1%
-4 1
 
< 0.1%
-3 2
 
< 0.1%
-2 5
 
0.1%
-1 25
 
0.2%
0 5418
54.2%
1 2694
26.9%
2 1853
 
18.5%
ValueCountFrequency (%)
2 1853
 
18.5%
1 2694
26.9%
0 5418
54.2%
-1 25
 
0.2%
-2 5
 
0.1%
-3 2
 
< 0.1%
-4 1
 
< 0.1%
-5 1
 
< 0.1%
-6 1
 
< 0.1%

남녀구분
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
M
4925 
F
4461 
<NA>
614 

Length

Max length4
Median length1
Mean length1.1842
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowM
4th rowF
5th rowM

Common Values

ValueCountFrequency (%)
M 4925
49.2%
F 4461
44.6%
<NA> 614
 
6.1%

Length

2023-12-12T13:15:59.549962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:15:59.648226image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
m 4925
49.2%
f 4461
44.6%
na 614
 
6.1%

독실여부
Boolean

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size87.9 KiB
False
10000 
ValueCountFrequency (%)
False 10000
100.0%
2023-12-12T13:15:59.752170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Distinct454
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2007-02-28 00:00:00
Maximum2023-07-03 00:00:00
2023-12-12T13:15:59.876966image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:16:00.046055image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct620
Distinct (%)6.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2007-02-28 00:00:00
Maximum2023-07-03 00:00:00
2023-12-12T13:16:00.197829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:16:00.348125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2023-12-12T13:15:57.142328image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:15:56.377056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:15:56.739917image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:15:57.279994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:15:56.481885image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:15:56.862891image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:15:57.444616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:15:56.606109image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:15:57.002566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T13:16:00.440780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호텔고유번호객실번호수용인원신청인원남녀구분
호텔고유번호1.0000.3290.5980.4230.015
객실번호0.3291.0000.7000.1180.172
수용인원0.5980.7001.0000.5470.025
신청인원0.4230.1180.5471.0000.168
남녀구분0.0150.1720.0250.1681.000
2023-12-12T13:16:00.543576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
남녀구분수용인원
남녀구분1.0000.016
수용인원0.0161.000
2023-12-12T13:16:00.633175image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호텔고유번호객실번호신청인원수용인원남녀구분
호텔고유번호1.000-0.2930.3340.3230.015
객실번호-0.2931.000-0.0950.5570.114
신청인원0.334-0.0951.0000.2860.126
수용인원0.3230.5570.2861.0000.016
남녀구분0.0150.1140.1260.0161.000

Missing values

2023-12-12T13:15:57.933831image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T13:15:58.138712image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

호텔고유번호객실번호수용인원신청인원남녀구분독실여부입력일수정일
178195201621MN2016-03-032016-03-03
133475002321FN2014-05-132014-05-13
17351004121MN2008-01-312008-01-31
901008620FN2007-05-102007-05-10
198645405622MN2016-04-182016-04-18
163794801222MN2019-02-282019-02-28
61212201911MN2009-07-222009-07-22
73651004920FN2009-09-042009-09-04
196664801922MN2014-05-272014-05-27
17583480711FN2015-02-102015-02-10
호텔고유번호객실번호수용인원신청인원남녀구분독실여부입력일수정일
15336500611FN2018-04-272018-04-27
371718053800<NA>N2008-05-022008-05-02
9238100810MN2010-07-082010-07-08
5910100112-1MN2009-04-202009-04-20
49931003310FN2008-11-102008-11-10
48481002821MN2008-10-102008-10-10
166824801321MN2015-09-082015-10-08
114042801010FN2011-12-192011-12-19
80132402021MN2010-04-092010-04-09
59611005420FN2009-05-112009-05-11

Duplicate rows

Most frequently occurring

호텔고유번호객실번호수용인원신청인원남녀구분독실여부입력일수정일# duplicates
2531003610FN2010-03-032010-03-037
1441001920MN2010-03-032010-03-036
2771003910FN2010-03-032010-03-036
3221004620FN2010-03-032010-03-036
3551005020FN2010-03-032010-03-036
35100410MN2010-03-032010-03-035
37100410MN2010-09-032010-09-035
44100510MN2010-03-032010-03-035
55100610MN2010-03-032010-03-035
71100810MN2010-03-032010-03-035