Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows2071
Duplicate rows (%)20.7%
Total size in memory576.2 KiB
Average record size in memory59.0 B

Variable types

DateTime1
Categorical1
Text1
Numeric3

Dataset

Description보흔휴양원에서 개방하는 객실 파손비품 내역 데이터로 파손일자, 파손구분, 파손비품, 비품단가, 비품수량, 파손금액이 포함된 데이터입니다.
URLhttps://www.data.go.kr/data/15117117/fileData.do

Alerts

Dataset has 2071 (20.7%) duplicate rowsDuplicates
비품단가 is highly overall correlated with 파손금액High correlation
비품수량 is highly overall correlated with 파손금액High correlation
파손금액 is highly overall correlated with 비품단가 and 1 other fieldsHigh correlation
파손구분 is highly imbalanced (73.6%)Imbalance
비품단가 is highly skewed (γ1 = 42.70658099)Skewed

Reproduction

Analysis started2023-12-12 00:44:37.997162
Analysis finished2023-12-12 00:44:40.130244
Duration2.13 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct4409
Distinct (%)44.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum1997-09-08 00:00:00
Maximum2023-07-24 00:00:00
2023-12-12T09:44:40.248063image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T09:44:40.456457image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

파손구분
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
추가비품
9551 
파손비품
 
449

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row추가비품
2nd row추가비품
3rd row파손비품
4th row추가비품
5th row추가비품

Common Values

ValueCountFrequency (%)
추가비품 9551
95.5%
파손비품 449
 
4.5%

Length

2023-12-12T09:44:40.664140image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T09:44:40.796683image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
추가비품 9551
95.5%
파손비품 449
 
4.5%
Distinct53
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T09:44:40.997062image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length3
Mean length3.027
Min length2

Characters and Unicode

Total characters30270
Distinct characters114
Distinct categories5 ?
Distinct scripts4 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)0.2%

Sample

1st row침구류
2nd row침구류
3rd row가위
4th row침구류
5th row침구류
ValueCountFrequency (%)
침구류 9551
95.4%
커피잔세트 117
 
1.2%
물컵 33
 
0.3%
밥공기 33
 
0.3%
소주잔 33
 
0.3%
접시 29
 
0.3%
찬그릇 24
 
0.2%
슬리퍼 20
 
0.2%
국그릇 19
 
0.2%
냉장고 13
 
0.1%
Other values (46) 143
 
1.4%
2023-12-12T09:44:41.375601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9564
31.6%
9555
31.6%
9551
31.6%
150
 
0.5%
119
 
0.4%
119
 
0.4%
118
 
0.4%
117
 
0.4%
45
 
0.1%
43
 
0.1%
Other values (104) 889
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 30235
99.9%
Space Separator 15
 
< 0.1%
Open Punctuation 8
 
< 0.1%
Close Punctuation 8
 
< 0.1%
Lowercase Letter 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9564
31.6%
9555
31.6%
9551
31.6%
150
 
0.5%
119
 
0.4%
119
 
0.4%
118
 
0.4%
117
 
0.4%
45
 
0.1%
43
 
0.1%
Other values (99) 854
 
2.8%
Lowercase Letter
ValueCountFrequency (%)
v 2
50.0%
t 2
50.0%
Space Separator
ValueCountFrequency (%)
15
100.0%
Open Punctuation
ValueCountFrequency (%)
( 8
100.0%
Close Punctuation
ValueCountFrequency (%)
) 8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 30231
99.9%
Common 31
 
0.1%
Han 4
 
< 0.1%
Latin 4
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9564
31.6%
9555
31.6%
9551
31.6%
150
 
0.5%
119
 
0.4%
119
 
0.4%
118
 
0.4%
117
 
0.4%
45
 
0.1%
43
 
0.1%
Other values (98) 850
 
2.8%
Common
ValueCountFrequency (%)
15
48.4%
( 8
25.8%
) 8
25.8%
Latin
ValueCountFrequency (%)
v 2
50.0%
t 2
50.0%
Han
ValueCountFrequency (%)
4
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 30231
99.9%
ASCII 35
 
0.1%
CJK 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
9564
31.6%
9555
31.6%
9551
31.6%
150
 
0.5%
119
 
0.4%
119
 
0.4%
118
 
0.4%
117
 
0.4%
45
 
0.1%
43
 
0.1%
Other values (98) 850
 
2.8%
ASCII
ValueCountFrequency (%)
15
42.9%
( 8
22.9%
) 8
22.9%
v 2
 
5.7%
t 2
 
5.7%
CJK
ValueCountFrequency (%)
4
100.0%

비품단가
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct31
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3344.183
Minimum550
Maximum200000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T09:44:41.543648image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum550
5-th percentile2000
Q12000
median3000
Q35000
95-th percentile5000
Maximum200000
Range199450
Interquartile range (IQR)3000

Descriptive statistics

Standard deviation2662.7256
Coefficient of variation (CV)0.79622603
Kurtosis3011.1489
Mean3344.183
Median Absolute Deviation (MAD)1000
Skewness42.706581
Sum33441830
Variance7090107.4
MonotonicityNot monotonic
2023-12-12T09:44:41.702946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
2000 4311
43.1%
5000 3434
34.3%
3000 2000
20.0%
1000 64
 
0.6%
4000 46
 
0.5%
10000 38
 
0.4%
6000 18
 
0.2%
1650 14
 
0.1%
2420 11
 
0.1%
2200 10
 
0.1%
Other values (21) 54
 
0.5%
ValueCountFrequency (%)
550 1
 
< 0.1%
660 1
 
< 0.1%
1000 64
 
0.6%
1100 5
 
0.1%
1650 14
 
0.1%
2000 4311
43.1%
2200 10
 
0.1%
2420 11
 
0.1%
2500 3
 
< 0.1%
3000 2000
20.0%
ValueCountFrequency (%)
200000 1
 
< 0.1%
60000 1
 
< 0.1%
46000 1
 
< 0.1%
35000 2
 
< 0.1%
33000 1
 
< 0.1%
30000 2
 
< 0.1%
29000 1
 
< 0.1%
20000 5
0.1%
19000 2
 
< 0.1%
14000 2
 
< 0.1%

비품수량
Real number (ℝ)

HIGH CORRELATION 

Distinct16
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.4719
Minimum1
Maximum40
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T09:44:41.889057image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile3
Maximum40
Range39
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.94037164
Coefficient of variation (CV)0.63888283
Kurtosis380.7812
Mean1.4719
Median Absolute Deviation (MAD)0
Skewness12.360625
Sum14719
Variance0.88429882
MonotonicityNot monotonic
2023-12-12T09:44:42.014584image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
1 6464
64.6%
2 2762
27.6%
3 593
 
5.9%
4 111
 
1.1%
5 46
 
0.5%
6 6
 
0.1%
7 4
 
< 0.1%
10 3
 
< 0.1%
8 3
 
< 0.1%
17 2
 
< 0.1%
Other values (6) 6
 
0.1%
ValueCountFrequency (%)
1 6464
64.6%
2 2762
27.6%
3 593
 
5.9%
4 111
 
1.1%
5 46
 
0.5%
6 6
 
0.1%
7 4
 
< 0.1%
8 3
 
< 0.1%
10 3
 
< 0.1%
11 1
 
< 0.1%
ValueCountFrequency (%)
40 1
 
< 0.1%
28 1
 
< 0.1%
18 1
 
< 0.1%
17 2
< 0.1%
15 1
 
< 0.1%
14 1
 
< 0.1%
11 1
 
< 0.1%
10 3
< 0.1%
8 3
< 0.1%
7 4
< 0.1%

파손금액
Real number (ℝ)

HIGH CORRELATION 

Distinct43
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4916.133
Minimum550
Maximum200000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T09:44:42.211905image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum550
5-th percentile2000
Q12000
median4000
Q36000
95-th percentile10000
Maximum200000
Range199450
Interquartile range (IQR)4000

Descriptive statistics

Standard deviation4378.0003
Coefficient of variation (CV)0.89053739
Kurtosis510.31011
Mean4916.133
Median Absolute Deviation (MAD)2000
Skewness14.444207
Sum49161330
Variance19166886
MonotonicityNot monotonic
2023-12-12T09:44:42.394240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=43)
ValueCountFrequency (%)
2000 2979
29.8%
5000 2033
20.3%
10000 1245
12.4%
3000 1210
12.1%
4000 1009
 
10.1%
6000 908
 
9.1%
15000 186
 
1.9%
9000 139
 
1.4%
1000 61
 
0.6%
8000 58
 
0.6%
Other values (33) 172
 
1.7%
ValueCountFrequency (%)
550 1
 
< 0.1%
660 1
 
< 0.1%
1000 61
 
0.6%
1100 5
 
0.1%
1650 14
 
0.1%
2000 2979
29.8%
2200 10
 
0.1%
2420 11
 
0.1%
2500 2
 
< 0.1%
3000 1210
12.1%
ValueCountFrequency (%)
200000 1
< 0.1%
138000 1
< 0.1%
85000 1
< 0.1%
80000 1
< 0.1%
70000 1
< 0.1%
60000 1
< 0.1%
56000 1
< 0.1%
40000 1
< 0.1%
36000 1
< 0.1%
35000 2
< 0.1%

Interactions

2023-12-12T09:44:39.449305image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T09:44:38.521736image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T09:44:39.023211image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T09:44:39.617312image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T09:44:38.667306image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T09:44:39.183336image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T09:44:39.728704image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T09:44:38.868368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T09:44:39.331852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T09:44:42.554237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
파손구분파손비품비품단가비품수량파손금액
파손구분1.0001.0000.2070.0000.078
파손비품1.0001.0000.9900.0000.869
비품단가0.2070.9901.0000.0000.876
비품수량0.0000.0000.0001.0000.906
파손금액0.0780.8690.8760.9061.000
2023-12-12T09:44:43.035955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비품단가비품수량파손금액파손구분
비품단가1.0000.0740.7500.137
비품수량0.0741.0000.6940.000
파손금액0.7500.6941.0000.084
파손구분0.1370.0000.0841.000

Missing values

2023-12-12T09:44:39.894151image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T09:44:40.044255image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

파손일자파손구분파손비품비품단가비품수량파손금액
100282009-12-25추가비품침구류300026000
39891998-04-13추가비품침구류200012000
186212019-08-18파손비품가위200012000
93352010-11-05추가비품침구류300013000
44791998-08-10추가비품침구류200012000
91202010-10-16추가비품침구류300026000
167472017-02-26추가비품침구류500015000
109522011-08-15추가비품침구류300026000
64002007-09-02추가비품침구류200012000
153122015-08-07추가비품침구류500015000
파손일자파손구분파손비품비품단가비품수량파손금액
157092016-12-30추가비품침구류500015000
146912014-08-30추가비품침구류5000210000
31141999-11-06추가비품침구류200012000
48212005-09-26추가비품침구류200012000
169182018-04-17추가비품침구류500015000
35831998-06-11추가비품침구류200012000
129792014-05-24추가비품침구류500015000
115252012-01-13추가비품침구류300013000
121842013-10-26추가비품침구류500015000
183742019-11-08추가비품침구류500015000

Duplicate rows

Most frequently occurring

파손일자파손구분파손비품비품단가비품수량파손금액# duplicates
651998-06-11추가비품침구류20001200019
1701999-04-27추가비품침구류20001200017
2992000-06-20추가비품침구류20001200017
541998-05-06추가비품침구류20001200016
1141998-10-20추가비품침구류20001200016
751998-06-30추가비품침구류20001200014
351998-02-24추가비품침구류20001200012
501998-04-20추가비품침구류20001200012
1211998-10-30추가비품침구류20001200012
451998-04-07추가비품침구류20001200011