Overview

Dataset statistics

Number of variables5
Number of observations134
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.5 KiB
Average record size in memory42.0 B

Variable types

Categorical3
Text1
Numeric1

Dataset

Description성북구도시관리공단 개운산스포츠센터에서 운영하는 수영, 헬스, 각종 취미문화 프로그램의 대상, 이용시간, 요일, 수강료에 대하여 정보를 제공하고 있습니다.
Author성북구도시관리공단
URLhttps://www.data.go.kr/data/15002809/fileData.do

Alerts

프로그램그룹 is highly imbalanced (61.7%)Imbalance

Reproduction

Analysis started2023-12-12 10:03:47.784810
Analysis finished2023-12-12 10:03:48.344404
Duration0.56 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

프로그램그룹
Categorical

IMBALANCE 

Distinct2
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
스포츠프로그램
124 
문화프로그램
 
10

Length

Max length7
Median length7
Mean length6.9253731
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row문화프로그램
2nd row문화프로그램
3rd row문화프로그램
4th row문화프로그램
5th row문화프로그램

Common Values

ValueCountFrequency (%)
스포츠프로그램 124
92.5%
문화프로그램 10
 
7.5%

Length

2023-12-12T19:03:48.438573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T19:03:48.580483image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
스포츠프로그램 124
92.5%
문화프로그램 10
 
7.5%
Distinct90
Distinct (%)67.2%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
2023-12-12T19:03:48.931753image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length22
Mean length19.223881
Min length8

Characters and Unicode

Total characters2576
Distinct characters145
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique51 ?
Unique (%)38.1%

Sample

1st row[NEW스포츠]밸리댄스(성인청소년/화목)
2nd row[변경]바이엘11D
3rd row[변경]바이엘11D
4th row[변경]바이엘11D
5th row[변경]체르니11D
ValueCountFrequency (%)
변경]바이엘11d 3
 
2.2%
new]월자유수영18시(월~금 3
 
2.2%
new]월자유수영12시(월~금 3
 
2.2%
new]월자유수영21시(월~금 3
 
2.2%
new]월자유수영08시(월~금 3
 
2.2%
단기특강]성인소그룹반18시a(자유형,배영 2
 
1.4%
단기특강]성인소그룹반16시b(자유형,배영 2
 
1.4%
수영new]직장(성인/청소년)21시화목 2
 
1.4%
수영new]직장(성인/청소년)21시월수금 2
 
1.4%
수영new]직장(성인/청소년)20시화목 2
 
1.4%
Other values (83) 114
82.0%
2023-12-12T19:03:49.444691image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
127
 
4.9%
) 122
 
4.7%
( 122
 
4.7%
[ 121
 
4.7%
] 121
 
4.7%
1 105
 
4.1%
105
 
4.1%
96
 
3.7%
N 91
 
3.5%
E 91
 
3.5%
Other values (135) 1475
57.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1408
54.7%
Uppercase Letter 305
 
11.8%
Decimal Number 257
 
10.0%
Close Punctuation 243
 
9.4%
Open Punctuation 243
 
9.4%
Other Punctuation 85
 
3.3%
Math Symbol 28
 
1.1%
Space Separator 5
 
0.2%
Dash Punctuation 2
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
127
 
9.0%
105
 
7.5%
96
 
6.8%
80
 
5.7%
66
 
4.7%
66
 
4.7%
65
 
4.6%
58
 
4.1%
51
 
3.6%
50
 
3.6%
Other values (103) 644
45.7%
Decimal Number
ValueCountFrequency (%)
1 105
40.9%
0 47
18.3%
2 26
 
10.1%
6 22
 
8.6%
9 18
 
7.0%
8 17
 
6.6%
7 11
 
4.3%
3 5
 
1.9%
5 4
 
1.6%
4 2
 
0.8%
Uppercase Letter
ValueCountFrequency (%)
N 91
29.8%
E 91
29.8%
W 88
28.9%
B 9
 
3.0%
A 9
 
3.0%
D 9
 
3.0%
S 3
 
1.0%
P 3
 
1.0%
T 1
 
0.3%
O 1
 
0.3%
Other Punctuation
ValueCountFrequency (%)
/ 62
72.9%
, 12
 
14.1%
. 9
 
10.6%
& 2
 
2.4%
Close Punctuation
ValueCountFrequency (%)
) 122
50.2%
] 121
49.8%
Open Punctuation
ValueCountFrequency (%)
( 122
50.2%
[ 121
49.8%
Math Symbol
ValueCountFrequency (%)
~ 26
92.9%
+ 2
 
7.1%
Space Separator
ValueCountFrequency (%)
5
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1408
54.7%
Common 863
33.5%
Latin 305
 
11.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
127
 
9.0%
105
 
7.5%
96
 
6.8%
80
 
5.7%
66
 
4.7%
66
 
4.7%
65
 
4.6%
58
 
4.1%
51
 
3.6%
50
 
3.6%
Other values (103) 644
45.7%
Common
ValueCountFrequency (%)
) 122
14.1%
( 122
14.1%
[ 121
14.0%
] 121
14.0%
1 105
12.2%
/ 62
7.2%
0 47
 
5.4%
2 26
 
3.0%
~ 26
 
3.0%
6 22
 
2.5%
Other values (12) 89
10.3%
Latin
ValueCountFrequency (%)
N 91
29.8%
E 91
29.8%
W 88
28.9%
B 9
 
3.0%
A 9
 
3.0%
D 9
 
3.0%
S 3
 
1.0%
P 3
 
1.0%
T 1
 
0.3%
O 1
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1408
54.7%
ASCII 1168
45.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
127
 
9.0%
105
 
7.5%
96
 
6.8%
80
 
5.7%
66
 
4.7%
66
 
4.7%
65
 
4.6%
58
 
4.1%
51
 
3.6%
50
 
3.6%
Other values (103) 644
45.7%
ASCII
ValueCountFrequency (%)
) 122
10.4%
( 122
10.4%
[ 121
10.4%
] 121
10.4%
1 105
9.0%
N 91
7.8%
E 91
7.8%
W 88
7.5%
/ 62
 
5.3%
0 47
 
4.0%
Other values (22) 198
17.0%

시간
Categorical

Distinct18
Distinct (%)13.4%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
11:00~11:50
17 
10:00~10:50
14 
09:00~09:50
13 
12:00~12:50
11 
08:00~08:50
10 
Other values (13)
69 

Length

Max length11
Median length11
Mean length11
Min length11

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row12:00~12:50
2nd row11:00~11:50
3rd row11:00~11:50
4th row11:00~11:50
5th row11:00~11:50

Common Values

ValueCountFrequency (%)
11:00~11:50 17
12.7%
10:00~10:50 14
10.4%
09:00~09:50 13
9.7%
12:00~12:50 11
 
8.2%
08:00~08:50 10
 
7.5%
21:00~21:50 9
 
6.7%
18:00~18:50 9
 
6.7%
06:00~06:50 7
 
5.2%
20:00~20:50 7
 
5.2%
16:00~16:50 6
 
4.5%
Other values (8) 31
23.1%

Length

2023-12-12T19:03:49.607825image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
11:00~11:50 17
12.7%
10:00~10:50 14
10.4%
09:00~09:50 13
9.7%
12:00~12:50 11
 
8.2%
08:00~08:50 10
 
7.5%
21:00~21:50 9
 
6.7%
18:00~18:50 9
 
6.7%
06:00~06:50 7
 
5.2%
20:00~20:50 7
 
5.2%
19:00~19:50 6
 
4.5%
Other values (8) 31
23.1%

대상
Categorical

Distinct5
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
성인
58 
청소년
35 
초등
24 
대상없음
16 
유아
 
1

Length

Max length4
Median length2
Mean length2.5
Min length2

Unique

Unique1 ?
Unique (%)0.7%

Sample

1st row대상없음
2nd row성인
3rd row청소년
4th row초등
5th row성인

Common Values

ValueCountFrequency (%)
성인 58
43.3%
청소년 35
26.1%
초등 24
17.9%
대상없음 16
 
11.9%
유아 1
 
0.7%

Length

2023-12-12T19:03:49.780524image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T19:03:49.899459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
성인 58
43.3%
청소년 35
26.1%
초등 24
17.9%
대상없음 16
 
11.9%
유아 1
 
0.7%

수강료(원)
Real number (ℝ)

Distinct29
Distinct (%)21.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean60652.985
Minimum0
Maximum1056000
Zeros1
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size1.3 KiB
2023-12-12T19:03:50.382161image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile26000
Q137000
median48000
Q357000
95-th percentile125250
Maximum1056000
Range1056000
Interquartile range (IQR)20000

Descriptive statistics

Standard deviation92414.078
Coefficient of variation (CV)1.5236526
Kurtosis102.88808
Mean60652.985
Median Absolute Deviation (MAD)11000
Skewness9.6067186
Sum8127500
Variance8.5403618 × 109
MonotonicityNot monotonic
2023-12-12T19:03:50.554068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
49000 18
13.4%
60000 15
11.2%
26000 14
 
10.4%
39000 13
 
9.7%
33000 12
 
9.0%
50000 8
 
6.0%
120000 5
 
3.7%
40000 5
 
3.7%
37000 4
 
3.0%
57000 4
 
3.0%
Other values (19) 36
26.9%
ValueCountFrequency (%)
0 1
 
0.7%
15000 1
 
0.7%
20500 1
 
0.7%
26000 14
10.4%
28000 1
 
0.7%
33000 12
9.0%
35000 2
 
1.5%
37000 4
 
3.0%
38000 2
 
1.5%
39000 13
9.7%
ValueCountFrequency (%)
1056000 1
 
0.7%
180000 4
 
3.0%
135000 2
 
1.5%
120000 5
 
3.7%
110000 1
 
0.7%
90000 4
 
3.0%
65000 1
 
0.7%
60000 15
11.2%
57000 4
 
3.0%
55000 4
 
3.0%

Interactions

2023-12-12T19:03:48.052743image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T19:03:50.664863image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
프로그램그룹프로그램명시간대상수강료(원)
프로그램그룹1.0001.0000.5970.0290.000
프로그램명1.0001.0001.0000.7440.999
시간0.5971.0001.0000.5290.660
대상0.0290.7440.5291.0000.137
수강료(원)0.0000.9990.6600.1371.000
2023-12-12T19:03:50.783141image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대상시간프로그램그룹
대상1.0000.2860.031
시간0.2861.0000.444
프로그램그룹0.0310.4441.000
2023-12-12T19:03:50.880200image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수강료(원)프로그램그룹시간대상
수강료(원)1.0000.0000.3690.101
프로그램그룹0.0001.0000.4440.031
시간0.3690.4441.0000.286
대상0.1010.0310.2861.000

Missing values

2023-12-12T19:03:48.188386image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T19:03:48.301090image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

프로그램그룹프로그램명시간대상수강료(원)
0문화프로그램[NEW스포츠]밸리댄스(성인청소년/화목)12:00~12:50대상없음43000
1문화프로그램[변경]바이엘11D11:00~11:50성인40000
2문화프로그램[변경]바이엘11D11:00~11:50청소년40000
3문화프로그램[변경]바이엘11D11:00~11:50초등35000
4문화프로그램[변경]체르니11D11:00~11:50성인45000
5문화프로그램[변경]체르니11D11:00~11:50초등40000
6문화프로그램[변경]바이엘12D12:00~12:50성인40000
7문화프로그램[변경]바이엘12D12:00~12:50초등35000
8문화프로그램[변경]체르니12D12:00~12:50성인45000
9문화프로그램[변경]체르니12D12:00~12:50초등40000
프로그램그룹프로그램명시간대상수강료(원)
124스포츠프로그램[NEW]월자유수영12시(월~금)12:00~12:50성인57000
125스포츠프로그램[NEW]월자유수영12시(월~금)12:00~12:50청소년47000
126스포츠프로그램[NEW]월자유수영12시(월~금)12:00~12:50초등37000
127스포츠프로그램[NEW]월자유수영18시(월~금)18:00~18:50성인57000
128스포츠프로그램[NEW]월자유수영18시(월~금)18:00~18:50청소년47000
129스포츠프로그램[NEW]월자유수영18시(월~금)18:00~18:50초등37000
130스포츠프로그램[NEW]월자유수영21시(월~금)21:00~21:50성인57000
131스포츠프로그램[NEW]월자유수영21시(월~금)21:00~21:50청소년47000
132스포츠프로그램[NEW]월자유수영21시(월~금)21:00~21:50초등37000
133스포츠프로그램[사회공헌협력]승가원수영교실11:00~11:50대상없음20500