Overview

Dataset statistics

Number of variables7
Number of observations319
Missing cells32
Missing cells (%)1.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory17.9 KiB
Average record size in memory57.4 B

Variable types

Text3
Categorical2
Numeric1
DateTime1

Dataset

Description다양성 영화 상영 현황
Author경기콘텐츠진흥원
URLhttps://data.gg.go.kr/portal/data/service/selectServicePage.do?&infId=GVMBBRVHF4FD72JSN62012076383&infSeq=1

Alerts

장르구분명 is highly imbalanced (54.7%)Imbalance
개봉일정보 has 26 (8.2%) missing valuesMissing
배급사명 has 6 (1.9%) missing valuesMissing

Reproduction

Analysis started2023-12-10 21:25:44.082178
Analysis finished2023-12-10 21:25:44.825445
Duration0.74 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct314
Distinct (%)98.4%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
2023-12-11T06:25:45.038472image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length13
Mean length6.0752351
Min length1

Characters and Unicode

Total characters1938
Distinct characters453
Distinct categories8 ?
Distinct scripts4 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique310 ?
Unique (%)97.2%

Sample

1st row윤시내가 사라졌다
2nd row십개월의 미래
3rd row디어마이지니어스
4th row이장
5th rowB급 며느리
ValueCountFrequency (%)
우리 4
 
0.7%
보희와 3
 
0.5%
3
 
0.5%
녹양 3
 
0.5%
나는 3
 
0.5%
3
 
0.5%
3
 
0.5%
비밀의 2
 
0.3%
마이 2
 
0.3%
위한 2
 
0.3%
Other values (516) 555
95.2%
2023-12-11T06:25:45.423001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
264
 
13.6%
66
 
3.4%
53
 
2.7%
36
 
1.9%
28
 
1.4%
27
 
1.4%
25
 
1.3%
23
 
1.2%
21
 
1.1%
20
 
1.0%
Other values (443) 1375
70.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1625
83.8%
Space Separator 264
 
13.6%
Other Punctuation 28
 
1.4%
Decimal Number 14
 
0.7%
Uppercase Letter 3
 
0.2%
Connector Punctuation 2
 
0.1%
Open Punctuation 1
 
0.1%
Close Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
66
 
4.1%
53
 
3.3%
36
 
2.2%
28
 
1.7%
27
 
1.7%
25
 
1.5%
23
 
1.4%
21
 
1.3%
20
 
1.2%
19
 
1.2%
Other values (425) 1307
80.4%
Decimal Number
ValueCountFrequency (%)
1 4
28.6%
6 3
21.4%
0 2
14.3%
5 2
14.3%
2 1
 
7.1%
9 1
 
7.1%
8 1
 
7.1%
Other Punctuation
ValueCountFrequency (%)
, 16
57.1%
! 5
 
17.9%
: 5
 
17.9%
? 2
 
7.1%
Uppercase Letter
ValueCountFrequency (%)
N 1
33.3%
K 1
33.3%
B 1
33.3%
Space Separator
ValueCountFrequency (%)
264
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1624
83.8%
Common 310
 
16.0%
Latin 3
 
0.2%
Han 1
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
66
 
4.1%
53
 
3.3%
36
 
2.2%
28
 
1.7%
27
 
1.7%
25
 
1.5%
23
 
1.4%
21
 
1.3%
20
 
1.2%
19
 
1.2%
Other values (424) 1306
80.4%
Common
ValueCountFrequency (%)
264
85.2%
, 16
 
5.2%
! 5
 
1.6%
: 5
 
1.6%
1 4
 
1.3%
6 3
 
1.0%
0 2
 
0.6%
? 2
 
0.6%
5 2
 
0.6%
_ 2
 
0.6%
Other values (5) 5
 
1.6%
Latin
ValueCountFrequency (%)
N 1
33.3%
K 1
33.3%
B 1
33.3%
Han
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1624
83.8%
ASCII 313
 
16.2%
CJK Compat Ideographs 1
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
264
84.3%
, 16
 
5.1%
! 5
 
1.6%
: 5
 
1.6%
1 4
 
1.3%
6 3
 
1.0%
0 2
 
0.6%
? 2
 
0.6%
5 2
 
0.6%
_ 2
 
0.6%
Other values (8) 8
 
2.6%
Hangul
ValueCountFrequency (%)
66
 
4.1%
53
 
3.3%
36
 
2.2%
28
 
1.7%
27
 
1.7%
25
 
1.5%
23
 
1.4%
21
 
1.3%
20
 
1.2%
19
 
1.2%
Other values (424) 1306
80.4%
CJK Compat Ideographs
ValueCountFrequency (%)
1
100.0%
Distinct284
Distinct (%)89.0%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
2023-12-11T06:25:45.757867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length3
Mean length3.4106583
Min length2

Characters and Unicode

Total characters1088
Distinct characters159
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique254 ?
Unique (%)79.6%

Sample

1st row김진화
2nd row남궁선
3rd row구윤주
4th row정승오
5th row선호빈
ValueCountFrequency (%)
고봉수 3
 
0.9%
안주영 3
 
0.9%
김보람 3
 
0.9%
백승기 3
 
0.9%
전규환 3
 
0.9%
최승연 2
 
0.6%
김나경 2
 
0.6%
장건재 2
 
0.6%
김태용 2
 
0.6%
황윤 2
 
0.6%
Other values (281) 301
92.3%
2023-12-11T06:25:46.212596image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
75
 
6.9%
56
 
5.1%
48
 
4.4%
33
 
3.0%
+ 26
 
2.4%
25
 
2.3%
23
 
2.1%
23
 
2.1%
22
 
2.0%
21
 
1.9%
Other values (149) 736
67.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1047
96.2%
Math Symbol 26
 
2.4%
Other Punctuation 8
 
0.7%
Space Separator 7
 
0.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
75
 
7.2%
56
 
5.3%
48
 
4.6%
33
 
3.2%
25
 
2.4%
23
 
2.2%
23
 
2.2%
22
 
2.1%
21
 
2.0%
18
 
1.7%
Other values (146) 703
67.1%
Math Symbol
ValueCountFrequency (%)
+ 26
100.0%
Other Punctuation
ValueCountFrequency (%)
, 8
100.0%
Space Separator
ValueCountFrequency (%)
7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1047
96.2%
Common 41
 
3.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
75
 
7.2%
56
 
5.3%
48
 
4.6%
33
 
3.2%
25
 
2.4%
23
 
2.2%
23
 
2.2%
22
 
2.1%
21
 
2.0%
18
 
1.7%
Other values (146) 703
67.1%
Common
ValueCountFrequency (%)
+ 26
63.4%
, 8
 
19.5%
7
 
17.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1047
96.2%
ASCII 41
 
3.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
75
 
7.2%
56
 
5.3%
48
 
4.6%
33
 
3.2%
25
 
2.4%
23
 
2.2%
23
 
2.2%
22
 
2.1%
21
 
2.0%
18
 
1.7%
Other values (146) 703
67.1%
ASCII
ValueCountFrequency (%)
+ 26
63.4%
, 8
 
19.5%
7
 
17.1%

장르구분명
Categorical

IMBALANCE 

Distinct43
Distinct (%)13.5%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
드라마
160 
다큐멘터리
98 
코미디
 
6
극영화
 
4
멜로
 
3
Other values (38)
48 

Length

Max length19
Median length3
Mean length3.9937304
Min length2

Unique

Unique30 ?
Unique (%)9.4%

Sample

1st row드라마
2nd row드라마
3rd row다큐멘터리
4th row드라마
5th row다큐멘터리

Common Values

ValueCountFrequency (%)
드라마 160
50.2%
다큐멘터리 98
30.7%
코미디 6
 
1.9%
극영화 4
 
1.3%
멜로 3
 
0.9%
옴니버스 3
 
0.9%
액션 3
 
0.9%
실사극영화 2
 
0.6%
다큐 2
 
0.6%
스릴러 2
 
0.6%
Other values (33) 36
 
11.3%

Length

2023-12-11T06:25:46.333454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
드라마 164
49.8%
다큐멘터리 100
30.4%
코미디 7
 
2.1%
극영화 4
 
1.2%
멜로 4
 
1.2%
스릴러 4
 
1.2%
옴니버스 3
 
0.9%
액션 3
 
0.9%
멜로,로맨스 2
 
0.6%
애니메이션 2
 
0.6%
Other values (29) 36
 
10.9%

상영시간(분)
Real number (ℝ)

Distinct84
Distinct (%)26.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean90.366771
Minimum10
Maximum160
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.9 KiB
2023-12-11T06:25:46.657541image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile29.9
Q183
median95
Q3104
95-th percentile120
Maximum160
Range150
Interquartile range (IQR)21

Descriptive statistics

Standard deviation24.419814
Coefficient of variation (CV)0.27023002
Kurtosis2.5848167
Mean90.366771
Median Absolute Deviation (MAD)10
Skewness-1.3360967
Sum28827
Variance596.32732
MonotonicityNot monotonic
2023-12-11T06:25:46.795911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100 18
 
5.6%
90 13
 
4.1%
95 12
 
3.8%
99 11
 
3.4%
104 11
 
3.4%
80 10
 
3.1%
98 10
 
3.1%
83 10
 
3.1%
93 10
 
3.1%
85 9
 
2.8%
Other values (74) 205
64.3%
ValueCountFrequency (%)
10 2
0.6%
11 1
0.3%
12 1
0.3%
15 1
0.3%
18 2
0.6%
19 1
0.3%
20 1
0.3%
22 1
0.3%
23 1
0.3%
24 1
0.3%
ValueCountFrequency (%)
160 1
0.3%
144 1
0.3%
140 2
0.6%
139 1
0.3%
136 1
0.3%
131 1
0.3%
130 1
0.3%
126 1
0.3%
125 2
0.6%
123 2
0.6%

상영등급
Categorical

Distinct9
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
12세이상관람가
70 
전체관람가
63 
15세이상관람가
52 
12세
42 
15세
40 
Other values (4)
52 

Length

Max length8
Median length7
Mean length5.8557994
Min length2

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row12세이상관람가
2nd row12세이상관람가
3rd row전체관람가
4th row12세이상관람가
5th row12세이상관람가

Common Values

ValueCountFrequency (%)
12세이상관람가 70
21.9%
전체관람가 63
19.7%
15세이상관람가 52
16.3%
12세 42
13.2%
15세 40
12.5%
청소년관람불가 29
9.1%
청소년 관람불가 13
 
4.1%
전체 9
 
2.8%
12세 예정 1
 
0.3%

Length

2023-12-11T06:25:46.923756image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T06:25:47.031917image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
12세이상관람가 70
21.0%
전체관람가 63
18.9%
15세이상관람가 52
15.6%
12세 43
12.9%
15세 40
12.0%
청소년관람불가 29
8.7%
청소년 13
 
3.9%
관람불가 13
 
3.9%
전체 9
 
2.7%
예정 1
 
0.3%

개봉일정보
Date

MISSING 

Distinct234
Distinct (%)79.9%
Missing26
Missing (%)8.2%
Memory size2.6 KiB
Minimum2009-04-23 00:00:00
Maximum2023-11-22 00:00:00
2023-12-11T06:25:47.157027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:25:47.273842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

배급사명
Text

MISSING 

Distinct115
Distinct (%)36.7%
Missing6
Missing (%)1.9%
Memory size2.6 KiB
2023-12-11T06:25:47.541047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length13
Mean length5.8530351
Min length2

Characters and Unicode

Total characters1832
Distinct characters192
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique78 ?
Unique (%)24.9%

Sample

1st row블루라벨픽쳐스
2nd row그린나래미디어
3rd row필름다빈
4th row인디스토리
5th row에스와이코마드
ValueCountFrequency (%)
인디스토리 24
 
6.5%
필름다빈 19
 
5.1%
㈜인디스토리 17
 
4.6%
진진 17
 
4.6%
엣나인필름 14
 
3.8%
㈜시네마달 12
 
3.2%
시네마달 12
 
3.2%
무브먼트 12
 
3.2%
상상마당 12
 
3.2%
kt&g 11
 
3.0%
Other values (116) 220
59.5%
2023-12-11T06:25:47.966554image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
107
 
5.8%
88
 
4.8%
74
 
4.0%
73
 
4.0%
67
 
3.7%
57
 
3.1%
52
 
2.8%
52
 
2.8%
51
 
2.8%
50
 
2.7%
Other values (182) 1161
63.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1566
85.5%
Uppercase Letter 83
 
4.5%
Other Symbol 67
 
3.7%
Space Separator 57
 
3.1%
Close Punctuation 18
 
1.0%
Open Punctuation 17
 
0.9%
Other Punctuation 17
 
0.9%
Decimal Number 4
 
0.2%
Lowercase Letter 3
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
107
 
6.8%
88
 
5.6%
74
 
4.7%
73
 
4.7%
52
 
3.3%
52
 
3.3%
51
 
3.3%
50
 
3.2%
48
 
3.1%
47
 
3.0%
Other values (157) 924
59.0%
Uppercase Letter
ValueCountFrequency (%)
G 23
27.7%
C 13
15.7%
T 13
15.7%
K 13
15.7%
V 10
12.0%
J 3
 
3.6%
E 2
 
2.4%
N 2
 
2.4%
M 1
 
1.2%
W 1
 
1.2%
Other values (2) 2
 
2.4%
Decimal Number
ValueCountFrequency (%)
2 1
25.0%
6 1
25.0%
4 1
25.0%
0 1
25.0%
Lowercase Letter
ValueCountFrequency (%)
k 1
33.3%
h 1
33.3%
t 1
33.3%
Other Punctuation
ValueCountFrequency (%)
& 14
82.4%
, 3
 
17.6%
Other Symbol
ValueCountFrequency (%)
67
100.0%
Space Separator
ValueCountFrequency (%)
57
100.0%
Close Punctuation
ValueCountFrequency (%)
) 18
100.0%
Open Punctuation
ValueCountFrequency (%)
( 17
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1633
89.1%
Common 113
 
6.2%
Latin 86
 
4.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
107
 
6.6%
88
 
5.4%
74
 
4.5%
73
 
4.5%
67
 
4.1%
52
 
3.2%
52
 
3.2%
51
 
3.1%
50
 
3.1%
48
 
2.9%
Other values (158) 971
59.5%
Latin
ValueCountFrequency (%)
G 23
26.7%
C 13
15.1%
T 13
15.1%
K 13
15.1%
V 10
11.6%
J 3
 
3.5%
E 2
 
2.3%
N 2
 
2.3%
k 1
 
1.2%
M 1
 
1.2%
Other values (5) 5
 
5.8%
Common
ValueCountFrequency (%)
57
50.4%
) 18
 
15.9%
( 17
 
15.0%
& 14
 
12.4%
, 3
 
2.7%
2 1
 
0.9%
6 1
 
0.9%
4 1
 
0.9%
0 1
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1566
85.5%
ASCII 199
 
10.9%
None 67
 
3.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
107
 
6.8%
88
 
5.6%
74
 
4.7%
73
 
4.7%
52
 
3.3%
52
 
3.3%
51
 
3.3%
50
 
3.2%
48
 
3.1%
47
 
3.0%
Other values (157) 924
59.0%
None
ValueCountFrequency (%)
67
100.0%
ASCII
ValueCountFrequency (%)
57
28.6%
G 23
11.6%
) 18
 
9.0%
( 17
 
8.5%
& 14
 
7.0%
C 13
 
6.5%
T 13
 
6.5%
K 13
 
6.5%
V 10
 
5.0%
J 3
 
1.5%
Other values (14) 18
 
9.0%

Interactions

2023-12-11T06:25:44.505323image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T06:25:48.058363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
장르구분명상영시간(분)상영등급
장르구분명1.0000.6980.697
상영시간(분)0.6981.0000.284
상영등급0.6970.2841.000
2023-12-11T06:25:48.144683image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상영등급장르구분명
상영등급1.0000.317
장르구분명0.3171.000
2023-12-11T06:25:48.213906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상영시간(분)장르구분명상영등급
상영시간(분)1.0000.3110.140
장르구분명0.3111.0000.317
상영등급0.1400.3171.000

Missing values

2023-12-11T06:25:44.607307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T06:25:44.703770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T06:25:44.785877image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

작품명감독명장르구분명상영시간(분)상영등급개봉일정보배급사명
0윤시내가 사라졌다김진화드라마10712세이상관람가2022-06-28블루라벨픽쳐스
1십개월의 미래남궁선드라마9612세이상관람가2021-10-14그린나래미디어
2디어마이지니어스구윤주다큐멘터리80전체관람가2020-10-22필름다빈
3이장정승오드라마9412세이상관람가2020-03-25인디스토리
4B급 며느리선호빈다큐멘터리8012세이상관람가2018-01-17에스와이코마드
5성적표의 김민영이재은, 임지선드라마97전체관람가2022-09-08엣나인필름
6선데이리그이성일코미디83전체관람가2022-10-05아이엠
7낫아웃이정곤드라마10715세이상관람가2021-06-03kth, 판씨네마㈜
8가을이 여름에게원은선드라마3312세이상관람가<NA>센트럴파크
9겹겹이 여름백시원드라마3412세이상관람가<NA>센트럴파크
작품명감독명장르구분명상영시간(분)상영등급개봉일정보배급사명
309순자와 이슬이김윤지드라마3012세이상관람가<NA>호우주의보
310자유연기김도영드라마3012세이상관람가<NA>센트럴파크
311너에게 가는 길변규리다큐멘터리9312세이상관람가2021-11-17엣나인필름
312관계의 가나다에 있는 우리는이인의드라마100전체관람가2021-01-28시네마달
313왕자가 된 소녀들김혜정다큐멘터리79전체관람가2013-04-18영희야 놀자
314미싱타는 여자들이혁래다큐멘터리108전체관람가2022-01-20영화사 진진
315태어나길 잘했어최진영드라마10012세이상관람가2022-04-14그린나래미디어
316세자매이승원드라마11515세이상관람가2021-01-27리틀빅픽쳐스
317낮에는 덥고 밤에는 춥고박송열드라마9012세이상관람가2022-10-27필름다빈
318우스운게 딱! 좋아!김현, 정혜연드라마10115세이상관람가2022-06-23필름다빈