Overview

Dataset statistics

Number of variables7
Number of observations300
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory17.7 KiB
Average record size in memory60.4 B

Variable types

Numeric2
Categorical4
Text1

Dataset

Description샘플 데이터
Author마크로밀엠브레인
URLhttps://www.findatamall.or.kr/market/dataProdDetail?gdsSn=4552&gdsSeCd=GENERAL&gdsVer=1

Alerts

package_name is highly overall correlated with APP_NAME_N and 2 other fieldsHigh correlation
APP_NAME_N is highly overall correlated with package_name and 2 other fieldsHigh correlation
NO is highly overall correlated with genderHigh correlation
gender is highly overall correlated with NO and 2 other fieldsHigh correlation
age_g is highly overall correlated with package_name and 1 other fieldsHigh correlation
age_g is highly imbalanced (67.3%)Imbalance

Reproduction

Analysis started2024-03-03 16:59:56.497325
Analysis finished2024-03-03 16:59:58.767182
Duration2.27 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

NO
Real number (ℝ)

HIGH CORRELATION 

Distinct100
Distinct (%)33.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean50.5
Minimum1
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.8 KiB
2024-03-04T01:59:59.093513image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5.95
Q125.75
median50.5
Q375.25
95-th percentile95.05
Maximum100
Range99
Interquartile range (IQR)49.5

Descriptive statistics

Standard deviation28.914301
Coefficient of variation (CV)0.57256041
Kurtosis-1.200217
Mean50.5
Median Absolute Deviation (MAD)25
Skewness0
Sum15150
Variance836.03679
MonotonicityNot monotonic
2024-03-04T01:59:59.605096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 3
 
1.0%
65 3
 
1.0%
75 3
 
1.0%
74 3
 
1.0%
73 3
 
1.0%
72 3
 
1.0%
71 3
 
1.0%
70 3
 
1.0%
69 3
 
1.0%
68 3
 
1.0%
Other values (90) 270
90.0%
ValueCountFrequency (%)
1 3
1.0%
2 3
1.0%
3 3
1.0%
4 3
1.0%
5 3
1.0%
6 3
1.0%
7 3
1.0%
8 3
1.0%
9 3
1.0%
10 3
1.0%
ValueCountFrequency (%)
100 3
1.0%
99 3
1.0%
98 3
1.0%
97 3
1.0%
96 3
1.0%
95 3
1.0%
94 3
1.0%
93 3
1.0%
92 3
1.0%
91 3
1.0%

package_name
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)5.7%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
com.interpark.app.ticket
77 
com.sec.android.app.music
36 
com.clsk.media
32 
com.yes24.neb
24 
com.yes24.commerce
23 
Other values (12)
108 

Length

Max length31
Median length24
Mean length19.993333
Min length13

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st rowkr.co.ticketlink.cne
2nd rowkr.co.ticketlink.cne
3rd rowkr.co.ticketlink.cne
4th rowkr.co.ticketlink.cne
5th rowkr.co.ticketlink.cne

Common Values

ValueCountFrequency (%)
com.interpark.app.ticket 77
25.7%
com.sec.android.app.music 36
12.0%
com.clsk.media 32
10.7%
com.yes24.neb 24
 
8.0%
com.yes24.commerce 23
 
7.7%
com.nhn.android.nbooks 20
 
6.7%
skplanet.musicmate 17
 
5.7%
com.iloen.melon 15
 
5.0%
com.shazam.android 11
 
3.7%
com.ktmusic.geniemusic 11
 
3.7%
Other values (7) 34
11.3%

Length

2024-03-04T02:00:00.211175image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
com.interpark.app.ticket 77
25.7%
com.sec.android.app.music 36
12.0%
com.clsk.media 32
10.7%
com.yes24.neb 24
 
8.0%
com.yes24.commerce 23
 
7.7%
com.nhn.android.nbooks 20
 
6.7%
skplanet.musicmate 17
 
5.7%
com.iloen.melon 15
 
5.0%
kr.co.ticketlink.cne 11
 
3.7%
com.shazam.android 11
 
3.7%
Other values (7) 34
11.3%

APP_NAME_N
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)5.7%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
인터파크 티켓
77 
Samsung Music 삼성 뮤직
36 
바이블25 성경 찬송
32 
예스24 NEB
24 
예스24 도서 서점
23 
Other values (12)
108 

Length

Max length19
Median length14
Mean length9.8666667
Min length4

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row티켓링크
2nd row티켓링크
3rd row티켓링크
4th row티켓링크
5th row티켓링크

Common Values

ValueCountFrequency (%)
인터파크 티켓 77
25.7%
Samsung Music 삼성 뮤직 36
12.0%
바이블25 성경 찬송 32
10.7%
예스24 NEB 24
 
8.0%
예스24 도서 서점 23
 
7.7%
SERIES 네이버 시리즈 20
 
6.7%
FLO 플로 17
 
5.7%
멜론 Melon 15
 
5.0%
Shazam 11
 
3.7%
지니뮤직 genie 11
 
3.7%
Other values (7) 34
11.3%

Length

2024-03-04T02:00:00.722496image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
인터파크 77
 
10.5%
티켓 77
 
10.5%
예스24 54
 
7.4%
뮤직 36
 
4.9%
music 36
 
4.9%
samsung 36
 
4.9%
삼성 36
 
4.9%
바이블25 32
 
4.4%
성경 32
 
4.4%
찬송 32
 
4.4%
Other values (26) 286
39.0%

total_used_time
Real number (ℝ)

Distinct295
Distinct (%)98.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean493441.42
Minimum117
Maximum8974585
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.8 KiB
2024-03-04T02:00:01.101373image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum117
5-th percentile5350.95
Q144013.75
median142309.5
Q3439373.5
95-th percentile1585793.4
Maximum8974585
Range8974468
Interquartile range (IQR)395359.75

Descriptive statistics

Standard deviation1169733.7
Coefficient of variation (CV)2.3705625
Kurtosis25.494522
Mean493441.42
Median Absolute Deviation (MAD)115753.5
Skewness4.7884017
Sum1.4803242 × 108
Variance1.368277 × 1012
MonotonicityNot monotonic
2024-03-04T02:00:01.356394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13049 2
 
0.7%
6610390 2
 
0.7%
3056 2
 
0.7%
259107 2
 
0.7%
5505491 2
 
0.7%
50600 1
 
0.3%
1715 1
 
0.3%
1285732 1
 
0.3%
26757 1
 
0.3%
587455 1
 
0.3%
Other values (285) 285
95.0%
ValueCountFrequency (%)
117 1
0.3%
471 1
0.3%
500 1
0.3%
811 1
0.3%
1173 1
0.3%
1715 1
0.3%
1812 1
0.3%
1926 1
0.3%
2068 1
0.3%
3056 2
0.7%
ValueCountFrequency (%)
8974585 1
0.3%
8726295 1
0.3%
6610390 2
0.7%
5760058 1
0.3%
5505491 2
0.7%
4971541 1
0.3%
4927563 1
0.3%
4184015 1
0.3%
2846597 1
0.3%
2512582 1
0.3%
Distinct298
Distinct (%)99.3%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
2024-03-04T02:00:02.827118image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters2100
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique296 ?
Unique (%)98.7%

Sample

1st row29:42.4
2nd row24:24.5
3rd row29:26.5
4th row20:00.5
5th row44:12.4
ValueCountFrequency (%)
23:54.5 2
 
0.7%
52:35.7 2
 
0.7%
59:13.4 1
 
0.3%
55:19.1 1
 
0.3%
29:42.4 1
 
0.3%
29:00.2 1
 
0.3%
35:14.8 1
 
0.3%
35:04.9 1
 
0.3%
14:14.5 1
 
0.3%
08:29.5 1
 
0.3%
Other values (288) 288
96.0%
2024-03-04T02:00:04.398352image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
: 300
14.3%
. 300
14.3%
5 213
10.1%
2 195
9.3%
0 189
9.0%
1 185
8.8%
4 183
8.7%
3 181
8.6%
9 98
 
4.7%
6 87
 
4.1%
Other values (2) 169
8.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1500
71.4%
Other Punctuation 600
 
28.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 213
14.2%
2 195
13.0%
0 189
12.6%
1 185
12.3%
4 183
12.2%
3 181
12.1%
9 98
6.5%
6 87
5.8%
8 86
5.7%
7 83
 
5.5%
Other Punctuation
ValueCountFrequency (%)
: 300
50.0%
. 300
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2100
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
: 300
14.3%
. 300
14.3%
5 213
10.1%
2 195
9.3%
0 189
9.0%
1 185
8.8%
4 183
8.7%
3 181
8.6%
9 98
 
4.7%
6 87
 
4.1%
Other values (2) 169
8.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2100
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
: 300
14.3%
. 300
14.3%
5 213
10.1%
2 195
9.3%
0 189
9.0%
1 185
8.8%
4 183
8.7%
3 181
8.6%
9 98
 
4.7%
6 87
 
4.1%
Other values (2) 169
8.0%

gender
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
2
218 
1
82 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2 218
72.7%
1 82
 
27.3%

Length

2024-03-04T02:00:04.637195image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-04T02:00:04.808604image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 218
72.7%
1 82
 
27.3%

age_g
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
10
282 
60
 
18

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row10
2nd row10
3rd row10
4th row10
5th row10

Common Values

ValueCountFrequency (%)
10 282
94.0%
60 18
 
6.0%

Length

2024-03-04T02:00:05.140645image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-04T02:00:05.449417image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
10 282
94.0%
60 18
 
6.0%

Interactions

2024-03-04T01:59:57.370530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-04T01:59:56.887350image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-04T01:59:57.613835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-04T01:59:57.129305image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-04T02:00:05.636319image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NOpackage_nameAPP_NAME_Ntotal_used_timegenderage_g
NO1.0000.7310.7310.0620.6540.364
package_name0.7311.0001.0000.1920.9140.635
APP_NAME_N0.7311.0001.0000.1920.9140.635
total_used_time0.0620.1920.1921.0000.1980.000
gender0.6540.9140.9140.1981.0000.198
age_g0.3640.6350.6350.0000.1981.000
2024-03-04T02:00:06.113410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
package_nameage_gAPP_NAME_Ngender
package_name1.0000.5631.0000.863
age_g0.5631.0000.5630.127
APP_NAME_N1.0000.5631.0000.863
gender0.8630.1270.8631.000
2024-03-04T02:00:06.275713image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
NOtotal_used_timepackage_nameAPP_NAME_Ngenderage_g
NO1.0000.0790.3870.3870.5010.275
total_used_time0.0791.0000.0750.0750.1950.000
package_name0.3870.0751.0001.0000.8630.563
APP_NAME_N0.3870.0751.0001.0000.8630.563
gender0.5010.1950.8630.8631.0000.127
age_g0.2750.0000.5630.5630.1271.000

Missing values

2024-03-04T01:59:58.017760image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-04T01:59:58.507324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

NOpackage_nameAPP_NAME_Ntotal_used_timelast_used_timegenderage_g
01kr.co.ticketlink.cne티켓링크155986129:42.4210
12kr.co.ticketlink.cne티켓링크4419624:24.5210
23kr.co.ticketlink.cne티켓링크7752329:26.5210
34kr.co.ticketlink.cne티켓링크27962520:00.5210
45kr.co.ticketlink.cne티켓링크4312144:12.4210
56kr.co.ticketlink.cne티켓링크8782814:21.0210
67kr.co.ticketlink.cne티켓링크76479301:56.0210
78kr.co.ticketlink.cne티켓링크6878341:18.8210
89kr.co.ticketlink.cne티켓링크43983136:41.2210
910com.interpark.app.ticket인터파크 티켓1302012:57.7210
NOpackage_nameAPP_NAME_Ntotal_used_timelast_used_timegenderage_g
29091com.ktmusic.geniemusic지니뮤직 genie1973535:18.6210
29192com.sec.android.app.musicSamsung Music 삼성 뮤직67448629:13.9210
29293com.sec.android.app.musicSamsung Music 삼성 뮤직128090534:51.9210
29394com.sec.android.app.musicSamsung Music 삼성 뮤직89225134:03.4210
29495com.shazam.androidShazam1753811:57.3210
29596com.sec.android.app.musicSamsung Music 삼성 뮤직582444:15.2260
29697com.sec.android.app.musicSamsung Music 삼성 뮤직48988200:51.4260
29798com.sec.android.app.musicSamsung Music 삼성 뮤직16269131:13.3260
29899com.sec.android.app.musicSamsung Music 삼성 뮤직3278539:19.5260
299100com.sec.android.app.musicSamsung Music 삼성 뮤직42601741:56.9260