Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

Numeric2
Categorical2
Text2
DateTime1

Dataset

Description경상남도 밀양시 관광후기게시글에 대한 자료로, 관광상품번호, 수집원분류, 관광상품분류, 관광상품명, 후기글주소에 대한 정보를 제공합니다.
Author경상남도 밀양시
URLhttps://www.data.go.kr/data/15111090/fileData.do

Alerts

관광상품번호(NameID) is highly overall correlated with 관광상품분류High correlation
관광상품분류 is highly overall correlated with 관광상품번호(NameID)High correlation
수집원분류 is highly imbalanced (69.4%)Imbalance
후기글번호(ReviewID) has unique valuesUnique
후기글주소 has unique valuesUnique

Reproduction

Analysis started2024-04-21 01:19:11.886441
Analysis finished2024-04-21 01:19:14.566396
Duration2.68 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

후기글번호(ReviewID)
Real number (ℝ)

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6005463.8
Minimum6000001
Maximum6010951
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-21T10:19:14.780210image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum6000001
5-th percentile6000553
Q16002732.8
median6005463
Q36008201.2
95-th percentile6010399
Maximum6010951
Range10950
Interquartile range (IQR)5468.5

Descriptive statistics

Standard deviation3158.8466
Coefficient of variation (CV)0.00052599545
Kurtosis-1.1996222
Mean6005463.8
Median Absolute Deviation (MAD)2734.5
Skewness0.0049230697
Sum6.0054638 × 1010
Variance9978311.9
MonotonicityNot monotonic
2024-04-21T10:19:15.413836image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6007560 1
 
< 0.1%
6004536 1
 
< 0.1%
6006830 1
 
< 0.1%
6010460 1
 
< 0.1%
6008383 1
 
< 0.1%
6001521 1
 
< 0.1%
6000675 1
 
< 0.1%
6005471 1
 
< 0.1%
6002447 1
 
< 0.1%
6001210 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
6000001 1
< 0.1%
6000002 1
< 0.1%
6000003 1
< 0.1%
6000004 1
< 0.1%
6000005 1
< 0.1%
6000006 1
< 0.1%
6000007 1
< 0.1%
6000008 1
< 0.1%
6000009 1
< 0.1%
6000010 1
< 0.1%
ValueCountFrequency (%)
6010951 1
< 0.1%
6010950 1
< 0.1%
6010949 1
< 0.1%
6010948 1
< 0.1%
6010946 1
< 0.1%
6010945 1
< 0.1%
6010943 1
< 0.1%
6010942 1
< 0.1%
6010941 1
< 0.1%
6010940 1
< 0.1%

관광상품번호(NameID)
Real number (ℝ)

HIGH CORRELATION 

Distinct841
Distinct (%)8.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2810905
Minimum1000001
Maximum4001470
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-21T10:19:15.831617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1000001
5-th percentile1000012
Q11000096
median3000136
Q34000432
95-th percentile4001196.1
Maximum4001470
Range3001469
Interquartile range (IQR)3000336

Descriptive statistics

Standard deviation1218293.4
Coefficient of variation (CV)0.43341677
Kurtosis-1.2828298
Mean2810905
Median Absolute Deviation (MAD)1000409
Skewness-0.56116287
Sum2.810905 × 1010
Variance1.4842387 × 1012
MonotonicityNot monotonic
2024-04-21T10:19:16.272881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1000071 370
 
3.7%
1000009 236
 
2.4%
4000728 175
 
1.8%
1000063 131
 
1.3%
1000003 120
 
1.2%
4000353 102
 
1.0%
3000144 96
 
1.0%
3000218 94
 
0.9%
3000083 90
 
0.9%
4000605 90
 
0.9%
Other values (831) 8496
85.0%
ValueCountFrequency (%)
1000001 16
 
0.2%
1000002 11
 
0.1%
1000003 120
1.2%
1000004 8
 
0.1%
1000005 73
 
0.7%
1000006 10
 
0.1%
1000007 4
 
< 0.1%
1000008 2
 
< 0.1%
1000009 236
2.4%
1000010 2
 
< 0.1%
ValueCountFrequency (%)
4001470 6
 
0.1%
4001469 13
 
0.1%
4001468 12
 
0.1%
4001466 7
 
0.1%
4001463 6
 
0.1%
4001462 58
0.6%
4001461 2
 
< 0.1%
4001460 1
 
< 0.1%
4001451 2
 
< 0.1%
4001446 1
 
< 0.1%

수집원분류
Categorical

IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
naver
8622 
youtube
985 
tistory
 
372
post
 
13
brunch
 
8

Length

Max length7
Median length5
Mean length5.2709
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowyoutube
2nd rownaver
3rd rownaver
4th rownaver
5th rownaver

Common Values

ValueCountFrequency (%)
naver 8622
86.2%
youtube 985
 
9.8%
tistory 372
 
3.7%
post 13
 
0.1%
brunch 8
 
0.1%

Length

2024-04-21T10:19:16.725762image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T10:19:17.058671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
naver 8622
86.2%
youtube 985
 
9.8%
tistory 372
 
3.7%
post 13
 
0.1%
brunch 8
 
0.1%

관광상품분류
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
일반음식점
3689 
숙박업소
3179 
관광지
2791 
문화축제
 
171
모범음식점
 
170

Length

Max length5
Median length4
Mean length4.1068
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row관광지
2nd row관광지
3rd row관광지
4th row일반음식점
5th row숙박업소

Common Values

ValueCountFrequency (%)
일반음식점 3689
36.9%
숙박업소 3179
31.8%
관광지 2791
27.9%
문화축제 171
 
1.7%
모범음식점 170
 
1.7%

Length

2024-04-21T10:19:17.453128image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T10:19:17.800686image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반음식점 3689
36.9%
숙박업소 3179
31.8%
관광지 2791
27.9%
문화축제 171
 
1.7%
모범음식점 170
 
1.7%
Distinct841
Distinct (%)8.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-21T10:19:18.859338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length30
Median length20
Mean length6.0405
Min length1

Characters and Unicode

Total characters60405
Distinct characters561
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique245 ?
Unique (%)2.5%

Sample

1st row종남산
2nd row만어사
3rd row월연정
4th row가지산흑염소
5th row미르캠핑장
ValueCountFrequency (%)
위양지 372
 
3.2%
트윈터널 236
 
2.0%
단골집 175
 
1.5%
풀빌라 154
 
1.3%
키즈풀빌라 153
 
1.3%
카페 145
 
1.2%
밀양점 136
 
1.2%
영남루 131
 
1.1%
캠핑장 126
 
1.1%
표충사 126
 
1.1%
Other values (896) 9875
84.9%
2024-04-21T10:19:20.431861image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2161
 
3.6%
1847
 
3.1%
1687
 
2.8%
1629
 
2.7%
1395
 
2.3%
1352
 
2.2%
1194
 
2.0%
871
 
1.4%
819
 
1.4%
747
 
1.2%
Other values (551) 46703
77.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 55497
91.9%
Space Separator 1629
 
2.7%
Lowercase Letter 1302
 
2.2%
Decimal Number 1138
 
1.9%
Uppercase Letter 352
 
0.6%
Close Punctuation 177
 
0.3%
Open Punctuation 177
 
0.3%
Other Punctuation 132
 
0.2%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2161
 
3.9%
1847
 
3.3%
1687
 
3.0%
1395
 
2.5%
1352
 
2.4%
1194
 
2.2%
871
 
1.6%
819
 
1.5%
747
 
1.3%
744
 
1.3%
Other values (504) 42680
76.9%
Uppercase Letter
ValueCountFrequency (%)
C 96
27.3%
D 88
25.0%
R 66
18.8%
M 20
 
5.7%
B 18
 
5.1%
V 14
 
4.0%
I 14
 
4.0%
P 14
 
4.0%
A 11
 
3.1%
G 3
 
0.9%
Other values (5) 8
 
2.3%
Lowercase Letter
ValueCountFrequency (%)
o 244
18.7%
e 228
17.5%
f 210
16.1%
s 144
11.1%
t 126
9.7%
a 116
8.9%
r 100
7.7%
n 38
 
2.9%
c 34
 
2.6%
u 22
 
1.7%
Other values (4) 40
 
3.1%
Decimal Number
ValueCountFrequency (%)
1 319
28.0%
9 269
23.6%
4 144
12.7%
8 142
12.5%
3 95
 
8.3%
0 77
 
6.8%
5 61
 
5.4%
2 17
 
1.5%
7 13
 
1.1%
6 1
 
0.1%
Other Punctuation
ValueCountFrequency (%)
& 100
75.8%
/ 28
 
21.2%
: 3
 
2.3%
, 1
 
0.8%
Space Separator
ValueCountFrequency (%)
1629
100.0%
Close Punctuation
ValueCountFrequency (%)
) 177
100.0%
Open Punctuation
ValueCountFrequency (%)
( 177
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 55497
91.9%
Common 3254
 
5.4%
Latin 1654
 
2.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2161
 
3.9%
1847
 
3.3%
1687
 
3.0%
1395
 
2.5%
1352
 
2.4%
1194
 
2.2%
871
 
1.6%
819
 
1.5%
747
 
1.3%
744
 
1.3%
Other values (504) 42680
76.9%
Latin
ValueCountFrequency (%)
o 244
14.8%
e 228
13.8%
f 210
12.7%
s 144
8.7%
t 126
7.6%
a 116
7.0%
r 100
6.0%
C 96
 
5.8%
D 88
 
5.3%
R 66
 
4.0%
Other values (19) 236
14.3%
Common
ValueCountFrequency (%)
1629
50.1%
1 319
 
9.8%
9 269
 
8.3%
) 177
 
5.4%
( 177
 
5.4%
4 144
 
4.4%
8 142
 
4.4%
& 100
 
3.1%
3 95
 
2.9%
0 77
 
2.4%
Other values (8) 125
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 55497
91.9%
ASCII 4908
 
8.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2161
 
3.9%
1847
 
3.3%
1687
 
3.0%
1395
 
2.5%
1352
 
2.4%
1194
 
2.2%
871
 
1.6%
819
 
1.5%
747
 
1.3%
744
 
1.3%
Other values (504) 42680
76.9%
ASCII
ValueCountFrequency (%)
1629
33.2%
1 319
 
6.5%
9 269
 
5.5%
o 244
 
5.0%
e 228
 
4.6%
f 210
 
4.3%
) 177
 
3.6%
( 177
 
3.6%
4 144
 
2.9%
s 144
 
2.9%
Other values (37) 1367
27.9%
Distinct1734
Distinct (%)17.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2008-12-19 00:00:00
Maximum2022-09-22 00:00:00
2024-04-21T10:19:20.827446image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-21T10:19:21.216098image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

후기글주소
Text

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-21T10:19:21.973303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length80
Median length78
Mean length44.007
Min length19

Characters and Unicode

Total characters440070
Distinct characters71
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st rowhttps://www.youtube.com/watch?v=axsOfU1XO_M
2nd rowhttps://blog.naver.com/fjqm2330/222330069465
3rd rowhttps://blog.naver.com/lemonteaset/222568744537
4th rowhttps://blog.naver.com/bsseri/221981453102
5th rowhttps:/blog.naver.com/qhrud5696/222637110818
ValueCountFrequency (%)
https://www.youtube.com/watch?v=axsofu1xo_m 1
 
< 0.1%
https://blog.naver.com/myhjw3405/222829308562 1
 
< 0.1%
https://blog.naver.com/lightsout/222539056596 1
 
< 0.1%
https://blog.naver.com/ygkimdw0/222833858900 1
 
< 0.1%
https://blog.naver.com/cyj7470/222880335797 1
 
< 0.1%
https://blog.naver.com/joji0706/222664169936 1
 
< 0.1%
https://blog.naver.com/tlsltlsl2020/222574690210 1
 
< 0.1%
https://blog.naver.com/erica_777/222801595852 1
 
< 0.1%
https://blog.naver.com/sunbin0731/222555437913 1
 
< 0.1%
https://blog.naver.com/lightsout/222419671533 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-04-21T10:19:23.216588image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 37998
 
8.6%
2 35144
 
8.0%
t 24404
 
5.5%
o 24314
 
5.5%
. 20002
 
4.5%
s 13910
 
3.2%
a 13792
 
3.1%
e 13694
 
3.1%
h 13669
 
3.1%
n 12785
 
2.9%
Other values (61) 230358
52.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 235858
53.6%
Decimal Number 128446
29.2%
Other Punctuation 69019
 
15.7%
Uppercase Letter 4521
 
1.0%
Math Symbol 1011
 
0.2%
Connector Punctuation 902
 
0.2%
Dash Punctuation 313
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 24404
 
10.3%
o 24314
 
10.3%
s 13910
 
5.9%
a 13792
 
5.8%
e 13694
 
5.8%
h 13669
 
5.8%
n 12785
 
5.4%
m 12496
 
5.3%
c 12320
 
5.2%
l 12068
 
5.1%
Other values (16) 82406
34.9%
Uppercase Letter
ValueCountFrequency (%)
M 242
 
5.4%
Y 232
 
5.1%
Q 228
 
5.0%
A 222
 
4.9%
I 212
 
4.7%
E 211
 
4.7%
U 199
 
4.4%
N 187
 
4.1%
B 170
 
3.8%
O 167
 
3.7%
Other values (16) 2451
54.2%
Decimal Number
ValueCountFrequency (%)
2 35144
27.4%
1 12324
 
9.6%
0 11252
 
8.8%
7 10939
 
8.5%
8 10880
 
8.5%
6 10103
 
7.9%
4 9636
 
7.5%
3 9606
 
7.5%
5 9585
 
7.5%
9 8977
 
7.0%
Other Punctuation
ValueCountFrequency (%)
/ 37998
55.1%
. 20002
29.0%
: 10000
 
14.5%
? 998
 
1.4%
& 13
 
< 0.1%
@ 8
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
= 1011
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 902
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 313
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 240379
54.6%
Common 199691
45.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 24404
 
10.2%
o 24314
 
10.1%
s 13910
 
5.8%
a 13792
 
5.7%
e 13694
 
5.7%
h 13669
 
5.7%
n 12785
 
5.3%
m 12496
 
5.2%
c 12320
 
5.1%
l 12068
 
5.0%
Other values (42) 86927
36.2%
Common
ValueCountFrequency (%)
/ 37998
19.0%
2 35144
17.6%
. 20002
10.0%
1 12324
 
6.2%
0 11252
 
5.6%
7 10939
 
5.5%
8 10880
 
5.4%
6 10103
 
5.1%
: 10000
 
5.0%
4 9636
 
4.8%
Other values (9) 31413
15.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 440070
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/ 37998
 
8.6%
2 35144
 
8.0%
t 24404
 
5.5%
o 24314
 
5.5%
. 20002
 
4.5%
s 13910
 
3.2%
a 13792
 
3.1%
e 13694
 
3.1%
h 13669
 
3.1%
n 12785
 
2.9%
Other values (61) 230358
52.3%

Interactions

2024-04-21T10:19:13.353191image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-21T10:19:12.813633image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-21T10:19:13.627275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-21T10:19:13.079366image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-21T10:19:23.477711image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
후기글번호(ReviewID)관광상품번호(NameID)수집원분류관광상품분류
후기글번호(ReviewID)1.0000.5340.6640.625
관광상품번호(NameID)0.5341.0000.1921.000
수집원분류0.6640.1921.0000.354
관광상품분류0.6251.0000.3541.000
2024-04-21T10:19:23.729673image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
관광상품분류수집원분류
관광상품분류1.0000.138
수집원분류0.1381.000
2024-04-21T10:19:23.963479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
후기글번호(ReviewID)관광상품번호(NameID)수집원분류관광상품분류
후기글번호(ReviewID)1.0000.1910.3360.308
관광상품번호(NameID)0.1911.0000.1601.000
수집원분류0.3360.1601.0000.138
관광상품분류0.3081.0000.1381.000

Missing values

2024-04-21T10:19:13.991752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-21T10:19:14.391618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

후기글번호(ReviewID)관광상품번호(NameID)수집원분류관광상품분류관광상품명게시일후기글주소
755960075601000076youtube관광지종남산2022-04-20https://www.youtube.com/watch?v=axsOfU1XO_M
244160024421000005naver관광지만어사2021-05-01https://blog.naver.com/fjqm2330/222330069465
187060018711000070naver관광지월연정2021-11-15https://blog.naver.com/lemonteaset/222568744537
873760087384000852naver일반음식점가지산흑염소2020-05-28https://blog.naver.com/bsseri/221981453102
409460040953000093naver숙박업소미르캠핑장2022-02-02https:/blog.naver.com/qhrud5696/222637110818
383760038383000090naver숙박업소밀양강캠프스쿨2022-07-03https:/blog.naver.com/ghwls_11/222798597370
349460034954000252naver일반음식점카페평리2022-07-05https://blog.naver.com/anna901020/222799457241
499460049953000003naver숙박업소비클래시 키즈풀빌라2019-05-29https://blog.naver.com/najins2/221549493453
548560054861000096naver관광지단장유원지2022-07-19https://blog.naver.com/z112245/222819076422
349060034914000791naver일반음식점어서이곳2022-02-22https://blog.naver.com/shfksqksk1/222654372398
후기글번호(ReviewID)관광상품번호(NameID)수집원분류관광상품분류관광상품명게시일후기글주소
538160053824000696naver일반음식점황금명태본가 밀양직영점2022-07-16https://blog.naver.com/sk8968753/222814720351
775460077553000116youtube숙박업소도래재별빛마을캠핑장2022-02-09https://www.youtube.com/watch?v=TkvpATrsrsI
643060064313000154naver숙박업소소담한옥2022-08-22https://blog.naver.com/gkswltjs9712/222854538470
54560005461000016naver관광지금시당2021-11-18https://blog.naver.com/tobi4788/222571484219
239760023984001226naver일반음식점라라코스트밀양삼문점2018-09-07https://blog.naver.com/kni0807/221354017378
625160062524000353naver일반음식점단장면커피로스터스2022-08-16https://blog.naver.com/koko1994/222849930543
912460091254000502naver일반음식점밀성식당2021-03-13https://blog.naver.com/alstjf5583/222273979583
356260035634000755naver일반음식점카페삼랑2022-06-04https://blog.naver.com/siyul3355/222758738992
731560073161000045youtube관광지사명대사유적지2017-08-01https://www.youtube.com/watch?v=y258QFAuVUg
830260083034000728naver일반음식점단골집2022-04-24https://blog.naver.com/baest7/222709752242