Overview

Dataset statistics

Number of variables8
Number of observations9330
Missing cells1923
Missing cells (%)2.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory601.5 KiB
Average record size in memory66.0 B

Variable types

Numeric2
Categorical1
Text4
DateTime1

Dataset

Description다국어 식당정보(식당명, 업종, 주소, 언어종류 등 8개 항목)
Author전라남도
URLhttps://www.data.go.kr/data/15076623/fileData.do

Alerts

다국어식당정보ID is highly overall correlated with 식당IDHigh correlation
식당ID is highly overall correlated with 다국어식당정보IDHigh correlation
업종 has 1908 (20.5%) missing valuesMissing
다국어식당정보ID has unique valuesUnique

Reproduction

Analysis started2023-12-12 21:42:56.561108
Analysis finished2023-12-12 21:42:58.280514
Duration1.72 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

다국어식당정보ID
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct9330
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4665.5
Minimum1
Maximum9330
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size82.1 KiB
2023-12-13T06:42:58.343489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile467.45
Q12333.25
median4665.5
Q36997.75
95-th percentile8863.55
Maximum9330
Range9329
Interquartile range (IQR)4664.5

Descriptive statistics

Standard deviation2693.4833
Coefficient of variation (CV)0.57731933
Kurtosis-1.2
Mean4665.5
Median Absolute Deviation (MAD)2332.5
Skewness0
Sum43529115
Variance7254852.5
MonotonicityStrictly increasing
2023-12-13T06:42:58.793429image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
6198 1
 
< 0.1%
6218 1
 
< 0.1%
6219 1
 
< 0.1%
6220 1
 
< 0.1%
6221 1
 
< 0.1%
6222 1
 
< 0.1%
6223 1
 
< 0.1%
6224 1
 
< 0.1%
6225 1
 
< 0.1%
Other values (9320) 9320
99.9%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
9330 1
< 0.1%
9329 1
< 0.1%
9328 1
< 0.1%
9327 1
< 0.1%
9326 1
< 0.1%
9325 1
< 0.1%
9324 1
< 0.1%
9323 1
< 0.1%
9322 1
< 0.1%
9321 1
< 0.1%

식당ID
Real number (ℝ)

HIGH CORRELATION 

Distinct3110
Distinct (%)33.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean753151.39
Minimum2858
Maximum865303
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size82.1 KiB
2023-12-13T06:42:58.972039image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2858
5-th percentile197468
Q1857711
median858711.5
Q3859674
95-th percentile864297
Maximum865303
Range862445
Interquartile range (IQR)1963

Descriptive statistics

Standard deviation219197.56
Coefficient of variation (CV)0.29104051
Kurtosis2.3446412
Mean753151.39
Median Absolute Deviation (MAD)985.5
Skewness-1.9170256
Sum7.0269025 × 109
Variance4.8047572 × 1010
MonotonicityIncreasing
2023-12-13T06:42:59.119877image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2858 3
 
< 0.1%
859360 3
 
< 0.1%
859328 3
 
< 0.1%
859330 3
 
< 0.1%
859333 3
 
< 0.1%
859334 3
 
< 0.1%
859336 3
 
< 0.1%
859337 3
 
< 0.1%
859338 3
 
< 0.1%
859339 3
 
< 0.1%
Other values (3100) 9300
99.7%
ValueCountFrequency (%)
2858 3
< 0.1%
3820 3
< 0.1%
4419 3
< 0.1%
4751 3
< 0.1%
5075 3
< 0.1%
6215 3
< 0.1%
10302 3
< 0.1%
11705 3
< 0.1%
12433 3
< 0.1%
16676 3
< 0.1%
ValueCountFrequency (%)
865303 3
< 0.1%
865302 3
< 0.1%
865301 3
< 0.1%
865300 3
< 0.1%
865299 3
< 0.1%
865297 3
< 0.1%
865295 3
< 0.1%
865288 3
< 0.1%
865287 3
< 0.1%
865282 3
< 0.1%

언어타입
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size73.0 KiB
en
3110 
ja
3110 
zh-Hans
3110 

Length

Max length7
Median length2
Mean length3.6666667
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowen
2nd rowja
3rd rowzh-Hans
4th rowen
5th rowja

Common Values

ValueCountFrequency (%)
en 3110
33.3%
ja 3110
33.3%
zh-Hans 3110
33.3%

Length

2023-12-13T06:42:59.265090image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:42:59.368854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
en 3110
33.3%
ja 3110
33.3%
zh-hans 3110
33.3%

업종
Text

MISSING 

Distinct222
Distinct (%)3.0%
Missing1908
Missing (%)20.5%
Memory size73.0 KiB
2023-12-13T06:42:59.546170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length35
Median length24
Mean length6.2980329
Min length1

Characters and Unicode

Total characters46744
Distinct characters272
Distinct categories9 ?
Distinct scripts5 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)0.4%

Sample

1st rowBakery
2nd rowベーカリー
3rd row面包店
4th rowJapanese (cuisine)
5th row和食
ValueCountFrequency (%)
korean 1254
 
12.4%
cuisine 1243
 
12.3%
韩餐 1165
 
11.5%
韓食 1165
 
11.5%
fish 231
 
2.3%
生鱼片 216
 
2.1%
刺身 216
 
2.1%
sliced 216
 
2.1%
raw 216
 
2.1%
soup 182
 
1.8%
Other values (227) 3994
39.6%
2023-12-13T06:43:00.043700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 4271
 
9.1%
i 3538
 
7.6%
n 2840
 
6.1%
2712
 
5.8%
a 2167
 
4.6%
o 2073
 
4.4%
s 1987
 
4.3%
c 1873
 
4.0%
r 1703
 
3.6%
u 1649
 
3.5%
Other values (262) 21931
46.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 26309
56.3%
Other Letter 13563
29.0%
Uppercase Letter 3587
 
7.7%
Space Separator 2712
 
5.8%
Other Punctuation 160
 
0.3%
Modifier Letter 154
 
0.3%
Close Punctuation 88
 
0.2%
Open Punctuation 88
 
0.2%
Dash Punctuation 83
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1414
 
10.4%
1254
 
9.2%
1254
 
9.2%
1239
 
9.1%
641
 
4.7%
302
 
2.2%
297
 
2.2%
281
 
2.1%
271
 
2.0%
228
 
1.7%
Other values (210) 6382
47.1%
Lowercase Letter
ValueCountFrequency (%)
e 4271
16.2%
i 3538
13.4%
n 2840
10.8%
a 2167
8.2%
o 2073
7.9%
s 1987
7.6%
c 1873
7.1%
r 1703
 
6.5%
u 1649
 
6.3%
l 740
 
2.8%
Other values (13) 3468
13.2%
Uppercase Letter
ValueCountFrequency (%)
K 1254
35.0%
S 624
17.4%
R 321
 
8.9%
B 268
 
7.5%
F 256
 
7.1%
D 159
 
4.4%
C 153
 
4.3%
G 134
 
3.7%
P 104
 
2.9%
M 84
 
2.3%
Other values (10) 230
 
6.4%
Other Punctuation
ValueCountFrequency (%)
' 82
51.2%
, 78
48.8%
Close Punctuation
ValueCountFrequency (%)
45
51.1%
) 43
48.9%
Open Punctuation
ValueCountFrequency (%)
45
51.1%
( 43
48.9%
Space Separator
ValueCountFrequency (%)
2712
100.0%
Modifier Letter
ValueCountFrequency (%)
154
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 83
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 29896
64.0%
Han 11158
 
23.9%
Common 3285
 
7.0%
Katakana 1938
 
4.1%
Hiragana 467
 
1.0%

Most frequent character per script

Han
ValueCountFrequency (%)
1414
 
12.7%
1254
 
11.2%
1254
 
11.2%
1239
 
11.1%
641
 
5.7%
302
 
2.7%
297
 
2.7%
281
 
2.5%
271
 
2.4%
228
 
2.0%
Other values (143) 3977
35.6%
Katakana
ValueCountFrequency (%)
162
 
8.4%
162
 
8.4%
126
 
6.5%
125
 
6.4%
105
 
5.4%
95
 
4.9%
94
 
4.9%
89
 
4.6%
79
 
4.1%
76
 
3.9%
Other values (38) 825
42.6%
Latin
ValueCountFrequency (%)
e 4271
14.3%
i 3538
11.8%
n 2840
9.5%
a 2167
 
7.2%
o 2073
 
6.9%
s 1987
 
6.6%
c 1873
 
6.3%
r 1703
 
5.7%
u 1649
 
5.5%
K 1254
 
4.2%
Other values (33) 6541
21.9%
Hiragana
ValueCountFrequency (%)
115
24.6%
45
 
9.6%
28
 
6.0%
28
 
6.0%
26
 
5.6%
26
 
5.6%
23
 
4.9%
23
 
4.9%
23
 
4.9%
22
 
4.7%
Other values (9) 108
23.1%
Common
ValueCountFrequency (%)
2712
82.6%
154
 
4.7%
- 83
 
2.5%
' 82
 
2.5%
, 78
 
2.4%
45
 
1.4%
45
 
1.4%
) 43
 
1.3%
( 43
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 32937
70.5%
CJK 11137
 
23.8%
Katakana 2092
 
4.5%
Hiragana 467
 
1.0%
None 90
 
0.2%
CJK Compat Ideographs 21
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 4271
13.0%
i 3538
10.7%
n 2840
 
8.6%
2712
 
8.2%
a 2167
 
6.6%
o 2073
 
6.3%
s 1987
 
6.0%
c 1873
 
5.7%
r 1703
 
5.2%
u 1649
 
5.0%
Other values (39) 8124
24.7%
CJK
ValueCountFrequency (%)
1414
 
12.7%
1254
 
11.3%
1254
 
11.3%
1239
 
11.1%
641
 
5.8%
302
 
2.7%
297
 
2.7%
281
 
2.5%
271
 
2.4%
228
 
2.0%
Other values (142) 3956
35.5%
Katakana
ValueCountFrequency (%)
162
 
7.7%
162
 
7.7%
154
 
7.4%
126
 
6.0%
125
 
6.0%
105
 
5.0%
95
 
4.5%
94
 
4.5%
89
 
4.3%
79
 
3.8%
Other values (39) 901
43.1%
Hiragana
ValueCountFrequency (%)
115
24.6%
45
 
9.6%
28
 
6.0%
28
 
6.0%
26
 
5.6%
26
 
5.6%
23
 
4.9%
23
 
4.9%
23
 
4.9%
22
 
4.7%
Other values (9) 108
23.1%
None
ValueCountFrequency (%)
45
50.0%
45
50.0%
CJK Compat Ideographs
ValueCountFrequency (%)
21
100.0%
Distinct2944
Distinct (%)31.6%
Missing6
Missing (%)0.1%
Memory size73.0 KiB
2023-12-13T06:43:00.345988image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length68
Median length46
Mean length17.121943
Min length2

Characters and Unicode

Total characters159645
Distinct characters81
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHoseongdang
2nd rowHoseongdang
3rd rowHoseongdang
4th rowHwanggeum Dokki
5th rowHwanggeum Dokki
ValueCountFrequency (%)
sikdang 1125
 
5.5%
hoetjip 378
 
1.8%
garden 357
 
1.7%
galbi 279
 
1.4%
gukbap 252
 
1.2%
hoegwan 219
 
1.1%
hanu 201
 
1.0%
jangeo 168
 
0.8%
sutbul 150
 
0.7%
gamjatang 138
 
0.7%
Other values (2895) 17334
84.1%
2023-12-13T06:43:00.799025image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 16878
 
10.6%
n 16488
 
10.3%
o 13005
 
8.1%
e 12171
 
7.6%
g 11982
 
7.5%
11277
 
7.1%
i 7467
 
4.7%
u 7461
 
4.7%
m 4878
 
3.1%
k 4602
 
2.9%
Other values (71) 53436
33.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 125424
78.6%
Uppercase Letter 22260
 
13.9%
Space Separator 11277
 
7.1%
Decimal Number 294
 
0.2%
Other Punctuation 138
 
0.1%
Open Punctuation 105
 
0.1%
Close Punctuation 105
 
0.1%
Other Letter 27
 
< 0.1%
Dash Punctuation 9
 
< 0.1%
Math Symbol 6
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 16878
13.5%
n 16488
13.1%
o 13005
10.4%
e 12171
9.7%
g 11982
9.6%
i 7467
 
6.0%
u 7461
 
5.9%
m 4878
 
3.9%
k 4602
 
3.7%
j 3699
 
2.9%
Other values (17) 26793
21.4%
Uppercase Letter
ValueCountFrequency (%)
S 3705
16.6%
G 2931
13.2%
J 2562
11.5%
H 2433
10.9%
M 1506
6.8%
B 1446
 
6.5%
D 1347
 
6.1%
C 993
 
4.5%
N 969
 
4.4%
Y 954
 
4.3%
Other values (16) 3414
15.3%
Decimal Number
ValueCountFrequency (%)
1 63
21.4%
2 51
17.3%
4 33
11.2%
9 30
10.2%
5 24
 
8.2%
3 24
 
8.2%
0 24
 
8.2%
6 18
 
6.1%
8 15
 
5.1%
7 12
 
4.1%
Other Punctuation
ValueCountFrequency (%)
& 57
41.3%
. 42
30.4%
, 18
 
13.0%
' 9
 
6.5%
· 6
 
4.3%
! 3
 
2.2%
: 3
 
2.2%
Other Letter
ValueCountFrequency (%)
6
22.2%
6
22.2%
6
22.2%
3
11.1%
3
11.1%
3
11.1%
Space Separator
ValueCountFrequency (%)
11277
100.0%
Open Punctuation
ValueCountFrequency (%)
( 105
100.0%
Close Punctuation
ValueCountFrequency (%)
) 105
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 9
100.0%
Math Symbol
ValueCountFrequency (%)
+ 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 147684
92.5%
Common 11934
 
7.5%
Han 24
 
< 0.1%
Hangul 3
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 16878
 
11.4%
n 16488
 
11.2%
o 13005
 
8.8%
e 12171
 
8.2%
g 11982
 
8.1%
i 7467
 
5.1%
u 7461
 
5.1%
m 4878
 
3.3%
k 4602
 
3.1%
S 3705
 
2.5%
Other values (43) 49047
33.2%
Common
ValueCountFrequency (%)
11277
94.5%
( 105
 
0.9%
) 105
 
0.9%
1 63
 
0.5%
& 57
 
0.5%
2 51
 
0.4%
. 42
 
0.4%
4 33
 
0.3%
9 30
 
0.3%
5 24
 
0.2%
Other values (12) 147
 
1.2%
Han
ValueCountFrequency (%)
6
25.0%
6
25.0%
6
25.0%
3
12.5%
3
12.5%
Hangul
ValueCountFrequency (%)
3
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 159558
99.9%
None 60
 
< 0.1%
CJK 21
 
< 0.1%
CJK Compat Ideographs 3
 
< 0.1%
Compat Jamo 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 16878
 
10.6%
n 16488
 
10.3%
o 13005
 
8.2%
e 12171
 
7.6%
g 11982
 
7.5%
11277
 
7.1%
i 7467
 
4.7%
u 7461
 
4.7%
m 4878
 
3.1%
k 4602
 
2.9%
Other values (62) 53349
33.4%
None
ValueCountFrequency (%)
é 42
70.0%
É 12
 
20.0%
· 6
 
10.0%
CJK
ValueCountFrequency (%)
6
28.6%
6
28.6%
6
28.6%
3
14.3%
CJK Compat Ideographs
ValueCountFrequency (%)
3
100.0%
Compat Jamo
ValueCountFrequency (%)
3
100.0%
Distinct8511
Distinct (%)91.3%
Missing9
Missing (%)0.1%
Memory size73.0 KiB
2023-12-13T06:43:01.072013image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length70
Median length60
Mean length31.594893
Min length11

Characters and Unicode

Total characters294496
Distinct characters324
Distinct categories6 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8163 ?
Unique (%)87.6%

Sample

1st row17 Jangpyeong-ro Suncheon-si Jeollanam-do
2nd row全羅南道 スンチョン市 チャンピョンロ17
3rd row全罗南道 顺天市 Jangpyeong路17
4th row10 Galti-ro Gwangyang-eup Gwangyang-si Jeollanam-do
5th row全羅南道 クァンヤン市 クァンヤン邑 カルティロ10
ValueCountFrequency (%)
全羅南道 3107
 
8.3%
jeollanam-do 3107
 
8.3%
全罗南道 3107
 
8.3%
suncheon-si 470
 
1.3%
顺天市 470
 
1.3%
スンチョン市 470
 
1.3%
ヨス市 426
 
1.1%
麗水市 426
 
1.1%
yeosu-si 426
 
1.1%
naju-si 253
 
0.7%
Other values (8309) 25110
67.2%
2023-12-13T06:43:01.508143image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
28363
 
9.6%
n 18713
 
6.4%
o 16763
 
5.7%
a 15729
 
5.3%
- 13633
 
4.6%
e 12667
 
4.3%
g 11657
 
4.0%
8714
 
3.0%
l 8513
 
2.9%
u 7599
 
2.6%
Other values (314) 152145
51.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 125933
42.8%
Other Letter 86723
29.4%
Space Separator 28363
 
9.6%
Decimal Number 25777
 
8.8%
Uppercase Letter 14067
 
4.8%
Dash Punctuation 13633
 
4.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8714
 
10.0%
6463
 
7.5%
6318
 
7.3%
6214
 
7.2%
3510
 
4.0%
3380
 
3.9%
3107
 
3.6%
2724
 
3.1%
2412
 
2.8%
1950
 
2.2%
Other values (261) 41931
48.4%
Lowercase Letter
ValueCountFrequency (%)
n 18713
14.9%
o 16763
13.3%
a 15729
12.5%
e 12667
10.1%
g 11657
9.3%
l 8513
6.8%
u 7599
 
6.0%
m 6487
 
5.2%
i 4601
 
3.7%
s 4154
 
3.3%
Other values (11) 19050
15.1%
Uppercase Letter
ValueCountFrequency (%)
J 4238
30.1%
S 1841
13.1%
G 1547
 
11.0%
Y 1347
 
9.6%
H 1110
 
7.9%
D 774
 
5.5%
B 724
 
5.1%
M 668
 
4.7%
N 628
 
4.5%
C 242
 
1.7%
Other values (10) 948
 
6.7%
Decimal Number
ValueCountFrequency (%)
1 5914
22.9%
2 3751
14.6%
3 3077
11.9%
4 2316
 
9.0%
5 2223
 
8.6%
6 2013
 
7.8%
7 1926
 
7.5%
8 1590
 
6.2%
0 1533
 
5.9%
9 1434
 
5.6%
Space Separator
ValueCountFrequency (%)
28363
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 13633
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 140000
47.5%
Common 67773
23.0%
Han 48798
 
16.6%
Katakana 37925
 
12.9%

Most frequent character per script

Han
ValueCountFrequency (%)
6463
13.2%
6318
12.9%
6214
12.7%
3510
 
7.2%
3380
 
6.9%
3107
 
6.4%
2724
 
5.6%
2412
 
4.9%
1706
 
3.5%
1540
 
3.2%
Other values (190) 11424
23.4%
Katakana
ValueCountFrequency (%)
8714
23.0%
1950
 
5.1%
1839
 
4.8%
1689
 
4.5%
1655
 
4.4%
1506
 
4.0%
1492
 
3.9%
1364
 
3.6%
1182
 
3.1%
1176
 
3.1%
Other values (61) 15358
40.5%
Latin
ValueCountFrequency (%)
n 18713
13.4%
o 16763
12.0%
a 15729
11.2%
e 12667
 
9.0%
g 11657
 
8.3%
l 8513
 
6.1%
u 7599
 
5.4%
m 6487
 
4.6%
i 4601
 
3.3%
J 4238
 
3.0%
Other values (31) 33033
23.6%
Common
ValueCountFrequency (%)
28363
41.8%
- 13633
20.1%
1 5914
 
8.7%
2 3751
 
5.5%
3 3077
 
4.5%
4 2316
 
3.4%
5 2223
 
3.3%
6 2013
 
3.0%
7 1926
 
2.8%
8 1590
 
2.3%
Other values (2) 2967
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 207773
70.6%
CJK 47872
 
16.3%
Katakana 37925
 
12.9%
CJK Compat Ideographs 926
 
0.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
28363
13.7%
n 18713
 
9.0%
o 16763
 
8.1%
a 15729
 
7.6%
- 13633
 
6.6%
e 12667
 
6.1%
g 11657
 
5.6%
l 8513
 
4.1%
u 7599
 
3.7%
m 6487
 
3.1%
Other values (43) 67649
32.6%
Katakana
ValueCountFrequency (%)
8714
23.0%
1950
 
5.1%
1839
 
4.8%
1689
 
4.5%
1655
 
4.4%
1506
 
4.0%
1492
 
3.9%
1364
 
3.6%
1182
 
3.1%
1176
 
3.1%
Other values (61) 15358
40.5%
CJK
ValueCountFrequency (%)
6463
13.5%
6318
13.2%
6214
13.0%
3510
 
7.3%
3380
 
7.1%
3107
 
6.5%
2724
 
5.7%
2412
 
5.0%
1706
 
3.6%
1540
 
3.2%
Other values (180) 10498
21.9%
CJK Compat Ideographs
ValueCountFrequency (%)
471
50.9%
426
46.0%
10
 
1.1%
6
 
0.6%
4
 
0.4%
3
 
0.3%
2
 
0.2%
2
 
0.2%
1
 
0.1%
1
 
0.1%
Distinct7164
Distinct (%)76.8%
Missing0
Missing (%)0.0%
Memory size73.0 KiB
2023-12-13T06:43:01.775334image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length64
Median length55
Mean length28.15134
Min length11

Characters and Unicode

Total characters262652
Distinct characters473
Distinct categories6 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6645 ?
Unique (%)71.2%

Sample

1st row558-2 Namjeong-dong Suncheon-si Jeollanam-do
2nd row全羅南道 スンチョン市 ナムジョン洞558-2
3rd row全罗南道 顺天市 南井洞558-2
4th row1794-3 Deongnye-ri Gwangyang-eup Gwangyang-si Jeollanam-do
5th row全羅南道 グァンヤン市 クァンヤン邑 トクレェ 里1794-3
ValueCountFrequency (%)
全罗南道 3110
 
9.0%
jeollanam-do 3110
 
9.0%
全羅南道 3110
 
9.0%
顺天市 472
 
1.4%
suncheon-si 472
 
1.4%
スンチョン市 472
 
1.4%
ヨス市 426
 
1.2%
麗水市 426
 
1.2%
yeosu-si 426
 
1.2%
ナジュ市 253
 
0.7%
Other values (7544) 22199
64.4%
2023-12-13T06:43:02.310157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
25274
 
9.6%
- 16665
 
6.3%
n 14684
 
5.6%
o 12984
 
4.9%
a 12217
 
4.7%
e 9581
 
3.6%
g 8020
 
3.1%
7346
 
2.8%
6681
 
2.5%
1 6606
 
2.5%
Other values (463) 142594
54.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 96779
36.8%
Other Letter 83247
31.7%
Decimal Number 29892
 
11.4%
Space Separator 25274
 
9.6%
Dash Punctuation 16665
 
6.3%
Uppercase Letter 10795
 
4.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
7346
 
8.8%
6681
 
8.0%
6314
 
7.6%
6220
 
7.5%
3539
 
4.3%
3386
 
4.1%
3110
 
3.7%
2728
 
3.3%
2702
 
3.2%
2495
 
3.0%
Other values (411) 38726
46.5%
Lowercase Letter
ValueCountFrequency (%)
n 14684
15.2%
o 12984
13.4%
a 12217
12.6%
e 9581
9.9%
g 8020
8.3%
l 6483
6.7%
u 5867
 
6.1%
m 5517
 
5.7%
d 4586
 
4.7%
i 3614
 
3.7%
Other values (11) 13226
13.7%
Uppercase Letter
ValueCountFrequency (%)
J 3845
35.6%
S 1301
 
12.1%
G 1188
 
11.0%
Y 1135
 
10.5%
H 837
 
7.8%
D 529
 
4.9%
B 526
 
4.9%
N 466
 
4.3%
M 391
 
3.6%
W 146
 
1.4%
Other values (9) 431
 
4.0%
Decimal Number
ValueCountFrequency (%)
1 6606
22.1%
2 3819
12.8%
3 2913
9.7%
4 2526
 
8.5%
6 2481
 
8.3%
5 2481
 
8.3%
7 2442
 
8.2%
8 2346
 
7.8%
0 2226
 
7.4%
9 2052
 
6.9%
Space Separator
ValueCountFrequency (%)
25274
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 16665
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 107574
41.0%
Common 71831
27.3%
Han 54735
20.8%
Katakana 28512
 
10.9%

Most frequent character per script

Han
ValueCountFrequency (%)
6681
12.2%
6314
11.5%
6220
11.4%
3539
 
6.5%
3386
 
6.2%
3110
 
5.7%
2728
 
5.0%
2702
 
4.9%
2495
 
4.6%
2101
 
3.8%
Other values (336) 15459
28.2%
Katakana
ValueCountFrequency (%)
7346
25.8%
1697
 
6.0%
1391
 
4.9%
1254
 
4.4%
1183
 
4.1%
1161
 
4.1%
1135
 
4.0%
863
 
3.0%
860
 
3.0%
827
 
2.9%
Other values (65) 10795
37.9%
Latin
ValueCountFrequency (%)
n 14684
13.7%
o 12984
12.1%
a 12217
11.4%
e 9581
8.9%
g 8020
 
7.5%
l 6483
 
6.0%
u 5867
 
5.5%
m 5517
 
5.1%
d 4586
 
4.3%
J 3845
 
3.6%
Other values (30) 23790
22.1%
Common
ValueCountFrequency (%)
25274
35.2%
- 16665
23.2%
1 6606
 
9.2%
2 3819
 
5.3%
3 2913
 
4.1%
4 2526
 
3.5%
6 2481
 
3.5%
5 2481
 
3.5%
7 2442
 
3.4%
8 2346
 
3.3%
Other values (2) 4278
 
6.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 179405
68.3%
CJK 53658
 
20.4%
Katakana 28512
 
10.9%
CJK Compat Ideographs 1077
 
0.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
25274
14.1%
- 16665
 
9.3%
n 14684
 
8.2%
o 12984
 
7.2%
a 12217
 
6.8%
e 9581
 
5.3%
g 8020
 
4.5%
1 6606
 
3.7%
l 6483
 
3.6%
u 5867
 
3.3%
Other values (42) 61024
34.0%
Katakana
ValueCountFrequency (%)
7346
25.8%
1697
 
6.0%
1391
 
4.9%
1254
 
4.4%
1183
 
4.1%
1161
 
4.1%
1135
 
4.0%
863
 
3.0%
860
 
3.0%
827
 
2.9%
Other values (65) 10795
37.9%
CJK
ValueCountFrequency (%)
6681
12.5%
6314
11.8%
6220
11.6%
3539
 
6.6%
3386
 
6.3%
3110
 
5.8%
2728
 
5.1%
2702
 
5.0%
2495
 
4.6%
2101
 
3.9%
Other values (320) 14382
26.8%
CJK Compat Ideographs
ValueCountFrequency (%)
475
44.1%
459
42.6%
93
 
8.6%
10
 
0.9%
8
 
0.7%
7
 
0.6%
6
 
0.6%
4
 
0.4%
4
 
0.4%
3
 
0.3%
Other values (6) 8
 
0.7%
Distinct546
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Memory size73.0 KiB
Minimum2021-01-21 13:23:55
Maximum2021-01-21 13:33:45
2023-12-13T06:43:02.518043image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:43:02.655427image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2023-12-13T06:42:57.752385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:42:57.582172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:42:57.865288image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:42:57.662651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T06:43:02.770498image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
다국어식당정보ID식당ID언어타입
다국어식당정보ID1.0000.8670.000
식당ID0.8671.0000.000
언어타입0.0000.0001.000
2023-12-13T06:43:02.874537image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
다국어식당정보ID식당ID언어타입
다국어식당정보ID1.0001.0000.000
식당ID1.0001.0000.000
언어타입0.0000.0001.000

Missing values

2023-12-13T06:42:57.998993image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T06:42:58.124381image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T06:42:58.226018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

다국어식당정보ID식당ID언어타입업종식당명도로명주소지번주소등록일시
012858enBakeryHoseongdang17 Jangpyeong-ro Suncheon-si Jeollanam-do558-2 Namjeong-dong Suncheon-si Jeollanam-do2021-01-21 13:23:55
122858jaベーカリーHoseongdang全羅南道 スンチョン市 チャンピョンロ17全羅南道 スンチョン市 ナムジョン洞558-22021-01-21 13:23:55
232858zh-Hans面包店Hoseongdang全罗南道 顺天市 Jangpyeong路17全罗南道 顺天市 南井洞558-22021-01-21 13:23:55
343820enJapanese (cuisine)Hwanggeum Dokki10 Galti-ro Gwangyang-eup Gwangyang-si Jeollanam-do1794-3 Deongnye-ri Gwangyang-eup Gwangyang-si Jeollanam-do2021-01-21 13:23:55
453820ja和食Hwanggeum Dokki全羅南道 クァンヤン市 クァンヤン邑 カルティロ10全羅南道 グァンヤン市 クァンヤン邑 トクレェ 里1794-32021-01-21 13:23:55
563820zh-Hans日本料理Hwanggeum Dokki全罗南道 光阳市 光阳邑 Galti路10全罗南道 光阳市 光阳邑 德礼里1794-32021-01-21 13:23:55
674419enJapanese (cuisine)Geobukseon Hoetjip11 Bonghwa 2-gil Suncheon-si Jeollanam-do1714-1 Jorye-dong Suncheon-si Jeollanam-do2021-01-21 13:23:55
784419ja和食Geobukseon Hoetjip全羅南道 スンチョン市 ポンファ2ギル11全羅南道 スンチョン市 チョリェ洞1714-12021-01-21 13:23:55
894419zh-Hans日本料理Geobukseon Hoetjip全罗南道 顺天市 Bonghwa2街11全罗南道 顺天市 照礼洞1714-12021-01-21 13:23:55
9104751enKorean cuisineHwawon Imone Sikdang34 Daejukseo-ro 15beon-gil Samhyang-eup Muan-gun Jeollanam-do2159 Namak-ri Samhyang-eup Muan-gun Jeollanam-do2021-01-21 13:23:55
다국어식당정보ID식당ID언어타입업종식당명도로명주소지번주소등록일시
93209321865300zh-Hans韩餐Seoneo Sikdang全罗南道 罗州市 Naju路168全罗南道 罗州市 中央洞72-42021-01-21 13:33:44
93219322865301enrestaurantOkcheon Gwitturami19 Jeojeon 1-gil Suncheon-si Jeollanam-do121-6 Jeojeon-dong Suncheon-si Jeollanam-do2021-01-21 13:33:45
93229323865301ja飲食店Okcheon Gwitturami全羅南道 スンチョン市 チョジョン1ギル19全羅南道 スンチョン市 チョジョン洞121-62021-01-21 13:33:45
93239324865301zh-Hans餐厅Okcheon Gwitturami全罗南道 顺天市 Jeojeon1街19全罗南道 顺天市 楮田洞121-62021-01-21 13:33:45
93249325865302enKorean cuisineGeurin Jjigae Bapsang6 Naedong-gil Naju-si Jeollanam-do1098-7 Songwol-dong Naju-si Jeollanam-do2021-01-21 13:33:45
93259326865302ja韓食Geurin Jjigae Bapsang全羅南道 ナジュ市 ネドンギル6全羅南道 ナジュ市 ソンウォル洞1098-72021-01-21 13:33:45
93269327865302zh-Hans韩餐Geurin Jjigae Bapsang全罗南道 罗州市 Naedong街6全罗南道 罗州市 松月洞1098-72021-01-21 13:33:45
93279328865303enRice SoupOmpanggol Kongnamul Gukbap101 Honam-gil Suncheon-si Jeollanam-do142-6 Jeojeon-dong Suncheon-si Jeollanam-do2021-01-21 13:33:45
93289329865303ja牛肉クッパOmpanggol Kongnamul Gukbap全羅南道 スンチョン市 ホナムギル101全羅南道 スンチョン市 チョジョン洞142-62021-01-21 13:33:45
93299330865303zh-Hans汤饭Ompanggol Kongnamul Gukbap全罗南道 顺天市 Honam街101全罗南道 顺天市 楮田洞142-62021-01-21 13:33:45