Overview

Dataset statistics

Number of variables4
Number of observations1833
Missing cells320
Missing cells (%)4.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory57.4 KiB
Average record size in memory32.1 B

Variable types

Text3
Categorical1

Dataset

Description대아수목원식물표본보유현황
Author전라북도
URLhttps://www.bigdatahub.go.kr/opendata/dataSet/detail.nm?contentId=37&rlik=49451aebf056b486&serviceId=202191

Alerts

Unnamed: 1 has 159 (8.7%) missing valuesMissing
Unnamed: 2 has 160 (8.7%) missing valuesMissing

Reproduction

Analysis started2024-03-14 01:13:28.355294
Analysis finished2024-03-14 01:13:29.012190
Duration0.66 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct1832
Distinct (%)100.0%
Missing1
Missing (%)0.1%
Memory size14.4 KiB
2024-03-14T10:13:29.210912image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length21
Mean length4.4656114
Min length1

Characters and Unicode

Total characters8181
Distinct characters263
Distinct categories5 ?
Distinct scripts4 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1832 ?
Unique (%)100.0%

Sample

1st row번호
2nd rowSelaginellaceae 부처손科
3rd row1
4th row2
5th rowEquisetaceae 속새科
ValueCountFrequency (%)
meliaceae 2
 
0.1%
번호 1
 
0.1%
1106 1
 
0.1%
loganiaceae 1
 
0.1%
1117 1
 
0.1%
1116 1
 
0.1%
1115 1
 
0.1%
1114 1
 
0.1%
1113 1
 
0.1%
1112 1
 
0.1%
Other values (1976) 1976
99.4%
2024-03-14T10:13:29.565558image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 1211
14.8%
2 538
 
6.6%
3 537
 
6.6%
4 537
 
6.6%
5 537
 
6.6%
6 510
 
6.2%
7 430
 
5.3%
9 427
 
5.2%
0 427
 
5.2%
8 427
 
5.2%
Other values (253) 2600
31.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 5581
68.2%
Lowercase Letter 1633
 
20.0%
Other Letter 653
 
8.0%
Space Separator 157
 
1.9%
Uppercase Letter 157
 
1.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
158
24.2%
42
 
6.4%
38
 
5.8%
15
 
2.3%
11
 
1.7%
10
 
1.5%
10
 
1.5%
9
 
1.4%
9
 
1.4%
8
 
1.2%
Other values (197) 343
52.5%
Lowercase Letter
ValueCountFrequency (%)
a 406
24.9%
e 368
22.5%
c 186
11.4%
i 105
 
6.4%
r 80
 
4.9%
l 69
 
4.2%
n 64
 
3.9%
o 61
 
3.7%
t 40
 
2.4%
u 33
 
2.0%
Other values (14) 221
13.5%
Uppercase Letter
ValueCountFrequency (%)
P 22
14.0%
A 19
12.1%
C 18
11.5%
S 16
10.2%
M 11
 
7.0%
L 9
 
5.7%
B 8
 
5.1%
T 7
 
4.5%
O 6
 
3.8%
H 5
 
3.2%
Other values (11) 36
22.9%
Decimal Number
ValueCountFrequency (%)
1 1211
21.7%
2 538
9.6%
3 537
9.6%
4 537
9.6%
5 537
9.6%
6 510
9.1%
7 430
 
7.7%
9 427
 
7.7%
0 427
 
7.7%
8 427
 
7.7%
Space Separator
ValueCountFrequency (%)
157
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5738
70.1%
Latin 1790
 
21.9%
Hangul 495
 
6.1%
Han 158
 
1.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
42
 
8.5%
38
 
7.7%
15
 
3.0%
11
 
2.2%
10
 
2.0%
10
 
2.0%
9
 
1.8%
9
 
1.8%
8
 
1.6%
8
 
1.6%
Other values (196) 335
67.7%
Latin
ValueCountFrequency (%)
a 406
22.7%
e 368
20.6%
c 186
10.4%
i 105
 
5.9%
r 80
 
4.5%
l 69
 
3.9%
n 64
 
3.6%
o 61
 
3.4%
t 40
 
2.2%
u 33
 
1.8%
Other values (35) 378
21.1%
Common
ValueCountFrequency (%)
1 1211
21.1%
2 538
9.4%
3 537
9.4%
4 537
9.4%
5 537
9.4%
6 510
8.9%
7 430
 
7.5%
9 427
 
7.4%
0 427
 
7.4%
8 427
 
7.4%
Han
ValueCountFrequency (%)
158
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7528
92.0%
Hangul 495
 
6.1%
CJK 158
 
1.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 1211
16.1%
2 538
 
7.1%
3 537
 
7.1%
4 537
 
7.1%
5 537
 
7.1%
6 510
 
6.8%
7 430
 
5.7%
9 427
 
5.7%
0 427
 
5.7%
8 427
 
5.7%
Other values (46) 1947
25.9%
CJK
ValueCountFrequency (%)
158
100.0%
Hangul
ValueCountFrequency (%)
42
 
8.5%
38
 
7.7%
15
 
3.0%
11
 
2.2%
10
 
2.0%
10
 
2.0%
9
 
1.8%
9
 
1.8%
8
 
1.6%
8
 
1.6%
Other values (196) 335
67.7%

Unnamed: 1
Text

MISSING 

Distinct1474
Distinct (%)88.1%
Missing159
Missing (%)8.7%
Memory size14.4 KiB
2024-03-14T10:13:29.825543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length62
Median length49
Mean length26.01374
Min length2

Characters and Unicode

Total characters43547
Distinct characters84
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1432 ?
Unique (%)85.5%

Sample

1st row식 물 유 전 자 원 명
2nd row학 명
3rd rowSelaginella involvens (Sw.) Spring
4th rowSelaginella tamariscina (Beauv.) Spring
5th rowEquisetum arvense L.
ValueCountFrequency (%)
spp 252
 
4.3%
l 176
 
3.0%
var 161
 
2.8%
rosa 127
 
2.2%
japonica 114
 
2.0%
et 99
 
1.7%
nakai 82
 
1.4%
hibiscus 82
 
1.4%
thunb 82
 
1.4%
syriacus 69
 
1.2%
Other values (2220) 4561
78.6%
2024-03-14T10:13:30.228396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4568
 
10.5%
a 4550
 
10.4%
i 3426
 
7.9%
s 2583
 
5.9%
e 2437
 
5.6%
r 2240
 
5.1%
o 2036
 
4.7%
n 1992
 
4.6%
u 1930
 
4.4%
l 1760
 
4.0%
Other values (74) 16025
36.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 32819
75.4%
Space Separator 4568
 
10.5%
Uppercase Letter 3673
 
8.4%
Other Punctuation 2035
 
4.7%
Close Punctuation 202
 
0.5%
Open Punctuation 202
 
0.5%
Other Letter 18
 
< 0.1%
Dash Punctuation 16
 
< 0.1%
Decimal Number 10
 
< 0.1%
Modifier Symbol 4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 4550
13.9%
i 3426
10.4%
s 2583
 
7.9%
e 2437
 
7.4%
r 2240
 
6.8%
o 2036
 
6.2%
n 1992
 
6.1%
u 1930
 
5.9%
l 1760
 
5.4%
t 1369
 
4.2%
Other values (16) 8496
25.9%
Uppercase Letter
ValueCountFrequency (%)
C 341
 
9.3%
L 338
 
9.2%
S 318
 
8.7%
P 299
 
8.1%
M 281
 
7.7%
H 265
 
7.2%
R 256
 
7.0%
A 216
 
5.9%
T 215
 
5.9%
B 168
 
4.6%
Other values (16) 976
26.6%
Other Letter
ValueCountFrequency (%)
2
 
11.1%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
Other values (7) 7
38.9%
Decimal Number
ValueCountFrequency (%)
1 3
30.0%
2 2
20.0%
8 2
20.0%
9 1
 
10.0%
4 1
 
10.0%
0 1
 
10.0%
Other Punctuation
ValueCountFrequency (%)
. 1557
76.5%
' 473
 
23.2%
? 4
 
0.2%
& 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
4568
100.0%
Close Punctuation
ValueCountFrequency (%)
) 202
100.0%
Open Punctuation
ValueCountFrequency (%)
( 202
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 16
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 36492
83.8%
Common 7037
 
16.2%
Hangul 18
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 4550
 
12.5%
i 3426
 
9.4%
s 2583
 
7.1%
e 2437
 
6.7%
r 2240
 
6.1%
o 2036
 
5.6%
n 1992
 
5.5%
u 1930
 
5.3%
l 1760
 
4.8%
t 1369
 
3.8%
Other values (42) 12169
33.3%
Hangul
ValueCountFrequency (%)
2
 
11.1%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
Other values (7) 7
38.9%
Common
ValueCountFrequency (%)
4568
64.9%
. 1557
 
22.1%
' 473
 
6.7%
) 202
 
2.9%
( 202
 
2.9%
- 16
 
0.2%
? 4
 
0.1%
` 4
 
0.1%
1 3
 
< 0.1%
2 2
 
< 0.1%
Other values (5) 6
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 43529
> 99.9%
Hangul 18
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4568
 
10.5%
a 4550
 
10.5%
i 3426
 
7.9%
s 2583
 
5.9%
e 2437
 
5.6%
r 2240
 
5.1%
o 2036
 
4.7%
n 1992
 
4.6%
u 1930
 
4.4%
l 1760
 
4.0%
Other values (57) 16007
36.8%
Hangul
ValueCountFrequency (%)
2
 
11.1%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
Other values (7) 7
38.9%

Unnamed: 2
Text

MISSING 

Distinct1600
Distinct (%)95.6%
Missing160
Missing (%)8.7%
Memory size14.4 KiB
2024-03-14T10:13:30.558705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length12
Mean length5.2492528
Min length1

Characters and Unicode

Total characters8782
Distinct characters621
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1596 ?
Unique (%)95.4%

Sample

1st row국 명
2nd row바위손
3rd row부처손
4th row쇠뜨기
5th row속새
ValueCountFrequency (%)
동백나무(재배종 55
 
3.3%
무궁화(재배종 11
 
0.7%
목련(재배종 9
 
0.5%
아디안툼 2
 
0.1%
드로세라류 2
 
0.1%
82 2
 
0.1%
실새삼 1
 
0.1%
풀협죽도 1
 
0.1%
지면패랭이(꽃잔디 1
 
0.1%
참꽃마리 1
 
0.1%
Other values (1598) 1598
94.9%
2024-03-14T10:13:31.097438image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
492
 
5.6%
446
 
5.1%
- 224
 
2.6%
223
 
2.5%
( 221
 
2.5%
) 221
 
2.5%
170
 
1.9%
145
 
1.7%
138
 
1.6%
136
 
1.5%
Other values (611) 6366
72.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8058
91.8%
Dash Punctuation 224
 
2.6%
Open Punctuation 221
 
2.5%
Close Punctuation 221
 
2.5%
Space Separator 23
 
0.3%
Decimal Number 22
 
0.3%
Lowercase Letter 7
 
0.1%
Other Punctuation 5
 
0.1%
Uppercase Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
492
 
6.1%
446
 
5.5%
223
 
2.8%
170
 
2.1%
145
 
1.8%
138
 
1.7%
136
 
1.7%
104
 
1.3%
104
 
1.3%
103
 
1.3%
Other values (587) 5997
74.4%
Decimal Number
ValueCountFrequency (%)
2 6
27.3%
1 4
18.2%
8 3
13.6%
7 2
 
9.1%
4 2
 
9.1%
9 2
 
9.1%
3 1
 
4.5%
5 1
 
4.5%
6 1
 
4.5%
Lowercase Letter
ValueCountFrequency (%)
r 1
14.3%
a 1
14.3%
e 1
14.3%
c 1
14.3%
i 1
14.3%
n 1
14.3%
o 1
14.3%
Other Punctuation
ValueCountFrequency (%)
, 3
60.0%
' 1
 
20.0%
. 1
 
20.0%
Dash Punctuation
ValueCountFrequency (%)
- 224
100.0%
Open Punctuation
ValueCountFrequency (%)
( 221
100.0%
Close Punctuation
ValueCountFrequency (%)
) 221
100.0%
Space Separator
ValueCountFrequency (%)
23
100.0%
Uppercase Letter
ValueCountFrequency (%)
L 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 8058
91.8%
Common 716
 
8.2%
Latin 8
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
492
 
6.1%
446
 
5.5%
223
 
2.8%
170
 
2.1%
145
 
1.8%
138
 
1.7%
136
 
1.7%
104
 
1.3%
104
 
1.3%
103
 
1.3%
Other values (587) 5997
74.4%
Common
ValueCountFrequency (%)
- 224
31.3%
( 221
30.9%
) 221
30.9%
23
 
3.2%
2 6
 
0.8%
1 4
 
0.6%
8 3
 
0.4%
, 3
 
0.4%
7 2
 
0.3%
4 2
 
0.3%
Other values (6) 7
 
1.0%
Latin
ValueCountFrequency (%)
r 1
12.5%
a 1
12.5%
e 1
12.5%
c 1
12.5%
i 1
12.5%
n 1
12.5%
L 1
12.5%
o 1
12.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 8058
91.8%
ASCII 724
 
8.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
492
 
6.1%
446
 
5.5%
223
 
2.8%
170
 
2.1%
145
 
1.8%
138
 
1.7%
136
 
1.7%
104
 
1.3%
104
 
1.3%
103
 
1.3%
Other values (587) 5997
74.4%
ASCII
ValueCountFrequency (%)
- 224
30.9%
( 221
30.5%
) 221
30.5%
23
 
3.2%
2 6
 
0.8%
1 4
 
0.6%
8 3
 
0.4%
, 3
 
0.4%
7 2
 
0.3%
4 2
 
0.3%
Other values (14) 15
 
2.1%

Unnamed: 3
Categorical

Distinct17
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size14.4 KiB
2
462 
5
319 
1
236 
4
235 
3
207 
Other values (12)
374 

Length

Max length4
Median length1
Mean length1.2880524
Min length1

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row표본수
2nd row<NA>
3rd row<NA>
4th row5
5th row5

Common Values

ValueCountFrequency (%)
2 462
25.2%
5 319
17.4%
1 236
12.9%
4 235
12.8%
3 207
11.3%
<NA> 160
 
8.7%
6 61
 
3.3%
8 37
 
2.0%
7 35
 
1.9%
9 34
 
1.9%
Other values (7) 47
 
2.6%

Length

2024-03-14T10:13:31.217547image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2 462
25.2%
5 319
17.4%
1 236
12.9%
4 235
12.8%
3 207
11.3%
na 160
 
8.7%
6 61
 
3.3%
8 37
 
2.0%
7 35
 
1.9%
9 34
 
1.9%
Other values (7) 47
 
2.6%

Missing values

2024-03-14T10:13:28.825359image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-14T10:13:28.891103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-14T10:13:28.963967image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

대아수목원 식물표본 보유 현황Unnamed: 1Unnamed: 2Unnamed: 3
0번호식 물 유 전 자 원 명<NA>표본수
1<NA>학 명국 명<NA>
2Selaginellaceae 부처손科<NA><NA><NA>
31Selaginella involvens (Sw.) Spring바위손5
42Selaginella tamariscina (Beauv.) Spring부처손5
5Equisetaceae 속새科<NA><NA><NA>
63Equisetum arvense L.쇠뜨기5
74Equisetum hyemale L.속새9
8Ophioglossaceae 고사리삼科<NA><NA><NA>
95Botrychium ternatum (Thunb.) Sw.고사리삼4
대아수목원 식물표본 보유 현황Unnamed: 1Unnamed: 2Unnamed: 3
18231666Callistemon lanceolatus (Sm.) DC.병솔꽃나무3
18241667Psidium cattleianum Sabine스트로베리구아바4
1825Meliaceae 산석류科<NA><NA><NA>
18261668Tibouchina semidecandra Cogn.티보치나5
1827닛사科<NA><NA><NA>
18281669Davidia involucrata손수건나무5
1829학명 미확인종류<NA><NA><NA>
18301670망고망고3
18311671트럼펫트럼펫2
18321672호주매화호주매화3