Overview

Dataset statistics

Number of variables4
Number of observations1833
Missing cells480
Missing cells (%)6.5%
Duplicate rows1
Duplicate rows (%)0.1%
Total size in memory57.4 KiB
Average record size in memory32.1 B

Variable types

Unsupported2
Text2

Dataset

Description대아수목원식물표본보유현황
Author전라북도
URLhttps://www.bigdatahub.go.kr/opendata/dataSet/detail.nm?contentId=37&rlik=49451aebf056b486&serviceId=202191

Alerts

Dataset has 1 (0.1%) duplicate rowsDuplicates
Unnamed: 1 has 159 (8.7%) missing valuesMissing
Unnamed: 2 has 160 (8.7%) missing valuesMissing
Unnamed: 3 has 160 (8.7%) missing valuesMissing
대아수목원 식물표본 보유 현황 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 3 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-03-14 01:13:32.052173
Analysis finished2024-03-14 01:13:32.563933
Duration0.51 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

대아수목원 식물표본 보유 현황
Unsupported

REJECTED  UNSUPPORTED 

Missing1
Missing (%)0.1%
Memory size14.4 KiB

Unnamed: 1
Text

MISSING 

Distinct1474
Distinct (%)88.1%
Missing159
Missing (%)8.7%
Memory size14.4 KiB
2024-03-14T10:13:32.764829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length62
Median length49
Mean length26.01374
Min length2

Characters and Unicode

Total characters43547
Distinct characters84
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1432 ?
Unique (%)85.5%

Sample

1st row식 물 유 전 자 원 명
2nd row학 명
3rd rowSelaginella involvens (Sw.) Spring
4th rowSelaginella tamariscina (Beauv.) Spring
5th rowEquisetum arvense L.
ValueCountFrequency (%)
spp 252
 
4.3%
l 176
 
3.0%
var 161
 
2.8%
rosa 127
 
2.2%
japonica 114
 
2.0%
et 99
 
1.7%
nakai 82
 
1.4%
hibiscus 82
 
1.4%
thunb 82
 
1.4%
syriacus 69
 
1.2%
Other values (2220) 4561
78.6%
2024-03-14T10:13:33.224098image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4568
 
10.5%
a 4550
 
10.4%
i 3426
 
7.9%
s 2583
 
5.9%
e 2437
 
5.6%
r 2240
 
5.1%
o 2036
 
4.7%
n 1992
 
4.6%
u 1930
 
4.4%
l 1760
 
4.0%
Other values (74) 16025
36.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 32819
75.4%
Space Separator 4572
 
10.5%
Uppercase Letter 3673
 
8.4%
Other Punctuation 2031
 
4.7%
Close Punctuation 202
 
0.5%
Open Punctuation 202
 
0.5%
Other Letter 18
 
< 0.1%
Dash Punctuation 16
 
< 0.1%
Decimal Number 10
 
< 0.1%
Modifier Symbol 4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 4550
13.9%
i 3426
10.4%
s 2583
 
7.9%
e 2437
 
7.4%
r 2240
 
6.8%
o 2036
 
6.2%
n 1992
 
6.1%
u 1930
 
5.9%
l 1760
 
5.4%
t 1369
 
4.2%
Other values (16) 8496
25.9%
Uppercase Letter
ValueCountFrequency (%)
C 341
 
9.3%
L 338
 
9.2%
S 318
 
8.7%
P 299
 
8.1%
M 281
 
7.7%
H 265
 
7.2%
R 256
 
7.0%
A 216
 
5.9%
T 215
 
5.9%
B 168
 
4.6%
Other values (16) 976
26.6%
Other Letter
ValueCountFrequency (%)
2
 
11.1%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
Other values (7) 7
38.9%
Decimal Number
ValueCountFrequency (%)
1 3
30.0%
2 2
20.0%
8 2
20.0%
9 1
 
10.0%
4 1
 
10.0%
0 1
 
10.0%
Other Punctuation
ValueCountFrequency (%)
. 1557
76.7%
' 473
 
23.3%
& 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
4568
99.9%
  4
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 202
100.0%
Open Punctuation
ValueCountFrequency (%)
( 202
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 16
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 36492
83.8%
Common 7037
 
16.2%
Hangul 18
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 4550
 
12.5%
i 3426
 
9.4%
s 2583
 
7.1%
e 2437
 
6.7%
r 2240
 
6.1%
o 2036
 
5.6%
n 1992
 
5.5%
u 1930
 
5.3%
l 1760
 
4.8%
t 1369
 
3.8%
Other values (42) 12169
33.3%
Hangul
ValueCountFrequency (%)
2
 
11.1%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
Other values (7) 7
38.9%
Common
ValueCountFrequency (%)
4568
64.9%
. 1557
 
22.1%
' 473
 
6.7%
) 202
 
2.9%
( 202
 
2.9%
- 16
 
0.2%
  4
 
0.1%
` 4
 
0.1%
1 3
 
< 0.1%
2 2
 
< 0.1%
Other values (5) 6
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 43525
99.9%
Hangul 18
 
< 0.1%
None 4
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4568
 
10.5%
a 4550
 
10.5%
i 3426
 
7.9%
s 2583
 
5.9%
e 2437
 
5.6%
r 2240
 
5.1%
o 2036
 
4.7%
n 1992
 
4.6%
u 1930
 
4.4%
l 1760
 
4.0%
Other values (56) 16003
36.8%
None
ValueCountFrequency (%)
  4
100.0%
Hangul
ValueCountFrequency (%)
2
 
11.1%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
Other values (7) 7
38.9%

Unnamed: 2
Text

MISSING 

Distinct1600
Distinct (%)95.6%
Missing160
Missing (%)8.7%
Memory size14.4 KiB
2024-03-14T10:13:33.533538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length12
Mean length5.2492528
Min length1

Characters and Unicode

Total characters8782
Distinct characters621
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1596 ?
Unique (%)95.4%

Sample

1st row국 명
2nd row바위손
3rd row부처손
4th row쇠뜨기
5th row속새
ValueCountFrequency (%)
동백나무(재배종 55
 
3.3%
무궁화(재배종 11
 
0.7%
목련(재배종 9
 
0.5%
아디안툼 2
 
0.1%
드로세라류 2
 
0.1%
82 2
 
0.1%
실새삼 1
 
0.1%
풀협죽도 1
 
0.1%
지면패랭이(꽃잔디 1
 
0.1%
참꽃마리 1
 
0.1%
Other values (1598) 1598
94.9%
2024-03-14T10:13:33.954638image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
492
 
5.6%
446
 
5.1%
- 224
 
2.6%
223
 
2.5%
( 221
 
2.5%
) 221
 
2.5%
170
 
1.9%
145
 
1.7%
138
 
1.6%
136
 
1.5%
Other values (611) 6366
72.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8058
91.8%
Dash Punctuation 224
 
2.6%
Open Punctuation 221
 
2.5%
Close Punctuation 221
 
2.5%
Space Separator 23
 
0.3%
Decimal Number 22
 
0.3%
Lowercase Letter 7
 
0.1%
Other Punctuation 5
 
0.1%
Uppercase Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
492
 
6.1%
446
 
5.5%
223
 
2.8%
170
 
2.1%
145
 
1.8%
138
 
1.7%
136
 
1.7%
104
 
1.3%
104
 
1.3%
103
 
1.3%
Other values (587) 5997
74.4%
Decimal Number
ValueCountFrequency (%)
2 6
27.3%
1 4
18.2%
8 3
13.6%
7 2
 
9.1%
4 2
 
9.1%
9 2
 
9.1%
3 1
 
4.5%
5 1
 
4.5%
6 1
 
4.5%
Lowercase Letter
ValueCountFrequency (%)
r 1
14.3%
a 1
14.3%
e 1
14.3%
c 1
14.3%
i 1
14.3%
n 1
14.3%
o 1
14.3%
Other Punctuation
ValueCountFrequency (%)
, 3
60.0%
' 1
 
20.0%
. 1
 
20.0%
Dash Punctuation
ValueCountFrequency (%)
- 224
100.0%
Open Punctuation
ValueCountFrequency (%)
( 221
100.0%
Close Punctuation
ValueCountFrequency (%)
) 221
100.0%
Space Separator
ValueCountFrequency (%)
23
100.0%
Uppercase Letter
ValueCountFrequency (%)
L 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 8058
91.8%
Common 716
 
8.2%
Latin 8
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
492
 
6.1%
446
 
5.5%
223
 
2.8%
170
 
2.1%
145
 
1.8%
138
 
1.7%
136
 
1.7%
104
 
1.3%
104
 
1.3%
103
 
1.3%
Other values (587) 5997
74.4%
Common
ValueCountFrequency (%)
- 224
31.3%
( 221
30.9%
) 221
30.9%
23
 
3.2%
2 6
 
0.8%
1 4
 
0.6%
8 3
 
0.4%
, 3
 
0.4%
7 2
 
0.3%
4 2
 
0.3%
Other values (6) 7
 
1.0%
Latin
ValueCountFrequency (%)
r 1
12.5%
a 1
12.5%
e 1
12.5%
c 1
12.5%
i 1
12.5%
n 1
12.5%
L 1
12.5%
o 1
12.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 8058
91.8%
ASCII 724
 
8.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
492
 
6.1%
446
 
5.5%
223
 
2.8%
170
 
2.1%
145
 
1.8%
138
 
1.7%
136
 
1.7%
104
 
1.3%
104
 
1.3%
103
 
1.3%
Other values (587) 5997
74.4%
ASCII
ValueCountFrequency (%)
- 224
30.9%
( 221
30.5%
) 221
30.5%
23
 
3.2%
2 6
 
0.8%
1 4
 
0.6%
8 3
 
0.4%
, 3
 
0.4%
7 2
 
0.3%
4 2
 
0.3%
Other values (14) 15
 
2.1%

Unnamed: 3
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing160
Missing (%)8.7%
Memory size14.4 KiB

Missing values

2024-03-14T10:13:32.353989image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-14T10:13:32.431562image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-14T10:13:32.514024image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

대아수목원 식물표본 보유 현황Unnamed: 1Unnamed: 2Unnamed: 3
0번호식 물 유 전 자 원 명<NA>표본수
1NaN학 명국 명NaN
2Selaginellaceae 부처손科<NA><NA>NaN
31Selaginella involvens (Sw.) Spring바위손5
42Selaginella tamariscina (Beauv.) Spring부처손5
5Equisetaceae 속새科<NA><NA>NaN
63Equisetum arvense L.쇠뜨기5
74Equisetum hyemale L.속새9
8Ophioglossaceae 고사리삼科<NA><NA>NaN
95Botrychium ternatum (Thunb.) Sw.고사리삼4
대아수목원 식물표본 보유 현황Unnamed: 1Unnamed: 2Unnamed: 3
18231666Callistemon lanceolatus (Sm.) DC.병솔꽃나무3
18241667Psidium cattleianum Sabine스트로베리구아바4
1825Meliaceae 산석류科<NA><NA>NaN
18261668Tibouchina semidecandra Cogn.티보치나5
1827닛사科<NA><NA>NaN
18281669Davidia involucrata손수건나무5
1829학명 미확인종류<NA><NA>NaN
18301670망고망고3
18311671트럼펫트럼펫2
18321672호주매화호주매화3

Duplicate rows

Most frequently occurring

Unnamed: 1Unnamed: 2# duplicates
0<NA><NA>159