Overview

Dataset statistics

Number of variables2
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows797
Duplicate rows (%)8.0%
Total size in memory234.4 KiB
Average record size in memory24.0 B

Variable types

Text2

Dataset

Description2019년 12월에 접수된 국제특급 수출 우편물 정보입니다. 접수국과 도착국의 정보가 있는 데이터입니다. EMS 관련정보입니다.
Author과학기술정보통신부 우정사업본부
URLhttps://www.data.go.kr/data/15105953/fileData.do

Alerts

Dataset has 797 (8.0%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 16:32:56.822589
Analysis finished2023-12-12 16:32:57.222023
Duration0.4 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct1219
Distinct (%)12.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T01:32:57.458106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length15
Mean length6.8343
Min length5

Characters and Unicode

Total characters68343
Distinct characters336
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique602 ?
Unique (%)6.0%

Sample

1st row서울강남우체국
2nd row대구수성우체국
3rd row의정부3동우체국
4th row서울태평로우체국
5th row영천임고우체국
ValueCountFrequency (%)
서울강남우체국 2580
25.8%
인천우체국 1114
 
11.1%
국제우편물류센터 415
 
4.1%
김포우체국 411
 
4.1%
서울중앙우체국 208
 
2.1%
서울양천우체국 200
 
2.0%
서울마포우체국 182
 
1.8%
서울강동우체국 176
 
1.8%
고양일산우체국 146
 
1.5%
서울강서우체국 133
 
1.3%
Other values (1210) 4436
44.4%
2023-12-13T01:32:57.869449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10005
14.6%
9994
14.6%
8930
13.1%
5048
 
7.4%
4610
 
6.7%
2920
 
4.3%
2902
 
4.2%
1757
 
2.6%
1475
 
2.2%
1397
 
2.0%
Other values (326) 19305
28.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67975
99.5%
Decimal Number 333
 
0.5%
Uppercase Letter 24
 
< 0.1%
Dash Punctuation 10
 
< 0.1%
Space Separator 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10005
14.7%
9994
14.7%
8930
13.1%
5048
 
7.4%
4610
 
6.8%
2920
 
4.3%
2902
 
4.3%
1757
 
2.6%
1475
 
2.2%
1397
 
2.1%
Other values (310) 18937
27.9%
Decimal Number
ValueCountFrequency (%)
2 121
36.3%
1 80
24.0%
3 66
19.8%
4 28
 
8.4%
6 15
 
4.5%
5 15
 
4.5%
7 4
 
1.2%
8 3
 
0.9%
9 1
 
0.3%
Uppercase Letter
ValueCountFrequency (%)
G 10
41.7%
A 9
37.5%
D 2
 
8.3%
I 2
 
8.3%
S 1
 
4.2%
Dash Punctuation
ValueCountFrequency (%)
- 10
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67975
99.5%
Common 344
 
0.5%
Latin 24
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10005
14.7%
9994
14.7%
8930
13.1%
5048
 
7.4%
4610
 
6.8%
2920
 
4.3%
2902
 
4.3%
1757
 
2.6%
1475
 
2.2%
1397
 
2.1%
Other values (310) 18937
27.9%
Common
ValueCountFrequency (%)
2 121
35.2%
1 80
23.3%
3 66
19.2%
4 28
 
8.1%
6 15
 
4.4%
5 15
 
4.4%
- 10
 
2.9%
7 4
 
1.2%
8 3
 
0.9%
9 1
 
0.3%
Latin
ValueCountFrequency (%)
G 10
41.7%
A 9
37.5%
D 2
 
8.3%
I 2
 
8.3%
S 1
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67975
99.5%
ASCII 368
 
0.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
10005
14.7%
9994
14.7%
8930
13.1%
5048
 
7.4%
4610
 
6.8%
2920
 
4.3%
2902
 
4.3%
1757
 
2.6%
1475
 
2.2%
1397
 
2.1%
Other values (310) 18937
27.9%
ASCII
ValueCountFrequency (%)
2 121
32.9%
1 80
21.7%
3 66
17.9%
4 28
 
7.6%
6 15
 
4.1%
5 15
 
4.1%
- 10
 
2.7%
G 10
 
2.7%
A 9
 
2.4%
7 4
 
1.1%
Other values (6) 10
 
2.7%
Distinct85
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T01:32:58.105125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length3
Mean length4.4048
Min length3

Characters and Unicode

Total characters44048
Distinct characters128
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row 일본
2nd row 일본
3rd row 미국
4th row 일본
5th row 말레이시아
ValueCountFrequency (%)
일본 2130
21.3%
미국 1518
15.2%
중국 1253
12.5%
타이(태국 530
 
5.3%
홍콩(중국 479
 
4.8%
타이완(대만 385
 
3.9%
캐나다 340
 
3.4%
싱가포르 280
 
2.8%
필리핀 265
 
2.6%
인도네시아 256
 
2.6%
Other values (75) 2564
25.6%
2023-12-13T01:32:58.474310image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10000
22.7%
4067
 
9.2%
2559
 
5.8%
2130
 
4.8%
1754
 
4.0%
( 1749
 
4.0%
) 1749
 
4.0%
1576
 
3.6%
1310
 
3.0%
1198
 
2.7%
Other values (118) 15956
36.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 30550
69.4%
Space Separator 10000
 
22.7%
Open Punctuation 1749
 
4.0%
Close Punctuation 1749
 
4.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4067
 
13.3%
2559
 
8.4%
2130
 
7.0%
1754
 
5.7%
1576
 
5.2%
1310
 
4.3%
1198
 
3.9%
923
 
3.0%
712
 
2.3%
646
 
2.1%
Other values (115) 13675
44.8%
Space Separator
ValueCountFrequency (%)
10000
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1749
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1749
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 30550
69.4%
Common 13498
30.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4067
 
13.3%
2559
 
8.4%
2130
 
7.0%
1754
 
5.7%
1576
 
5.2%
1310
 
4.3%
1198
 
3.9%
923
 
3.0%
712
 
2.3%
646
 
2.1%
Other values (115) 13675
44.8%
Common
ValueCountFrequency (%)
10000
74.1%
( 1749
 
13.0%
) 1749
 
13.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 30550
69.4%
ASCII 13498
30.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10000
74.1%
( 1749
 
13.0%
) 1749
 
13.0%
Hangul
ValueCountFrequency (%)
4067
 
13.3%
2559
 
8.4%
2130
 
7.0%
1754
 
5.7%
1576
 
5.2%
1310
 
4.3%
1198
 
3.9%
923
 
3.0%
712
 
2.3%
646
 
2.1%
Other values (115) 13675
44.8%

Missing values

2023-12-13T01:32:57.120699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T01:32:57.187480image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

발생국도착국가
83644서울강남우체국일본
31998대구수성우체국일본
2706의정부3동우체국미국
933서울태평로우체국일본
871영천임고우체국말레이시아
58822고양화정2동우편취급국우크라이나
73274서울강남우체국미국
89924김포우체국미국
88110서울강남우체국핀란드
78547서울강남우체국싱가포르
발생국도착국가
7099부산수안동우체국일본
6365팔탄우체국일본
98610사천곤명우체국미국
8611서울홍파동우편취급국대한민국
83209서울강남우체국일본
46809서울중앙우체국일본
80812서울강남우체국스위스
68974서울강남우체국중국
91653서울마포우체국미국
65893서울강남우체국싱가포르

Duplicate rows

Most frequently occurring

발생국도착국가# duplicates
275서울강남우체국미국417
706인천우체국중국402
299서울강남우체국일본339
103국제우편물류센터홍콩(중국)283
709인천우체국타이(태국)260
125김포우체국일본193
309서울강남우체국타이완(대만)155
316서울강남우체국필리핀148
298서울강남우체국인도네시아133
280서울강남우체국브라질118