Overview

Dataset statistics

Number of variables4
Number of observations10000
Missing cells1
Missing cells (%)< 0.1%
Duplicate rows7
Duplicate rows (%)0.1%
Total size in memory400.4 KiB
Average record size in memory41.0 B

Variable types

Text1
Categorical2
Numeric1

Dataset

Description2020년 BL데이터로 HSCODE /국가명/ 국가코드/ 수입기업고유갯수를 제공하는 KOTRA 공공데이터 입니다.
Author대한무역투자진흥공사
URLhttps://www.data.go.kr/data/15101842/fileData.do

Alerts

Dataset has 7 (0.1%) duplicate rowsDuplicates
국가코드 is highly overall correlated with 국가명High correlation
국가명 is highly overall correlated with 국가코드High correlation
수입기업 고유갯수 is highly skewed (γ1 = 40.88051953)Skewed

Reproduction

Analysis started2023-12-12 19:34:04.239959
Analysis finished2023-12-12 19:34:05.171099
Duration0.93 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct7618
Distinct (%)76.2%
Missing1
Missing (%)< 0.1%
Memory size156.2 KiB
2023-12-13T04:34:05.525213image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length5.9572957
Min length1

Characters and Unicode

Total characters59567
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5916 ?
Unique (%)59.2%

Sample

1st row229600
2nd row902129
3rd row843110
4th row330124
5th row940204
ValueCountFrequency (%)
853400 6
 
0.1%
840991 6
 
0.1%
853810 6
 
0.1%
292429 6
 
0.1%
271290 5
 
0.1%
170191 5
 
0.1%
843359 5
 
0.1%
843410 5
 
0.1%
847432 5
 
0.1%
441294 5
 
0.1%
Other values (7608) 9945
99.5%
2023-12-13T04:34:06.123133image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 12926
21.7%
1 8532
14.3%
2 7413
12.4%
9 5567
9.3%
3 5154
 
8.7%
4 5050
 
8.5%
8 4308
 
7.2%
5 4126
 
6.9%
6 3341
 
5.6%
7 3138
 
5.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 59555
> 99.9%
Uppercase Letter 12
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 12926
21.7%
1 8532
14.3%
2 7413
12.4%
9 5567
9.3%
3 5154
 
8.7%
4 5050
 
8.5%
8 4308
 
7.2%
5 4126
 
6.9%
6 3341
 
5.6%
7 3138
 
5.3%
Uppercase Letter
ValueCountFrequency (%)
S 12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 59555
> 99.9%
Latin 12
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 12926
21.7%
1 8532
14.3%
2 7413
12.4%
9 5567
9.3%
3 5154
 
8.7%
4 5050
 
8.5%
8 4308
 
7.2%
5 4126
 
6.9%
6 3341
 
5.6%
7 3138
 
5.3%
Latin
ValueCountFrequency (%)
S 12
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 59567
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 12926
21.7%
1 8532
14.3%
2 7413
12.4%
9 5567
9.3%
3 5154
 
8.7%
4 5050
 
8.5%
8 4308
 
7.2%
5 4126
 
6.9%
6 3341
 
5.6%
7 3138
 
5.3%

국가명
Categorical

HIGH CORRELATION 

Distinct11
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
미국
4529 
러시아연방
674 
베트남
655 
인도
653 
멕시코
631 
Other values (6)
2858 

Length

Max length5
Median length2
Mean length2.7944
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row미국
2nd row인도네시아
3rd row케냐
4th row인도네시아
5th row미국

Common Values

ValueCountFrequency (%)
미국 4529
45.3%
러시아연방 674
 
6.7%
베트남 655
 
6.6%
인도 653
 
6.5%
멕시코 631
 
6.3%
인도네시아 626
 
6.3%
필리핀 608
 
6.1%
케냐 580
 
5.8%
아르헨티나 553
 
5.5%
파나마 488
 
4.9%

Length

2023-12-13T04:34:06.331091image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
미국 4529
45.3%
러시아연방 674
 
6.7%
베트남 655
 
6.6%
인도 653
 
6.5%
멕시코 631
 
6.3%
인도네시아 626
 
6.3%
필리핀 608
 
6.1%
케냐 580
 
5.8%
아르헨티나 553
 
5.5%
파나마 488
 
4.9%

국가코드
Categorical

HIGH CORRELATION 

Distinct11
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
US
4529 
RU
674 
VN
655 
IN
653 
MX
631 
Other values (6)
2858 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUS
2nd rowID
3rd rowKE
4th rowID
5th rowUS

Common Values

ValueCountFrequency (%)
US 4529
45.3%
RU 674
 
6.7%
VN 655
 
6.6%
IN 653
 
6.5%
MX 631
 
6.3%
ID 626
 
6.3%
PH 608
 
6.1%
KE 580
 
5.8%
AR 553
 
5.5%
PA 488
 
4.9%

Length

2023-12-13T04:34:06.480027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
us 4529
45.3%
ru 674
 
6.7%
vn 655
 
6.6%
in 653
 
6.5%
mx 631
 
6.3%
id 626
 
6.3%
ph 608
 
6.1%
ke 580
 
5.8%
ar 553
 
5.5%
pa 488
 
4.9%

수입기업 고유갯수
Real number (ℝ)

SKEWED 

Distinct1049
Distinct (%)10.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean216.6107
Minimum0
Maximum105122
Zeros4
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T04:34:06.626588image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median8
Q374
95-th percentile754.05
Maximum105122
Range105122
Interquartile range (IQR)73

Descriptive statistics

Standard deviation1834.4991
Coefficient of variation (CV)8.4691065
Kurtosis2136.1568
Mean216.6107
Median Absolute Deviation (MAD)7
Skewness40.88052
Sum2166107
Variance3365386.9
MonotonicityNot monotonic
2023-12-13T04:34:06.791321image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 3378
33.8%
2 647
 
6.5%
3 282
 
2.8%
4 226
 
2.3%
5 156
 
1.6%
7 132
 
1.3%
6 131
 
1.3%
11 118
 
1.2%
8 118
 
1.2%
9 97
 
1.0%
Other values (1039) 4715
47.1%
ValueCountFrequency (%)
0 4
 
< 0.1%
1 3378
33.8%
2 647
 
6.5%
3 282
 
2.8%
4 226
 
2.3%
5 156
 
1.6%
6 131
 
1.3%
7 132
 
1.3%
8 118
 
1.2%
9 97
 
1.0%
ValueCountFrequency (%)
105122 1
< 0.1%
102553 1
< 0.1%
53160 1
< 0.1%
30528 1
< 0.1%
29341 1
< 0.1%
28734 1
< 0.1%
27141 1
< 0.1%
22527 1
< 0.1%
19614 1
< 0.1%
14030 1
< 0.1%

Interactions

2023-12-13T04:34:04.513440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T04:34:06.895069image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
국가명국가코드수입기업 고유갯수
국가명1.0001.0000.098
국가코드1.0001.0000.098
수입기업 고유갯수0.0980.0981.000
2023-12-13T04:34:06.983521image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
국가코드국가명
국가코드1.0001.000
국가명1.0001.000
2023-12-13T04:34:07.064565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수입기업 고유갯수국가명국가코드
수입기업 고유갯수1.0000.0540.054
국가명0.0541.0001.000
국가코드0.0541.0001.000

Missing values

2023-12-13T04:34:04.989478image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T04:34:05.109142image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

국제통일상품체계 (HSCODE)국가명국가코드수입기업 고유갯수
14850229600미국US2
64628902129인도네시아ID209
77162843110케냐KE44
66058330124인도네시아ID49
26613940204미국US1
45485530121러시아연방RU3
13281252220미국US146
3744292108미국US1
22792463711미국US1
76900880240케냐KE1
국제통일상품체계 (HSCODE)국가명국가코드수입기업 고유갯수
3588256197미국US1
67061200510인도네시아ID18
57463320641아르헨티나AR28
78669020421케냐KE1
3779601222미국US1
43724847432필리핀PH39
2848012158미국US1
43315851989필리핀PH85
55448441891아르헨티나AR2
28673307301미국US1

Duplicate rows

Most frequently occurring

국제통일상품체계 (HSCODE)국가명국가코드수입기업 고유갯수# duplicates
0012589미국US12
1014665미국US12
2030234미국US12
3030445미국US12
4039001미국US12
5060320미국US12
6090900미국US12