Overview

Dataset statistics

Number of variables5
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory40.2 KiB
Average record size in memory41.1 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description한국공예디자인문화진흥원 대표홈페이지에 등록된 첨부파일 정보로, 첨부파일아이디, 저장파일명, 원파일명, 확장자, 파일크기 항목을 제공합니다.
Author한국공예디자인문화진흥원
URLhttps://www.data.go.kr/data/15072643/fileData.do

Alerts

파일확장자 is highly imbalanced (54.0%)Imbalance
파일크기 is highly skewed (γ1 = 28.3376151)Skewed
파일크기 has 27 (2.7%) zerosZeros

Reproduction

Analysis started2023-12-12 12:59:45.098281
Analysis finished2023-12-12 12:59:45.851309
Duration0.75 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct751
Distinct (%)75.1%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2023-12-12T21:59:46.013760image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length20
Mean length20
Min length20

Characters and Unicode

Total characters20000
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique542 ?
Unique (%)54.2%

Sample

1st rowFILE_000000000000001
2nd rowFILE_000000000000002
3rd rowFILE_000000000000003
4th rowFILE_000000000000004
5th rowFILE_000000000000005
ValueCountFrequency (%)
file_000000000000393 3
 
0.3%
file_000000000000687 3
 
0.3%
file_000000000000465 3
 
0.3%
file_000000000000718 3
 
0.3%
file_000000000000440 3
 
0.3%
file_000000000000713 3
 
0.3%
file_000000000000711 3
 
0.3%
file_000000000000318 3
 
0.3%
file_000000000000705 3
 
0.3%
file_000000000000549 3
 
0.3%
Other values (741) 970
97.0%
2023-12-12T21:59:46.379192image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 12296
61.5%
F 1000
 
5.0%
I 1000
 
5.0%
L 1000
 
5.0%
E 1000
 
5.0%
_ 1000
 
5.0%
6 352
 
1.8%
5 350
 
1.8%
3 349
 
1.7%
4 345
 
1.7%
Other values (5) 1308
 
6.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 15000
75.0%
Uppercase Letter 4000
 
20.0%
Connector Punctuation 1000
 
5.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 12296
82.0%
6 352
 
2.3%
5 350
 
2.3%
3 349
 
2.3%
4 345
 
2.3%
1 315
 
2.1%
2 308
 
2.1%
7 295
 
2.0%
8 197
 
1.3%
9 193
 
1.3%
Uppercase Letter
ValueCountFrequency (%)
F 1000
25.0%
I 1000
25.0%
L 1000
25.0%
E 1000
25.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 16000
80.0%
Latin 4000
 
20.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 12296
76.8%
_ 1000
 
6.2%
6 352
 
2.2%
5 350
 
2.2%
3 349
 
2.2%
4 345
 
2.2%
1 315
 
2.0%
2 308
 
1.9%
7 295
 
1.8%
8 197
 
1.2%
Latin
ValueCountFrequency (%)
F 1000
25.0%
I 1000
25.0%
L 1000
25.0%
E 1000
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 12296
61.5%
F 1000
 
5.0%
I 1000
 
5.0%
L 1000
 
5.0%
E 1000
 
5.0%
_ 1000
 
5.0%
6 352
 
1.8%
5 350
 
1.8%
3 349
 
1.7%
4 345
 
1.7%
Other values (5) 1308
 
6.5%
Distinct997
Distinct (%)99.7%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2023-12-12T21:59:46.712326image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length51
Median length18
Mean length18.25
Min length6

Characters and Unicode

Total characters18250
Distinct characters312
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique994 ?
Unique (%)99.4%

Sample

1st row공방교실시간표.xls
2nd row수강신청서.hwp
3rd row점2호점운영컨소시엄사업자선정공고.hwp
4th row점2호점운영컨소시엄사업자선정공고(2).hwp
5th row프랑스 포레드 파리 박람회 참가 공예단체 공모.hwp
ValueCountFrequency (%)
43
 
3.2%
제출서류 18
 
1.3%
신규채용 18
 
1.3%
kcdf 18
 
1.3%
신청서.hwp 12
 
0.9%
응시원서 9
 
0.7%
참가신청서 7
 
0.5%
2013 6
 
0.4%
공고문 5
 
0.4%
신청서 5
 
0.4%
Other values (1136) 1197
89.5%
2023-12-12T21:59:47.117482image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 1854
 
10.2%
4 1542
 
8.4%
0 1468
 
8.0%
. 1012
 
5.5%
p 996
 
5.5%
2 953
 
5.2%
6 859
 
4.7%
5 807
 
4.4%
3 791
 
4.3%
8 780
 
4.3%
Other values (302) 7188
39.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 10537
57.7%
Lowercase Letter 3145
 
17.2%
Other Letter 2793
 
15.3%
Other Punctuation 1021
 
5.6%
Space Separator 338
 
1.9%
Uppercase Letter 124
 
0.7%
Connector Punctuation 106
 
0.6%
Open Punctuation 87
 
0.5%
Close Punctuation 87
 
0.5%
Dash Punctuation 11
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
229
 
8.2%
126
 
4.5%
125
 
4.5%
124
 
4.4%
58
 
2.1%
58
 
2.1%
55
 
2.0%
52
 
1.9%
49
 
1.8%
48
 
1.7%
Other values (242) 1869
66.9%
Lowercase Letter
ValueCountFrequency (%)
p 996
31.7%
h 641
20.4%
w 637
20.3%
i 174
 
5.5%
z 173
 
5.5%
d 130
 
4.1%
f 106
 
3.4%
t 41
 
1.3%
x 39
 
1.2%
o 33
 
1.0%
Other values (13) 175
 
5.6%
Uppercase Letter
ValueCountFrequency (%)
F 25
20.2%
C 25
20.2%
D 25
20.2%
K 23
18.5%
P 4
 
3.2%
O 4
 
3.2%
A 4
 
3.2%
G 3
 
2.4%
W 2
 
1.6%
I 2
 
1.6%
Other values (5) 7
 
5.6%
Decimal Number
ValueCountFrequency (%)
1 1854
17.6%
4 1542
14.6%
0 1468
13.9%
2 953
9.0%
6 859
8.2%
5 807
7.7%
3 791
7.5%
8 780
7.4%
7 756
7.2%
9 727
 
6.9%
Other Punctuation
ValueCountFrequency (%)
. 1012
99.1%
, 6
 
0.6%
& 2
 
0.2%
1
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 45
51.7%
[ 42
48.3%
Close Punctuation
ValueCountFrequency (%)
) 45
51.7%
] 42
48.3%
Space Separator
ValueCountFrequency (%)
338
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 106
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12188
66.8%
Latin 3269
 
17.9%
Hangul 2793
 
15.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
229
 
8.2%
126
 
4.5%
125
 
4.5%
124
 
4.4%
58
 
2.1%
58
 
2.1%
55
 
2.0%
52
 
1.9%
49
 
1.8%
48
 
1.7%
Other values (242) 1869
66.9%
Latin
ValueCountFrequency (%)
p 996
30.5%
h 641
19.6%
w 637
19.5%
i 174
 
5.3%
z 173
 
5.3%
d 130
 
4.0%
f 106
 
3.2%
t 41
 
1.3%
x 39
 
1.2%
o 33
 
1.0%
Other values (28) 299
 
9.1%
Common
ValueCountFrequency (%)
1 1854
15.2%
4 1542
12.7%
0 1468
12.0%
. 1012
8.3%
2 953
7.8%
6 859
7.0%
5 807
6.6%
3 791
6.5%
8 780
6.4%
7 756
6.2%
Other values (12) 1366
11.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 15456
84.7%
Hangul 2793
 
15.3%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 1854
12.0%
4 1542
 
10.0%
0 1468
 
9.5%
. 1012
 
6.5%
p 996
 
6.4%
2 953
 
6.2%
6 859
 
5.6%
5 807
 
5.2%
3 791
 
5.1%
8 780
 
5.0%
Other values (49) 4394
28.4%
Hangul
ValueCountFrequency (%)
229
 
8.2%
126
 
4.5%
125
 
4.5%
124
 
4.4%
58
 
2.1%
58
 
2.1%
55
 
2.0%
52
 
1.9%
49
 
1.8%
48
 
1.7%
Other values (242) 1869
66.9%
None
ValueCountFrequency (%)
1
100.0%
Distinct646
Distinct (%)64.6%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2023-12-12T21:59:47.450735image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length51
Median length43
Mean length19.181
Min length6

Characters and Unicode

Total characters19181
Distinct characters392
Distinct categories12 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique574 ?
Unique (%)57.4%

Sample

1st row공방교실시간표.xls
2nd row수강신청서.hwp
3rd row점2호점운영컨소시엄사업자선정공고.hwp
4th row점2호점운영컨소시엄사업자선정공고(2).hwp
5th row프랑스 포레드 파리 박람회 참가 공예단체 공모.hwp
ValueCountFrequency (%)
2 218
 
8.4%
제안요청서.hwp 168
 
6.4%
kcdf 86
 
3.3%
1 73
 
2.8%
신규채용 72
 
2.8%
제출서류 72
 
2.8%
64
 
2.5%
붙임 59
 
2.3%
입찰공고문.hwp 55
 
2.1%
3 48
 
1.8%
Other values (988) 1694
64.9%
2023-12-12T21:59:47.986415image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1610
 
8.4%
. 1378
 
7.2%
p 1004
 
5.2%
720
 
3.8%
h 647
 
3.4%
w 642
 
3.3%
2 533
 
2.8%
1 476
 
2.5%
463
 
2.4%
424
 
2.2%
Other values (382) 11284
58.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 9563
49.9%
Lowercase Letter 3307
 
17.2%
Decimal Number 1819
 
9.5%
Space Separator 1610
 
8.4%
Other Punctuation 1417
 
7.4%
Uppercase Letter 506
 
2.6%
Close Punctuation 298
 
1.6%
Open Punctuation 297
 
1.5%
Connector Punctuation 282
 
1.5%
Math Symbol 42
 
0.2%
Other values (2) 40
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
720
 
7.5%
463
 
4.8%
424
 
4.4%
311
 
3.3%
308
 
3.2%
260
 
2.7%
250
 
2.6%
198
 
2.1%
183
 
1.9%
181
 
1.9%
Other values (309) 6265
65.5%
Lowercase Letter
ValueCountFrequency (%)
p 1004
30.4%
h 647
19.6%
w 642
19.4%
i 187
 
5.7%
z 174
 
5.3%
d 138
 
4.2%
f 117
 
3.5%
t 53
 
1.6%
c 48
 
1.5%
o 46
 
1.4%
Other values (14) 251
 
7.6%
Uppercase Letter
ValueCountFrequency (%)
F 110
21.7%
C 105
20.8%
D 103
20.4%
K 99
19.6%
A 18
 
3.6%
O 13
 
2.6%
S 12
 
2.4%
P 7
 
1.4%
G 6
 
1.2%
R 5
 
1.0%
Other values (12) 28
 
5.5%
Decimal Number
ValueCountFrequency (%)
2 533
29.3%
1 476
26.2%
0 356
19.6%
4 129
 
7.1%
3 102
 
5.6%
5 91
 
5.0%
6 60
 
3.3%
7 35
 
1.9%
8 19
 
1.0%
9 18
 
1.0%
Other Punctuation
ValueCountFrequency (%)
. 1378
97.2%
, 21
 
1.5%
· 11
 
0.8%
& 5
 
0.4%
1
 
0.1%
# 1
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 153
51.3%
] 144
48.3%
1
 
0.3%
Open Punctuation
ValueCountFrequency (%)
( 152
51.2%
[ 144
48.5%
1
 
0.3%
Space Separator
ValueCountFrequency (%)
1610
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 282
100.0%
Math Symbol
ValueCountFrequency (%)
+ 42
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 39
100.0%
Other Symbol
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 9563
49.9%
Common 5805
30.3%
Latin 3813
 
19.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
720
 
7.5%
463
 
4.8%
424
 
4.4%
311
 
3.3%
308
 
3.2%
260
 
2.7%
250
 
2.6%
198
 
2.1%
183
 
1.9%
181
 
1.9%
Other values (309) 6265
65.5%
Latin
ValueCountFrequency (%)
p 1004
26.3%
h 647
17.0%
w 642
16.8%
i 187
 
4.9%
z 174
 
4.6%
d 138
 
3.6%
f 117
 
3.1%
F 110
 
2.9%
C 105
 
2.8%
D 103
 
2.7%
Other values (36) 586
15.4%
Common
ValueCountFrequency (%)
1610
27.7%
. 1378
23.7%
2 533
 
9.2%
1 476
 
8.2%
0 356
 
6.1%
_ 282
 
4.9%
) 153
 
2.6%
( 152
 
2.6%
[ 144
 
2.5%
] 144
 
2.5%
Other values (17) 577
 
9.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9603
50.1%
Hangul 9563
49.9%
None 14
 
0.1%
Misc Symbols 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1610
16.8%
. 1378
14.3%
p 1004
10.5%
h 647
 
6.7%
w 642
 
6.7%
2 533
 
5.6%
1 476
 
5.0%
0 356
 
3.7%
_ 282
 
2.9%
i 187
 
1.9%
Other values (58) 2488
25.9%
Hangul
ValueCountFrequency (%)
720
 
7.5%
463
 
4.8%
424
 
4.4%
311
 
3.3%
308
 
3.2%
260
 
2.7%
250
 
2.6%
198
 
2.1%
183
 
1.9%
181
 
1.9%
Other values (309) 6265
65.5%
None
ValueCountFrequency (%)
· 11
78.6%
1
 
7.1%
1
 
7.1%
1
 
7.1%
Misc Symbols
ValueCountFrequency (%)
1
100.0%

파일확장자
Categorical

IMBALANCE 

Distinct14
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
hwp
637 
zip
170 
pdf
102 
jpg
 
25
doc
 
19
Other values (9)
 
47

Length

Max length4
Median length3
Mean length3.032
Min length3

Unique

Unique4 ?
Unique (%)0.4%

Sample

1st rowxls
2nd rowhwp
3rd rowhwp
4th rowhwp
5th rowhwp

Common Values

ValueCountFrequency (%)
hwp 637
63.7%
zip 170
 
17.0%
pdf 102
 
10.2%
jpg 25
 
2.5%
doc 19
 
1.9%
pptx 19
 
1.9%
ppt 9
 
0.9%
xlsx 7
 
0.7%
docx 5
 
0.5%
alz 3
 
0.3%
Other values (4) 4
 
0.4%

Length

2023-12-12T21:59:48.161275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
hwp 637
63.7%
zip 170
 
17.0%
pdf 102
 
10.2%
jpg 26
 
2.6%
doc 19
 
1.9%
pptx 19
 
1.9%
ppt 9
 
0.9%
xlsx 7
 
0.7%
docx 5
 
0.5%
alz 3
 
0.3%
Other values (3) 3
 
0.3%

파일크기
Real number (ℝ)

SKEWED  ZEROS 

Distinct511
Distinct (%)51.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean772478.82
Minimum0
Maximum1.748704 × 108
Zeros27
Zeros (%)2.7%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-12T21:59:48.313390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile13824
Q128672
median74563
Q3187978.75
95-th percentile3620947.4
Maximum1.748704 × 108
Range1.748704 × 108
Interquartile range (IQR)159306.75

Descriptive statistics

Standard deviation5721015.7
Coefficient of variation (CV)7.4060487
Kurtosis860.7196
Mean772478.82
Median Absolute Deviation (MAD)52547
Skewness28.337615
Sum7.7247882 × 108
Variance3.2730021 × 1013
MonotonicityNot monotonic
2023-12-12T21:59:48.506545image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
74563 29
 
2.9%
0 27
 
2.7%
145688 23
 
2.3%
13824 17
 
1.7%
145698 15
 
1.5%
25088 15
 
1.5%
25600 13
 
1.3%
24064 12
 
1.2%
90624 12
 
1.2%
17408 11
 
1.1%
Other values (501) 826
82.6%
ValueCountFrequency (%)
0 27
2.7%
5632 1
 
0.1%
9682 1
 
0.1%
9728 2
 
0.2%
10240 1
 
0.1%
11264 3
 
0.3%
11776 2
 
0.2%
12276 1
 
0.1%
12288 3
 
0.3%
12800 4
 
0.4%
ValueCountFrequency (%)
174870398 1
0.1%
13515884 1
0.1%
12095078 1
0.1%
9804288 1
0.1%
9504315 1
0.1%
9432381 1
0.1%
9263104 1
0.1%
8678400 1
0.1%
8501459 1
0.1%
8252990 1
0.1%

Interactions

2023-12-12T21:59:45.533070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T21:59:48.623798image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
파일확장자파일크기
파일확장자1.0000.000
파일크기0.0001.000
2023-12-12T21:59:48.714006image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
파일크기파일확장자
파일크기1.0000.000
파일확장자0.0001.000

Missing values

2023-12-12T21:59:45.680311image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:59:45.810019image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

첨부파일아이디저장파일명원파일명파일확장자파일크기
0FILE_000000000000001공방교실시간표.xls공방교실시간표.xlsxls25088
1FILE_000000000000002수강신청서.hwp수강신청서.hwphwp9682
2FILE_000000000000003점2호점운영컨소시엄사업자선정공고.hwp점2호점운영컨소시엄사업자선정공고.hwphwp154357
3FILE_000000000000004점2호점운영컨소시엄사업자선정공고(2).hwp점2호점운영컨소시엄사업자선정공고(2).hwphwp154209
4FILE_000000000000005프랑스 포레드 파리 박람회 참가 공예단체 공모.hwp프랑스 포레드 파리 박람회 참가 공예단체 공모.hwphwp34816
5FILE_000000000000006공모요강 한글97.hwp공모요강 한글97.hwphwp38124
6FILE_000000000000007열린공예장터.hwp열린공예장터.hwphwp19253
7FILE_000000000000008홈페이지 사업자 선정공고1(비투비).hwp홈페이지 사업자 선정공고1(비투비).hwphwp16384
8FILE_000000000000009공예CEO과정 강의시간표.hwp공예CEO과정 강의시간표.hwphwp12276
9FILE_000000000000010박람회 참가 공예단체 공모_97.hwp박람회 참가 공예단체 공모_97.hwphwp22016
첨부파일아이디저장파일명원파일명파일확장자파일크기
990FILE_00000000000074614858254673040.hwp[붙임] 신청서_한국공예디자인문화진흥원 직영 숍 입점.hwphwp29696
991FILE_00000000000074714926479596640.hwp2017+우수공예품+지정제도+선정공모+공고문.hwphwp27648
992FILE_00000000000074814861061896810.hwp1. 입찰공고문.hwphwp90624
993FILE_00000000000074814861061896811.hwp2. 제안요청서.hwphwp148992
994FILE_00000000000074914864612840260.hwp1. 입찰공고문.hwphwp74240
995FILE_00000000000074914864612840271.hwp2. 과업지시서.hwphwp29184
996FILE_00000000000074914864612840272.pdf3. 작품목록표.pdfpdf975192
997FILE_00000000000075014865411358010.hwp1. 입찰공고문.hwphwp90624
998FILE_00000000000075014865411358021.hwp2. 제안요청서.hwphwp104960
999FILE_00000000000075114866874904770.hwp[양식1] 2017참가신청서_홍길동.hwphwp32768