Overview

Dataset statistics

Number of variables4
Number of observations5138
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory165.7 KiB
Average record size in memory33.0 B

Variable types

Numeric1
Text2
Categorical1

Dataset

Description다문화교육 포털에 등록되어있는 자료의 첨부파일 목록입니다. 첨부파일명, 확장자, 파일용량, 등록날짜 항목을 확인 가능합니다.
URLhttps://www.data.go.kr/data/15038811/fileData.do

Alerts

파일유형 is highly imbalanced (60.3%)Imbalance
연번 has unique valuesUnique
업로드링크 has unique valuesUnique

Reproduction

Analysis started2023-12-12 19:36:05.815957
Analysis finished2023-12-12 19:36:06.769620
Duration0.95 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct5138
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2569.5
Minimum1
Maximum5138
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size45.3 KiB
2023-12-13T04:36:06.876873image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile257.85
Q11285.25
median2569.5
Q33853.75
95-th percentile4881.15
Maximum5138
Range5137
Interquartile range (IQR)2568.5

Descriptive statistics

Standard deviation1483.3572
Coefficient of variation (CV)0.57729409
Kurtosis-1.2
Mean2569.5
Median Absolute Deviation (MAD)1284.5
Skewness0
Sum13202091
Variance2200348.5
MonotonicityStrictly increasing
2023-12-13T04:36:07.056944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
3374 1
 
< 0.1%
3432 1
 
< 0.1%
3431 1
 
< 0.1%
3430 1
 
< 0.1%
3429 1
 
< 0.1%
3428 1
 
< 0.1%
3427 1
 
< 0.1%
3426 1
 
< 0.1%
3425 1
 
< 0.1%
Other values (5128) 5128
99.8%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
5138 1
< 0.1%
5137 1
< 0.1%
5136 1
< 0.1%
5135 1
< 0.1%
5134 1
< 0.1%
5133 1
< 0.1%
5132 1
< 0.1%
5131 1
< 0.1%
5130 1
< 0.1%
5129 1
< 0.1%
Distinct4809
Distinct (%)93.6%
Missing0
Missing (%)0.0%
Memory size40.3 KiB
2023-12-13T04:36:07.374853image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length125
Median length73
Mean length24.603931
Min length2

Characters and Unicode

Total characters126415
Distinct characters671
Distinct categories16 ?
Distinct scripts4 ?
Distinct blocks10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4550 ?
Unique (%)88.6%

Sample

1st row3~4학년 방과후 수학 보충 자료.pdf
2nd row(경기)2014년 중점학교 사업계획서-마석고.pdf
3rd row(경기)2014년 중점학교 사업계획서-마송중앙초.pdf
4th row(경기)2014년 중점학교 사업계획서-만정중.pdf
5th row(경기)2014년 중점학교 사업계획서-매홀중.pdf
ValueCountFrequency (%)
다문화교육 488
 
3.5%
다문화 364
 
2.6%
중점학교 287
 
2.0%
예비학교 261
 
1.9%
운영 156
 
1.1%
위한 154
 
1.1%
92
 
0.7%
한국어교육 82
 
0.6%
다문화중점학교 81
 
0.6%
운영계획서.pdf 81
 
0.6%
Other values (5973) 12007
85.4%
2023-12-13T04:36:07.891980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8923
 
7.1%
. 5374
 
4.3%
p 4625
 
3.7%
4045
 
3.2%
0 3389
 
2.7%
_ 3389
 
2.7%
3371
 
2.7%
2 2947
 
2.3%
2528
 
2.0%
2500
 
2.0%
Other values (661) 85324
67.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 73052
57.8%
Lowercase Letter 15389
 
12.2%
Decimal Number 11749
 
9.3%
Space Separator 8923
 
7.1%
Other Punctuation 5530
 
4.4%
Connector Punctuation 3389
 
2.7%
Close Punctuation 2361
 
1.9%
Open Punctuation 2356
 
1.9%
Uppercase Letter 2070
 
1.6%
Dash Punctuation 1479
 
1.2%
Other values (6) 117
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4045
 
5.5%
3371
 
4.6%
2528
 
3.5%
2500
 
3.4%
2487
 
3.4%
2276
 
3.1%
1614
 
2.2%
1535
 
2.1%
1256
 
1.7%
1242
 
1.7%
Other values (569) 50198
68.7%
Lowercase Letter
ValueCountFrequency (%)
p 4625
30.1%
d 2492
16.2%
f 2467
16.0%
g 2033
13.2%
j 1902
12.4%
i 220
 
1.4%
n 205
 
1.3%
e 180
 
1.2%
o 162
 
1.1%
a 158
 
1.0%
Other values (18) 945
 
6.1%
Uppercase Letter
ValueCountFrequency (%)
P 554
26.8%
G 496
24.0%
J 494
23.9%
K 52
 
2.5%
S 48
 
2.3%
F 46
 
2.2%
D 44
 
2.1%
L 39
 
1.9%
I 38
 
1.8%
A 30
 
1.4%
Other values (15) 229
11.1%
Other Punctuation
ValueCountFrequency (%)
. 5374
97.2%
, 56
 
1.0%
% 42
 
0.8%
; 32
 
0.6%
· 15
 
0.3%
! 4
 
0.1%
& 3
 
0.1%
' 2
 
< 0.1%
\ 1
 
< 0.1%
@ 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0 3389
28.8%
2 2947
25.1%
1 2294
19.5%
6 687
 
5.8%
4 648
 
5.5%
5 496
 
4.2%
9 438
 
3.7%
3 365
 
3.1%
8 280
 
2.4%
7 205
 
1.7%
Close Punctuation
ValueCountFrequency (%)
) 2151
91.1%
] 208
 
8.8%
2
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 2147
91.1%
[ 207
 
8.8%
2
 
0.1%
Letter Number
ValueCountFrequency (%)
2
33.3%
2
33.3%
2
33.3%
Math Symbol
ValueCountFrequency (%)
+ 66
68.0%
~ 31
32.0%
Other Number
ValueCountFrequency (%)
2
50.0%
2
50.0%
Space Separator
ValueCountFrequency (%)
8923
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3389
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1479
100.0%
Other Symbol
ValueCountFrequency (%)
6
100.0%
Final Punctuation
ValueCountFrequency (%)
2
100.0%
Initial Punctuation
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 73016
57.8%
Common 35898
28.4%
Latin 17465
 
13.8%
Han 36
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4045
 
5.5%
3371
 
4.6%
2528
 
3.5%
2500
 
3.4%
2487
 
3.4%
2276
 
3.1%
1614
 
2.2%
1535
 
2.1%
1256
 
1.7%
1242
 
1.7%
Other values (549) 50162
68.7%
Latin
ValueCountFrequency (%)
p 4625
26.5%
d 2492
14.3%
f 2467
14.1%
g 2033
11.6%
j 1902
10.9%
P 554
 
3.2%
G 496
 
2.8%
J 494
 
2.8%
i 220
 
1.3%
n 205
 
1.2%
Other values (46) 1977
11.3%
Common
ValueCountFrequency (%)
8923
24.9%
. 5374
15.0%
0 3389
 
9.4%
_ 3389
 
9.4%
2 2947
 
8.2%
1 2294
 
6.4%
) 2151
 
6.0%
( 2147
 
6.0%
- 1479
 
4.1%
6 687
 
1.9%
Other values (26) 3118
 
8.7%
Han
ValueCountFrequency (%)
4
 
11.1%
4
 
11.1%
2
 
5.6%
2
 
5.6%
2
 
5.6%
2
 
5.6%
2
 
5.6%
2
 
5.6%
2
 
5.6%
2
 
5.6%
Other values (10) 12
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 73008
57.8%
ASCII 53321
42.2%
CJK 35
 
< 0.1%
None 22
 
< 0.1%
Compat Jamo 8
 
< 0.1%
Misc Symbols 6
 
< 0.1%
Number Forms 6
 
< 0.1%
Enclosed Alphanum 4
 
< 0.1%
Punctuation 4
 
< 0.1%
CJK Compat Ideographs 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
8923
16.7%
. 5374
 
10.1%
p 4625
 
8.7%
0 3389
 
6.4%
_ 3389
 
6.4%
2 2947
 
5.5%
d 2492
 
4.7%
f 2467
 
4.6%
1 2294
 
4.3%
) 2151
 
4.0%
Other values (68) 15270
28.6%
Hangul
ValueCountFrequency (%)
4045
 
5.5%
3371
 
4.6%
2528
 
3.5%
2500
 
3.4%
2487
 
3.4%
2276
 
3.1%
1614
 
2.2%
1535
 
2.1%
1256
 
1.7%
1242
 
1.7%
Other values (546) 50154
68.7%
None
ValueCountFrequency (%)
· 15
68.2%
2
 
9.1%
2
 
9.1%
ȸ 1
 
4.5%
ó 1
 
4.5%
ĸ 1
 
4.5%
Misc Symbols
ValueCountFrequency (%)
6
100.0%
Compat Jamo
ValueCountFrequency (%)
6
75.0%
1
 
12.5%
1
 
12.5%
CJK
ValueCountFrequency (%)
4
 
11.4%
4
 
11.4%
2
 
5.7%
2
 
5.7%
2
 
5.7%
2
 
5.7%
2
 
5.7%
2
 
5.7%
2
 
5.7%
2
 
5.7%
Other values (9) 11
31.4%
Enclosed Alphanum
ValueCountFrequency (%)
2
50.0%
2
50.0%
Number Forms
ValueCountFrequency (%)
2
33.3%
2
33.3%
2
33.3%
Punctuation
ValueCountFrequency (%)
2
50.0%
2
50.0%
CJK Compat Ideographs
ValueCountFrequency (%)
1
100.0%

파일유형
Categorical

IMBALANCE 

Distinct24
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size40.3 KiB
pdf
2436 
jpg
1895 
JPG
484 
zip
 
73
hwp
 
61
Other values (19)
 
189

Length

Max length4
Median length3
Mean length3.0035033
Min length2

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowpdf
2nd rowpdf
3rd rowpdf
4th rowpdf
5th rowpdf

Common Values

ValueCountFrequency (%)
pdf 2436
47.4%
jpg 1895
36.9%
JPG 484
 
9.4%
zip 73
 
1.4%
hwp 61
 
1.2%
png 59
 
1.1%
PDF 36
 
0.7%
mp4 17
 
0.3%
avi 16
 
0.3%
mpeg 10
 
0.2%
Other values (14) 51
 
1.0%

Length

2023-12-13T04:36:08.090033image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pdf 2472
48.1%
jpg 2379
46.3%
zip 73
 
1.4%
png 65
 
1.3%
hwp 64
 
1.2%
mp4 18
 
0.4%
avi 16
 
0.3%
mpeg 10
 
0.2%
pptx 9
 
0.2%
doc 6
 
0.1%
Other values (9) 26
 
0.5%

업로드링크
Text

UNIQUE 

Distinct5138
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size40.3 KiB
2023-12-13T04:36:08.492275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length104
Median length76
Mean length78.148307
Min length2

Characters and Unicode

Total characters401526
Distinct characters53
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5138 ?
Unique (%)100.0%

Sample

1st rowhttp://file.edu4mc.or.kr/nime_upload/attach/1000000/1001800/20160118074228-1344-1.pdf
2nd rowhttp://file.edu4mc.or.kr/nime_upload/attach/00000/811/20161108153144630.pdf
3rd rowhttp://file.edu4mc.or.kr/nime_upload/attach/00000/812/20161108153204210.pdf
4th rowhttp://file.edu4mc.or.kr/nime_upload/attach/00000/813/20161108153226803.pdf
5th rowhttp://file.edu4mc.or.kr/nime_upload/attach/00000/814/20161108153256365.pdf
ValueCountFrequency (%)
http://file.edu4mc.or.kr/nime_upload/attach/1000000/1001800/20160118074228-1344-1.pdf 1
 
< 0.1%
http://file.edu4mc.or.kr/nime_upload/attach/00000/2014/20170210154930748.pdf 1
 
< 0.1%
http://file.edu4mc.or.kr/nime_upload/attach/00000/2096/20170210173747374.jpg 1
 
< 0.1%
http://file.edu4mc.or.kr/nime_upload/attach/00000/2096/20170210173747372.pdf 1
 
< 0.1%
http://file.edu4mc.or.kr/nime_upload/attach/00000/2095/20170210173715158.jpg 1
 
< 0.1%
http://file.edu4mc.or.kr/nime_upload/attach/00000/2095/20170210173715156.pdf 1
 
< 0.1%
http://file.edu4mc.or.kr/nime_upload/attach/00000/2094/20170210173626349.jpg 1
 
< 0.1%
http://file.edu4mc.or.kr/nime_upload/attach/00000/2094/20170210173626347.pdf 1
 
< 0.1%
http://file.edu4mc.or.kr/nime_upload/attach/00000/2070/20170210171116932.pdf 1
 
< 0.1%
http://file.edu4mc.or.kr/nime_upload/attach/00000/2093/20170210173554864.jpg 1
 
< 0.1%
Other values (5128) 5128
99.8%
2023-12-13T04:36:09.049234image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 54257
 
13.5%
/ 35677
 
8.9%
1 27204
 
6.8%
t 20261
 
5.0%
. 20100
 
5.0%
2 18106
 
4.5%
a 15440
 
3.8%
e 15140
 
3.8%
p 14711
 
3.7%
d 12563
 
3.1%
Other values (43) 168067
41.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 180890
45.1%
Decimal Number 149739
37.3%
Other Punctuation 60768
 
15.1%
Connector Punctuation 5132
 
1.3%
Dash Punctuation 3386
 
0.8%
Uppercase Letter 1611
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 20261
 
11.2%
a 15440
 
8.5%
e 15140
 
8.4%
p 14711
 
8.1%
d 12563
 
6.9%
i 10218
 
5.6%
h 10183
 
5.6%
m 10169
 
5.6%
o 10136
 
5.6%
c 10135
 
5.6%
Other values (15) 51934
28.7%
Uppercase Letter
ValueCountFrequency (%)
P 530
32.9%
G 495
30.7%
J 484
30.0%
F 40
 
2.5%
D 40
 
2.5%
N 7
 
0.4%
C 4
 
0.2%
H 4
 
0.2%
W 3
 
0.2%
M 2
 
0.1%
Other values (2) 2
 
0.1%
Decimal Number
ValueCountFrequency (%)
0 54257
36.2%
1 27204
18.2%
2 18106
 
12.1%
4 11909
 
8.0%
8 8721
 
5.8%
7 7427
 
5.0%
6 7092
 
4.7%
3 5560
 
3.7%
5 5225
 
3.5%
9 4238
 
2.8%
Other Punctuation
ValueCountFrequency (%)
/ 35677
58.7%
. 20100
33.1%
: 4990
 
8.2%
\ 1
 
< 0.1%
Connector Punctuation
ValueCountFrequency (%)
_ 5132
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3386
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 219025
54.5%
Latin 182501
45.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 20261
 
11.1%
a 15440
 
8.5%
e 15140
 
8.3%
p 14711
 
8.1%
d 12563
 
6.9%
i 10218
 
5.6%
h 10183
 
5.6%
m 10169
 
5.6%
o 10136
 
5.6%
c 10135
 
5.6%
Other values (27) 53545
29.3%
Common
ValueCountFrequency (%)
0 54257
24.8%
/ 35677
16.3%
1 27204
12.4%
. 20100
 
9.2%
2 18106
 
8.3%
4 11909
 
5.4%
8 8721
 
4.0%
7 7427
 
3.4%
6 7092
 
3.2%
3 5560
 
2.5%
Other values (6) 22972
10.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 401526
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 54257
 
13.5%
/ 35677
 
8.9%
1 27204
 
6.8%
t 20261
 
5.0%
. 20100
 
5.0%
2 18106
 
4.5%
a 15440
 
3.8%
e 15140
 
3.8%
p 14711
 
3.7%
d 12563
 
3.1%
Other values (43) 168067
41.9%

Interactions

2023-12-13T04:36:06.459834image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T04:36:09.195986image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번파일유형
연번1.0000.620
파일유형0.6201.000
2023-12-13T04:36:09.298618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번파일유형
연번1.0000.280
파일유형0.2801.000

Missing values

2023-12-13T04:36:06.617471image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T04:36:06.719413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번파일명파일유형업로드링크
013~4학년 방과후 수학 보충 자료.pdfpdfhttp://file.edu4mc.or.kr/nime_upload/attach/1000000/1001800/20160118074228-1344-1.pdf
12(경기)2014년 중점학교 사업계획서-마석고.pdfpdfhttp://file.edu4mc.or.kr/nime_upload/attach/00000/811/20161108153144630.pdf
23(경기)2014년 중점학교 사업계획서-마송중앙초.pdfpdfhttp://file.edu4mc.or.kr/nime_upload/attach/00000/812/20161108153204210.pdf
34(경기)2014년 중점학교 사업계획서-만정중.pdfpdfhttp://file.edu4mc.or.kr/nime_upload/attach/00000/813/20161108153226803.pdf
45(경기)2014년 중점학교 사업계획서-매홀중.pdfpdfhttp://file.edu4mc.or.kr/nime_upload/attach/00000/814/20161108153256365.pdf
56(경기)2014년 중점학교 사업계획서-봉일천고.pdfpdfhttp://file.edu4mc.or.kr/nime_upload/attach/00000/815/20161108153313540.pdf
67(경기)2014년 중점학교 사업계획서-부발중.pdfpdfhttp://file.edu4mc.or.kr/nime_upload/attach/00000/816/20161108153503468.pdf
78(경남)2014년 예비학교 사업계획-합천여자중학교.pdfpdfhttp://file.edu4mc.or.kr/nime_upload/attach/00000/737/20161108165714387.pdf
89(경남)2014년 예비학교 사업계획-화정초등학교.pdfpdfhttp://file.edu4mc.or.kr/nime_upload/attach/00000/738/20161108165731256.pdf
910(경남)2014년 중점학교 사업계획서-.초동초.pdfpdfhttp://file.edu4mc.or.kr/nime_upload/attach/00000/837/20161108165820129.pdf
연번파일명파일유형업로드링크
51285129다문화교육 미디어클리핑 뷰어 접속링크 전체.hwphwp/nime_upload/attach/00000/2817/20230413043530636.hwp
512951303월 썸네일.pngpng/nime_upload/attach/00000/2817/20230413043530744.png
51305131제2회 충청다문화포럼 자료집.pdfpdfhttp://file.edu4mc.or.kr/nime_upload/attach/00000/2276/20170216173202980.pdf
51315132차수희.jpgjpg/nime_upload/attach/00000/2821/20230511041607215.jpg
513251332023년 한국어학급 담당교원 다문화 역량강화 워크숍(중등).pdfpdf/nime_upload/attach/00000/2828/20230511043129970.pdf
51335134중등.jpgjpg/nime_upload/attach/00000/2828/20230511043130170.jpg
513451352023년 한국어학급 담당교원 다문화 역량강화 워크숍(초등).pdfpdf/nime_upload/attach/00000/2829/20230511043248444.pdf
51355136초등.jpgjpg/nime_upload/attach/00000/2829/20230511043248678.jpg
513651372023년 한국어학급 관리자 다문화 역량강화 워크숍(초·중등).pdfpdf/nime_upload/attach/00000/2830/20230511043606529.pdf
51375138관리자.jpgjpg/nime_upload/attach/00000/2830/20230511043606651.jpg