Dataset statistics
Number of variables | 3 |
---|---|
Number of observations | 10000 |
Missing cells | 1 |
Missing cells (%) | < 0.1% |
Duplicate rows | 207 |
Duplicate rows (%) | 2.1% |
Total size in memory | 312.5 KiB |
Average record size in memory | 32.0 B |
Variable types
Text | 1 |
---|---|
Categorical | 2 |
Dataset
Description | 한국연구재단이 보유하고있는 기초학문자료센터 시스템에 있는 원문목록데이터입니다. 대표 데이터로는 원문목록명, 작성기관등이 있습니다. |
---|---|
Author | 한국연구재단 |
URL | https://www.data.go.kr/data/15092444/fileData.do |
Dataset has 207 (2.1%) duplicate rows | Duplicates |
확장자 is highly imbalanced (54.6%) | Imbalance |
Reproduction
Analysis started | 2023-12-16 15:04:46.935308 |
---|---|
Analysis finished | 2023-12-16 15:04:51.498624 |
Duration | 4.56 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
원문목록명
Text
Distinct | 9629 |
---|---|
Distinct (%) | 96.3% |
Missing | 1 |
Missing (%) | < 0.1% |
Memory size | 156.2 KiB |
Length
Max length | 183 |
---|---|
Median length | 114 |
Mean length | 16.591759 |
Min length | 5 |
Characters and Unicode
Total characters | 165901 |
---|---|
Distinct characters | 1137 |
Distinct categories | 15 ? |
Distinct scripts | 6 ? |
Distinct blocks | 9 ? |
Unique
Unique | 9358 ? |
---|---|
Unique (%) | 93.6% |
Sample
1st row | A000131.pdf |
---|---|
2nd row | 139-2.jpg |
3rd row | 언어적 입력의 품사가 영아의 초기 어휘발달에 미치는 영향.pdf |
4th row | 2213s4_273.mp3 |
5th row | AS201512.pdf |
Value | Count | Frequency (%) |
표현된 | 121 | 0.8% |
사진 | 117 | 0.7% |
관한 | 86 | 0.5% |
75 | 0.5% | |
연구.pdf | 63 | 0.4% |
대한 | 57 | 0.4% |
picture | 46 | 0.3% |
및 | 41 | 0.3% |
고문서 | 35 | 0.2% |
미치는 | 29 | 0.2% |
Other values (12853) | 15093 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 24266 | 14.6% |
1 | 13213 | 8.0% |
. | 10541 | 6.4% |
p | 8580 | 5.2% |
2 | 7838 | 4.7% |
3 | 7631 | 4.6% |
5780 | 3.5% | |
5 | 5325 | 3.2% |
4 | 4553 | 2.7% |
_ | 4249 | 2.6% |
Other values (1127) | 73925 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 75640 | |
Lowercase Letter | 29727 | 17.9% |
Other Letter | 25573 | 15.4% |
Other Punctuation | 10789 | 6.5% |
Uppercase Letter | 10648 | 6.4% |
Space Separator | 5780 | 3.5% |
Connector Punctuation | 4249 | 2.6% |
Dash Punctuation | 2564 | 1.5% |
Open Punctuation | 453 | 0.3% |
Close Punctuation | 453 | 0.3% |
Other values (5) | 25 | < 0.1% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
의 | 780 | 3.1% |
사 | 524 | 2.0% |
에 | 451 | 1.8% |
장 | 390 | 1.5% |
소 | 387 | 1.5% |
한 | 345 | 1.3% |
구 | 343 | 1.3% |
문 | 342 | 1.3% |
기 | 326 | 1.3% |
서 | 325 | 1.3% |
Other values (1018) | 21360 |
Lowercase Letter
Value | Count | Frequency (%) |
p | 8580 | |
m | 3660 | |
d | 3577 | |
f | 3499 | |
g | 2081 | 7.0% |
j | 1963 | 6.6% |
s | 1492 | 5.0% |
e | 597 | 2.0% |
i | 491 | 1.7% |
n | 410 | 1.4% |
Other values (30) | 3377 | 11.4% |
Uppercase Letter
Value | Count | Frequency (%) |
P | 1392 | |
G | 1310 | |
J | 1089 | |
A | 1049 | |
S | 1033 | |
B | 999 | |
M | 736 | |
C | 664 | |
D | 550 | 5.2% |
N | 447 | 4.2% |
Other values (20) | 1379 |
Decimal Number
Value | Count | Frequency (%) |
0 | 24266 | |
1 | 13213 | |
2 | 7838 | 10.4% |
3 | 7631 | 10.1% |
5 | 5325 | 7.0% |
4 | 4553 | 6.0% |
6 | 3702 | 4.9% |
7 | 3346 | 4.4% |
8 | 2982 | 3.9% |
9 | 2784 | 3.7% |
Other Punctuation
Value | Count | Frequency (%) |
. | 10541 | |
, | 201 | 1.9% |
' | 36 | 0.3% |
# | 5 | < 0.1% |
· | 2 | < 0.1% |
& | 2 | < 0.1% |
; | 1 | < 0.1% |
! | 1 | < 0.1% |
Open Punctuation
Value | Count | Frequency (%) |
( | 411 | |
[ | 31 | 6.8% |
『 | 7 | 1.5% |
〈 | 2 | 0.4% |
「 | 2 | 0.4% |
Close Punctuation
Value | Count | Frequency (%) |
) | 411 | |
] | 31 | 6.8% |
』 | 7 | 1.5% |
〉 | 2 | 0.4% |
」 | 2 | 0.4% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 2561 | |
- | 3 | 0.1% |
Math Symbol
Value | Count | Frequency (%) |
~ | 12 | |
+ | 1 | 7.7% |
Initial Punctuation
Value | Count | Frequency (%) |
‘ | 3 | |
“ | 1 | 25.0% |
Space Separator
Value | Count | Frequency (%) |
5780 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 4249 |
Final Punctuation
Value | Count | Frequency (%) |
’ | 5 |
Modifier Symbol
Value | Count | Frequency (%) |
` | 2 |
Letter Number
Value | Count | Frequency (%) |
Ⅱ | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 99952 | |
Latin | 40347 | |
Hangul | 24085 | 14.5% |
Han | 1486 | 0.9% |
Cyrillic | 29 | < 0.1% |
Hiragana | 2 | < 0.1% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
의 | 780 | 3.2% |
사 | 524 | 2.2% |
에 | 451 | 1.9% |
장 | 390 | 1.6% |
소 | 387 | 1.6% |
한 | 345 | 1.4% |
구 | 343 | 1.4% |
문 | 342 | 1.4% |
기 | 326 | 1.4% |
서 | 325 | 1.3% |
Other values (680) | 19872 |
Han
Value | Count | Frequency (%) |
圖 | 103 | 6.9% |
地 | 84 | 5.7% |
縣 | 76 | 5.1% |
輿 | 67 | 4.5% |
東 | 62 | 4.2% |
府 | 33 | 2.2% |
記 | 30 | 2.0% |
日 | 25 | 1.7% |
慶 | 23 | 1.5% |
月 | 20 | 1.3% |
Other values (327) | 963 |
Latin
Value | Count | Frequency (%) |
p | 8580 | |
m | 3660 | 9.1% |
d | 3577 | 8.9% |
f | 3499 | 8.7% |
g | 2081 | 5.2% |
j | 1963 | 4.9% |
s | 1492 | 3.7% |
P | 1392 | 3.5% |
G | 1310 | 3.2% |
J | 1089 | 2.7% |
Other values (43) | 11704 |
Common
Value | Count | Frequency (%) |
0 | 24266 | |
1 | 13213 | |
. | 10541 | |
2 | 7838 | 7.8% |
3 | 7631 | 7.6% |
5780 | 5.8% | |
5 | 5325 | 5.3% |
4 | 4553 | 4.6% |
_ | 4249 | 4.3% |
6 | 3702 | 3.7% |
Other values (28) | 12854 |
Cyrillic
Value | Count | Frequency (%) |
с | 5 | |
и | 4 | |
п | 2 | 6.9% |
е | 2 | 6.9% |
к | 2 | 6.9% |
л | 2 | 6.9% |
я | 1 | 3.4% |
о | 1 | 3.4% |
г | 1 | 3.4% |
д | 1 | 3.4% |
Other values (8) | 8 |
Hiragana
Value | Count | Frequency (%) |
の | 2 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 140262 | |
Hangul | 24085 | 14.5% |
CJK | 1448 | 0.9% |
CJK Compat Ideographs | 38 | < 0.1% |
Cyrillic | 29 | < 0.1% |
None | 27 | < 0.1% |
Punctuation | 9 | < 0.1% |
Hiragana | 2 | < 0.1% |
Number Forms | 1 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 24266 | |
1 | 13213 | 9.4% |
. | 10541 | 7.5% |
p | 8580 | 6.1% |
2 | 7838 | 5.6% |
3 | 7631 | 5.4% |
5780 | 4.1% | |
5 | 5325 | 3.8% |
4 | 4553 | 3.2% |
_ | 4249 | 3.0% |
Other values (69) | 48286 |
Hangul
Value | Count | Frequency (%) |
의 | 780 | 3.2% |
사 | 524 | 2.2% |
에 | 451 | 1.9% |
장 | 390 | 1.6% |
소 | 387 | 1.6% |
한 | 345 | 1.4% |
구 | 343 | 1.4% |
문 | 342 | 1.4% |
기 | 326 | 1.4% |
서 | 325 | 1.3% |
Other values (680) | 19872 |
CJK
Value | Count | Frequency (%) |
圖 | 103 | 7.1% |
地 | 84 | 5.8% |
縣 | 76 | 5.2% |
輿 | 67 | 4.6% |
東 | 62 | 4.3% |
府 | 33 | 2.3% |
記 | 30 | 2.1% |
日 | 25 | 1.7% |
慶 | 23 | 1.6% |
月 | 20 | 1.4% |
Other values (313) | 925 |
CJK Compat Ideographs
Value | Count | Frequency (%) |
嶺 | 18 | |
醴 | 3 | 7.9% |
龍 | 3 | 7.9% |
論 | 2 | 5.3% |
利 | 2 | 5.3% |
寧 | 2 | 5.3% |
寧 | 1 | 2.6% |
禮 | 1 | 2.6% |
律 | 1 | 2.6% |
奈 | 1 | 2.6% |
Other values (4) | 4 | 10.5% |
None
Value | Count | Frequency (%) |
『 | 7 | |
』 | 7 | |
- | 3 | |
〈 | 2 | 7.4% |
· | 2 | 7.4% |
〉 | 2 | 7.4% |
」 | 2 | 7.4% |
「 | 2 | 7.4% |
Punctuation
Value | Count | Frequency (%) |
’ | 5 | |
‘ | 3 | |
“ | 1 | 11.1% |
Cyrillic
Value | Count | Frequency (%) |
с | 5 | |
и | 4 | |
п | 2 | 6.9% |
е | 2 | 6.9% |
к | 2 | 6.9% |
л | 2 | 6.9% |
я | 1 | 3.4% |
о | 1 | 3.4% |
г | 1 | 3.4% |
д | 1 | 3.4% |
Other values (8) | 8 |
Hiragana
Value | Count | Frequency (%) |
の | 2 |
Number Forms
Value | Count | Frequency (%) |
Ⅱ | 1 |
확장자
Categorical
IMBALANCE
 
Distinct | 29 |
---|---|
Distinct (%) | 0.3% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
mp3 | |
---|---|
jpg | |
JPG | |
wmv | 272 |
Other values (24) | 236 |
Length
Max length | 4 |
---|---|
Median length | 3 |
Mean length | 3.0049 |
Min length | 2 |
Unique
Unique | 9 ? |
---|---|
Unique (%) | 0.1% |
Sample
1st row | |
---|---|
2nd row | jpg |
3rd row | |
4th row | mp3 |
5th row |
Common Values
Value | Count | Frequency (%) |
3437 | ||
mp3 | 3073 | |
jpg | 1906 | |
JPG | 1076 | 10.8% |
wmv | 272 | 2.7% |
TXT | 43 | 0.4% |
jpeg | 40 | 0.4% |
sav | 30 | 0.3% |
xls | 27 | 0.3% |
MP3 | 23 | 0.2% |
Other values (19) | 73 | 0.7% |
Length
Value | Count | Frequency (%) |
3443 | ||
mp3 | 3096 | |
jpg | 2982 | |
wmv | 272 | 2.7% |
txt | 44 | 0.4% |
jpeg | 40 | 0.4% |
xls | 32 | 0.3% |
sav | 30 | 0.3% |
zip | 21 | 0.2% |
htm | 9 | 0.1% |
Other values (12) | 31 | 0.3% |
작성기관
Categorical
Distinct | 9 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
SungkyunkwanUniv | |
---|---|
KOSSDA | |
KoreaUniv | |
ChonbukUniv | |
KongjuUniv | |
Other values (4) |
Length
Max length | 16 |
---|---|
Median length | 11 |
Mean length | 11.0771 |
Min length | 2 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | ChonbukUniv |
---|---|
2nd row | KoreaUniv |
3rd row | KOSSDA |
4th row | SungkyunkwanUniv |
5th row | KoreaUniv |
Common Values
Value | Count | Frequency (%) |
SungkyunkwanUniv | 3295 | |
KOSSDA | 2281 | |
KoreaUniv | 1405 | |
ChonbukUniv | 1345 | |
KongjuUniv | 706 | 7.1% |
SeoulUniv | 351 | 3.5% |
ChonnamUniv | 315 | 3.1% |
MyongjiUniv | 293 | 2.9% |
-1 | 9 | 0.1% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
sungkyunkwanuniv | 3295 | |
kossda | 2281 | |
koreauniv | 1405 | |
chonbukuniv | 1345 | |
kongjuuniv | 706 | 7.1% |
seouluniv | 351 | 3.5% |
chonnamuniv | 315 | 3.1% |
myongjiuniv | 293 | 2.9% |
1 | 9 | 0.1% |
확장자 | 작성기관 | |
---|---|---|
확장자 | 1.000 | 0.713 |
작성기관 | 0.713 | 1.000 |
작성기관 | 확장자 | |
---|---|---|
작성기관 | 1.000 | 0.359 |
확장자 | 0.359 | 1.000 |
확장자 | 작성기관 | |
---|---|---|
확장자 | 1.000 | 0.359 |
작성기관 | 0.359 | 1.000 |
원문목록명 | 확장자 | 작성기관 | |
---|---|---|---|
15303 | A000131.pdf | ChonbukUniv | |
46395 | 139-2.jpg | jpg | KoreaUniv |
84200 | 언어적 입력의 품사가 영아의 초기 어휘발달에 미치는 영향.pdf | KOSSDA | |
77296 | 2213s4_273.mp3 | mp3 | SungkyunkwanUniv |
2601 | AS201512.pdf | KoreaUniv | |
6176 | BS20704.pdf | KOSSDA | |
42188 | 1000100550808_47.mp3 | mp3 | SungkyunkwanUniv |
17390 | 44950010.JPG | JPG | KOSSDA |
59433 | 1000100570071_17.mp3 | mp3 | SungkyunkwanUniv |
39238 | m290603.jpg | jpg | KoreaUniv |
원문목록명 | 확장자 | 작성기관 | |
---|---|---|---|
27278 | 김동규소장 연길7.jpg | jpg | ChonbukUniv |
70768 | 한국교육의 종합이해와 미래구상III(면담조사자료집).pdf | KOSSDA | |
16164 | 고05948 copy.jpg | jpg | ChonbukUniv |
63542 | 1000100570266_98.mp3 | mp3 | SungkyunkwanUniv |
30276 | 박상호소장 통문1.JPG | JPG | ChonbukUniv |
27006 | 1000100560084_38.mp3 | mp3 | SungkyunkwanUniv |
16193 | 중간보고서.pdf | ChonbukUniv | |
73099 | S1MM06-12.jpg | jpg | KOSSDA |
17110 | 2222s4_659.mp3 | mp3 | SungkyunkwanUniv |
1170 | A200056.pdf | KoreaUniv |
Most frequently occurring
원문목록명 | 확장자 | 작성기관 | # duplicates | |
---|---|---|---|---|
1 | 1-1.jpg | jpg | KoreaUniv | 4 |
14 | 16-1.jpg | jpg | KoreaUniv | 4 |
21 | 26-1.jpg | jpg | KoreaUniv | 4 |
105 | B000021.pdf | ChonnamUniv | 4 | |
107 | B000051.pdf | ChonnamUniv | 4 | |
112 | B000081.pdf | ChonnamUniv | 4 | |
131 | B000381.pdf | ChonnamUniv | 4 | |
171 | G000121.pdf | MyongjiUniv | 4 | |
177 | G000571.pdf | MyongjiUniv | 4 | |
3 | 10-2.jpg | jpg | KoreaUniv | 3 |