Overview

Dataset statistics

Number of variables5
Number of observations1127
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory46.4 KiB
Average record size in memory42.1 B

Variable types

Categorical1
Numeric2
Text2

Dataset

Description대한민국학술원에서 발간한 학술 논문집으로 자연과학분야와 인문사회과학 분야를 나눈 목록(분야구분, 순번, 발표년도, 저자, 논문명)
Author교육부 학술원사무국
URLhttps://www.data.go.kr/data/15067006/fileData.do

Alerts

순번 is highly overall correlated with 년도High correlation
년도 is highly overall correlated with 순번High correlation

Reproduction

Analysis started2024-03-16 04:17:48.025190
Analysis finished2024-03-16 04:17:51.368071
Duration3.34 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

구분
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
자연과학
599 
인문사회
528 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row인문사회
2nd row인문사회
3rd row인문사회
4th row인문사회
5th row인문사회

Common Values

ValueCountFrequency (%)
자연과학 599
53.1%
인문사회 528
46.9%

Length

2024-03-16T13:17:51.455266image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-16T13:17:51.623023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
자연과학 599
53.1%
인문사회 528
46.9%

순번
Real number (ℝ)

HIGH CORRELATION 

Distinct599
Distinct (%)53.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean283.36823
Minimum1
Maximum599
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.0 KiB
2024-03-16T13:17:51.838081image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile29
Q1141.5
median282
Q3423
95-th percentile542.7
Maximum599
Range598
Interquartile range (IQR)281.5

Descriptive statistics

Standard deviation164.6629
Coefficient of variation (CV)0.5810916
Kurtosis-1.1463141
Mean283.36823
Median Absolute Deviation (MAD)141
Skewness0.039555428
Sum319356
Variance27113.87
MonotonicityNot monotonic
2024-03-16T13:17:52.075379image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 2
 
0.2%
349 2
 
0.2%
363 2
 
0.2%
362 2
 
0.2%
361 2
 
0.2%
360 2
 
0.2%
359 2
 
0.2%
358 2
 
0.2%
357 2
 
0.2%
356 2
 
0.2%
Other values (589) 1107
98.2%
ValueCountFrequency (%)
1 2
0.2%
2 2
0.2%
3 2
0.2%
4 2
0.2%
5 2
0.2%
6 2
0.2%
7 2
0.2%
8 2
0.2%
9 2
0.2%
10 2
0.2%
ValueCountFrequency (%)
599 1
0.1%
598 1
0.1%
597 1
0.1%
596 1
0.1%
595 1
0.1%
594 1
0.1%
593 1
0.1%
592 1
0.1%
591 1
0.1%
590 1
0.1%

년도
Real number (ℝ)

HIGH CORRELATION 

Distinct64
Distinct (%)5.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1995.3736
Minimum1959
Maximum2023
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.0 KiB
2024-03-16T13:17:52.372849image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1959
5-th percentile1966
Q11981
median1995
Q32011
95-th percentile2022
Maximum2023
Range64
Interquartile range (IQR)30

Descriptive statistics

Standard deviation17.658788
Coefficient of variation (CV)0.0088498658
Kurtosis-1.1312322
Mean1995.3736
Median Absolute Deviation (MAD)15
Skewness-0.090075557
Sum2248786
Variance311.8328
MonotonicityNot monotonic
2024-03-16T13:17:52.952267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2023 40
 
3.5%
2020 30
 
2.7%
2019 26
 
2.3%
1977 24
 
2.1%
1980 24
 
2.1%
1984 24
 
2.1%
2022 24
 
2.1%
1983 24
 
2.1%
1982 24
 
2.1%
1981 24
 
2.1%
Other values (54) 863
76.6%
ValueCountFrequency (%)
1959 8
0.7%
1960 9
0.8%
1961 8
0.7%
1963 8
0.7%
1964 10
0.9%
1965 10
0.9%
1966 7
0.6%
1967 4
 
0.4%
1968 4
 
0.4%
1969 8
0.7%
ValueCountFrequency (%)
2023 40
3.5%
2022 24
2.1%
2021 13
 
1.2%
2020 30
2.7%
2019 26
2.3%
2018 23
2.0%
2017 20
1.8%
2016 22
2.0%
2015 23
2.0%
2014 20
1.8%

저자
Text

Distinct628
Distinct (%)55.7%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
2024-03-16T13:17:53.603492image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length3
Mean length3.6078083
Min length2

Characters and Unicode

Total characters4066
Distinct characters224
Distinct categories6 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique397 ?
Unique (%)35.2%

Sample

1st row김두헌
2nd row최현배
3rd row이숭녕
4th row신기석
5th row이숭령
ValueCountFrequency (%)
291
 
20.3%
박세희 22
 
1.5%
기우항 15
 
1.0%
박정기 13
 
0.9%
고승제 13
 
0.9%
김준보 11
 
0.8%
김철수 11
 
0.8%
이숭녕 10
 
0.7%
강영선 9
 
0.6%
김재근 9
 
0.6%
Other values (543) 1029
71.8%
2024-03-16T13:17:54.410424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
318
 
7.8%
291
 
7.2%
227
 
5.6%
220
 
5.4%
101
 
2.5%
87
 
2.1%
86
 
2.1%
86
 
2.1%
66
 
1.6%
63
 
1.5%
Other values (214) 2521
62.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3661
90.0%
Space Separator 318
 
7.8%
Lowercase Letter 54
 
1.3%
Uppercase Letter 19
 
0.5%
Other Punctuation 12
 
0.3%
Dash Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
291
 
7.9%
227
 
6.2%
220
 
6.0%
101
 
2.8%
87
 
2.4%
86
 
2.3%
86
 
2.3%
66
 
1.8%
63
 
1.7%
60
 
1.6%
Other values (190) 2374
64.8%
Lowercase Letter
ValueCountFrequency (%)
n 8
14.8%
i 6
11.1%
o 5
9.3%
u 5
9.3%
k 5
9.3%
e 5
9.3%
g 5
9.3%
a 5
9.3%
h 3
 
5.6%
c 3
 
5.6%
Other values (3) 4
7.4%
Uppercase Letter
ValueCountFrequency (%)
P 5
26.3%
H 4
21.1%
Y 3
15.8%
S 3
15.8%
U 1
 
5.3%
G 1
 
5.3%
B 1
 
5.3%
K 1
 
5.3%
Space Separator
ValueCountFrequency (%)
318
100.0%
Other Punctuation
ValueCountFrequency (%)
, 12
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3652
89.8%
Common 332
 
8.2%
Latin 73
 
1.8%
Han 9
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
291
 
8.0%
227
 
6.2%
220
 
6.0%
101
 
2.8%
87
 
2.4%
86
 
2.4%
86
 
2.4%
66
 
1.8%
63
 
1.7%
60
 
1.6%
Other values (181) 2365
64.8%
Latin
ValueCountFrequency (%)
n 8
 
11.0%
i 6
 
8.2%
o 5
 
6.8%
u 5
 
6.8%
k 5
 
6.8%
e 5
 
6.8%
g 5
 
6.8%
P 5
 
6.8%
a 5
 
6.8%
H 4
 
5.5%
Other values (11) 20
27.4%
Han
ValueCountFrequency (%)
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Common
ValueCountFrequency (%)
318
95.8%
, 12
 
3.6%
- 2
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3652
89.8%
ASCII 405
 
10.0%
CJK 6
 
0.1%
CJK Compat Ideographs 3
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
318
78.5%
, 12
 
3.0%
n 8
 
2.0%
i 6
 
1.5%
o 5
 
1.2%
u 5
 
1.2%
k 5
 
1.2%
e 5
 
1.2%
g 5
 
1.2%
P 5
 
1.2%
Other values (14) 31
 
7.7%
Hangul
ValueCountFrequency (%)
291
 
8.0%
227
 
6.2%
220
 
6.0%
101
 
2.8%
87
 
2.4%
86
 
2.4%
86
 
2.4%
66
 
1.8%
63
 
1.7%
60
 
1.6%
Other values (181) 2365
64.8%
CJK Compat Ideographs
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
CJK
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%

제목
Text

Distinct1125
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
2024-03-16T13:17:55.053649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length178
Median length105
Mean length29.684117
Min length3

Characters and Unicode

Total characters33454
Distinct characters1204
Distinct categories15 ?
Distinct scripts5 ?
Distinct blocks9 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1123 ?
Unique (%)99.6%

Sample

1st row존재의 질서
2nd row"달아"의 읽기에 대하여
3rd row"-" 음고재론
4th row조선문제에 관한 로청 외교관계
5th row중기어의 이와작용의 고찰
ValueCountFrequency (%)
연구 225
 
3.5%
관한 176
 
2.7%
of 131
 
2.0%
the 77
 
1.2%
in 62
 
1.0%
62
 
1.0%
대한 58
 
0.9%
55
 
0.9%
and 52
 
0.8%
a 38
 
0.6%
Other values (4162) 5504
85.5%
2024-03-16T13:17:55.915621image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5565
 
16.6%
e 1102
 
3.3%
1003
 
3.0%
o 934
 
2.8%
i 934
 
2.8%
a 894
 
2.7%
n 827
 
2.5%
t 736
 
2.2%
r 687
 
2.1%
s 545
 
1.6%
Other values (1194) 20227
60.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 15795
47.2%
Lowercase Letter 9915
29.6%
Space Separator 5565
 
16.6%
Uppercase Letter 1369
 
4.1%
Decimal Number 198
 
0.6%
Other Punctuation 180
 
0.5%
Dash Punctuation 160
 
0.5%
Open Punctuation 97
 
0.3%
Close Punctuation 97
 
0.3%
Math Symbol 25
 
0.1%
Other values (5) 53
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1003
 
6.4%
498
 
3.2%
475
 
3.0%
365
 
2.3%
319
 
2.0%
263
 
1.7%
240
 
1.5%
208
 
1.3%
206
 
1.3%
205
 
1.3%
Other values (1089) 12013
76.1%
Lowercase Letter
ValueCountFrequency (%)
e 1102
11.1%
o 934
 
9.4%
i 934
 
9.4%
a 894
 
9.0%
n 827
 
8.3%
t 736
 
7.4%
r 687
 
6.9%
s 545
 
5.5%
l 448
 
4.5%
c 391
 
3.9%
Other values (18) 2417
24.4%
Uppercase Letter
ValueCountFrequency (%)
S 141
 
10.3%
C 122
 
8.9%
A 113
 
8.3%
T 99
 
7.2%
P 93
 
6.8%
R 79
 
5.8%
K 69
 
5.0%
M 66
 
4.8%
O 60
 
4.4%
N 60
 
4.4%
Other values (16) 467
34.1%
Other Punctuation
ValueCountFrequency (%)
, 62
34.4%
: 32
17.8%
. 23
 
12.8%
· 19
 
10.6%
' 17
 
9.4%
/ 11
 
6.1%
" 8
 
4.4%
; 3
 
1.7%
? 3
 
1.7%
! 1
 
0.6%
Decimal Number
ValueCountFrequency (%)
1 52
26.3%
2 29
14.6%
3 25
12.6%
9 23
11.6%
0 19
 
9.6%
4 19
 
9.6%
5 14
 
7.1%
7 7
 
3.5%
6 7
 
3.5%
8 3
 
1.5%
Math Symbol
ValueCountFrequency (%)
< 5
20.0%
> 5
20.0%
4
16.0%
= 3
12.0%
~ 3
12.0%
+ 3
12.0%
1
 
4.0%
1
 
4.0%
Letter Number
ValueCountFrequency (%)
10
50.0%
5
25.0%
2
 
10.0%
2
 
10.0%
1
 
5.0%
Open Punctuation
ValueCountFrequency (%)
( 74
76.3%
10
 
10.3%
[ 7
 
7.2%
6
 
6.2%
Close Punctuation
ValueCountFrequency (%)
) 74
76.3%
10
 
10.3%
] 7
 
7.2%
6
 
6.2%
Dash Punctuation
ValueCountFrequency (%)
- 158
98.8%
2
 
1.2%
Initial Punctuation
ValueCountFrequency (%)
8
53.3%
7
46.7%
Final Punctuation
ValueCountFrequency (%)
8
53.3%
7
46.7%
Space Separator
ValueCountFrequency (%)
5565
100.0%
Modifier Letter
ValueCountFrequency (%)
2
100.0%
Other Number
ValueCountFrequency (%)
² 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 14398
43.0%
Latin 11304
33.8%
Common 6353
19.0%
Han 1397
 
4.2%
Greek 2
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1003
 
7.0%
498
 
3.5%
475
 
3.3%
365
 
2.5%
319
 
2.2%
263
 
1.8%
240
 
1.7%
208
 
1.4%
206
 
1.4%
205
 
1.4%
Other values (576) 10616
73.7%
Han
ValueCountFrequency (%)
34
 
2.4%
34
 
2.4%
24
 
1.7%
23
 
1.6%
21
 
1.5%
19
 
1.4%
13
 
0.9%
12
 
0.9%
11
 
0.8%
11
 
0.8%
Other values (503) 1195
85.5%
Latin
ValueCountFrequency (%)
e 1102
 
9.7%
o 934
 
8.3%
i 934
 
8.3%
a 894
 
7.9%
n 827
 
7.3%
t 736
 
6.5%
r 687
 
6.1%
s 545
 
4.8%
l 448
 
4.0%
c 391
 
3.5%
Other values (48) 3806
33.7%
Common
ValueCountFrequency (%)
5565
87.6%
- 158
 
2.5%
( 74
 
1.2%
) 74
 
1.2%
, 62
 
1.0%
1 52
 
0.8%
: 32
 
0.5%
2 29
 
0.5%
3 25
 
0.4%
. 23
 
0.4%
Other values (35) 259
 
4.1%
Greek
ValueCountFrequency (%)
β 1
50.0%
α 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 17545
52.4%
Hangul 14396
43.0%
CJK 1361
 
4.1%
None 58
 
0.2%
CJK Compat Ideographs 36
 
0.1%
Punctuation 30
 
0.1%
Number Forms 20
 
0.1%
Math Operators 6
 
< 0.1%
Compat Jamo 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5565
31.7%
e 1102
 
6.3%
o 934
 
5.3%
i 934
 
5.3%
a 894
 
5.1%
n 827
 
4.7%
t 736
 
4.2%
r 687
 
3.9%
s 545
 
3.1%
l 448
 
2.6%
Other values (73) 4873
27.8%
Hangul
ValueCountFrequency (%)
1003
 
7.0%
498
 
3.5%
475
 
3.3%
365
 
2.5%
319
 
2.2%
263
 
1.8%
240
 
1.7%
208
 
1.4%
206
 
1.4%
205
 
1.4%
Other values (575) 10614
73.7%
CJK
ValueCountFrequency (%)
34
 
2.5%
34
 
2.5%
24
 
1.8%
23
 
1.7%
21
 
1.5%
19
 
1.4%
13
 
1.0%
12
 
0.9%
11
 
0.8%
11
 
0.8%
Other values (482) 1159
85.2%
None
ValueCountFrequency (%)
· 19
32.8%
10
17.2%
10
17.2%
6
 
10.3%
6
 
10.3%
2
 
3.4%
2
 
3.4%
² 1
 
1.7%
β 1
 
1.7%
α 1
 
1.7%
Number Forms
ValueCountFrequency (%)
10
50.0%
5
25.0%
2
 
10.0%
2
 
10.0%
1
 
5.0%
Punctuation
ValueCountFrequency (%)
8
26.7%
8
26.7%
7
23.3%
7
23.3%
CJK Compat Ideographs
ValueCountFrequency (%)
5
13.9%
4
 
11.1%
4
 
11.1%
2
 
5.6%
2
 
5.6%
2
 
5.6%
2
 
5.6%
2
 
5.6%
1
 
2.8%
1
 
2.8%
Other values (11) 11
30.6%
Math Operators
ValueCountFrequency (%)
4
66.7%
1
 
16.7%
1
 
16.7%
Compat Jamo
ValueCountFrequency (%)
2
100.0%

Interactions

2024-03-16T13:17:50.787482image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-16T13:17:50.426148image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-16T13:17:50.971404image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-16T13:17:50.621649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-16T13:17:56.093938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분순번년도
구분1.0000.2700.103
순번0.2701.0000.966
년도0.1030.9661.000
2024-03-16T13:17:56.293195image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번년도구분
순번1.0000.9920.206
년도0.9921.0000.075
구분0.2060.0751.000

Missing values

2024-03-16T13:17:51.189338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-16T13:17:51.317549image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

구분순번년도저자제목
0인문사회11959김두헌존재의 질서
1인문사회21959최현배"달아"의 읽기에 대하여
2인문사회31959이숭녕"-" 음고재론
3인문사회41959신기석조선문제에 관한 로청 외교관계
4인문사회51960이숭령중기어의 이와작용의 고찰
5인문사회61960김방한몽고어 Monguor방언의 어두자음군에 관한 고찰
6인문사회71960신기석조선국의 미구파사에 대한 청국의 간섭
7인문사회81961김태길평가판단의 논리에 관하여.
8인문사회91961남광우사동,피동형의 역사적 고찰
9인문사회101961김방한한.몽대역어휘집 관한 연구
구분순번년도저자제목
1117자연과학5902023황병국 외식물의 비생물적 스트레스내성발현에서 고추 아스코르베이트 퍼옥시다아제1(CaPOA1)의 기능적 역할
1118자연과학5912023박세희History of the Metatheorem in Ordered Fixed Point Theory
1119자연과학5922023기우항Semi-invariant Submanifolds in a Complex Space Form Satisfying ∇
1120자연과학5932023우경식국제해양시추탐사프로그램(IODP)의 중요성과 대한민국
1121자연과학5942023김영중 외천연물의약품 산업 혁신을 위한 정책제안 연구
1122자연과학5952023송진원차세대염기서열분석법을 이용한 한타바이러스 감염 정밀 분석
1123자연과학5962023한재용 외조류 원시생식세포 매개의 유전자 편집 및 활용
1124자연과학5972023김용균 외꽃노랑총채벌레(Frankliniella occidentalis)의 주간행동을 중개하는 일일리듬 유전자의 발현 양상
1125자연과학5982023백종범인류의 삶에 가장 중요한 화학반응: 암모니아 합성
1126자연과학5992023최병인간세포암과 전암성 병변: 최신 영상진단