Overview

Dataset statistics

Number of variables8
Number of observations33
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.3 KiB
Average record size in memory71.0 B

Variable types

Numeric2
Categorical3
Text2
Boolean1

Dataset

Description국립중앙과학관 홈페이지에 있는 과학학습콘텐츠의 빅데이타 목록입니다. 데이터 항목명: 고유 아이디, 대분류코드, 중분류코드, 컨텐츠명, 이름, 감수자, 공개유무, 첨부파일 ※ 대전광역시 유성구 대덕대로 481(국립중앙과학관)
URLhttps://www.data.go.kr/data/15067837/fileData.do

Alerts

대분류코드 has constant value ""Constant
감수자 has constant value ""Constant
공개유무 has constant value ""Constant
고유 아이디 is highly overall correlated with 중분류코드 and 1 other fieldsHigh correlation
중분류코드 is highly overall correlated with 고유 아이디 and 1 other fieldsHigh correlation
이름 is highly overall correlated with 고유 아이디 and 1 other fieldsHigh correlation
고유 아이디 has unique valuesUnique
컨텐츠명 has unique valuesUnique
첨부파일 has unique valuesUnique

Reproduction

Analysis started2023-12-12 11:09:12.920796
Analysis finished2023-12-12 11:09:14.491823
Duration1.57 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

고유 아이디
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct33
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1217
Minimum1201
Maximum1233
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size429.0 B
2023-12-12T20:09:14.605825image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1201
5-th percentile1202.6
Q11209
median1217
Q31225
95-th percentile1231.4
Maximum1233
Range32
Interquartile range (IQR)16

Descriptive statistics

Standard deviation9.6695398
Coefficient of variation (CV)0.0079453901
Kurtosis-1.2
Mean1217
Median Absolute Deviation (MAD)8
Skewness0
Sum40161
Variance93.5
MonotonicityNot monotonic
2023-12-12T20:09:14.837841image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
1201 1
 
3.0%
1216 1
 
3.0%
1217 1
 
3.0%
1232 1
 
3.0%
1222 1
 
3.0%
1226 1
 
3.0%
1227 1
 
3.0%
1215 1
 
3.0%
1218 1
 
3.0%
1210 1
 
3.0%
Other values (23) 23
69.7%
ValueCountFrequency (%)
1201 1
3.0%
1202 1
3.0%
1203 1
3.0%
1204 1
3.0%
1205 1
3.0%
1206 1
3.0%
1207 1
3.0%
1208 1
3.0%
1209 1
3.0%
1210 1
3.0%
ValueCountFrequency (%)
1233 1
3.0%
1232 1
3.0%
1231 1
3.0%
1230 1
3.0%
1229 1
3.0%
1228 1
3.0%
1227 1
3.0%
1226 1
3.0%
1225 1
3.0%
1224 1
3.0%

대분류코드
Categorical

CONSTANT 

Distinct1
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size396.0 B
1004
33 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1004
2nd row1004
3rd row1004
4th row1004
5th row1004

Common Values

ValueCountFrequency (%)
1004 33
100.0%

Length

2023-12-12T20:09:15.066168image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:09:15.253577image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1004 33
100.0%

중분류코드
Real number (ℝ)

HIGH CORRELATION 

Distinct6
Distinct (%)18.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1011.6061
Minimum1009
Maximum1017
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size429.0 B
2023-12-12T20:09:15.557721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1009
5-th percentile1009.6
Q11010
median1011
Q31012
95-th percentile1017
Maximum1017
Range8
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.4992423
Coefficient of variation (CV)0.0024705687
Kurtosis0.9678869
Mean1011.6061
Median Absolute Deviation (MAD)1
Skewness1.4900543
Sum33383
Variance6.2462121
MonotonicityNot monotonic
2023-12-12T20:09:15.809872image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
1010 13
39.4%
1011 8
24.2%
1017 5
 
15.2%
1012 3
 
9.1%
1009 2
 
6.1%
1013 2
 
6.1%
ValueCountFrequency (%)
1009 2
 
6.1%
1010 13
39.4%
1011 8
24.2%
1012 3
 
9.1%
1013 2
 
6.1%
1017 5
 
15.2%
ValueCountFrequency (%)
1017 5
 
15.2%
1013 2
 
6.1%
1012 3
 
9.1%
1011 8
24.2%
1010 13
39.4%
1009 2
 
6.1%

컨텐츠명
Text

UNIQUE 

Distinct33
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size396.0 B
2023-12-12T20:09:16.207085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length36
Median length26
Mean length11.030303
Min length1

Characters and Unicode

Total characters364
Distinct characters91
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)100.0%

Sample

1st row빅데이터란?
2nd rowMPI(Message Passing Interface)
3rd rowQoobah
4th rowBatch
5th rowCEP
ValueCountFrequency (%)
빅데이터와 3
 
4.7%
마이닝 3
 
4.7%
system 2
 
3.1%
2
 
3.1%
file 2
 
3.1%
zookeeper 1
 
1.6%
아파치 1
 
1.6%
재단 1
 
1.6%
iris 1
 
1.6%
지능형 1
 
1.6%
Other values (47) 47
73.4%
2023-12-12T20:09:16.873931image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
31
 
8.5%
e 24
 
6.6%
o 20
 
5.5%
a 17
 
4.7%
l 13
 
3.6%
10
 
2.7%
t 10
 
2.7%
S 10
 
2.7%
s 10
 
2.7%
i 10
 
2.7%
Other values (81) 209
57.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 177
48.6%
Other Letter 82
22.5%
Uppercase Letter 60
 
16.5%
Space Separator 31
 
8.5%
Close Punctuation 4
 
1.1%
Open Punctuation 4
 
1.1%
Other Punctuation 4
 
1.1%
Decimal Number 2
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10
 
12.2%
7
 
8.5%
7
 
8.5%
6
 
7.3%
3
 
3.7%
3
 
3.7%
3
 
3.7%
3
 
3.7%
2
 
2.4%
2
 
2.4%
Other values (34) 36
43.9%
Lowercase Letter
ValueCountFrequency (%)
e 24
13.6%
o 20
11.3%
a 17
 
9.6%
l 13
 
7.3%
t 10
 
5.6%
s 10
 
5.6%
i 10
 
5.6%
u 8
 
4.5%
r 8
 
4.5%
p 8
 
4.5%
Other values (13) 49
27.7%
Uppercase Letter
ValueCountFrequency (%)
S 10
16.7%
I 8
13.3%
F 5
8.3%
B 5
8.3%
H 5
8.3%
M 5
8.3%
P 5
8.3%
R 4
 
6.7%
D 3
 
5.0%
G 2
 
3.3%
Other values (6) 8
13.3%
Other Punctuation
ValueCountFrequency (%)
/ 2
50.0%
? 1
25.0%
, 1
25.0%
Decimal Number
ValueCountFrequency (%)
3 1
50.0%
4 1
50.0%
Space Separator
ValueCountFrequency (%)
31
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 237
65.1%
Hangul 82
 
22.5%
Common 45
 
12.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10
 
12.2%
7
 
8.5%
7
 
8.5%
6
 
7.3%
3
 
3.7%
3
 
3.7%
3
 
3.7%
3
 
3.7%
2
 
2.4%
2
 
2.4%
Other values (34) 36
43.9%
Latin
ValueCountFrequency (%)
e 24
 
10.1%
o 20
 
8.4%
a 17
 
7.2%
l 13
 
5.5%
t 10
 
4.2%
S 10
 
4.2%
s 10
 
4.2%
i 10
 
4.2%
I 8
 
3.4%
u 8
 
3.4%
Other values (29) 107
45.1%
Common
ValueCountFrequency (%)
31
68.9%
) 4
 
8.9%
( 4
 
8.9%
/ 2
 
4.4%
? 1
 
2.2%
3 1
 
2.2%
, 1
 
2.2%
4 1
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 282
77.5%
Hangul 82
 
22.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
31
 
11.0%
e 24
 
8.5%
o 20
 
7.1%
a 17
 
6.0%
l 13
 
4.6%
t 10
 
3.5%
S 10
 
3.5%
s 10
 
3.5%
i 10
 
3.5%
I 8
 
2.8%
Other values (37) 129
45.7%
Hangul
ValueCountFrequency (%)
10
 
12.2%
7
 
8.5%
7
 
8.5%
6
 
7.3%
3
 
3.7%
3
 
3.7%
3
 
3.7%
3
 
3.7%
2
 
2.4%
2
 
2.4%
Other values (34) 36
43.9%

이름
Categorical

HIGH CORRELATION 

Distinct15
Distinct (%)45.5%
Missing0
Missing (%)0.0%
Memory size396.0 B
분산처리기술 분석 기술
빅데이터 활용&관련 기술
해외 빅데이터 플랫폼
국내 빅데이터 플랫폼
빅데이터 융합기술
Other values (10)
12 

Length

Max length14
Median length12
Mean length9.5151515
Min length2

Unique

Unique8 ?
Unique (%)24.2%

Sample

1st row개요
2nd row분산처리기술 분석 기술
3rd row국내 빅데이터 플랫폼
4th row아키텍처
5th row아키텍처

Common Values

ValueCountFrequency (%)
분산처리기술 분석 기술 6
18.2%
빅데이터 활용&관련 기술 5
15.2%
해외 빅데이터 플랫폼 4
12.1%
국내 빅데이터 플랫폼 3
9.1%
빅데이터 융합기술 3
9.1%
아키텍처 2
 
6.1%
저장방법 2
 
6.1%
개요 1
 
3.0%
수집- 로그수집 agent 1
 
3.0%
특징 1
 
3.0%
Other values (5) 5
15.2%

Length

2023-12-12T20:09:17.111977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
빅데이터 18
22.2%
기술 12
14.8%
플랫폼 8
9.9%
분산처리기술 6
 
7.4%
분석 6
 
7.4%
활용&관련 5
 
6.2%
해외 4
 
4.9%
국내 3
 
3.7%
융합기술 3
 
3.7%
아키텍처 2
 
2.5%
Other values (12) 14
17.3%

감수자
Categorical

CONSTANT 

Distinct1
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size396.0 B
여상수 교수
33 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row여상수 교수
2nd row여상수 교수
3rd row여상수 교수
4th row여상수 교수
5th row여상수 교수

Common Values

ValueCountFrequency (%)
여상수 교수 33
100.0%

Length

2023-12-12T20:09:17.316232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:09:17.516611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
여상수 33
50.0%
교수 33
50.0%

공개유무
Boolean

CONSTANT 

Distinct1
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size165.0 B
True
33 
ValueCountFrequency (%)
True 33
100.0%
2023-12-12T20:09:17.662432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

첨부파일
Text

UNIQUE 

Distinct33
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size396.0 B
2023-12-12T20:09:18.216520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length70
Median length70
Mean length70
Min length70

Characters and Unicode

Total characters2310
Distinct characters36
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)100.0%

Sample

1st rowhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_01.pdf
2nd rowhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_10.pdf
3rd rowhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_23.pdf
4th rowhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_03.pdf
5th rowhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_04.pdf
ValueCountFrequency (%)
https://smart.science.go.kr/upload_data/subject/bigdata/pdf/b_e_01.pdf 1
 
3.0%
https://smart.science.go.kr/upload_data/subject/bigdata/pdf/b_e_21.pdf 1
 
3.0%
https://smart.science.go.kr/upload_data/subject/bigdata/pdf/b_e_33.pdf 1
 
3.0%
https://smart.science.go.kr/upload_data/subject/bigdata/pdf/b_e_31.pdf 1
 
3.0%
https://smart.science.go.kr/upload_data/subject/bigdata/pdf/b_e_30.pdf 1
 
3.0%
https://smart.science.go.kr/upload_data/subject/bigdata/pdf/b_e_29.pdf 1
 
3.0%
https://smart.science.go.kr/upload_data/subject/bigdata/pdf/b_e_24.pdf 1
 
3.0%
https://smart.science.go.kr/upload_data/subject/bigdata/pdf/b_e_18.pdf 1
 
3.0%
https://smart.science.go.kr/upload_data/subject/bigdata/pdf/b_e_16.pdf 1
 
3.0%
https://smart.science.go.kr/upload_data/subject/bigdata/pdf/b_e_15.pdf 1
 
3.0%
Other values (23) 23
69.7%
2023-12-12T20:09:18.979378image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 231
 
10.0%
t 198
 
8.6%
a 198
 
8.6%
d 165
 
7.1%
. 132
 
5.7%
p 132
 
5.7%
s 132
 
5.7%
c 99
 
4.3%
e 99
 
4.3%
_ 99
 
4.3%
Other values (26) 825
35.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1683
72.9%
Other Punctuation 396
 
17.1%
Connector Punctuation 99
 
4.3%
Uppercase Letter 66
 
2.9%
Decimal Number 66
 
2.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 198
11.8%
a 198
11.8%
d 165
 
9.8%
p 132
 
7.8%
s 132
 
7.8%
c 99
 
5.9%
e 99
 
5.9%
b 66
 
3.9%
i 66
 
3.9%
r 66
 
3.9%
Other values (10) 462
27.5%
Decimal Number
ValueCountFrequency (%)
1 14
21.2%
2 14
21.2%
0 12
18.2%
3 8
12.1%
4 3
 
4.5%
5 3
 
4.5%
8 3
 
4.5%
6 3
 
4.5%
7 3
 
4.5%
9 3
 
4.5%
Other Punctuation
ValueCountFrequency (%)
/ 231
58.3%
. 132
33.3%
: 33
 
8.3%
Uppercase Letter
ValueCountFrequency (%)
B 33
50.0%
E 33
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 99
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1749
75.7%
Common 561
 
24.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 198
 
11.3%
a 198
 
11.3%
d 165
 
9.4%
p 132
 
7.5%
s 132
 
7.5%
c 99
 
5.7%
e 99
 
5.7%
b 66
 
3.8%
i 66
 
3.8%
r 66
 
3.8%
Other values (12) 528
30.2%
Common
ValueCountFrequency (%)
/ 231
41.2%
. 132
23.5%
_ 99
17.6%
: 33
 
5.9%
1 14
 
2.5%
2 14
 
2.5%
0 12
 
2.1%
3 8
 
1.4%
4 3
 
0.5%
5 3
 
0.5%
Other values (4) 12
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2310
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/ 231
 
10.0%
t 198
 
8.6%
a 198
 
8.6%
d 165
 
7.1%
. 132
 
5.7%
p 132
 
5.7%
s 132
 
5.7%
c 99
 
4.3%
e 99
 
4.3%
_ 99
 
4.3%
Other values (26) 825
35.7%

Interactions

2023-12-12T20:09:13.692452image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:09:13.331775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:09:13.883513image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:09:13.498738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T20:09:19.179240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
고유 아이디중분류코드컨텐츠명이름첨부파일
고유 아이디1.0000.9571.0000.8941.000
중분류코드0.9571.0001.0001.0001.000
컨텐츠명1.0001.0001.0001.0001.000
이름0.8941.0001.0001.0001.000
첨부파일1.0001.0001.0001.0001.000
2023-12-12T20:09:19.360119image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
고유 아이디중분류코드이름
고유 아이디1.0000.9160.549
중분류코드0.9161.0000.816
이름0.5490.8161.000

Missing values

2023-12-12T20:09:14.143824image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T20:09:14.390396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

고유 아이디대분류코드중분류코드컨텐츠명이름감수자공개유무첨부파일
0120110041009빅데이터란?개요여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_01.pdf
1121010041010MPI(Message Passing Interface)분산처리기술 분석 기술여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_10.pdf
2122310041011Qoobah국내 빅데이터 플랫폼여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_23.pdf
3120310041010Batch아키텍처여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_03.pdf
4120410041010CEP아키텍처여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_04.pdf
5120510041010Flume수집- 로그수집 agent여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_05.pdf
6120210041009빅데이터의 속성 3V, 4V특징여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_02.pdf
7120810041010GFS(Google File System)저장방법여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_08.pdf
8121110041010BSP(Bulk Synchronous Parallel)분산처리기술 분석 기술여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_11.pdf
9121210041010R분산처리기술 분석 기술여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_12.pdf
고유 아이디대분류코드중분류코드컨텐츠명이름감수자공개유무첨부파일
23122710041017실시간대용량 스트림 분석빅데이터 활용&관련 기술여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_27.pdf
24121510041010Oozie / Hcatalog / Zookeeper빅데이터 관리 기술여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_15.pdf
25121610041011IBM InfoSphere BigInsights빅데이터 기반 플랫폼여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_16.pdf
26121810041011Cloudera Impala해외 빅데이터 플랫폼여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_18.pdf
27122410041017텍스트 마이닝빅데이터 활용&관련 기술여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_24.pdf
28122910041012빅데이터와 경영빅데이터 융합기술여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_29.pdf
29123010041012빅데이터와 과학빅데이터 융합기술여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_30.pdf
30123110041012빅데이터와 건강빅데이터 융합기술여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_31.pdf
31123310041013빅데이터 표준화표준화 동향여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_33.pdf
32122810041017프로세스 마이닝빅데이터 활용&관련 기술여상수 교수Yhttps://smart.science.go.kr/upload_data/subject/bigdata/pdf/B_E_28.pdf