Overview

Dataset statistics

Number of variables5
Number of observations220
Missing cells2
Missing cells (%)0.2%
Duplicate rows1
Duplicate rows (%)0.5%
Total size in memory8.7 KiB
Average record size in memory40.6 B

Variable types

Unsupported1
Text1
Categorical3

Dataset

Description통계청에서 발간하고 통계도서관에서 소장중인 통계청 발간 간행물 목록으로 단행본 및 연속간행물 목록으로 구성되어 있음
Author통계청
URLhttps://www.data.go.kr/data/15126791/fileData.do

Alerts

Dataset has 1 (0.5%) duplicate rowsDuplicates
Unnamed: 4 is highly overall correlated with Unnamed: 2 and 1 other fieldsHigh correlation
Unnamed: 3 is highly overall correlated with Unnamed: 2 and 1 other fieldsHigh correlation
Unnamed: 2 is highly overall correlated with Unnamed: 3 and 1 other fieldsHigh correlation
Unnamed: 2 is highly imbalanced (56.8%)Imbalance
Unnamed: 3 is highly imbalanced (57.9%)Imbalance
Unnamed: 4 is highly imbalanced (94.7%)Imbalance
통계청 발간 간행물 목록(단행본) is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-03-14 08:40:42.512461
Analysis finished2024-03-14 08:40:43.769638
Duration1.26 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

통계청 발간 간행물 목록(단행본)
Unsupported

REJECTED  UNSUPPORTED 

Missing1
Missing (%)0.5%
Memory size1.8 KiB
Distinct218
Distinct (%)99.5%
Missing1
Missing (%)0.5%
Memory size1.8 KiB
2024-03-14T17:40:44.594574image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length102
Median length65
Mean length36.547945
Min length2

Characters and Unicode

Total characters8004
Distinct characters309
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique217 ?
Unique (%)99.1%

Sample

1st row서명
2nd row(2023년)『청년패널조사』정기통계품질진단 결과보고서 = 2023 Regular Assessment Report
3rd row(2023년)『엔지니어링서비스업경영분석』정기통계품질진단 결과보고서 = 2023 Regular Assessment Report
4th row(2023년)『사망원인통계』정기통계품질진단 결과보고서 = 2023 Regular Assessment Report
5th row(2023)지역통계실무
ValueCountFrequency (%)
2023 120
 
11.1%
78
 
7.2%
결과보고서 69
 
6.4%
report 68
 
6.3%
assessment 67
 
6.2%
regular 67
 
6.2%
21
 
1.9%
이해 16
 
1.5%
맞춤형 16
 
1.5%
통계분석 14
 
1.3%
Other values (330) 544
50.4%
2024-03-14T17:40:46.073024image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
861
 
10.8%
2 577
 
7.2%
s 278
 
3.5%
) 277
 
3.5%
( 277
 
3.5%
0 275
 
3.4%
e 275
 
3.4%
3 271
 
3.4%
180
 
2.2%
177
 
2.2%
Other values (299) 4556
56.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3529
44.1%
Lowercase Letter 1416
17.7%
Decimal Number 1166
 
14.6%
Space Separator 861
 
10.8%
Close Punctuation 348
 
4.3%
Open Punctuation 348
 
4.3%
Uppercase Letter 237
 
3.0%
Math Symbol 69
 
0.9%
Other Punctuation 29
 
0.4%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
180
 
5.1%
177
 
5.0%
148
 
4.2%
115
 
3.3%
104
 
2.9%
98
 
2.8%
97
 
2.7%
96
 
2.7%
87
 
2.5%
81
 
2.3%
Other values (250) 2346
66.5%
Lowercase Letter
ValueCountFrequency (%)
s 278
19.6%
e 275
19.4%
t 143
10.1%
r 136
9.6%
n 75
 
5.3%
o 75
 
5.3%
a 73
 
5.2%
u 72
 
5.1%
p 70
 
4.9%
l 70
 
4.9%
Other values (6) 149
10.5%
Uppercase Letter
ValueCountFrequency (%)
R 141
59.5%
A 72
30.4%
S 11
 
4.6%
C 3
 
1.3%
I 2
 
0.8%
N 1
 
0.4%
T 1
 
0.4%
B 1
 
0.4%
E 1
 
0.4%
H 1
 
0.4%
Other values (3) 3
 
1.3%
Decimal Number
ValueCountFrequency (%)
2 577
49.5%
0 275
23.6%
3 271
23.2%
1 28
 
2.4%
4 6
 
0.5%
9 4
 
0.3%
6 3
 
0.3%
7 1
 
0.1%
5 1
 
0.1%
Other Punctuation
ValueCountFrequency (%)
· 18
62.1%
: 7
 
24.1%
& 2
 
6.9%
. 2
 
6.9%
Close Punctuation
ValueCountFrequency (%)
) 277
79.6%
71
 
20.4%
Open Punctuation
ValueCountFrequency (%)
( 277
79.6%
71
 
20.4%
Space Separator
ValueCountFrequency (%)
861
100.0%
Math Symbol
ValueCountFrequency (%)
= 69
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3529
44.1%
Common 2822
35.3%
Latin 1653
20.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
180
 
5.1%
177
 
5.0%
148
 
4.2%
115
 
3.3%
104
 
2.9%
98
 
2.8%
97
 
2.7%
96
 
2.7%
87
 
2.5%
81
 
2.3%
Other values (250) 2346
66.5%
Latin
ValueCountFrequency (%)
s 278
16.8%
e 275
16.6%
t 143
8.7%
R 141
8.5%
r 136
8.2%
n 75
 
4.5%
o 75
 
4.5%
a 73
 
4.4%
u 72
 
4.4%
A 72
 
4.4%
Other values (19) 313
18.9%
Common
ValueCountFrequency (%)
861
30.5%
2 577
20.4%
) 277
 
9.8%
( 277
 
9.8%
0 275
 
9.7%
3 271
 
9.6%
71
 
2.5%
71
 
2.5%
= 69
 
2.4%
1 28
 
1.0%
Other values (10) 45
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4315
53.9%
Hangul 3529
44.1%
None 160
 
2.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
861
20.0%
2 577
13.4%
s 278
 
6.4%
) 277
 
6.4%
( 277
 
6.4%
0 275
 
6.4%
e 275
 
6.4%
3 271
 
6.3%
t 143
 
3.3%
R 141
 
3.3%
Other values (36) 940
21.8%
Hangul
ValueCountFrequency (%)
180
 
5.1%
177
 
5.0%
148
 
4.2%
115
 
3.3%
104
 
2.9%
98
 
2.8%
97
 
2.7%
96
 
2.7%
87
 
2.5%
81
 
2.3%
Other values (250) 2346
66.5%
None
ValueCountFrequency (%)
71
44.4%
71
44.4%
· 18
 
11.2%

Unnamed: 2
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct7
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size1.8 KiB
통계청 . 통계교육원
113 
통계청
101 
통계교육원
 
2
<NA>
 
1
저자
 
1
Other values (2)
 
2

Length

Max length14
Median length11
Mean length7.2227273
Min length2

Unique

Unique4 ?
Unique (%)1.8%

Sample

1st row<NA>
2nd row저자
3rd row통계청
4th row통계청
5th row통계청

Common Values

ValueCountFrequency (%)
통계청 . 통계교육원 113
51.4%
통계청 101
45.9%
통계교육원 2
 
0.9%
<NA> 1
 
0.5%
저자 1
 
0.5%
통계청 . 경인지방통계청 1
 
0.5%
통계청 . 강원지방통계지청 1
 
0.5%

Length

2024-03-14T17:40:46.512488image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-14T17:40:46.872041image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
통계청 216
48.0%
115
25.6%
통계교육원 115
25.6%
na 1
 
0.2%
저자 1
 
0.2%
경인지방통계청 1
 
0.2%
강원지방통계지청 1
 
0.2%

Unnamed: 3
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct7
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size1.8 KiB
통계교육원
115 
통계청
100 
<NA>
 
1
출판사
 
1
통계청 : 경인통계청
 
1
Other values (2)
 
2

Length

Max length11
Median length5
Mean length4.1363636
Min length3

Unique

Unique5 ?
Unique (%)2.3%

Sample

1st row<NA>
2nd row출판사
3rd row통계청
4th row통계청
5th row통계청

Common Values

ValueCountFrequency (%)
통계교육원 115
52.3%
통계청 100
45.5%
<NA> 1
 
0.5%
출판사 1
 
0.5%
통계청 : 경인통계청 1
 
0.5%
통계청 통계교육원 1
 
0.5%
강원지방통계지청 1
 
0.5%

Length

2024-03-14T17:40:47.318828image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-14T17:40:47.710173image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
통계교육원 116
52.0%
통계청 102
45.7%
na 1
 
0.4%
출판사 1
 
0.4%
1
 
0.4%
경인통계청 1
 
0.4%
강원지방통계지청 1
 
0.4%

Unnamed: 4
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size1.8 KiB
2023
218 
<NA>
 
1
출판년
 
1

Length

Max length4
Median length4
Mean length3.9954545
Min length3

Unique

Unique2 ?
Unique (%)0.9%

Sample

1st row<NA>
2nd row출판년
3rd row2023
4th row2023
5th row2023

Common Values

ValueCountFrequency (%)
2023 218
99.1%
<NA> 1
 
0.5%
출판년 1
 
0.5%

Length

2024-03-14T17:40:48.111602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-14T17:40:48.434331image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2023 218
99.1%
na 1
 
0.5%
출판년 1
 
0.5%

Correlations

2024-03-14T17:40:48.637767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 2Unnamed: 3Unnamed: 4
Unnamed: 21.0000.9951.000
Unnamed: 30.9951.0001.000
Unnamed: 41.0001.0001.000
2024-03-14T17:40:48.882703image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 4Unnamed: 3Unnamed: 2
Unnamed: 41.0000.9910.991
Unnamed: 30.9911.0000.892
Unnamed: 20.9910.8921.000
2024-03-14T17:40:49.127480image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 2Unnamed: 3Unnamed: 4
Unnamed: 21.0000.8920.991
Unnamed: 30.8921.0000.991
Unnamed: 40.9910.9911.000

Missing values

2024-03-14T17:40:42.947776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-14T17:40:43.272081image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-14T17:40:43.585515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

통계청 발간 간행물 목록(단행본)Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4
0NaN<NA><NA><NA><NA>
1No.서명저자출판사출판년
21(2023년)『청년패널조사』정기통계품질진단 결과보고서 = 2023 Regular Assessment Report통계청통계청2023
32(2023년)『엔지니어링서비스업경영분석』정기통계품질진단 결과보고서 = 2023 Regular Assessment Report통계청통계청2023
43(2023년)『사망원인통계』정기통계품질진단 결과보고서 = 2023 Regular Assessment Report통계청통계청2023
54(2023)지역통계실무통계청 . 통계교육원통계교육원2023
65(2023)(제1기) 파이썬 중급 통계분석통계청 . 통계교육원통계교육원2023
76(2023년)(제1기) 국가통계실무 3통계청 . 통계교육원통계교육원2023
87(2023)(제1기) 재무제표통계청 . 통계교육원통계교육원2023
98(2023년)『보육실태조사』정기통계품질진단 결과보고서 = 2023 Regular Assessment Report통계청통계청2023
통계청 발간 간행물 목록(단행본)Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4
210209(2023)(제1기) R 중급 통계분석통계청 . 통계교육원통계교육원2023
211210(2023년) 고용통계의 이해통계청 . 통계교육원통계교육원2023
212211(2023) 통계를 활용한 통합사회 지도교사 연수통계청 . 통계교육원통계교육원2023
213212(2023) 실용통계 지도교사 통계교육 연수통계청 . 통계교육원통계교육원2023
214213(2023) 국민계정통계청 . 통계교육원통계교육원2023
215214(2023 지역통계 표준매뉴얼) 노인등록통계통계청통계청2023
216215(2023년)지역통계 우수사례집통계청통계청2023
217216(2023)(제1기)통계보고서 작성통계청 . 통계교육원통계교육원2023
218217(2023)(제2기)오피스를 활용한 데이터시각화(서울)통계청 . 통계교육원통계교육원2023
219218(2023년)(제1기) 국가통계실무 4통계청 . 통계교육원통계교육원2023

Duplicate rows

Most frequently occurring

Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4# duplicates
0(2023) 통계를 활용한 통합사회 지도교사 연수통계청 . 통계교육원통계교육원20232