Overview

Dataset statistics

Number of variables4
Number of observations231
Missing cells31
Missing cells (%)3.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.3 KiB
Average record size in memory32.6 B

Variable types

Text3
Categorical1

Dataset

Description기록물의 기능, 보존, 시설, 분배, 열람, 소독, 색인, 분류, 훼손, 폐기, 관계, 코드, 공개 등에 있어서의 기본 관리 규칙 정보로 제공항목은 대분류코드,대분류명,설명,등록일자 입니다.
Author법무부
URLhttps://www.data.go.kr/data/15042254/fileData.do

Alerts

등록일자 is highly imbalanced (84.2%)Imbalance
설명 has 31 (13.4%) missing valuesMissing
대분류코드 has unique valuesUnique

Reproduction

Analysis started2023-12-12 13:40:22.404811
Analysis finished2023-12-12 13:40:23.068202
Duration0.66 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

대분류코드
Text

UNIQUE 

Distinct231
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
2023-12-12T22:40:23.526341image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters924
Distinct characters22
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique231 ?
Unique (%)100.0%

Sample

1st rowAD01
2nd rowAD02
3rd rowAD03
4th rowAD04
5th rowBA01
ValueCountFrequency (%)
ad01 1
 
0.4%
st27 1
 
0.4%
sr09 1
 
0.4%
st01 1
 
0.4%
st02 1
 
0.4%
st03 1
 
0.4%
st04 1
 
0.4%
st05 1
 
0.4%
st06 1
 
0.4%
st07 1
 
0.4%
Other values (221) 221
95.7%
2023-12-12T22:40:24.213234image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
R 121
13.1%
D 114
12.3%
0 95
10.3%
1 81
 
8.8%
Z 66
 
7.1%
2 60
 
6.5%
T 50
 
5.4%
3 41
 
4.4%
S 37
 
4.0%
4 35
 
3.8%
Other values (12) 224
24.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 462
50.0%
Decimal Number 462
50.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R 121
26.2%
D 114
24.7%
Z 66
14.3%
T 50
10.8%
S 37
 
8.0%
F 28
 
6.1%
K 22
 
4.8%
A 7
 
1.5%
G 5
 
1.1%
V 5
 
1.1%
Other values (2) 7
 
1.5%
Decimal Number
ValueCountFrequency (%)
0 95
20.6%
1 81
17.5%
2 60
13.0%
3 41
8.9%
4 35
 
7.6%
5 33
 
7.1%
6 31
 
6.7%
8 30
 
6.5%
9 29
 
6.3%
7 27
 
5.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 462
50.0%
Common 462
50.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
R 121
26.2%
D 114
24.7%
Z 66
14.3%
T 50
10.8%
S 37
 
8.0%
F 28
 
6.1%
K 22
 
4.8%
A 7
 
1.5%
G 5
 
1.1%
V 5
 
1.1%
Other values (2) 7
 
1.5%
Common
ValueCountFrequency (%)
0 95
20.6%
1 81
17.5%
2 60
13.0%
3 41
8.9%
4 35
 
7.6%
5 33
 
7.1%
6 31
 
6.7%
8 30
 
6.5%
9 29
 
6.3%
7 27
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 924
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
R 121
13.1%
D 114
12.3%
0 95
10.3%
1 81
 
8.8%
Z 66
 
7.1%
2 60
 
6.5%
T 50
 
5.4%
3 41
 
4.4%
S 37
 
4.0%
4 35
 
3.8%
Other values (12) 224
24.2%
Distinct225
Distinct (%)97.4%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
2023-12-12T22:40:24.595570image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length13
Mean length7.2640693
Min length4

Characters and Unicode

Total characters1678
Distinct characters186
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique219 ?
Unique (%)94.8%

Sample

1st row행정박물관리대상유형
2nd row행정박물형태
3rd row행정박물재질
4th row행정박물진행상태
5th row작업구분
ValueCountFrequency (%)
rfid 3
 
1.2%
보존기간 3
 
1.2%
저장매체 2
 
0.8%
사용여부 2
 
0.8%
단위업무전송여부 2
 
0.8%
작업구분 2
 
0.8%
공개재분류진행상태 2
 
0.8%
인수상태구분코드 2
 
0.8%
리더기 2
 
0.8%
사용자구분 1
 
0.4%
Other values (222) 222
91.4%
2023-12-12T22:40:25.109316image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
65
 
3.9%
64
 
3.8%
62
 
3.7%
61
 
3.6%
60
 
3.6%
56
 
3.3%
55
 
3.3%
47
 
2.8%
43
 
2.6%
42
 
2.5%
Other values (176) 1123
66.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1633
97.3%
Uppercase Letter 21
 
1.3%
Space Separator 12
 
0.7%
Other Punctuation 6
 
0.4%
Dash Punctuation 2
 
0.1%
Open Punctuation 2
 
0.1%
Close Punctuation 2
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
65
 
4.0%
64
 
3.9%
62
 
3.8%
61
 
3.7%
60
 
3.7%
56
 
3.4%
55
 
3.4%
47
 
2.9%
43
 
2.6%
42
 
2.6%
Other values (165) 1078
66.0%
Uppercase Letter
ValueCountFrequency (%)
F 6
28.6%
M 4
19.0%
R 4
19.0%
I 3
14.3%
D 3
14.3%
B 1
 
4.8%
Space Separator
ValueCountFrequency (%)
12
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 6
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%
Open Punctuation
ValueCountFrequency (%)
[ 2
100.0%
Close Punctuation
ValueCountFrequency (%)
] 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1633
97.3%
Common 24
 
1.4%
Latin 21
 
1.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
65
 
4.0%
64
 
3.9%
62
 
3.8%
61
 
3.7%
60
 
3.7%
56
 
3.4%
55
 
3.4%
47
 
2.9%
43
 
2.6%
42
 
2.6%
Other values (165) 1078
66.0%
Latin
ValueCountFrequency (%)
F 6
28.6%
M 4
19.0%
R 4
19.0%
I 3
14.3%
D 3
14.3%
B 1
 
4.8%
Common
ValueCountFrequency (%)
12
50.0%
/ 6
25.0%
- 2
 
8.3%
[ 2
 
8.3%
] 2
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1633
97.3%
ASCII 45
 
2.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
65
 
4.0%
64
 
3.9%
62
 
3.8%
61
 
3.7%
60
 
3.7%
56
 
3.4%
55
 
3.4%
47
 
2.9%
43
 
2.6%
42
 
2.6%
Other values (165) 1078
66.0%
ASCII
ValueCountFrequency (%)
12
26.7%
/ 6
13.3%
F 6
13.3%
M 4
 
8.9%
R 4
 
8.9%
I 3
 
6.7%
D 3
 
6.7%
- 2
 
4.4%
[ 2
 
4.4%
] 2
 
4.4%

설명
Text

MISSING 

Distinct193
Distinct (%)96.5%
Missing31
Missing (%)13.4%
Memory size1.9 KiB
2023-12-12T22:40:25.400755image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length67
Median length47
Mean length19.82
Min length4

Characters and Unicode

Total characters3964
Distinct characters299
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique190 ?
Unique (%)95.0%

Sample

1st row01. 관인류 ,02. 견본류 ,03. 상징류 ,04. 기념류 ,05. 상장,훈장류 ,06. 사무집기류 ,07. 기타
2nd row01.관인 ,02.현판 ,03.기 ,04.휘호 ,05.트로피 ,06.수치 등
3rd row01.금속 ,02.석재 ,03.유리 ,04.자기 ,05.목재 ,06.종이 등
4th row행정박물 이관처리진행상태
5th row생산현황통보전송파일데이터구분
ValueCountFrequency (%)
관리 63
 
9.1%
구분 13
 
1.9%
대한 13
 
1.9%
9
 
1.3%
상태 8
 
1.2%
유형 8
 
1.2%
기록물 6
 
0.9%
오류 6
 
0.9%
공용 5
 
0.7%
상태를 5
 
0.7%
Other values (466) 559
80.4%
2023-12-12T22:40:25.878502image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
497
 
12.5%
0 123
 
3.1%
: 119
 
3.0%
106
 
2.7%
, 104
 
2.6%
97
 
2.4%
96
 
2.4%
69
 
1.7%
. 66
 
1.7%
59
 
1.5%
Other values (289) 2628
66.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2708
68.3%
Space Separator 497
 
12.5%
Decimal Number 306
 
7.7%
Other Punctuation 301
 
7.6%
Lowercase Letter 35
 
0.9%
Close Punctuation 32
 
0.8%
Open Punctuation 32
 
0.8%
Uppercase Letter 32
 
0.8%
Dash Punctuation 13
 
0.3%
Math Symbol 8
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
106
 
3.9%
97
 
3.6%
96
 
3.5%
69
 
2.5%
59
 
2.2%
54
 
2.0%
50
 
1.8%
46
 
1.7%
45
 
1.7%
45
 
1.7%
Other values (245) 2041
75.4%
Uppercase Letter
ValueCountFrequency (%)
N 6
18.8%
M 5
15.6%
F 4
12.5%
E 3
9.4%
O 3
9.4%
Y 3
9.4%
D 2
 
6.2%
A 1
 
3.1%
C 1
 
3.1%
I 1
 
3.1%
Other values (3) 3
9.4%
Decimal Number
ValueCountFrequency (%)
0 123
40.2%
1 58
19.0%
2 45
 
14.7%
3 31
 
10.1%
4 23
 
7.5%
5 12
 
3.9%
6 5
 
1.6%
9 3
 
1.0%
7 3
 
1.0%
8 3
 
1.0%
Lowercase Letter
ValueCountFrequency (%)
d 5
14.3%
t 5
14.3%
l 5
14.3%
u 5
14.3%
a 5
14.3%
f 5
14.3%
e 5
14.3%
Other Punctuation
ValueCountFrequency (%)
: 119
39.5%
, 104
34.6%
. 66
21.9%
/ 10
 
3.3%
· 2
 
0.7%
Math Symbol
ValueCountFrequency (%)
~ 4
50.0%
= 3
37.5%
+ 1
 
12.5%
Close Punctuation
ValueCountFrequency (%)
) 31
96.9%
] 1
 
3.1%
Open Punctuation
ValueCountFrequency (%)
( 31
96.9%
[ 1
 
3.1%
Space Separator
ValueCountFrequency (%)
497
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2708
68.3%
Common 1189
30.0%
Latin 67
 
1.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
106
 
3.9%
97
 
3.6%
96
 
3.5%
69
 
2.5%
59
 
2.2%
54
 
2.0%
50
 
1.8%
46
 
1.7%
45
 
1.7%
45
 
1.7%
Other values (245) 2041
75.4%
Common
ValueCountFrequency (%)
497
41.8%
0 123
 
10.3%
: 119
 
10.0%
, 104
 
8.7%
. 66
 
5.6%
1 58
 
4.9%
2 45
 
3.8%
3 31
 
2.6%
) 31
 
2.6%
( 31
 
2.6%
Other values (14) 84
 
7.1%
Latin
ValueCountFrequency (%)
N 6
 
9.0%
M 5
 
7.5%
d 5
 
7.5%
t 5
 
7.5%
l 5
 
7.5%
u 5
 
7.5%
a 5
 
7.5%
f 5
 
7.5%
e 5
 
7.5%
F 4
 
6.0%
Other values (10) 17
25.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2708
68.3%
ASCII 1254
31.6%
None 2
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
497
39.6%
0 123
 
9.8%
: 119
 
9.5%
, 104
 
8.3%
. 66
 
5.3%
1 58
 
4.6%
2 45
 
3.6%
3 31
 
2.5%
) 31
 
2.5%
( 31
 
2.5%
Other values (33) 149
 
11.9%
Hangul
ValueCountFrequency (%)
106
 
3.9%
97
 
3.6%
96
 
3.5%
69
 
2.5%
59
 
2.2%
54
 
2.0%
50
 
1.8%
46
 
1.7%
45
 
1.7%
45
 
1.7%
Other values (245) 2041
75.4%
None
ValueCountFrequency (%)
· 2
100.0%

등록일자
Categorical

IMBALANCE 

Distinct4
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size1.9 KiB
2013-01-28
221 
2013-12-09
 
5
2015-10-02
 
4
2014-12-01
 
1

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row2013-01-28
2nd row2013-01-28
3rd row2013-01-28
4th row2013-01-28
5th row2013-01-28

Common Values

ValueCountFrequency (%)
2013-01-28 221
95.7%
2013-12-09 5
 
2.2%
2015-10-02 4
 
1.7%
2014-12-01 1
 
0.4%

Length

2023-12-12T22:40:26.070666image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:40:26.198506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2013-01-28 221
95.7%
2013-12-09 5
 
2.2%
2015-10-02 4
 
1.7%
2014-12-01 1
 
0.4%

Missing values

2023-12-12T22:40:22.847130image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T22:40:23.009071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

대분류코드대분류명설명등록일자
0AD01행정박물관리대상유형01. 관인류 ,02. 견본류 ,03. 상징류 ,04. 기념류 ,05. 상장,훈장류 ,06. 사무집기류 ,07. 기타2013-01-28
1AD02행정박물형태01.관인 ,02.현판 ,03.기 ,04.휘호 ,05.트로피 ,06.수치 등2013-01-28
2AD03행정박물재질01.금속 ,02.석재 ,03.유리 ,04.자기 ,05.목재 ,06.종이 등2013-01-28
3AD04행정박물진행상태행정박물 이관처리진행상태2013-01-28
4BA01작업구분<NA>2013-01-28
5BA02작업상태<NA>2013-01-28
6BA03작업종료상태<NA>2013-01-28
7CR01생산현황통보전송파일데이터구분생산현황통보전송파일데이터구분2013-01-28
8CR02이관전송파일데이터구분이관전송파일데이터구분2013-01-28
9CR03생산현황통보서식국가기록원으로 통보하는 19종의 서식파일2013-01-28
대분류코드대분류명설명등록일자
221ZZ24선정여부<NA>2013-01-28
222ZZ25지정여부<NA>2013-01-28
223ZZ26온라인신청여부<NA>2013-01-28
224ZZ27보존기간 변경 사유<NA>2013-01-28
225ZZ28보존기간 재조정 상태<NA>2013-01-28
226ZZ29단위업무작업유형코드<NA>2013-01-28
227ZZ30단위업무수정항목코드<NA>2013-01-28
228ZZ31단위업무전송여부단위업무 작업 후 전자문서시스템으로 배포여부2013-01-28
229ZZ32단위업무현존여부단위업무의 현존 여부2013-01-28
230ZZ33단위업무승인신청상태단위업무 CAMS 승인신청 상태2013-01-28