Overview

Dataset statistics

Number of variables4
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory390.6 KiB
Average record size in memory40.0 B

Variable types

Text3
DateTime1

Dataset

Description한국서부발전에서 생산된 문서정보를 제공합니다. 제공데이터는 문서제목,생성일,문서번호,담당부서 입니다. 데이터 예) [탈질설비]고압가스 저장시설 안전점검표,2019-01-01,평(환)-1,환경화학부
URLhttps://www.data.go.kr/data/15044425/fileData.do

Alerts

문서번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 09:53:39.447934
Analysis finished2023-12-12 09:53:40.534632
Duration1.09 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct9232
Distinct (%)92.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T18:53:40.867182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length76
Median length57
Mean length28.3679
Min length4

Characters and Unicode

Total characters283679
Distinct characters763
Distinct categories15 ?
Distinct scripts5 ?
Distinct blocks9 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8910 ?
Unique (%)89.1%

Sample

1st row태안 5,6호기 터빈 경상정비용 안전난간사다리 구매
2nd row[2023사업연도]내부회계관리제도 운영계획(안)
3rd row태안 1~4호기 LED 등기구 구매
4th row한국서부발전 전력거래 전문인력 Pool 특성화 교육 출강 요청
5th row‘22년도 발전용 연료유 정기재물조사 결과 보고
ValueCountFrequency (%)
1396
 
2.3%
시행 1362
 
2.3%
요청 1290
 
2.2%
알림 1252
 
2.1%
제출 1242
 
2.1%
구매 964
 
1.6%
결과 741
 
1.2%
2022년 717
 
1.2%
2023년 633
 
1.1%
태안 527
 
0.9%
Other values (12622) 49451
83.0%
2023-12-12T18:53:41.514220image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
49961
 
17.6%
2 8888
 
3.1%
5174
 
1.8%
4232
 
1.5%
3895
 
1.4%
0 3741
 
1.3%
3499
 
1.2%
3420
 
1.2%
) 3245
 
1.1%
( 3227
 
1.1%
Other values (753) 194397
68.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 183487
64.7%
Space Separator 49961
 
17.6%
Decimal Number 20770
 
7.3%
Uppercase Letter 8976
 
3.2%
Lowercase Letter 7989
 
2.8%
Close Punctuation 4715
 
1.7%
Open Punctuation 4697
 
1.7%
Other Punctuation 1575
 
0.6%
Dash Punctuation 555
 
0.2%
Math Symbol 517
 
0.2%
Other values (5) 437
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5174
 
2.8%
4232
 
2.3%
3895
 
2.1%
3499
 
1.9%
3420
 
1.9%
3158
 
1.7%
3098
 
1.7%
2928
 
1.6%
2921
 
1.6%
2711
 
1.5%
Other values (646) 148451
80.9%
Uppercase Letter
ValueCountFrequency (%)
C 1308
14.6%
S 969
 
10.8%
G 703
 
7.8%
P 702
 
7.8%
T 609
 
6.8%
I 519
 
5.8%
M 465
 
5.2%
O 385
 
4.3%
A 381
 
4.2%
B 376
 
4.2%
Other values (16) 2559
28.5%
Lowercase Letter
ValueCountFrequency (%)
e 1123
14.1%
a 754
 
9.4%
r 720
 
9.0%
l 632
 
7.9%
n 608
 
7.6%
o 581
 
7.3%
i 576
 
7.2%
t 471
 
5.9%
u 304
 
3.8%
s 262
 
3.3%
Other values (16) 1958
24.5%
Other Punctuation
ValueCountFrequency (%)
, 811
51.5%
· 299
 
19.0%
. 137
 
8.7%
# 129
 
8.2%
/ 73
 
4.6%
: 58
 
3.7%
& 28
 
1.8%
! 14
 
0.9%
7
 
0.4%
7
 
0.4%
Other values (4) 12
 
0.8%
Decimal Number
ValueCountFrequency (%)
2 8888
42.8%
0 3741
18.0%
3 2473
 
11.9%
1 2398
 
11.5%
4 805
 
3.9%
5 682
 
3.3%
8 549
 
2.6%
9 508
 
2.4%
7 400
 
1.9%
6 325
 
1.6%
Math Symbol
ValueCountFrequency (%)
~ 398
77.0%
101
 
19.5%
7
 
1.4%
4
 
0.8%
× 3
 
0.6%
2
 
0.4%
+ 1
 
0.2%
= 1
 
0.2%
Close Punctuation
ValueCountFrequency (%)
) 3245
68.8%
] 1123
 
23.8%
157
 
3.3%
156
 
3.3%
34
 
0.7%
Open Punctuation
ValueCountFrequency (%)
( 3227
68.7%
[ 1119
 
23.8%
161
 
3.4%
156
 
3.3%
34
 
0.7%
Other Symbol
ValueCountFrequency (%)
6
54.5%
3
27.3%
2
 
18.2%
Initial Punctuation
ValueCountFrequency (%)
201
95.3%
10
 
4.7%
Modifier Symbol
ValueCountFrequency (%)
` 99
92.5%
´ 8
 
7.5%
Final Punctuation
ValueCountFrequency (%)
87
89.7%
10
 
10.3%
Space Separator
ValueCountFrequency (%)
49961
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 555
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 183440
64.7%
Common 83225
29.3%
Latin 16963
 
6.0%
Han 49
 
< 0.1%
Greek 2
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5174
 
2.8%
4232
 
2.3%
3895
 
2.1%
3499
 
1.9%
3420
 
1.9%
3158
 
1.7%
3098
 
1.7%
2928
 
1.6%
2921
 
1.6%
2711
 
1.5%
Other values (629) 148404
80.9%
Common
ValueCountFrequency (%)
49961
60.0%
2 8888
 
10.7%
0 3741
 
4.5%
) 3245
 
3.9%
( 3227
 
3.9%
3 2473
 
3.0%
1 2398
 
2.9%
] 1123
 
1.3%
[ 1119
 
1.3%
, 811
 
1.0%
Other values (44) 6239
 
7.5%
Latin
ValueCountFrequency (%)
C 1308
 
7.7%
e 1123
 
6.6%
S 969
 
5.7%
a 754
 
4.4%
r 720
 
4.2%
G 703
 
4.1%
P 702
 
4.1%
l 632
 
3.7%
T 609
 
3.6%
n 608
 
3.6%
Other values (41) 8835
52.1%
Han
ValueCountFrequency (%)
16
32.7%
7
14.3%
4
 
8.2%
3
 
6.1%
3
 
6.1%
2
 
4.1%
2
 
4.1%
2
 
4.1%
1
 
2.0%
1
 
2.0%
Other values (8) 8
16.3%
Greek
ValueCountFrequency (%)
φ 2
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 183431
64.7%
ASCII 98734
34.8%
None 1024
 
0.4%
Punctuation 315
 
0.1%
Math Operators 101
 
< 0.1%
CJK 49
 
< 0.1%
Arrows 9
 
< 0.1%
CJK Compat 9
 
< 0.1%
Compat Jamo 7
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
49961
50.6%
2 8888
 
9.0%
0 3741
 
3.8%
) 3245
 
3.3%
( 3227
 
3.3%
3 2473
 
2.5%
1 2398
 
2.4%
C 1308
 
1.3%
e 1123
 
1.1%
] 1123
 
1.1%
Other values (73) 21247
21.5%
Hangul
ValueCountFrequency (%)
5174
 
2.8%
4232
 
2.3%
3895
 
2.1%
3499
 
1.9%
3420
 
1.9%
3158
 
1.7%
3098
 
1.7%
2928
 
1.6%
2921
 
1.6%
2711
 
1.5%
Other values (627) 148395
80.9%
None
ValueCountFrequency (%)
· 299
29.2%
161
15.7%
157
15.3%
156
15.2%
156
15.2%
34
 
3.3%
34
 
3.3%
´ 8
 
0.8%
7
 
0.7%
4
 
0.4%
Other values (4) 8
 
0.8%
Punctuation
ValueCountFrequency (%)
201
63.8%
87
27.6%
10
 
3.2%
10
 
3.2%
7
 
2.2%
Math Operators
ValueCountFrequency (%)
101
100.0%
CJK
ValueCountFrequency (%)
16
32.7%
7
14.3%
4
 
8.2%
3
 
6.1%
3
 
6.1%
2
 
4.1%
2
 
4.1%
2
 
4.1%
1
 
2.0%
1
 
2.0%
Other values (8) 8
16.3%
Compat Jamo
ValueCountFrequency (%)
7
100.0%
Arrows
ValueCountFrequency (%)
7
77.8%
2
 
22.2%
CJK Compat
ValueCountFrequency (%)
6
66.7%
3
33.3%
Distinct261
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2022-08-01 00:00:00
Maximum2023-06-30 00:00:00
2023-12-12T18:53:41.698511image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T18:53:41.842128image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

문서번호
Text

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T18:53:42.103266image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length14
Mean length11.1957
Min length5

Characters and Unicode

Total characters111957
Distinct characters104
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row태2(터)-95063
2nd row기획(예)-35982
3rd row태1(전)-57036
4th row발전(전)-85870
5th row평경(경)-4537
ValueCountFrequency (%)
태2(터)-95063 1
 
< 0.1%
인경(경)-25341 1
 
< 0.1%
태1(전)-23894 1
 
< 0.1%
태2(보)-79958 1
 
< 0.1%
김건(토)-14230 1
 
< 0.1%
구건-96280 1
 
< 0.1%
군기(기)-93904 1
 
< 0.1%
안전(산)-46790 1
 
< 0.1%
서(경)-54630 1
 
< 0.1%
태3운(계)-7560 1
 
< 0.1%
Other values (9990) 9990
99.9%
2023-12-12T18:53:42.538619image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 10000
 
8.9%
( 9749
 
8.7%
) 9749
 
8.7%
1 6720
 
6.0%
2 5753
 
5.1%
3 5374
 
4.8%
5 5140
 
4.6%
6 4894
 
4.4%
7 4865
 
4.3%
4 4847
 
4.3%
Other values (94) 44866
40.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 51787
46.3%
Other Letter 29443
26.3%
Dash Punctuation 10000
 
8.9%
Open Punctuation 9749
 
8.7%
Close Punctuation 9749
 
8.7%
Uppercase Letter 1229
 
1.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3962
 
13.5%
2383
 
8.1%
1809
 
6.1%
1301
 
4.4%
1299
 
4.4%
1168
 
4.0%
1138
 
3.9%
1036
 
3.5%
1029
 
3.5%
1015
 
3.4%
Other values (76) 13303
45.2%
Decimal Number
ValueCountFrequency (%)
1 6720
13.0%
2 5753
11.1%
3 5374
10.4%
5 5140
9.9%
6 4894
9.5%
7 4865
9.4%
4 4847
9.4%
8 4831
9.3%
9 4715
9.1%
0 4648
9.0%
Uppercase Letter
ValueCountFrequency (%)
I 553
45.0%
G 425
34.6%
T 128
 
10.4%
C 113
 
9.2%
L 10
 
0.8%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%
Open Punctuation
ValueCountFrequency (%)
( 9749
100.0%
Close Punctuation
ValueCountFrequency (%)
) 9749
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 81285
72.6%
Hangul 29443
 
26.3%
Latin 1229
 
1.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3962
 
13.5%
2383
 
8.1%
1809
 
6.1%
1301
 
4.4%
1299
 
4.4%
1168
 
4.0%
1138
 
3.9%
1036
 
3.5%
1029
 
3.5%
1015
 
3.4%
Other values (76) 13303
45.2%
Common
ValueCountFrequency (%)
- 10000
12.3%
( 9749
12.0%
) 9749
12.0%
1 6720
8.3%
2 5753
7.1%
3 5374
 
6.6%
5 5140
 
6.3%
6 4894
 
6.0%
7 4865
 
6.0%
4 4847
 
6.0%
Other values (3) 14194
17.5%
Latin
ValueCountFrequency (%)
I 553
45.0%
G 425
34.6%
T 128
 
10.4%
C 113
 
9.2%
L 10
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 82514
73.7%
Hangul 29443
 
26.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 10000
12.1%
( 9749
11.8%
) 9749
11.8%
1 6720
8.1%
2 5753
 
7.0%
3 5374
 
6.5%
5 5140
 
6.2%
6 4894
 
5.9%
7 4865
 
5.9%
4 4847
 
5.9%
Other values (8) 15423
18.7%
Hangul
ValueCountFrequency (%)
3962
 
13.5%
2383
 
8.1%
1809
 
6.1%
1301
 
4.4%
1299
 
4.4%
1168
 
4.0%
1138
 
3.9%
1036
 
3.5%
1029
 
3.5%
1015
 
3.4%
Other values (76) 13303
45.2%
Distinct127
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T18:53:42.860996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length5
Mean length4.6926
Min length3

Characters and Unicode

Total characters46926
Distinct characters136
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)0.1%

Sample

1st row터빈부
2nd row재무예산실
3rd row전기부
4th row전력거래실
5th row경영지원부
ValueCountFrequency (%)
경영지원부 927
 
9.3%
발전부 473
 
4.7%
전기부 423
 
4.2%
화공설비부 414
 
4.1%
계약부 340
 
3.4%
계측제어부 331
 
3.3%
기계부 306
 
3.1%
환경화학부 296
 
3.0%
안전품질부 260
 
2.6%
터빈부 222
 
2.2%
Other values (118) 6013
60.1%
2023-12-12T18:53:43.330457image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8632
 
18.4%
2954
 
6.3%
1858
 
4.0%
1832
 
3.9%
1523
 
3.2%
1407
 
3.0%
1306
 
2.8%
1090
 
2.3%
1030
 
2.2%
1025
 
2.2%
Other values (126) 24269
51.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 45549
97.1%
Uppercase Letter 1020
 
2.2%
Decimal Number 352
 
0.8%
Space Separator 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8632
 
19.0%
2954
 
6.5%
1858
 
4.1%
1832
 
4.0%
1523
 
3.3%
1407
 
3.1%
1306
 
2.9%
1090
 
2.4%
1030
 
2.3%
1025
 
2.3%
Other values (116) 22892
50.3%
Uppercase Letter
ValueCountFrequency (%)
C 366
35.9%
I 312
30.6%
T 266
26.1%
G 56
 
5.5%
L 10
 
1.0%
N 10
 
1.0%
Decimal Number
ValueCountFrequency (%)
1 181
51.4%
2 125
35.5%
3 46
 
13.1%
Space Separator
ValueCountFrequency (%)
5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 45549
97.1%
Latin 1020
 
2.2%
Common 357
 
0.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8632
 
19.0%
2954
 
6.5%
1858
 
4.1%
1832
 
4.0%
1523
 
3.3%
1407
 
3.1%
1306
 
2.9%
1090
 
2.4%
1030
 
2.3%
1025
 
2.3%
Other values (116) 22892
50.3%
Latin
ValueCountFrequency (%)
C 366
35.9%
I 312
30.6%
T 266
26.1%
G 56
 
5.5%
L 10
 
1.0%
N 10
 
1.0%
Common
ValueCountFrequency (%)
1 181
50.7%
2 125
35.0%
3 46
 
12.9%
5
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 45549
97.1%
ASCII 1377
 
2.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
8632
 
19.0%
2954
 
6.5%
1858
 
4.1%
1832
 
4.0%
1523
 
3.3%
1407
 
3.1%
1306
 
2.9%
1090
 
2.4%
1030
 
2.3%
1025
 
2.3%
Other values (116) 22892
50.3%
ASCII
ValueCountFrequency (%)
C 366
26.6%
I 312
22.7%
T 266
19.3%
1 181
13.1%
2 125
 
9.1%
G 56
 
4.1%
3 46
 
3.3%
L 10
 
0.7%
N 10
 
0.7%
5
 
0.4%

Missing values

2023-12-12T18:53:40.365832image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T18:53:40.483919image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

문서제목생성일문서번호담당부서
22622태안 5,6호기 터빈 경상정비용 안전난간사다리 구매2022-11-25태2(터)-95063터빈부
55383[2023사업연도]내부회계관리제도 운영계획(안)2023-04-24기획(예)-35982재무예산실
529태안 1~4호기 LED 등기구 구매2022-08-02태1(전)-57036전기부
17291한국서부발전 전력거래 전문인력 Pool 특성화 교육 출강 요청2022-11-01발전(전)-85870전력거래실
33789‘22년도 발전용 연료유 정기재물조사 결과 보고2023-01-16평경(경)-4537경영지원부
625392023년도 발전5사 정비적격기업(1차) 현장실사 계획 보고2023-05-30서(안)-46997안전품질부
16567업무지시서(근태-22-58) 발전기술원 근무변경 알림2022-10-27태3운(화)-84481화공설비부
2985최종방류구 Filter Backwash Water Transfer Pump 구매2022-08-17태1(보)-61050보일러부
538799,10호기 보일러 GAH Ash Blower 에어필터 구매2023-04-17태3(보)-33624보일러부
67608[IGCC발전처]’23년 설비보강 긴급공사 가동 전 점검 시행(안)2023-06-21태IG(공)-54920공정안전부
문서제목생성일문서번호담당부서
10596[9,10호기 수처리설비]RO Membrane 교체 결과 보고2022-09-27태3운(화)-74643화공설비부
50758견적의뢰(1호기 GAH Air MTR 및 Mixing HTR 하부 정비)2023-03-31태1(보)-28819보일러부
27838[회사 중요자료 유출방지를 위한]불시 보안점검 계획(안)2022-12-21정보(정)-103704정보보안실
37103계약의뢰(1~8호기 증기터빈 복수기 진공펌프 Internal Parts 구매)2023-02-01서기(기)-9209기계부
79282022년 8월 공시자료 점검관리대장 제출2022-09-14태경(경)-69124경영지원부
52762구매계획(1~4호기 원격조작형 차단기 인출입장치 구매)2023-04-11태경(계)-31872계약부
32120발전부산물(정제회 등) 판매대금 청구내역 및 미납내역 알림2023-01-09태경(경)-2191경영지원부
66677오만 마나 태양광 사업 대주단 보험자문계약 체결2023-06-19해외(사업)-53551해외사업실
120092022년 09월 천안청수 연료전지 열요금 청구 알림2022-10-05평복(전)-76901복합전기부
44981[태안 1∼8호기 탈황폐수 무방류설비]운전원 OJT 교육 결과보고(1분기)2023-03-08태1운(화)-20681화공설비부