Overview

Dataset statistics

Number of variables3
Number of observations2075
Missing cells0
Missing cells (%)0.0%
Duplicate rows70
Duplicate rows (%)3.4%
Total size in memory50.8 KiB
Average record size in memory25.1 B

Variable types

Text1
Categorical2

Dataset

Description수상구조사 시스템의 수상안전종합관리 첨부파일상세 데이터로 파일ID, 파일내용, 파일확장명 등의 항목을 제공합니다.
URLhttps://www.data.go.kr/data/15118345/fileData.do

Alerts

첨부파일내용(FILE_CN) has constant value ""Constant
Dataset has 70 (3.4%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 09:29:31.301902
Analysis finished2023-12-12 09:29:31.604262
Duration0.3 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct1907
Distinct (%)91.9%
Missing0
Missing (%)0.0%
Memory size16.3 KiB
2023-12-12T18:29:31.785904image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length20
Mean length20
Min length20

Characters and Unicode

Total characters41500
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1802 ?
Unique (%)86.8%

Sample

1st rowFILE_000000000000431
2nd rowFILE_000000000000441
3rd rowFILE_000000000000442
4th rowFILE_000000000000452
5th rowFILE_000000000000470
ValueCountFrequency (%)
file_000000000005110 4
 
0.2%
file_000000000005554 4
 
0.2%
file_000000000006139 4
 
0.2%
file_000000000003420 4
 
0.2%
file_000000000006138 4
 
0.2%
file_000000000000551 4
 
0.2%
file_000000000006150 4
 
0.2%
file_000000000004230 4
 
0.2%
file_000000000004412 4
 
0.2%
file_000000000004600 4
 
0.2%
Other values (1897) 2035
98.1%
2023-12-12T18:29:32.249715image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 23934
57.7%
F 2075
 
5.0%
I 2075
 
5.0%
L 2075
 
5.0%
E 2075
 
5.0%
_ 2075
 
5.0%
1 1279
 
3.1%
2 962
 
2.3%
4 949
 
2.3%
3 930
 
2.2%
Other values (5) 3071
 
7.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 31125
75.0%
Uppercase Letter 8300
 
20.0%
Connector Punctuation 2075
 
5.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 23934
76.9%
1 1279
 
4.1%
2 962
 
3.1%
4 949
 
3.0%
3 930
 
3.0%
5 903
 
2.9%
6 773
 
2.5%
7 515
 
1.7%
9 451
 
1.4%
8 429
 
1.4%
Uppercase Letter
ValueCountFrequency (%)
F 2075
25.0%
I 2075
25.0%
L 2075
25.0%
E 2075
25.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2075
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 33200
80.0%
Latin 8300
 
20.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 23934
72.1%
_ 2075
 
6.2%
1 1279
 
3.9%
2 962
 
2.9%
4 949
 
2.9%
3 930
 
2.8%
5 903
 
2.7%
6 773
 
2.3%
7 515
 
1.6%
9 451
 
1.4%
Latin
ValueCountFrequency (%)
F 2075
25.0%
I 2075
25.0%
L 2075
25.0%
E 2075
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 41500
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 23934
57.7%
F 2075
 
5.0%
I 2075
 
5.0%
L 2075
 
5.0%
E 2075
 
5.0%
_ 2075
 
5.0%
1 1279
 
3.1%
2 962
 
2.3%
4 949
 
2.3%
3 930
 
2.2%
Other values (5) 3071
 
7.4%

첨부파일내용(FILE_CN)
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.3 KiB
0
2075 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 2075
100.0%

Length

2023-12-12T18:29:32.440076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:29:32.572500image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 2075
100.0%
Distinct15
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size16.3 KiB
pdf
1093 
jpg
518 
hwp
200 
png
128 
jpeg
 
52
Other values (10)
 
84

Length

Max length4
Median length3
Mean length3.026506
Min length3

Unique

Unique5 ?
Unique (%)0.2%

Sample

1st rowjpg
2nd rowjsp
3rd rowjpg
4th rowjsp
5th rowjpg

Common Values

ValueCountFrequency (%)
pdf 1093
52.7%
jpg 518
25.0%
hwp 200
 
9.6%
png 128
 
6.2%
jpeg 52
 
2.5%
JPG 39
 
1.9%
PNG 29
 
1.4%
PDF 6
 
0.3%
bmp 3
 
0.1%
jsp 2
 
0.1%
Other values (5) 5
 
0.2%

Length

2023-12-12T18:29:32.681306image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pdf 1099
53.0%
jpg 557
26.8%
hwp 200
 
9.6%
png 157
 
7.6%
jpeg 52
 
2.5%
bmp 4
 
0.2%
jsp 2
 
0.1%
doc 1
 
< 0.1%
docx 1
 
< 0.1%
heic 1
 
< 0.1%

Missing values

2023-12-12T18:29:31.452629image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T18:29:31.557958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

첨부파일명(ATCH_FILE_ID)첨부파일내용(FILE_CN)첨부파일확장명(FILE_EXTSN)
0FILE_0000000000004310jpg
1FILE_0000000000004410jsp
2FILE_0000000000004420jpg
3FILE_0000000000004520jsp
4FILE_0000000000004700jpg
5FILE_0000000000004800jpg
6FILE_0000000000004810jpg
7FILE_0000000000004820jpg
8FILE_0000000000004980hwp
9FILE_0000000000004990hwp
첨부파일명(ATCH_FILE_ID)첨부파일내용(FILE_CN)첨부파일확장명(FILE_EXTSN)
2065FILE_0000000000067400pdf
2066FILE_0000000000067500hwp
2067FILE_0000000000067600pdf
2068FILE_0000000000067610pdf
2069FILE_0000000000067700pdf
2070FILE_0000000000067800pdf
2071FILE_0000000000067900png
2072FILE_0000000000067900png
2073FILE_0000000000067900png
2074FILE_0000000000068000jpg

Duplicate rows

Most frequently occurring

첨부파일명(ATCH_FILE_ID)첨부파일내용(FILE_CN)첨부파일확장명(FILE_EXTSN)# duplicates
0FILE_0000000000005510jpg4
47FILE_0000000000055540jpg4
59FILE_0000000000061380jpg4
60FILE_0000000000061390jpg4
61FILE_0000000000061500jpg4
22FILE_0000000000041700pdf3
27FILE_0000000000044120hwp3
28FILE_0000000000046000hwp3
30FILE_0000000000046110hwp3
36FILE_0000000000048910pdf3