Overview

Dataset statistics

Number of variables4
Number of observations3704
Missing cells0
Missing cells (%)0.0%
Duplicate rows55
Duplicate rows (%)1.5%
Total size in memory115.9 KiB
Average record size in memory32.0 B

Variable types

Text2
Categorical2

Dataset

Description국토지리정보원의 항공사진 관련 메타데이터 중 미디어현황 입니다. (미디어관리번호, 미디어종류, 영상종류 등 포함)
Author국토교통부 국토지리정보원
URLhttps://www.data.go.kr/data/15067537/fileData.do

Alerts

Dataset has 55 (1.5%) duplicate rowsDuplicates
미디어종류 is highly overall correlated with 영상종류High correlation
영상종류 is highly overall correlated with 미디어종류High correlation
미디어종류 is highly imbalanced (73.2%)Imbalance
영상종류 is highly imbalanced (92.5%)Imbalance

Reproduction

Analysis started2023-12-12 11:54:59.590708
Analysis finished2023-12-12 11:55:00.033180
Duration0.44 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct3619
Distinct (%)97.7%
Missing0
Missing (%)0.0%
Memory size29.1 KiB
2023-12-12T20:55:00.227530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length13
Mean length12.843952
Min length1

Characters and Unicode

Total characters47574
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3537 ?
Unique (%)95.5%

Sample

1st row2002110011005
2nd row2002110011006
3rd row2002110011007
4th row2002110011008
5th row2002110011009
ValueCountFrequency (%)
air-2007-003a 3
 
0.1%
12 3
 
0.1%
2 3
 
0.1%
2007040002001 2
 
0.1%
2007000017008 2
 
0.1%
2007040006001 2
 
0.1%
2007040006002 2
 
0.1%
2007040001001 2
 
0.1%
2007040001002 2
 
0.1%
2007040001003 2
 
0.1%
Other values (3609) 3681
99.4%
2023-12-12T20:55:00.723587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 22936
48.2%
1 5957
 
12.5%
9 4117
 
8.7%
2 3436
 
7.2%
8 2023
 
4.3%
3 1714
 
3.6%
5 1710
 
3.6%
4 1461
 
3.1%
7 1411
 
3.0%
6 1144
 
2.4%
Other values (17) 1665
 
3.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 45909
96.5%
Uppercase Letter 1069
 
2.2%
Dash Punctuation 590
 
1.2%
Lowercase Letter 6
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 22936
50.0%
1 5957
 
13.0%
9 4117
 
9.0%
2 3436
 
7.5%
8 2023
 
4.4%
3 1714
 
3.7%
5 1710
 
3.7%
4 1461
 
3.2%
7 1411
 
3.1%
6 1144
 
2.5%
Uppercase Letter
ValueCountFrequency (%)
A 295
27.6%
R 251
23.5%
I 209
19.6%
B 83
 
7.8%
D 44
 
4.1%
E 44
 
4.1%
M 44
 
4.1%
O 42
 
3.9%
T 42
 
3.9%
W 15
 
1.4%
Lowercase Letter
ValueCountFrequency (%)
y 1
16.7%
h 1
16.7%
g 1
16.7%
r 1
16.7%
f 1
16.7%
d 1
16.7%
Dash Punctuation
ValueCountFrequency (%)
- 590
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 46499
97.7%
Latin 1075
 
2.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 295
27.4%
R 251
23.3%
I 209
19.4%
B 83
 
7.7%
D 44
 
4.1%
E 44
 
4.1%
M 44
 
4.1%
O 42
 
3.9%
T 42
 
3.9%
W 15
 
1.4%
Other values (6) 6
 
0.6%
Common
ValueCountFrequency (%)
0 22936
49.3%
1 5957
 
12.8%
9 4117
 
8.9%
2 3436
 
7.4%
8 2023
 
4.4%
3 1714
 
3.7%
5 1710
 
3.7%
4 1461
 
3.1%
7 1411
 
3.0%
6 1144
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 47574
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 22936
48.2%
1 5957
 
12.5%
9 4117
 
8.7%
2 3436
 
7.2%
8 2023
 
4.3%
3 1714
 
3.6%
5 1710
 
3.6%
4 1461
 
3.1%
7 1411
 
3.0%
6 1144
 
2.4%
Other values (17) 1665
 
3.5%
Distinct64
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size29.1 KiB
2023-12-12T20:55:00.976389image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length7
Mean length6.99973
Min length6

Characters and Unicode

Total characters25927
Distinct characters20
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)0.4%

Sample

1st rowDA02-07
2nd rowDA02-07
3rd rowDA02-07
4th rowDA02-07
5th rowDA02-07
ValueCountFrequency (%)
da01-03 177
 
4.8%
da02-04 175
 
4.7%
da02-03 173
 
4.7%
da03-02 171
 
4.6%
da01-07 171
 
4.6%
da01-06 171
 
4.6%
da01-05 170
 
4.6%
da04-01 170
 
4.6%
da02-02 169
 
4.6%
da01-04 169
 
4.6%
Other values (54) 1988
53.7%
2023-12-12T20:55:01.448115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 7377
28.5%
D 3732
14.4%
- 3688
14.2%
A 3602
13.9%
2 1958
 
7.6%
1 1764
 
6.8%
3 1188
 
4.6%
4 817
 
3.2%
6 560
 
2.2%
5 412
 
1.6%
Other values (10) 829
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 14857
57.3%
Uppercase Letter 7379
28.5%
Dash Punctuation 3688
 
14.2%
Lowercase Letter 3
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 7377
49.7%
2 1958
 
13.2%
1 1764
 
11.9%
3 1188
 
8.0%
4 817
 
5.5%
6 560
 
3.8%
5 412
 
2.8%
7 403
 
2.7%
8 376
 
2.5%
9 2
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
D 3732
50.6%
A 3602
48.8%
O 42
 
0.6%
E 1
 
< 0.1%
W 1
 
< 0.1%
R 1
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
r 1
33.3%
f 1
33.3%
d 1
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 3688
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 18545
71.5%
Latin 7382
 
28.5%

Most frequent character per script

Common
ValueCountFrequency (%)
0 7377
39.8%
- 3688
19.9%
2 1958
 
10.6%
1 1764
 
9.5%
3 1188
 
6.4%
4 817
 
4.4%
6 560
 
3.0%
5 412
 
2.2%
7 403
 
2.2%
8 376
 
2.0%
Latin
ValueCountFrequency (%)
D 3732
50.6%
A 3602
48.8%
O 42
 
0.6%
r 1
 
< 0.1%
f 1
 
< 0.1%
d 1
 
< 0.1%
E 1
 
< 0.1%
W 1
 
< 0.1%
R 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 25927
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 7377
28.5%
D 3732
14.4%
- 3688
14.2%
A 3602
13.9%
2 1958
 
7.6%
1 1764
 
6.8%
3 1188
 
4.6%
4 817
 
3.2%
6 560
 
2.2%
5 412
 
1.6%
Other values (10) 829
 
3.2%

미디어종류
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size29.1 KiB
DVD
3399 
HDD
 
297
CD
 
8

Length

Max length3
Median length3
Mean length2.9978402
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDVD
2nd rowDVD
3rd rowDVD
4th rowDVD
5th rowDVD

Common Values

ValueCountFrequency (%)
DVD 3399
91.8%
HDD 297
 
8.0%
CD 8
 
0.2%

Length

2023-12-12T20:55:01.608630image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:55:01.732692image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
dvd 3399
91.8%
hdd 297
 
8.0%
cd 8
 
0.2%

영상종류
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct8
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size29.1 KiB
PDT001
3604 
PDT002
 
45
PDT004
 
42
PDT008
 
5
PDT123
 
4
Other values (3)
 
4

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st rowPDT001
2nd rowPDT001
3rd rowPDT001
4th rowPDT001
5th rowPDT001

Common Values

ValueCountFrequency (%)
PDT001 3604
97.3%
PDT002 45
 
1.2%
PDT004 42
 
1.1%
PDT008 5
 
0.1%
PDT123 4
 
0.1%
PDT007 2
 
0.1%
PDT003 1
 
< 0.1%
PDT999 1
 
< 0.1%

Length

2023-12-12T20:55:01.856249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:55:02.006436image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
pdt001 3604
97.3%
pdt002 45
 
1.2%
pdt004 42
 
1.1%
pdt008 5
 
0.1%
pdt123 4
 
0.1%
pdt007 2
 
0.1%
pdt003 1
 
< 0.1%
pdt999 1
 
< 0.1%

Correlations

2023-12-12T20:55:02.130755image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
캐비넷관리번호미디어종류영상종류
캐비넷관리번호1.0000.9950.997
미디어종류0.9951.0000.744
영상종류0.9970.7441.000
2023-12-12T20:55:02.315253image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
영상종류미디어종류
영상종류1.0000.644
미디어종류0.6441.000
2023-12-12T20:55:02.428375image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
미디어종류영상종류
미디어종류1.0000.644
영상종류0.6441.000

Missing values

2023-12-12T20:54:59.878182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T20:54:59.982983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

미디어관리번호캐비넷관리번호미디어종류영상종류
02002110011005DA02-07DVDPDT001
12002110011006DA02-07DVDPDT001
22002110011007DA02-07DVDPDT001
32002110011008DA02-07DVDPDT001
42002110011009DA02-07DVDPDT001
52002110011010DA02-07DVDPDT001
62002110011011DA02-07DVDPDT001
72002110011012DA02-07DVDPDT001
82002110011013DA02-07DVDPDT001
92002110011014DA02-07DVDPDT001
미디어관리번호캐비넷관리번호미디어종류영상종류
3694ORT-2008-0023DO05-07HDDPDT004
3695DEM-2008-0019DD05-07HDDPDT002
3696DEM-2007-0046DD05-08HDDPDT002
3697ORT-2007-0047DO05-08HDDPDT004
3698DEM-2008-0021DD05-07HDDPDT002
3699DEM-2008-0020DD05-07HDDPDT002
3700DEM-2007-0009DD05-08HDDPDT002
3701ORT-2007-0010DO05-08HDDPDT004
3702DEM-2006-0031DD07-01HDDPDT002
3703ORT-2006-0032DO07-01HDDPDT004

Duplicate rows

Most frequently occurring

미디어관리번호캐비넷관리번호미디어종류영상종류# duplicates
02007000017002DA04-03DVDPDT0012
12007000017003DA04-03DVDPDT0012
22007000017004DA04-03DVDPDT0012
32007000017005DA04-03DVDPDT0012
42007000017006DA04-03DVDPDT0012
52007000017007DA04-03DVDPDT0012
62007000017008DA04-03DVDPDT0012
72007000017009DA04-03DVDPDT0012
82007000017010DA04-03DVDPDT0012
92007000017011DA04-03DVDPDT0012