Dataset statistics
Number of variables | 4 |
---|---|
Number of observations | 3704 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 55 |
Duplicate rows (%) | 1.5% |
Total size in memory | 115.9 KiB |
Average record size in memory | 32.0 B |
Variable types
Text | 2 |
---|---|
Categorical | 2 |
Dataset
Description | 국토지리정보원의 항공사진 관련 메타데이터 중 미디어현황 입니다. (미디어관리번호, 미디어종류, 영상종류 등 포함) |
---|---|
Author | 국토교통부 국토지리정보원 |
URL | https://www.data.go.kr/data/15067537/fileData.do |
Dataset has 55 (1.5%) duplicate rows | Duplicates |
미디어종류 is highly overall correlated with 영상종류 | High correlation |
영상종류 is highly overall correlated with 미디어종류 | High correlation |
미디어종류 is highly imbalanced (73.2%) | Imbalance |
영상종류 is highly imbalanced (92.5%) | Imbalance |
Reproduction
Analysis started | 2023-12-12 11:54:59.590708 |
---|---|
Analysis finished | 2023-12-12 11:55:00.033180 |
Duration | 0.44 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
미디어관리번호
Text
Distinct | 3619 |
---|---|
Distinct (%) | 97.7% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 29.1 KiB |
Length
Max length | 13 |
---|---|
Median length | 13 |
Mean length | 12.843952 |
Min length | 1 |
Characters and Unicode
Total characters | 47574 |
---|---|
Distinct characters | 27 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 3537 ? |
---|---|
Unique (%) | 95.5% |
Sample
1st row | 2002110011005 |
---|---|
2nd row | 2002110011006 |
3rd row | 2002110011007 |
4th row | 2002110011008 |
5th row | 2002110011009 |
Value | Count | Frequency (%) |
air-2007-003a | 3 | 0.1% |
12 | 3 | 0.1% |
2 | 3 | 0.1% |
2007040002001 | 2 | 0.1% |
2007000017008 | 2 | 0.1% |
2007040006001 | 2 | 0.1% |
2007040006002 | 2 | 0.1% |
2007040001001 | 2 | 0.1% |
2007040001002 | 2 | 0.1% |
2007040001003 | 2 | 0.1% |
Other values (3609) | 3681 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 22936 | |
1 | 5957 | 12.5% |
9 | 4117 | 8.7% |
2 | 3436 | 7.2% |
8 | 2023 | 4.3% |
3 | 1714 | 3.6% |
5 | 1710 | 3.6% |
4 | 1461 | 3.1% |
7 | 1411 | 3.0% |
6 | 1144 | 2.4% |
Other values (17) | 1665 | 3.5% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 45909 | |
Uppercase Letter | 1069 | 2.2% |
Dash Punctuation | 590 | 1.2% |
Lowercase Letter | 6 | < 0.1% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 22936 | |
1 | 5957 | 13.0% |
9 | 4117 | 9.0% |
2 | 3436 | 7.5% |
8 | 2023 | 4.4% |
3 | 1714 | 3.7% |
5 | 1710 | 3.7% |
4 | 1461 | 3.2% |
7 | 1411 | 3.1% |
6 | 1144 | 2.5% |
Uppercase Letter
Value | Count | Frequency (%) |
A | 295 | |
R | 251 | |
I | 209 | |
B | 83 | 7.8% |
D | 44 | 4.1% |
E | 44 | 4.1% |
M | 44 | 4.1% |
O | 42 | 3.9% |
T | 42 | 3.9% |
W | 15 | 1.4% |
Lowercase Letter
Value | Count | Frequency (%) |
y | 1 | |
h | 1 | |
g | 1 | |
r | 1 | |
f | 1 | |
d | 1 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 590 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 46499 | |
Latin | 1075 | 2.3% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
A | 295 | |
R | 251 | |
I | 209 | |
B | 83 | 7.7% |
D | 44 | 4.1% |
E | 44 | 4.1% |
M | 44 | 4.1% |
O | 42 | 3.9% |
T | 42 | 3.9% |
W | 15 | 1.4% |
Other values (6) | 6 | 0.6% |
Common
Value | Count | Frequency (%) |
0 | 22936 | |
1 | 5957 | 12.8% |
9 | 4117 | 8.9% |
2 | 3436 | 7.4% |
8 | 2023 | 4.4% |
3 | 1714 | 3.7% |
5 | 1710 | 3.7% |
4 | 1461 | 3.1% |
7 | 1411 | 3.0% |
6 | 1144 | 2.5% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 47574 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 22936 | |
1 | 5957 | 12.5% |
9 | 4117 | 8.7% |
2 | 3436 | 7.2% |
8 | 2023 | 4.3% |
3 | 1714 | 3.6% |
5 | 1710 | 3.6% |
4 | 1461 | 3.1% |
7 | 1411 | 3.0% |
6 | 1144 | 2.4% |
Other values (17) | 1665 | 3.5% |
캐비넷관리번호
Text
Distinct | 64 |
---|---|
Distinct (%) | 1.7% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 29.1 KiB |
Value | Count | Frequency (%) |
da01-03 | 177 | 4.8% |
da02-04 | 175 | 4.7% |
da02-03 | 173 | 4.7% |
da03-02 | 171 | 4.6% |
da01-07 | 171 | 4.6% |
da01-06 | 171 | 4.6% |
da01-05 | 170 | 4.6% |
da04-01 | 170 | 4.6% |
da02-02 | 169 | 4.6% |
da01-04 | 169 | 4.6% |
Other values (54) | 1988 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 7377 | |
D | 3732 | |
- | 3688 | |
A | 3602 | |
2 | 1958 | 7.6% |
1 | 1764 | 6.8% |
3 | 1188 | 4.6% |
4 | 817 | 3.2% |
6 | 560 | 2.2% |
5 | 412 | 1.6% |
Other values (10) | 829 | 3.2% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 14857 | |
Uppercase Letter | 7379 | |
Dash Punctuation | 3688 | 14.2% |
Lowercase Letter | 3 | < 0.1% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 7377 | |
2 | 1958 | 13.2% |
1 | 1764 | 11.9% |
3 | 1188 | 8.0% |
4 | 817 | 5.5% |
6 | 560 | 3.8% |
5 | 412 | 2.8% |
7 | 403 | 2.7% |
8 | 376 | 2.5% |
9 | 2 | < 0.1% |
Uppercase Letter
Value | Count | Frequency (%) |
D | 3732 | |
A | 3602 | |
O | 42 | 0.6% |
E | 1 | < 0.1% |
W | 1 | < 0.1% |
R | 1 | < 0.1% |
Lowercase Letter
Value | Count | Frequency (%) |
r | 1 | |
f | 1 | |
d | 1 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 3688 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 18545 | |
Latin | 7382 | 28.5% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 7377 | |
- | 3688 | |
2 | 1958 | 10.6% |
1 | 1764 | 9.5% |
3 | 1188 | 6.4% |
4 | 817 | 4.4% |
6 | 560 | 3.0% |
5 | 412 | 2.2% |
7 | 403 | 2.2% |
8 | 376 | 2.0% |
Latin
Value | Count | Frequency (%) |
D | 3732 | |
A | 3602 | |
O | 42 | 0.6% |
r | 1 | < 0.1% |
f | 1 | < 0.1% |
d | 1 | < 0.1% |
E | 1 | < 0.1% |
W | 1 | < 0.1% |
R | 1 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 25927 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 7377 | |
D | 3732 | |
- | 3688 | |
A | 3602 | |
2 | 1958 | 7.6% |
1 | 1764 | 6.8% |
3 | 1188 | 4.6% |
4 | 817 | 3.2% |
6 | 560 | 2.2% |
5 | 412 | 1.6% |
Other values (10) | 829 | 3.2% |
미디어종류
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 3 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 29.1 KiB |
DVD | |
---|---|
HDD | 297 |
CD | 8 |
Length
Max length | 3 |
---|---|
Median length | 3 |
Mean length | 2.9978402 |
Min length | 2 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | DVD |
---|---|
2nd row | DVD |
3rd row | DVD |
4th row | DVD |
5th row | DVD |
Common Values
Value | Count | Frequency (%) |
DVD | 3399 | |
HDD | 297 | 8.0% |
CD | 8 | 0.2% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
dvd | 3399 | |
hdd | 297 | 8.0% |
cd | 8 | 0.2% |
영상종류
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 8 |
---|---|
Distinct (%) | 0.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 29.1 KiB |
PDT001 | |
---|---|
PDT002 | 45 |
PDT004 | 42 |
PDT008 | 5 |
PDT123 | 4 |
Other values (3) | 4 |
Length
Max length | 6 |
---|---|
Median length | 6 |
Mean length | 6 |
Min length | 6 |
Unique
Unique | 2 ? |
---|---|
Unique (%) | 0.1% |
Sample
1st row | PDT001 |
---|---|
2nd row | PDT001 |
3rd row | PDT001 |
4th row | PDT001 |
5th row | PDT001 |
Common Values
Value | Count | Frequency (%) |
PDT001 | 3604 | |
PDT002 | 45 | 1.2% |
PDT004 | 42 | 1.1% |
PDT008 | 5 | 0.1% |
PDT123 | 4 | 0.1% |
PDT007 | 2 | 0.1% |
PDT003 | 1 | < 0.1% |
PDT999 | 1 | < 0.1% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
pdt001 | 3604 | |
pdt002 | 45 | 1.2% |
pdt004 | 42 | 1.1% |
pdt008 | 5 | 0.1% |
pdt123 | 4 | 0.1% |
pdt007 | 2 | 0.1% |
pdt003 | 1 | < 0.1% |
pdt999 | 1 | < 0.1% |
캐비넷관리번호 | 미디어종류 | 영상종류 | |
---|---|---|---|
캐비넷관리번호 | 1.000 | 0.995 | 0.997 |
미디어종류 | 0.995 | 1.000 | 0.744 |
영상종류 | 0.997 | 0.744 | 1.000 |
영상종류 | 미디어종류 | |
---|---|---|
영상종류 | 1.000 | 0.644 |
미디어종류 | 0.644 | 1.000 |
미디어종류 | 영상종류 | |
---|---|---|
미디어종류 | 1.000 | 0.644 |
영상종류 | 0.644 | 1.000 |
미디어관리번호 | 캐비넷관리번호 | 미디어종류 | 영상종류 | |
---|---|---|---|---|
0 | 2002110011005 | DA02-07 | DVD | PDT001 |
1 | 2002110011006 | DA02-07 | DVD | PDT001 |
2 | 2002110011007 | DA02-07 | DVD | PDT001 |
3 | 2002110011008 | DA02-07 | DVD | PDT001 |
4 | 2002110011009 | DA02-07 | DVD | PDT001 |
5 | 2002110011010 | DA02-07 | DVD | PDT001 |
6 | 2002110011011 | DA02-07 | DVD | PDT001 |
7 | 2002110011012 | DA02-07 | DVD | PDT001 |
8 | 2002110011013 | DA02-07 | DVD | PDT001 |
9 | 2002110011014 | DA02-07 | DVD | PDT001 |
미디어관리번호 | 캐비넷관리번호 | 미디어종류 | 영상종류 | |
---|---|---|---|---|
3694 | ORT-2008-0023 | DO05-07 | HDD | PDT004 |
3695 | DEM-2008-0019 | DD05-07 | HDD | PDT002 |
3696 | DEM-2007-0046 | DD05-08 | HDD | PDT002 |
3697 | ORT-2007-0047 | DO05-08 | HDD | PDT004 |
3698 | DEM-2008-0021 | DD05-07 | HDD | PDT002 |
3699 | DEM-2008-0020 | DD05-07 | HDD | PDT002 |
3700 | DEM-2007-0009 | DD05-08 | HDD | PDT002 |
3701 | ORT-2007-0010 | DO05-08 | HDD | PDT004 |
3702 | DEM-2006-0031 | DD07-01 | HDD | PDT002 |
3703 | ORT-2006-0032 | DO07-01 | HDD | PDT004 |
Most frequently occurring
미디어관리번호 | 캐비넷관리번호 | 미디어종류 | 영상종류 | # duplicates | |
---|---|---|---|---|---|
0 | 2007000017002 | DA04-03 | DVD | PDT001 | 2 |
1 | 2007000017003 | DA04-03 | DVD | PDT001 | 2 |
2 | 2007000017004 | DA04-03 | DVD | PDT001 | 2 |
3 | 2007000017005 | DA04-03 | DVD | PDT001 | 2 |
4 | 2007000017006 | DA04-03 | DVD | PDT001 | 2 |
5 | 2007000017007 | DA04-03 | DVD | PDT001 | 2 |
6 | 2007000017008 | DA04-03 | DVD | PDT001 | 2 |
7 | 2007000017009 | DA04-03 | DVD | PDT001 | 2 |
8 | 2007000017010 | DA04-03 | DVD | PDT001 | 2 |
9 | 2007000017011 | DA04-03 | DVD | PDT001 | 2 |