gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	6258
Missing cells	6006
Missing cells (%)	24.0%
Duplicate rows	54
Duplicate rows (%)	0.9%
Total size in memory	201.8 KiB
Average record size in memory	33.0 B

Variable types

Text	3
Numeric	1

Dataset

Description	국토지리정보원의 항공사진 관련 메타데이터 중 미디어 저장내역 입니다. (미디어관리번호, 미디어자료일련번호 등 포함)
Author	국토교통부 국토지리정보원
URL	https://www.data.go.kr/data/15067536/fileData.do

Alerts

Dataset has 54 (0.9%) duplicate rows	Duplicates
`비고` has 5951 (95.1%) missing values	Missing

Reproduction

Analysis started	2023-12-12 17:21:15.176475
Analysis finished	2023-12-12 17:21:15.734576
Duration	0.56 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

미디어관리번호
Text

Distinct	3631
Distinct (%)	58.0%
Missing	0
Missing (%)	0.0%
Memory size	49.0 KiB

Length

Max length	13
Median length	13
Mean length	12.989773
Min length	1

Characters and Unicode

Total characters	81290
Distinct characters	21
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	3421 ?
Unique (%)	54.7%

Sample

1st row	2006100002023
2nd row	2007110016001
3rd row	2007110016002
4th row	2007110016003
5th row	2007110016004

Value	Count	Frequency (%)
air-2006-015a	289	4.6%
air-2006-015b	289	4.6%
air-2006-027a	144	2.3%
air-2006-027b	144	2.3%
air-2005-021a	115	1.8%
air-2005-021b	115	1.8%
air-2007-007a	46	0.7%
air-2007-007b	46	0.7%
air-2005-003a	44	0.7%
air-2006-008a	38	0.6%
Other values (3621)	4988	79.7%

Most occurring characters

Value	Count	Frequency (%)
0	31790	39.1%
1	7229	8.9%
2	6992	8.6%
-	5712	7.0%
9	4274	5.3%
A	4075	5.0%
6	2995	3.7%
5	2970	3.7%
R	2812	3.5%
I	2770	3.4%
Other values (11)	9671	11.9%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	64318	79.1%
Uppercase Letter	11260	13.9%
Dash Punctuation	5712	7.0%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	31790	49.4%
1	7229	11.2%
2	6992	10.9%
9	4274	6.6%
6	2995	4.7%
5	2970	4.6%
8	2182	3.4%
7	2166	3.4%
3	2029	3.2%
4	1691	2.6%

Uppercase Letter

Value	Count	Frequency (%)
A	4075	36.2%
R	2812	25.0%
I	2770	24.6%
B	1282	11.4%
W	105	0.9%
D	44	0.4%
E	44	0.4%
M	44	0.4%
O	42	0.4%
T	42	0.4%

Dash Punctuation

Value	Count	Frequency (%)
-	5712	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	70030	86.1%
Latin	11260	13.9%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	31790	45.4%
1	7229	10.3%
2	6992	10.0%
-	5712	8.2%
9	4274	6.1%
6	2995	4.3%
5	2970	4.2%
8	2182	3.1%
7	2166	3.1%
3	2029	2.9%

Latin

Value	Count	Frequency (%)
A	4075	36.2%
R	2812	25.0%
I	2770	24.6%
B	1282	11.4%
W	105	0.9%
D	44	0.4%
E	44	0.4%
M	44	0.4%
O	42	0.4%
T	42	0.4%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	81290	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	31790	39.1%
1	7229	8.9%
2	6992	8.6%
-	5712	7.0%
9	4274	5.3%
A	4075	5.0%
6	2995	3.7%
5	2970	3.7%
R	2812	3.5%
I	2770	3.4%
Other values (11)	9671	11.9%

사업지구코드
Text

Distinct	375
Distinct (%)	6.0%
Missing	55
Missing (%)	0.9%
Memory size	49.0 KiB

Length

Max length	11
Median length	11
Mean length	11
Min length	11

Characters and Unicode

Total characters	68233
Distinct characters	12
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	7 ?
Unique (%)	0.1%

Sample

1st row	200610A0002
2nd row	200711A0016
3rd row	200711A0016
4th row	200711A0016
5th row	200711A0016

Value	Count	Frequency (%)
198500a0003	147	2.4%
198609a0004	134	2.2%
198900a0002	96	1.5%
198900a0008	88	1.4%
198700a0001	87	1.4%
198800a0003	83	1.3%
198700a0011	77	1.2%
200505a0001	71	1.1%
198800a0004	68	1.1%
198500a0001	67	1.1%
Other values (365)	5285	85.2%

Most occurring characters

Value	Count	Frequency (%)
0	30581	44.8%
1	8187	12.0%
9	7081	10.4%
A	6083	8.9%
2	4200	6.2%
8	2740	4.0%
5	2105	3.1%
7	1952	2.9%
3	1728	2.5%
4	1723	2.5%
Other values (2)	1853	2.7%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	61670	90.4%
Uppercase Letter	6083	8.9%
Space Separator	480	0.7%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	30581	49.6%
1	8187	13.3%
9	7081	11.5%
2	4200	6.8%
8	2740	4.4%
5	2105	3.4%
7	1952	3.2%
3	1728	2.8%
4	1723	2.8%
6	1373	2.2%

Uppercase Letter

Value	Count	Frequency (%)
A	6083	100.0%

Space Separator

Value	Count	Frequency (%)
	480	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	62150	91.1%
Latin	6083	8.9%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	30581	49.2%
1	8187	13.2%
9	7081	11.4%
2	4200	6.8%
8	2740	4.4%
5	2105	3.4%
7	1952	3.1%
3	1728	2.8%
4	1723	2.8%
6	1373	2.2%

Latin

Value	Count	Frequency (%)
A	6083	100.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	68233	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	30581	44.8%
1	8187	12.0%
9	7081	10.4%
A	6083	8.9%
2	4200	6.2%
8	2740	4.0%
5	2105	3.1%
7	1952	2.9%
3	1728	2.5%
4	1723	2.5%
Other values (2)	1853	2.7%

미디어자료일련번호
Real number (ℝ)

Distinct	289
Distinct (%)	4.6%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	22.274049

Minimum	1
Maximum	289
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	55.1 KiB

Quantile statistics

Minimum	1
5-th percentile	1
Q1	1
median	1
Q3	13
95-th percentile	139
Maximum	289
Range	288
Interquartile range (IQR)	12

Descriptive statistics

Standard deviation	51.276088
Coefficient of variation (CV)	2.3020551
Kurtosis	9.9493427
Mean	22.274049
Median Absolute Deviation (MAD)	0
Skewness	3.153189
Sum	139391
Variance	2629.2372
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
1	3645	58.2%
2	192	3.1%
3	132	2.1%
4	106	1.7%
5	100	1.6%
6	96	1.5%
7	82	1.3%
8	78	1.2%
9	70	1.1%
10	67	1.1%
Other values (279)	1690	27.0%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1	3645	58.2%
2	192	3.1%
3	132	2.1%
4	106	1.7%
5	100	1.6%
6	96	1.5%
7	82	1.3%
8	78	1.2%
9	70	1.1%
10	67	1.1%

Value	Count	Frequency (%)
289	2	< 0.1%
288	2	< 0.1%
287	2	< 0.1%
286	2	< 0.1%
285	2	< 0.1%
284	2	< 0.1%
283	2	< 0.1%
282	2	< 0.1%
281	2	< 0.1%
280	2	< 0.1%

비고
Text

MISSING

Distinct	136
Distinct (%)	44.3%
Missing	5951
Missing (%)	95.1%
Memory size	49.0 KiB

Length

Max length	19
Median length	11
Mean length	10.283388
Min length	3

Characters and Unicode

Total characters	3157
Distinct characters	113
Distinct categories	8 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	50 ?
Unique (%)	16.3%

Sample

1st row	123
2nd row	198700A0001
3rd row	198700A0001
4th row	198700A0001
5th row	1987 서해안

Value	Count	Frequency (%)
1987	27	5.2%
서해안	27	5.2%
경기도	21	4.0%
1992	19	3.7%
경인지구	19	3.7%
경남	15	2.9%
신도시	9	1.7%
인천	6	1.2%
제주	6	1.2%
서울	6	1.2%
Other values (166)	364	70.1%

Most occurring characters

Value	Count	Frequency (%)
0	785	24.9%
1	252	8.0%
9	230	7.3%
	213	6.7%
A	156	4.9%
2	151	4.8%
7	103	3.3%
8	103	3.3%
구	74	2.3%
4	69	2.2%
Other values (103)	1021	32.3%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	1804	57.1%
Other Letter	949	30.1%
Space Separator	213	6.7%
Uppercase Letter	160	5.1%
Lowercase Letter	21	0.7%
Open Punctuation	5	0.2%
Close Punctuation	3	0.1%
Math Symbol	2	0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
구	74	7.8%
시	63	6.6%
경	58	6.1%
도	45	4.7%
해	41	4.3%
안	39	4.1%
서	39	4.1%
남	39	4.1%
지	32	3.4%
기	27	2.8%
Other values (77)	492	51.8%

Decimal Number

Value	Count	Frequency (%)
0	785	43.5%
1	252	14.0%
9	230	12.7%
2	151	8.4%
7	103	5.7%
8	103	5.7%
4	69	3.8%
3	38	2.1%
5	37	2.1%
6	36	2.0%

Lowercase Letter

Value	Count	Frequency (%)
l	4	19.0%
o	3	14.3%
u	2	9.5%
x	2	9.5%
i	2	9.5%
n	2	9.5%
a	2	9.5%
m	2	9.5%
s	2	9.5%

Uppercase Letter

Value	Count	Frequency (%)
A	156	97.5%
M	2	1.2%
W	2	1.2%

Space Separator

Value	Count	Frequency (%)
	213	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	5	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	3	100.0%

Math Symbol

Value	Count	Frequency (%)
~	2	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	2027	64.2%
Hangul	949	30.1%
Latin	181	5.7%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
구	74	7.8%
시	63	6.6%
경	58	6.1%
도	45	4.7%
해	41	4.3%
안	39	4.1%
서	39	4.1%
남	39	4.1%
지	32	3.4%
기	27	2.8%
Other values (77)	492	51.8%

Common

Value	Count	Frequency (%)
0	785	38.7%
1	252	12.4%
9	230	11.3%
	213	10.5%
2	151	7.4%
7	103	5.1%
8	103	5.1%
4	69	3.4%
3	38	1.9%
5	37	1.8%
Other values (4)	46	2.3%

Latin

Value	Count	Frequency (%)
A	156	86.2%
l	4	2.2%
o	3	1.7%
u	2	1.1%
x	2	1.1%
i	2	1.1%
n	2	1.1%
a	2	1.1%
m	2	1.1%
s	2	1.1%
Other values (2)	4	2.2%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	2208	69.9%
Hangul	949	30.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	785	35.6%
1	252	11.4%
9	230	10.4%
	213	9.6%
A	156	7.1%
2	151	6.8%
7	103	4.7%
8	103	4.7%
4	69	3.1%
3	38	1.7%
Other values (16)	108	4.9%

Hangul

Value	Count	Frequency (%)
구	74	7.8%
시	63	6.6%
경	58	6.1%
도	45	4.7%
해	41	4.3%
안	39	4.1%
서	39	4.1%
남	39	4.1%
지	32	3.4%
기	27	2.8%
Other values (77)	492	51.8%

미디어자료일련번호

미디어자료일련번호

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

First rows
Last rows

	미디어관리번호	사업지구코드	미디어자료일련번호	비고
0	2006100002023	200610A0002	1	<NA>
1	2007110016001	200711A0016	1	<NA>
2	2007110016002	200711A0016	1	<NA>
3	2007110016003	200711A0016	1	<NA>
4	2007110016004	200711A0016	1	<NA>
5	2007110016005	200711A0016	1	<NA>
6	2007110016006	200711A0016	1	<NA>
7	2007110016007	200711A0016	1	<NA>
8	2007110016008	200711A0016	1	<NA>
9	2007110016009	200711A0016	1	<NA>

	미디어관리번호	사업지구코드	미디어자료일련번호	비고
6248	AIR-2005-012B	199300A0002	1	<NA>
6249	AIR-2005-012B	199300A0003	2	<NA>
6250	AIR-2005-012B	200209A0002	3	<NA>
6251	AIR-2005-013A	199600A0002	1	<NA>
6252	AIR-2005-013A	199700A0023	2	<NA>
6253	AIR-2005-014A	199100A0006	1	<NA>
6254	AIR-2005-014A	199600A0003	2	<NA>
6255	AIR-2005-014A	199600A0013	3	<NA>
6256	AIR-2005-015A	196800A0002	1	<NA>
6257	AIR-2005-015A	199400A0002	2	<NA>

Most frequently occurring

	미디어관리번호	사업지구코드	미디어자료일련번호	비고	# duplicates
0	1970000003001	197000A0003	1	<NA>	2
1	2007000017002	200700A0017	1	<NA>	2
2	2007000017003	200700A0017	1	<NA>	2
3	2007000017004	200700A0017	1	<NA>	2
4	2007000017005	200700A0017	1	<NA>	2
5	2007000017006	200700A0017	1	<NA>	2
6	2007000017007	200700A0017	1	<NA>	2
7	2007000017008	200700A0017	1	<NA>	2
8	2007000017009	200700A0017	1	<NA>	2
9	2007000017010	200700A0017	1	<NA>	2

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Dash Punctuation

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Space Separator

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Lowercase Letter

Uppercase Letter

Space Separator

Open Punctuation

Close Punctuation

Math Symbol

Most occurring scripts

Most frequent character per script

Hangul

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Interactions

Missing values

Sample

Duplicate rows

Most frequently occurring