gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	1166
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	3
Duplicate rows (%)	0.3%
Total size in memory	37.7 KiB
Average record size in memory	33.1 B

Variable types

Numeric	1
Text	3

Dataset

Description	한국광해광업공단이 1978년부터 시행한 해외자원개발 조사사업의 목록을 연도별, 국가별, 광종별, 광산이름 정보로 정리하여 제공합니다
URL	https://www.data.go.kr/data/15025211/fileData.do

Alerts

Dataset has 3 (0.3%) duplicate rows

Duplicates

Reproduction

Analysis started	2023-12-12 04:52:33.432182
Analysis finished	2023-12-12 04:52:34.386715
Duration	0.95 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

년도
Real number (ℝ)

Distinct	45
Distinct (%)	3.9%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	2001.7856

Minimum	1978
Maximum	2022
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	10.4 KiB

Quantile statistics

Minimum	1978
5-th percentile	1978
Q1	1994
median	2005
Q3	2011
95-th percentile	2019
Maximum	2022
Range	44
Interquartile range (IQR)	17

Descriptive statistics

Standard deviation	12.335872
Coefficient of variation (CV)	0.0061624341
Kurtosis	-0.74158811
Mean	2001.7856
Median Absolute Deviation (MAD)	8
Skewness	-0.57391927
Sum	2334082
Variance	152.17373
Monotonicity	Increasing

Histogram with fixed size bins (bins=45)

Value	Count	Frequency (%)
2011	70	6.0%
1978	65	5.6%
2012	61	5.2%
2010	59	5.1%
2009	57	4.9%
2013	46	3.9%
2007	46	3.9%
2008	43	3.7%
1979	36	3.1%
2006	32	2.7%
Other values (35)	651	55.8%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1978	65	5.6%
1979	36	3.1%
1980	25	2.1%
1981	22	1.9%
1982	13	1.1%
1983	6	0.5%
1984	5	0.4%
1985	9	0.8%
1986	11	0.9%
1987	9	0.8%

Value	Count	Frequency (%)
2022	11	0.9%
2021	23	2.0%
2020	14	1.2%
2019	14	1.2%
2018	15	1.3%
2017	14	1.2%
2016	15	1.3%
2015	23	2.0%
2014	30	2.6%
2013	46	3.9%

국가명
Text

Distinct	79
Distinct (%)	6.8%
Missing	0
Missing (%)	0.0%
Memory size	9.2 KiB

Length

Max length	8
Median length	7
Mean length	3.3567753
Min length	2

Characters and Unicode

Total characters	3914
Distinct characters	114
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	20 ?
Unique (%)	1.7%

Sample

1st row	페루
2nd row	볼리비아
3rd row	볼리비아
4th row	볼리비아
5th row	볼리비아

Value	Count	Frequency (%)
인도네시아	196	16.8%
호주	124	10.6%
몽골	93	8.0%
중국	84	7.2%
필리핀	71	6.1%
캐나다	54	4.6%
페루	48	4.1%
미국	41	3.5%
볼리비아	39	3.3%
카자흐스탄	34	2.9%
Other values (70)	383	32.8%

Most occurring characters

Value	Count	Frequency (%)
아	372	9.5%
시	244	6.2%
도	205	5.2%
네	202	5.2%
인	201	5.1%
국	147	3.8%
리	137	3.5%
주	131	3.3%
호	124	3.2%
스	109	2.8%
Other values (104)	2042	52.2%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	3913	> 99.9%
Space Separator	1	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	372	9.5%
시	244	6.2%
도	205	5.2%
네	202	5.2%
인	201	5.1%
국	147	3.8%
리	137	3.5%
주	131	3.3%
호	124	3.2%
스	109	2.8%
Other values (103)	2041	52.2%

Space Separator

Value	Count	Frequency (%)
	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	3913	> 99.9%
Common	1	< 0.1%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	372	9.5%
시	244	6.2%
도	205	5.2%
네	202	5.2%
인	201	5.1%
국	147	3.8%
리	137	3.5%
주	131	3.3%
호	124	3.2%
스	109	2.8%
Other values (103)	2041	52.2%

Common

Value	Count	Frequency (%)
	1	100.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	3913	> 99.9%
ASCII	1	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	372	9.5%
시	244	6.2%
도	205	5.2%
네	202	5.2%
인	201	5.1%
국	147	3.8%
리	137	3.5%
주	131	3.3%
호	124	3.2%
스	109	2.8%
Other values (103)	2041	52.2%

ASCII

Value	Count	Frequency (%)
	1	100.0%

광산이름
Text

Distinct	1124
Distinct (%)	96.4%
Missing	0
Missing (%)	0.0%
Memory size	9.2 KiB

Length

Max length	17
Median length	15
Mean length	3.8301887
Min length	1

Characters and Unicode

Total characters	4466
Distinct characters	600
Distinct categories	10 ?
Distinct scripts	4 ?
Distinct blocks	4 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1085 ?
Unique (%)	93.1%

Sample

1st row	토로모쵸
2nd row	야루이꼬야
3rd row	출츄카니
4th row	빠드꼬요
5th row	에스페란쟈

Value	Count	Frequency (%)
탐보	3	0.3%
잠발레스	3	0.3%
tbs	3	0.3%
kwb	3	0.3%
카라토르	2	0.2%
심바이	2	0.2%
사라	2	0.2%
지캄보	2	0.2%
존시트톨고이	2	0.2%
오프레이	2	0.2%
Other values (1129)	1160	98.0%

Most occurring characters

Value	Count	Frequency (%)
스	174	3.9%
라	141	3.2%
이	128	2.9%
리	100	2.2%
르	91	2.0%
아	78	1.7%
마	77	1.7%
로	75	1.7%
바	70	1.6%
트	67	1.5%
Other values (590)	3465	77.6%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	4040	90.5%
Uppercase Letter	259	5.8%
Lowercase Letter	88	2.0%
Space Separator	19	0.4%
Open Punctuation	16	0.4%
Close Punctuation	16	0.4%
Decimal Number	13	0.3%
Dash Punctuation	12	0.3%
Other Punctuation	2	< 0.1%
Letter Number	1	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
스	174	4.3%
라	141	3.5%
이	128	3.2%
리	100	2.5%
르	91	2.3%
아	78	1.9%
마	77	1.9%
로	75	1.9%
바	70	1.7%
트	67	1.7%
Other values (535)	3039	75.2%

Uppercase Letter

Value	Count	Frequency (%)
B	32	12.4%
S	29	11.2%
M	26	10.0%
P	19	7.3%
K	19	7.3%
C	17	6.6%
T	15	5.8%
R	14	5.4%
A	11	4.2%
I	10	3.9%
Other values (15)	67	25.9%

Lowercase Letter

Value	Count	Frequency (%)
a	16	18.2%
e	11	12.5%
r	7	8.0%
n	5	5.7%
u	5	5.7%
o	5	5.7%
m	5	5.7%
i	4	4.5%
t	4	4.5%
s	4	4.5%
Other values (11)	22	25.0%

Decimal Number

Value	Count	Frequency (%)
1	7	53.8%
2	5	38.5%
3	1	7.7%

Space Separator

Value	Count	Frequency (%)
	19	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	16	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	16	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	12	100.0%

Other Punctuation

Value	Count	Frequency (%)
/	2	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅱ	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	4038	90.4%
Latin	348	7.8%
Common	78	1.7%
Han	2	< 0.1%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
스	174	4.3%
라	141	3.5%
이	128	3.2%
리	100	2.5%
르	91	2.3%
아	78	1.9%
마	77	1.9%
로	75	1.9%
바	70	1.7%
트	67	1.7%
Other values (534)	3037	75.2%

Latin

Value	Count	Frequency (%)
B	32	9.2%
S	29	8.3%
M	26	7.5%
P	19	5.5%
K	19	5.5%
C	17	4.9%
a	16	4.6%
T	15	4.3%
R	14	4.0%
A	11	3.2%
Other values (37)	150	43.1%

Common

Value	Count	Frequency (%)
	19	24.4%
(	16	20.5%
)	16	20.5%
-	12	15.4%
1	7	9.0%
2	5	6.4%
/	2	2.6%
3	1	1.3%

Han

Value	Count	Frequency (%)
社	2	100.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	4038	90.4%
ASCII	425	9.5%
CJK	2	< 0.1%
Number Forms	1	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
스	174	4.3%
라	141	3.5%
이	128	3.2%
리	100	2.5%
르	91	2.3%
아	78	1.9%
마	77	1.9%
로	75	1.9%
바	70	1.7%
트	67	1.7%
Other values (534)	3037	75.2%

ASCII

Value	Count	Frequency (%)
B	32	7.5%
S	29	6.8%
M	26	6.1%
P	19	4.5%
K	19	4.5%
	19	4.5%
C	17	4.0%
a	16	3.8%
(	16	3.8%
)	16	3.8%
Other values (44)	216	50.8%

CJK

Value	Count	Frequency (%)
社	2	100.0%

Number Forms

Value	Count	Frequency (%)
Ⅱ	1	100.0%

광종
Text

Distinct	65
Distinct (%)	5.6%
Missing	0
Missing (%)	0.0%
Memory size	9.2 KiB

Length

Max length	9
Median length	8
Mean length	2.1492281
Min length	1

Characters and Unicode

Total characters	2506
Distinct characters	84
Distinct categories	5 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	26 ?
Unique (%)	2.2%

Sample

1st row	동
2nd row	우라늄
3rd row	우라늄
4th row	우라늄
5th row	우라늄

Value	Count	Frequency (%)
유연탄	311	26.5%
동	274	23.3%
금	107	9.1%
우라늄	63	5.4%
아연	50	4.3%
철	45	3.8%
무연탄	28	2.4%
니켈	28	2.4%
연	22	1.9%
주석	21	1.8%
Other values (51)	225	19.2%

Most occurring characters

Value	Count	Frequency (%)
연	421	16.8%
탄	356	14.2%
유	315	12.6%
동	279	11.1%
금	126	5.0%
늄	73	2.9%
우	63	2.5%
라	63	2.5%
석	62	2.5%
아	53	2.1%
Other values (74)	695	27.7%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	2477	98.8%
Other Punctuation	15	0.6%
Space Separator	8	0.3%
Open Punctuation	3	0.1%
Close Punctuation	3	0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
연	421	17.0%
탄	356	14.4%
유	315	12.7%
동	279	11.3%
금	126	5.1%
늄	73	2.9%
우	63	2.5%
라	63	2.5%
석	62	2.5%
아	53	2.1%
Other values (70)	666	26.9%

Other Punctuation

Value	Count	Frequency (%)
,	15	100.0%

Space Separator

Value	Count	Frequency (%)
	8	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	3	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	3	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	2477	98.8%
Common	29	1.2%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
연	421	17.0%
탄	356	14.4%
유	315	12.7%
동	279	11.3%
금	126	5.1%
늄	73	2.9%
우	63	2.5%
라	63	2.5%
석	62	2.5%
아	53	2.1%
Other values (70)	666	26.9%

Common

Value	Count	Frequency (%)
,	15	51.7%
	8	27.6%
(	3	10.3%
)	3	10.3%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	2477	98.8%
ASCII	29	1.2%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
연	421	17.0%
탄	356	14.4%
유	315	12.7%
동	279	11.3%
금	126	5.1%
늄	73	2.9%
우	63	2.5%
라	63	2.5%
석	62	2.5%
아	53	2.1%
Other values (70)	666	26.9%

ASCII

Value	Count	Frequency (%)
,	15	51.7%
	8	27.6%
(	3	10.3%
)	3	10.3%

년도

년도

Phik (φk)

Heatmap
Table

	년도	국가명	광종
년도	1.000	0.692	0.614
국가명	0.692	1.000	0.829
광종	0.614	0.829	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	년도	국가명	광산이름	광종
0	1978	페루	토로모쵸	동
1	1978	볼리비아	야루이꼬야	우라늄
2	1978	볼리비아	출츄카니	우라늄
3	1978	볼리비아	빠드꼬요	우라늄
4	1978	볼리비아	에스페란쟈	우라늄
5	1978	볼리비아	룬라야	우라늄
6	1978	미국	쓰리스타	무연탄
7	1978	페루	띤따야	동
8	1978	태국	푸비엥	우라늄
9	1978	콜롬비아	마카레나	우라늄

	년도	국가명	광산이름	광종
1156	2022	인도네시아	PUP	니켈
1157	2022	인도네시아	모라모	석회석
1158	2022	몽골	하단하르	몰리브덴
1159	2022	몽골	소곳	동
1160	2022	인도네시아	MDK	니켈
1161	2022	인도네시아	TBS	유연탄
1162	2022	몽골	하르골	유연탄
1163	2022	카자흐스탄	쿠르다이	동
1164	2022	몽골	운드르차간	텅스텐
1165	2022	몽골	노썬보르츠	동

Most frequently occurring

	년도	국가명	광산이름	광종	# duplicates
0	1986	호주	글레니스크릭	유연탄	2
1	2018	몽골	타카라진	금	2
2	2022	몽골	노썬보르츠	동	2

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Space Separator

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Other Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Han

Most occurring blocks

Most frequent character per block

Hangul

ASCII

CJK

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Interactions

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring