gimi9 Pandas Profiling

Dataset statistics

Number of variables	15
Number of observations	417
Missing cells	1593
Missing cells (%)	25.5%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	49.4 KiB
Average record size in memory	121.3 B

Variable types

Text	7
Categorical	5
Numeric	1
Boolean	1
DateTime	1

Dataset

Description	국외독립운동사적지 홈페이지에서 사용하는 마스터 코드 정보로 국가코드, 국가별 지역코드(한글, 영어, 일어, 중국어), 교과서 분류 코드 등을 포함하고 있다.
Author	독립기념관
URL	https://www.data.go.kr/data/15122341/fileData.do

Alerts

`등록일` has constant value ""	Constant
`비고` is highly overall correlated with `코드1` and 3 other fields	High correlation
`코드1` is highly overall correlated with `상위코드` and 3 other fields	High correlation
`구분A` is highly overall correlated with `코드1` and 3 other fields	High correlation
`사용구분` is highly overall correlated with `코드1` and 3 other fields	High correlation
`상위코드` is highly overall correlated with `코드2` and 4 other fields	High correlation
`코드2` is highly overall correlated with `상위코드`	High correlation
`코드명_일본어` has 369 (88.5%) missing values	Missing
`코드명_중국어` has 384 (92.1%) missing values	Missing
`코드약어` has 177 (42.4%) missing values	Missing
`구분B` has 246 (59.0%) missing values	Missing
`수정일` has 414 (99.3%) missing values	Missing
`코드` has unique values	Unique

Reproduction

Analysis started	2023-12-11 23:53:31.202641
Analysis finished	2023-12-11 23:53:32.648628
Duration	1.45 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

코드
Text

UNIQUE

Distinct	417
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	3.4 KiB

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Characters and Unicode

Total characters	2502
Distinct characters	19
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	417 ?
Unique (%)	100.0%

Sample

1st row	ARA001
2nd row	ARA002
3rd row	ARA003
4th row	ARA004
5th row	ARA005

Value	Count	Frequency (%)
ara001	1	0.2%
sta031	1	0.2%
sta107	1	0.2%
sta106	1	0.2%
sta105	1	0.2%
sta104	1	0.2%
sta103	1	0.2%
sta102	1	0.2%
sta101	1	0.2%
sta100	1	0.2%
Other values (407)	407	97.6%

Most occurring characters

Value	Count	Frequency (%)
A	577	23.1%
0	313	12.5%
1	260	10.4%
S	238	9.5%
T	238	9.5%
R	169	6.8%
2	132	5.3%
3	93	3.7%
4	82	3.3%
5	82	3.3%
Other values (9)	318	12.7%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	1251	50.0%
Decimal Number	1251	50.0%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	313	25.0%
1	260	20.8%
2	132	10.6%
3	93	7.4%
4	82	6.6%
5	82	6.6%
6	79	6.3%
7	71	5.7%
8	70	5.6%
9	69	5.5%

Uppercase Letter

Value	Count	Frequency (%)
A	577	46.1%
S	238	19.0%
T	238	19.0%
R	169	13.5%
D	13	1.0%
L	7	0.6%
G	3	0.2%
E	3	0.2%
U	3	0.2%

Most occurring scripts

Value	Count	Frequency (%)
Latin	1251	50.0%
Common	1251	50.0%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	313	25.0%
1	260	20.8%
2	132	10.6%
3	93	7.4%
4	82	6.6%
5	82	6.6%
6	79	6.3%
7	71	5.7%
8	70	5.6%
9	69	5.5%

Latin

Value	Count	Frequency (%)
A	577	46.1%
S	238	19.0%
T	238	19.0%
R	169	13.5%
D	13	1.0%
L	7	0.6%
G	3	0.2%
E	3	0.2%
U	3	0.2%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	2502	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
A	577	23.1%
0	313	12.5%
1	260	10.4%
S	238	9.5%
T	238	9.5%
R	169	6.8%
2	132	5.3%
3	93	3.7%
4	82	3.3%
5	82	3.3%
Other values (9)	318	12.7%

코드1
Categorical

HIGH CORRELATION

Distinct	5
Distinct (%)	1.2%
Missing	0
Missing (%)	0.0%
Memory size	3.4 KiB

STA	238
ARA	166
LAD	7
EDU	3
GRD	3

Length

Max length	3
Median length	3
Mean length	3
Min length	3

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	ARA
2nd row	ARA
3rd row	ARA
4th row	ARA
5th row	ARA

Common Values

Value	Count	Frequency (%)
STA	238	57.1%
ARA	166	39.8%
LAD	7	1.7%
EDU	3	0.7%
GRD	3	0.7%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
sta	238	57.1%
ara	166	39.8%
lad	7	1.7%
edu	3	0.7%
grd	3	0.7%

코드2
Real number (ℝ)

HIGH CORRELATION

Distinct	239
Distinct (%)	57.3%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	101.51079

Minimum	0
Maximum	238
Zeros	1
Zeros (%)	0.2%
Negative	0
Negative (%)	0.0%
Memory size	3.8 KiB

Quantile statistics

Minimum	0
5-th percentile	5
Q1	46
median	98
Q3	150
95-th percentile	217.2
Maximum	238
Range	238
Interquartile range (IQR)	104

Descriptive statistics

Standard deviation	65.113846
Coefficient of variation (CV)	0.64144753
Kurtosis	-0.93913102
Mean	101.51079
Median Absolute Deviation (MAD)	52
Skewness	0.24241985
Sum	42330
Variance	4239.813
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
1	5	1.2%
3	5	1.2%
2	5	1.2%
4	3	0.7%
5	3	0.7%
6	3	0.7%
7	3	0.7%
116	2	0.5%
109	2	0.5%
110	2	0.5%
Other values (229)	384	92.1%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
0	1	0.2%
1	5	1.2%
2	5	1.2%
3	5	1.2%
4	3	0.7%
5	3	0.7%
6	3	0.7%
7	3	0.7%
8	2	0.5%
9	2	0.5%

Value	Count	Frequency (%)
238	1	0.2%
237	1	0.2%
236	1	0.2%
235	1	0.2%
234	1	0.2%
233	1	0.2%
232	1	0.2%
231	1	0.2%
230	1	0.2%
229	1	0.2%

코드명
Text

Distinct	393
Distinct (%)	94.2%
Missing	0
Missing (%)	0.0%
Memory size	3.4 KiB

Length

Max length	14
Median length	13
Mean length	4.2230216
Min length	1

Characters and Unicode

Total characters	1761
Distinct characters	293
Distinct categories	6 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	369 ?
Unique (%)	88.5%

Sample

1st row	간쑤성
2nd row	광둥성
3rd row	광시좡족자치구
4th row	구이저우성
5th row	네이멍구자치구

Value	Count	Frequency (%)
군도	8	1.7%
세인트	5	1.1%
제도	4	0.9%
프랑스	3	0.6%
미국령	3	0.6%
섬	3	0.6%
사모아	2	0.4%
버진아일랜드	2	0.4%
기니	2	0.4%
미얀마	2	0.4%
Other values (406)	433	92.7%

Most occurring characters

Value	Count	Frequency (%)
아	86	4.9%
이	64	3.6%
스	62	3.5%
리	51	2.9%
	50	2.8%
현	43	2.4%
라	36	2.0%
도	35	2.0%
시	34	1.9%
나	32	1.8%
Other values (283)	1268	72.0%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	1704	96.8%
Space Separator	50	2.8%
Close Punctuation	2	0.1%
Open Punctuation	2	0.1%
Uppercase Letter	2	0.1%
Other Punctuation	1	0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	86	5.0%
이	64	3.8%
스	62	3.6%
리	51	3.0%
현	43	2.5%
라	36	2.1%
도	35	2.1%
시	34	2.0%
나	32	1.9%
니	31	1.8%
Other values (277)	1230	72.2%

Uppercase Letter

Value	Count	Frequency (%)
C	1	50.0%
D	1	50.0%

Space Separator

Value	Count	Frequency (%)
	50	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	2	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	2	100.0%

Other Punctuation

Value	Count	Frequency (%)
.	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	1704	96.8%
Common	55	3.1%
Latin	2	0.1%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	86	5.0%
이	64	3.8%
스	62	3.6%
리	51	3.0%
현	43	2.5%
라	36	2.1%
도	35	2.1%
시	34	2.0%
나	32	1.9%
니	31	1.8%
Other values (277)	1230	72.2%

Common

Value	Count	Frequency (%)
	50	90.9%
)	2	3.6%
(	2	3.6%
.	1	1.8%

Latin

Value	Count	Frequency (%)
C	1	50.0%
D	1	50.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	1704	96.8%
ASCII	57	3.2%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	86	5.0%
이	64	3.8%
스	62	3.6%
리	51	3.0%
현	43	2.5%
라	36	2.1%
도	35	2.1%
시	34	2.0%
나	32	1.9%
니	31	1.8%
Other values (277)	1230	72.2%

ASCII

Value	Count	Frequency (%)
	50	87.7%
)	2	3.5%
(	2	3.5%
C	1	1.8%
.	1	1.8%
D	1	1.8%

코드명_영어
Text

Distinct	411
Distinct (%)	99.3%
Missing	3
Missing (%)	0.7%
Memory size	3.4 KiB

Length

Max length	38
Median length	29
Mean length	9.7874396
Min length	3

Characters and Unicode

Total characters	4052
Distinct characters	59
Distinct categories	7 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	409 ?
Unique (%)	98.8%

Sample

1st row	Gansusheng
2nd row	Guangdong
3rd row	GuangxiZhuangzu
4th row	Guizhou
5th row	Neimenggu

Value	Count	Frequency (%)
islands	15	2.5%
republic	11	1.8%
and	10	1.7%
of	7	1.2%
united	6	1.0%
saint	4	0.7%
america	3	0.5%
the	3	0.5%
french	3	0.5%
arab	3	0.5%
Other values (490)	531	89.1%

Most occurring characters

Value	Count	Frequency (%)
A	391	9.6%
I	249	6.1%
N	239	5.9%
	197	4.9%
a	195	4.8%
E	184	4.5%
R	158	3.9%
S	158	3.9%
O	140	3.5%
i	140	3.5%
Other values (49)	2001	49.4%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	2635	65.0%
Lowercase Letter	1143	28.2%
Space Separator	197	4.9%
Other Punctuation	65	1.6%
Open Punctuation	5	0.1%
Close Punctuation	5	0.1%
Dash Punctuation	2	< 0.1%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
A	391	14.8%
I	249	9.4%
N	239	9.1%
E	184	7.0%
R	158	6.0%
S	158	6.0%
O	140	5.3%
T	130	4.9%
L	129	4.9%
U	110	4.2%
Other values (16)	747	28.3%

Lowercase Letter

Value	Count	Frequency (%)
a	195	17.1%
i	140	12.2%
n	110	9.6%
o	87	7.6%
e	72	6.3%
s	60	5.2%
g	54	4.7%
h	54	4.7%
r	50	4.4%
t	43	3.8%
Other values (16)	278	24.3%

Other Punctuation

Value	Count	Frequency (%)
,	57	87.7%
.	5	7.7%
'	3	4.6%

Space Separator

Value	Count	Frequency (%)
	197	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	5	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	5	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	2	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	3778	93.2%
Common	274	6.8%

Most frequent character per script

Latin

Value	Count	Frequency (%)
A	391	10.3%
I	249	6.6%
N	239	6.3%
a	195	5.2%
E	184	4.9%
R	158	4.2%
S	158	4.2%
O	140	3.7%
i	140	3.7%
T	130	3.4%
Other values (42)	1794	47.5%

Common

Value	Count	Frequency (%)
	197	71.9%
,	57	20.8%
.	5	1.8%
(	5	1.8%
)	5	1.8%
'	3	1.1%
-	2	0.7%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	4052	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
A	391	9.6%
I	249	6.1%
N	239	5.9%
	197	4.9%
a	195	4.8%
E	184	4.5%
R	158	3.9%
S	158	3.9%
O	140	3.5%
i	140	3.5%
Other values (49)	2001	49.4%

코드명_일본어
Text

MISSING

Distinct	48
Distinct (%)	100.0%
Missing	369
Missing (%)	88.5%
Memory size	3.4 KiB

Length

Max length	4
Median length	3
Mean length	3
Min length	1

Characters and Unicode

Total characters	144
Distinct characters	78
Distinct categories	2 ?
Distinct scripts	3 ?
Distinct blocks	4 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	48 ?
Unique (%)	100.0%

Sample

1st row	香川현
2nd row	鹿兒島
3rd row	神奈川현
4th row	高知현
5th row	京都府

Value	Count	Frequency (%)
香川현	1	2.1%
沖승현	1	2.1%
愛知현	1	2.1%
秋田현	1	2.1%
山形현	1	2.1%
山口현	1	2.1%
山梨현	1	2.1%
愛媛현	1	2.1%
大阪府	1	2.1%
大分縣	1	2.1%
Other values (37)	37	78.7%

Most occurring characters

Value	Count	Frequency (%)
현	41	28.5%
山	6	4.2%
島	5	3.5%
岡	3	2.1%
福	3	2.1%
川	3	2.1%
府	2	1.4%
大	2	1.4%
崎	2	1.4%
長	2	1.4%
Other values (68)	75	52.1%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	143	99.3%
Space Separator	1	0.7%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
현	41	28.7%
山	6	4.2%
島	5	3.5%
岡	3	2.1%
福	3	2.1%
川	3	2.1%
府	2	1.4%
大	2	1.4%
崎	2	1.4%
長	2	1.4%
Other values (67)	74	51.7%

Space Separator

Value	Count	Frequency (%)
	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Han	97	67.4%
Hangul	46	31.9%
Common	1	0.7%

Most frequent character per script

Han

Value	Count	Frequency (%)
山	6	6.2%
島	5	5.2%
岡	3	3.1%
福	3	3.1%
川	3	3.1%
府	2	2.1%
大	2	2.1%
崎	2	2.1%
長	2	2.1%
宮	2	2.1%
Other values (61)	67	69.1%

Hangul

Value	Count	Frequency (%)
현	41	89.1%
승	1	2.2%
광	1	2.2%
청	1	2.2%
정	1	2.2%
회	1	2.2%

Common

Value	Count	Frequency (%)
	1	100.0%

Most occurring blocks

Value	Count	Frequency (%)
CJK	95	66.0%
Hangul	46	31.9%
CJK Compat Ideographs	2	1.4%
ASCII	1	0.7%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
현	41	89.1%
승	1	2.2%
광	1	2.2%
청	1	2.2%
정	1	2.2%
회	1	2.2%

CJK

Value	Count	Frequency (%)
山	6	6.3%
島	5	5.3%
岡	3	3.2%
福	3	3.2%
川	3	3.2%
府	2	2.1%
大	2	2.1%
崎	2	2.1%
長	2	2.1%
宮	2	2.1%
Other values (59)	65	68.4%

CJK Compat Ideographs

Value	Count	Frequency (%)
奈	1	50.0%
良	1	50.0%

ASCII

Value	Count	Frequency (%)
	1	100.0%

코드명_중국어
Text

MISSING

Distinct	33
Distinct (%)	100.0%
Missing	384
Missing (%)	92.1%
Memory size	3.4 KiB

Length

Max length	8
Median length	7
Mean length	3.6060606
Min length	2

Characters and Unicode

Total characters	119
Distinct characters	59
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	33 ?
Unique (%)	100.0%

Sample

1st row	甘肅省
2nd row	廣東省
3rd row	廣西壯族自治區
4th row	貴州省
5th row	內蒙古自治區

Value	Count	Frequency (%)
山東省	1	3.0%
江蘇省	1	3.0%
浙江省	1	3.0%
吉林省	1	3.0%
重慶	1	3.0%
靑海省	1	3.0%
天津	1	3.0%
西藏自治區	1	3.0%
海南省	1	3.0%
廣東省	1	3.0%
Other values (23)	23	69.7%

Most occurring characters

Value	Count	Frequency (%)
省	22	18.5%
	7	5.9%
自	5	4.2%
治	5	4.2%
區	5	4.2%
西	5	4.2%
江	4	3.4%
南	4	3.4%
海	3	2.5%
北	3	2.5%
Other values (49)	56	47.1%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	112	94.1%
Space Separator	7	5.9%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
省	22	19.6%
自	5	4.5%
治	5	4.5%
區	5	4.5%
西	5	4.5%
江	4	3.6%
南	4	3.6%
海	3	2.7%
北	3	2.7%
山	2	1.8%
Other values (48)	54	48.2%

Space Separator

Value	Count	Frequency (%)
	7	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Han	112	94.1%
Common	7	5.9%

Most frequent character per script

Han

Value	Count	Frequency (%)
省	22	19.6%
自	5	4.5%
治	5	4.5%
區	5	4.5%
西	5	4.5%
江	4	3.6%
南	4	3.6%
海	3	2.7%
北	3	2.7%
山	2	1.8%
Other values (48)	54	48.2%

Common

Value	Count	Frequency (%)
	7	100.0%

Most occurring blocks

Value	Count	Frequency (%)
CJK	112	94.1%
ASCII	7	5.9%

Most frequent character per block

CJK

Value	Count	Frequency (%)
省	22	19.6%
自	5	4.5%
治	5	4.5%
區	5	4.5%
西	5	4.5%
江	4	3.6%
南	4	3.6%
海	3	2.7%
北	3	2.7%
山	2	1.8%
Other values (48)	54	48.2%

ASCII

Value	Count	Frequency (%)
	7	100.0%

코드약어
Text

MISSING

Distinct	238
Distinct (%)	99.2%
Missing	177
Missing (%)	42.4%
Memory size	3.4 KiB

Length

Max length	2
Median length	2
Mean length	2
Min length	2

Characters and Unicode

Total characters	480
Distinct characters	26
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	236 ?
Unique (%)	98.3%

Sample

1st row	ES
2nd row	MS
3rd row	HS
4th row	XX
5th row	GH

Value	Count	Frequency (%)
es	2	0.8%
ms	2	0.8%
is	1	0.4%
iq	1	0.4%
hu	1	0.4%
ir	1	0.4%
au	1	0.4%
at	1	0.4%
hn	1	0.4%
jo	1	0.4%
Other values (228)	228	95.0%

Most occurring characters

Value	Count	Frequency (%)
M	37	7.7%
S	32	6.7%
G	29	6.0%
T	28	5.8%
A	27	5.6%
C	26	5.4%
N	24	5.0%
B	23	4.8%
E	20	4.2%
I	19	4.0%
Other values (16)	215	44.8%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	480	100.0%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
M	37	7.7%
S	32	6.7%
G	29	6.0%
T	28	5.8%
A	27	5.6%
C	26	5.4%
N	24	5.0%
B	23	4.8%
E	20	4.2%
I	19	4.0%
Other values (16)	215	44.8%

Most occurring scripts

Value	Count	Frequency (%)
Latin	480	100.0%

Most frequent character per script

Latin

Value	Count	Frequency (%)
M	37	7.7%
S	32	6.7%
G	29	6.0%
T	28	5.8%
A	27	5.6%
C	26	5.4%
N	24	5.0%
B	23	4.8%
E	20	4.2%
I	19	4.0%
Other values (16)	215	44.8%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	480	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
M	37	7.7%
S	32	6.7%
G	29	6.0%
T	28	5.8%
A	27	5.6%
C	26	5.4%
N	24	5.0%
B	23	4.8%
E	20	4.2%
I	19	4.0%
Other values (16)	215	44.8%

상위코드
Categorical

HIGH CORRELATION

Distinct	7
Distinct (%)	1.7%
Missing	0
Missing (%)	0.0%
Memory size	3.4 KiB

<NA>	251
america	53
japan	47
china	34
russia	14
Other values (2)	18

Length

Max length	7
Median length	4
Mean length	4.676259
Min length	4

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	china
2nd row	china
3rd row	china
4th row	china
5th row	china

Common Values

Value	Count	Frequency (%)
<NA>	251	60.2%
america	53	12.7%
japan	47	11.3%
china	34	8.2%
russia	14	3.4%
asia	11	2.6%
europe	7	1.7%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
na	251	60.2%
america	53	12.7%
japan	47	11.3%
china	34	8.2%
russia	14	3.4%
asia	11	2.6%
europe	7	1.7%

구분A
Categorical

HIGH CORRELATION

Distinct	27
Distinct (%)	6.5%
Missing	0
Missing (%)	0.0%
Memory size	3.4 KiB

0	214
3	52
4	48
1	35
2	15
Other values (22)	53

Length

Max length	4
Median length	1
Mean length	1.0935252
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	1
2nd row	1
3rd row	1
4th row	1
5th row	1

Common Values

Value	Count	Frequency (%)
0	214	51.3%
3	52	12.5%
4	48	11.5%
1	35	8.4%
2	15	3.6%
	10	2.4%
<NA>	3	0.7%
6	2	0.5%
7	2	0.5%
8	2	0.5%
Other values (17)	34	8.2%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
0	214	52.6%
3	52	12.8%
4	48	11.8%
1	35	8.6%
2	15	3.7%
na	3	0.7%
16	2	0.5%
5	2	0.5%
17	2	0.5%
24	2	0.5%
Other values (16)	32	7.9%

구분B
Text

MISSING

Distinct	52
Distinct (%)	30.4%
Missing	246
Missing (%)	59.0%
Memory size	3.4 KiB

Length

Max length	2
Median length	2
Mean length	1.6432749
Min length	1

Characters and Unicode

Total characters	281
Distinct characters	11
Distinct categories	2 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	4 ?
Unique (%)	2.3%

Sample

1st row	1
2nd row	2
3rd row	3
4th row	4
5th row	5

Value	Count	Frequency (%)
1	24	14.5%
9	4	2.4%
4	4	2.4%
11	4	2.4%
5	4	2.4%
3	4	2.4%
13	4	2.4%
2	4	2.4%
14	4	2.4%
10	4	2.4%
Other values (41)	106	63.9%

Most occurring characters

Value	Count	Frequency (%)
1	72	25.6%
2	46	16.4%
3	41	14.6%
4	34	12.1%
5	16	5.7%
6	14	5.0%
7	14	5.0%
8	13	4.6%
0	13	4.6%
9	13	4.6%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	276	98.2%
Space Separator	5	1.8%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
1	72	26.1%
2	46	16.7%
3	41	14.9%
4	34	12.3%
5	16	5.8%
6	14	5.1%
7	14	5.1%
8	13	4.7%
0	13	4.7%
9	13	4.7%

Space Separator

Value	Count	Frequency (%)
	5	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	281	100.0%

Most frequent character per script

Common

Value	Count	Frequency (%)
1	72	25.6%
2	46	16.4%
3	41	14.6%
4	34	12.1%
5	16	5.7%
6	14	5.0%
7	14	5.0%
8	13	4.6%
0	13	4.6%
9	13	4.6%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	281	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
1	72	25.6%
2	46	16.4%
3	41	14.6%
4	34	12.1%
5	16	5.7%
6	14	5.0%
7	14	5.0%
8	13	4.6%
0	13	4.6%
9	13	4.6%

비고
Categorical

HIGH CORRELATION

Distinct	5
Distinct (%)	1.2%
Missing	0
Missing (%)	0.0%
Memory size	3.4 KiB

국가별 코드	238
국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시	166
대륙명	7
교과서 분류코드	3
등급번호 ( 사적지 고유번호 생성 시 등급 번호에 해당 )	3

Length

Max length	72
Median length	6
Mean length	32.42446
Min length	3

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시
2nd row	국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시
3rd row	국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시
4th row	국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시
5th row	국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시

Common Values

Value	Count	Frequency (%)
국가별 코드	238	57.1%
국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시	166	39.8%
대륙명	7	1.7%
교과서 분류코드	3	0.7%
등급번호 ( 사적지 고유번호 생성 시 등급 번호에 해당 )	3	0.7%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
국가별	404	14.2%
해당	335	11.8%
코드	238	8.4%
	172	6.0%
같은	166	5.8%
표시	166	5.8%
지역	166	5.8%
경우	166	5.8%
목록을	166	5.8%
항목이	166	5.8%
Other values (14)	698	24.6%

사용구분
Boolean

HIGH CORRELATION

Distinct	2
Distinct (%)	0.5%
Missing	0
Missing (%)	0.0%
Memory size	549.0 B

False	215
True	202

Common Values (Table)
Common Values (Plot)

Value	Count	Frequency (%)
False	215	51.6%
True	202	48.4%

등록일
Categorical

CONSTANT

Distinct	1
Distinct (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	3.4 KiB

2015-11-23	417

Length

Max length	10
Median length	10
Mean length	10
Min length	10

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	2015-11-23
2nd row	2015-11-23
3rd row	2015-11-23
4th row	2015-11-23
5th row	2015-11-23

Common Values

Value	Count	Frequency (%)
2015-11-23	417	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
2015-11-23	417	100.0%

수정일
Date

MISSING

Distinct	2
Distinct (%)	66.7%
Missing	414
Missing (%)	99.3%
Memory size	3.4 KiB

Minimum	2015-12-08 00:00:00
Maximum	2017-10-12 00:00:00

Histogram

Histogram with fixed size bins (bins=2)

코드2

코드2

Heatmap
Table

	코드1	코드2	코드명_일본어	코드명_중국어	상위코드	구분A	구분B	비고	사용구분	수정일
코드1	1.000	0.547	NaN	NaN	NaN	0.927	1.000	1.000	0.750	0.000
코드2	0.547	1.000	1.000	1.000	0.871	0.753	0.410	0.547	0.381	1.000
코드명_일본어	NaN	1.000	1.000	NaN	1.000	1.000	1.000	NaN	NaN	NaN
코드명_중국어	NaN	1.000	NaN	1.000	NaN	NaN	1.000	NaN	NaN	NaN
상위코드	NaN	0.871	1.000	NaN	1.000	1.000	0.000	NaN	NaN	NaN
구분A	0.927	0.753	1.000	NaN	1.000	1.000	0.000	0.927	1.000	0.000
구분B	1.000	0.410	1.000	1.000	0.000	0.000	1.000	1.000	0.000	NaN
비고	1.000	0.547	NaN	NaN	NaN	0.927	1.000	1.000	0.750	0.000
사용구분	0.750	0.381	NaN	NaN	NaN	1.000	0.000	0.750	1.000	0.000
수정일	0.000	1.000	NaN	NaN	NaN	0.000	NaN	0.000	0.000	1.000

Heatmap
Table

	비고	코드1	구분A	사용구분	상위코드
비고	1.000	1.000	0.750	0.881	1.000
코드1	1.000	1.000	0.750	0.881	1.000
구분A	0.750	0.750	1.000	0.966	0.942
사용구분	0.881	0.881	0.966	1.000	1.000
상위코드	1.000	1.000	0.942	1.000	1.000

Heatmap
Table

	코드2	코드1	상위코드	구분A	비고	사용구분
코드2	1.000	0.256	0.742	0.382	0.256	0.290
코드1	0.256	1.000	1.000	0.750	1.000	0.881
상위코드	0.742	1.000	1.000	0.942	1.000	1.000
구분A	0.382	0.750	0.942	1.000	0.750	0.966
비고	0.256	1.000	1.000	0.750	1.000	0.881
사용구분	0.290	0.881	1.000	0.966	0.881	1.000

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

First rows
Last rows

	코드	코드1	코드2	코드명	코드명_영어	코드명_일본어	코드명_중국어	코드약어	상위코드	구분A	구분B	비고	사용구분	등록일	수정일
0	ARA001	ARA	1	간쑤성	Gansusheng	<NA>	甘肅省	<NA>	china	1	1	국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시	Y	2015-11-23	<NA>
1	ARA002	ARA	2	광둥성	Guangdong	<NA>	廣東省	<NA>	china	1	2	국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시	Y	2015-11-23	<NA>
2	ARA003	ARA	3	광시좡족자치구	GuangxiZhuangzu	<NA>	廣西壯族自治區	<NA>	china	1	3	국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시	Y	2015-11-23	<NA>
3	ARA004	ARA	4	구이저우성	Guizhou	<NA>	貴州省	<NA>	china	1	4	국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시	Y	2015-11-23	<NA>
4	ARA005	ARA	5	네이멍구자치구	Neimenggu	<NA>	內蒙古自治區	<NA>	china	1	5	국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시	Y	2015-11-23	<NA>
5	ARA006	ARA	6	닝샤후이족자치구	Ningxia	<NA>	寧夏回族自治區	<NA>	china	1	6	국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시	Y	2015-11-23	<NA>
6	ARA007	ARA	7	랴오닝성	Liaoning	<NA>	遼寧省	<NA>	china	1	7	국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시	Y	2015-11-23	<NA>
7	ARA008	ARA	8	마카오	Macau	<NA>	澳門	<NA>	china	1	8	국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시	Y	2015-11-23	<NA>
8	ARA009	ARA	9	베이징	Beijing	<NA>	北京	<NA>	china	1	9	국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시	Y	2015-11-23	<NA>
9	ARA010	ARA	10	산둥성	Shandong	<NA>	山東省	<NA>	china	1	10	국가별 지역목록입니다보조항목1 : 코드관리의 국가(카테고리:STA)에서 보조항목1과 해당 항목이 같은 경우 해당 지역 목록을 표시	Y	2015-11-23	<NA>

	코드	코드1	코드2	코드명	코드명_영어	코드명_일본어	코드명_중국어	코드약어	상위코드	구분A	구분B	비고	사용구분	등록일	수정일
407	STA229	STA	229	프랑스 남부 지역	FRENCH SOUTHERN TERRITORIES	<NA>	<NA>	TF	<NA>	0	<NA>	국가별 코드	N	2015-11-23	<NA>
408	STA230	STA	230	프랑스령 기아나	FRENCH GUIANA	<NA>	<NA>	GF	<NA>	0	<NA>	국가별 코드	N	2015-11-23	<NA>
409	STA231	STA	231	프랑스령 폴리네시아	FRENCH POLYNESIA	<NA>	<NA>	PF	<NA>	0	<NA>	국가별 코드	N	2015-11-23	<NA>
410	STA232	STA	232	피지	FIJI	<NA>	<NA>	FJ	<NA>	0	<NA>	국가별 코드	N	2015-11-23	<NA>
411	STA233	STA	233	핀란드	FINLAND	<NA>	<NA>	FI	<NA>	0	<NA>	국가별 코드	N	2015-11-23	<NA>
412	STA234	STA	234	필리핀	PHILIPPINES	<NA>	<NA>	PH	<NA>	12	<NA>	국가별 코드	Y	2015-11-23	<NA>
413	STA235	STA	235	핏케언 군도	PITCAIRN	<NA>	<NA>	PN	<NA>	0	<NA>	국가별 코드	N	2015-11-23	<NA>
414	STA236	STA	236	허드 섬 및 맥도날드 군도	HEARD AND MC DONALD ISLANDS	<NA>	<NA>	HM	<NA>	0	<NA>	국가별 코드	N	2015-11-23	<NA>
415	STA237	STA	237	헝가리	HUNGARY	<NA>	<NA>	HU	<NA>	0	<NA>	국가별 코드	N	2015-11-23	<NA>
416	STA238	STA	238	홍콩	HONG KONG	<NA>	<NA>	HK	<NA>	0	<NA>	국가별 코드	N	2015-11-23	2015-12-08

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Space Separator

Close Punctuation

Open Punctuation

Other Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Latin

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Uppercase Letter

Lowercase Letter

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Space Separator

Most occurring scripts

Most frequent character per script

Han

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

CJK

CJK Compat Ideographs

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Space Separator

Most occurring scripts

Most frequent character per script

Han

Common

Most occurring blocks

Most frequent character per block

CJK