gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	527
Missing cells	387
Missing cells (%)	18.4%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	17.1 KiB
Average record size in memory	33.3 B

Variable types

Numeric	1
Text	3

Dataset

Description	대구광역시 남구 자체 문자를 발송하는 알리미시스템의 코드에 대한 데이터로 주코드, 부코드, 코드명, 코드설명 등의 항목을 제공합니다.
Author	대구광역시 남구
URL	https://www.data.go.kr/data/15089399/fileData.do

Alerts

`부코드` has 37 (7.0%) missing values	Missing
`코드설명` has 350 (66.4%) missing values	Missing
`주코드` has 39 (7.4%) zeros	Zeros

Reproduction

Analysis started	2023-12-12 10:26:48.179876
Analysis finished	2023-12-12 10:26:48.770126
Duration	0.59 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

주코드
Real number (ℝ)

ZEROS

Distinct	36
Distinct (%)	6.8%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	53.98482

Minimum	0
Maximum	210
Zeros	39
Zeros (%)	7.4%
Negative	0
Negative (%)	0.0%
Memory size	4.8 KiB

Quantile statistics

Minimum	0
5-th percentile	0
Q1	14
median	24
Q3	102
95-th percentile	107
Maximum	210
Range	210
Interquartile range (IQR)	88

Descriptive statistics

Standard deviation	45.858931
Coefficient of variation (CV)	0.84947827
Kurtosis	-1.7030688
Mean	53.98482
Median Absolute Deviation (MAD)	24
Skewness	0.19142761
Sum	28450
Variance	2103.0416
Monotonicity	Increasing

Histogram with fixed size bins (bins=36)

Value	Count	Frequency (%)
102	106	20.1%
18	95	18.0%
0	39	7.4%
101	37	7.0%
107	37	7.0%
104	37	7.0%
5	25	4.7%
103	15	2.8%
12	13	2.5%
6	13	2.5%
Other values (26)	110	20.9%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
0	39	7.4%
1	4	0.8%
2	5	0.9%
3	4	0.8%
4	7	1.3%
5	25	4.7%
6	13	2.5%
8	4	0.8%
9	3	0.6%
10	5	0.9%

Value	Count	Frequency (%)
210	1	0.2%
109	1	0.2%
108	1	0.2%
107	37	7.0%
106	1	0.2%
105	3	0.6%
104	37	7.0%
103	15	2.8%
102	106	20.1%
101	37	7.0%

부코드
Text

MISSING

Distinct	331
Distinct (%)	67.6%
Missing	37
Missing (%)	7.0%
Memory size	4.2 KiB

Length

Max length	11
Median length	9
Mean length	4.1959184
Min length	1

Characters and Unicode

Total characters	2056
Distinct characters	54
Distinct categories	4 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	258 ?
Unique (%)	52.7%

Sample

1st row	1
2nd row	3
3rd row	4
4th row	5
5th row	7

Value	Count	Frequency (%)
1	24	4.9%
2	18	3.7%
3	15	3.1%
4	9	1.8%
5	8	1.6%
0	8	1.6%
7	5	1.0%
10	5	1.0%
8	4	0.8%
9	4	0.8%
Other values (313)	390	79.6%

Most occurring characters

Value	Count	Frequency (%)
0	528	25.7%
4	288	14.0%
3	224	10.9%
1	186	9.0%
2	150	7.3%
6	140	6.8%
9	87	4.2%
7	80	3.9%
5	60	2.9%
8	56	2.7%
Other values (44)	257	12.5%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	1799	87.5%
Uppercase Letter	184	8.9%
Lowercase Letter	70	3.4%
Dash Punctuation	3	0.1%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
e	10	14.3%
c	10	14.3%
t	10	14.3%
o	7	10.0%
j	7	10.0%
b	7	10.0%
z	1	1.4%
y	1	1.4%
x	1	1.4%
w	1	1.4%
Other values (15)	15	21.4%

Uppercase Letter

Value	Count	Frequency (%)
N	37	20.1%
S	36	19.6%
M	24	13.0%
C	21	11.4%
A	14	7.6%
F	8	4.3%
P	6	3.3%
T	6	3.3%
E	6	3.3%
D	5	2.7%
Other values (8)	21	11.4%

Decimal Number

Value	Count	Frequency (%)
0	528	29.3%
4	288	16.0%
3	224	12.5%
1	186	10.3%
2	150	8.3%
6	140	7.8%
9	87	4.8%
7	80	4.4%
5	60	3.3%
8	56	3.1%

Dash Punctuation

Value	Count	Frequency (%)
-	3	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	1802	87.6%
Latin	254	12.4%

Most frequent character per script

Latin

Value	Count	Frequency (%)
N	37	14.6%
S	36	14.2%
M	24	9.4%
C	21	8.3%
A	14	5.5%
e	10	3.9%
c	10	3.9%
t	10	3.9%
F	8	3.1%
o	7	2.8%
Other values (33)	77	30.3%

Common

Value	Count	Frequency (%)
0	528	29.3%
4	288	16.0%
3	224	12.4%
1	186	10.3%
2	150	8.3%
6	140	7.8%
9	87	4.8%
7	80	4.4%
5	60	3.3%
8	56	3.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	2056	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	528	25.7%
4	288	14.0%
3	224	10.9%
1	186	9.0%
2	150	7.3%
6	140	6.8%
9	87	4.2%
7	80	3.9%
5	60	2.9%
8	56	2.7%
Other values (44)	257	12.5%

코드명
Text

Distinct	387
Distinct (%)	73.4%
Missing	0
Missing (%)	0.0%
Memory size	4.2 KiB

Length

Max length	36
Median length	29
Mean length	7.5616698
Min length	1

Characters and Unicode

Total characters	3985
Distinct characters	298
Distinct categories	11 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	336 ?
Unique (%)	63.8%

Sample

1st row	시스템
2nd row	민원정보과
3rd row	20111100000000
4th row	20080100000000
5th row	1

Value	Count	Frequency (%)
true	23	3.1%
이통사	20	2.7%
메시지	13	1.8%
false	12	1.6%
단말기	10	1.4%
전송	8	1.1%
오류	7	1.0%
536642212	6	0.8%
536642514	6	0.8%
536642943	6	0.8%
Other values (455)	624	84.9%

Most occurring characters

Value	Count	Frequency (%)
6	223	5.6%
	210	5.3%
3	210	5.3%
5	163	4.1%
0	150	3.8%
4	149	3.7%
지	146	3.7%
1	111	2.8%
방	110	2.8%
2	107	2.7%
Other values (288)	2406	60.4%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	1969	49.4%
Decimal Number	1175	29.5%
Uppercase Letter	341	8.6%
Lowercase Letter	211	5.3%
Space Separator	210	5.3%
Close Punctuation	25	0.6%
Open Punctuation	25	0.6%
Other Punctuation	19	0.5%
Dash Punctuation	6	0.2%
Connector Punctuation	2	0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
지	146	7.4%
방	110	5.6%
사	94	4.8%
기	86	4.4%
시	80	4.1%
주	50	2.5%
보	48	2.4%
서	47	2.4%
전	36	1.8%
이	35	1.8%
Other values (221)	1237	62.8%

Uppercase Letter

Value	Count	Frequency (%)
E	55	16.1%
T	37	10.9%
R	35	10.3%
S	35	10.3%
U	31	9.1%
A	22	6.5%
M	20	5.9%
L	16	4.7%
F	16	4.7%
C	13	3.8%
Other values (13)	61	17.9%

Lowercase Letter

Value	Count	Frequency (%)
e	28	13.3%
o	25	11.8%
c	17	8.1%
a	17	8.1%
r	16	7.6%
l	16	7.6%
m	15	7.1%
n	13	6.2%
u	11	5.2%
s	9	4.3%
Other values (13)	44	20.9%

Decimal Number

Value	Count	Frequency (%)
6	223	19.0%
3	210	17.9%
5	163	13.9%
0	150	12.8%
4	149	12.7%
1	111	9.4%
2	107	9.1%
7	24	2.0%
8	22	1.9%
9	16	1.4%

Other Punctuation

Value	Count	Frequency (%)
.	15	78.9%
/	2	10.5%
,	1	5.3%
;	1	5.3%

Math Symbol

Value	Count	Frequency (%)
+	1	50.0%
>	1	50.0%

Space Separator

Value	Count	Frequency (%)
	210	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	25	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	25	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	6	100.0%

Connector Punctuation

Value	Count	Frequency (%)
_	2	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	1969	49.4%
Common	1464	36.7%
Latin	552	13.9%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
지	146	7.4%
방	110	5.6%
사	94	4.8%
기	86	4.4%
시	80	4.1%
주	50	2.5%
보	48	2.4%
서	47	2.4%
전	36	1.8%
이	35	1.8%
Other values (221)	1237	62.8%

Latin

Value	Count	Frequency (%)
E	55	10.0%
T	37	6.7%
R	35	6.3%
S	35	6.3%
U	31	5.6%
e	28	5.1%
o	25	4.5%
A	22	4.0%
M	20	3.6%
c	17	3.1%
Other values (36)	247	44.7%

Common

Value	Count	Frequency (%)
6	223	15.2%
	210	14.3%
3	210	14.3%
5	163	11.1%
0	150	10.2%
4	149	10.2%
1	111	7.6%
2	107	7.3%
)	25	1.7%
(	25	1.7%
Other values (11)	91	6.2%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	2016	50.6%
Hangul	1969	49.4%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
6	223	11.1%
	210	10.4%
3	210	10.4%
5	163	8.1%
0	150	7.4%
4	149	7.4%
1	111	5.5%
2	107	5.3%
E	55	2.7%
T	37	1.8%
Other values (57)	601	29.8%

Hangul

Value	Count	Frequency (%)
지	146	7.4%
방	110	5.6%
사	94	4.8%
기	86	4.4%
시	80	4.1%
주	50	2.5%
보	48	2.4%
서	47	2.4%
전	36	1.8%
이	35	1.8%
Other values (221)	1237	62.8%

코드설명
Text

MISSING

Distinct	108
Distinct (%)	61.0%
Missing	350
Missing (%)	66.4%
Memory size	4.2 KiB

Length

Max length	29
Median length	27
Mean length	7.3446328
Min length	2

Characters and Unicode

Total characters	1300
Distinct characters	193
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	63 ?
Unique (%)	35.6%

Sample

1st row	관리부서
2nd row	민원행정
3rd row	새올-인터넷민원용
4th row	암호 초기값
5th row	관리자가 부서업무 문안 보임

Value	Count	Frequency (%)
표시여부	7	2.6%
민원행정	7	2.6%
사용여부	4	1.5%
대명3동	3	1.1%
대명9동	3	1.1%
대명6동	3	1.1%
대명5동	3	1.1%
statistics_visible	3	1.1%
민원행정과	3	1.1%
조직도	3	1.1%
Other values (144)	228	85.4%

Most occurring characters

Value	Count	Frequency (%)
	90	6.9%
과	48	3.7%
동	39	3.0%
대	38	2.9%
원	34	2.6%
시	33	2.5%
정	32	2.5%
민	30	2.3%
부	29	2.2%
지	28	2.2%
Other values (183)	899	69.2%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	1064	81.8%
Space Separator	90	6.9%
Uppercase Letter	73	5.6%
Decimal Number	42	3.2%
Lowercase Letter	19	1.5%
Dash Punctuation	3	0.2%
Connector Punctuation	3	0.2%
Close Punctuation	2	0.2%
Open Punctuation	2	0.2%
Other Punctuation	2	0.2%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
과	48	4.5%
동	39	3.7%
대	38	3.6%
원	34	3.2%
시	33	3.1%
정	32	3.0%
민	30	2.8%
부	29	2.7%
지	28	2.6%
명	27	2.5%
Other values (147)	726	68.2%

Uppercase Letter

Value	Count	Frequency (%)
S	20	27.4%
I	12	16.4%
T	9	12.3%
M	8	11.0%
A	5	6.8%
B	4	5.5%
E	3	4.1%
L	3	4.1%
V	3	4.1%
C	3	4.1%
Other values (2)	3	4.1%

Lowercase Letter

Value	Count	Frequency (%)
e	5	26.3%
t	4	21.1%
y	3	15.8%
b	2	10.5%
u	1	5.3%
r	1	5.3%
g	1	5.3%
a	1	5.3%
p	1	5.3%

Decimal Number

Value	Count	Frequency (%)
1	15	35.7%
2	6	14.3%
3	6	14.3%
0	3	7.1%
5	3	7.1%
6	3	7.1%
9	3	7.1%
4	3	7.1%

Other Punctuation

Value	Count	Frequency (%)
/	1	50.0%
:	1	50.0%

Space Separator

Value	Count	Frequency (%)
	90	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	3	100.0%

Connector Punctuation

Value	Count	Frequency (%)
_	3	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	2	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	2	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	1064	81.8%
Common	144	11.1%
Latin	92	7.1%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
과	48	4.5%
동	39	3.7%
대	38	3.6%
원	34	3.2%
시	33	3.1%
정	32	3.0%
민	30	2.8%
부	29	2.7%
지	28	2.6%
명	27	2.5%
Other values (147)	726	68.2%

Latin

Value	Count	Frequency (%)
S	20	21.7%
I	12	13.0%
T	9	9.8%
M	8	8.7%
e	5	5.4%
A	5	5.4%
B	4	4.3%
t	4	4.3%
y	3	3.3%
E	3	3.3%
Other values (11)	19	20.7%

Common

Value	Count	Frequency (%)
	90	62.5%
1	15	10.4%
2	6	4.2%
3	6	4.2%
-	3	2.1%
0	3	2.1%
_	3	2.1%
5	3	2.1%
6	3	2.1%
9	3	2.1%
Other values (5)	9	6.2%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	1064	81.8%
ASCII	236	18.2%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	90	38.1%
S	20	8.5%
1	15	6.4%
I	12	5.1%
T	9	3.8%
M	8	3.4%
2	6	2.5%
3	6	2.5%
e	5	2.1%
A	5	2.1%
Other values (26)	60	25.4%

Hangul

Value	Count	Frequency (%)
과	48	4.5%
동	39	3.7%
대	38	3.6%
원	34	3.2%
시	33	3.1%
정	32	3.0%
민	30	2.8%
부	29	2.7%
지	28	2.6%
명	27	2.5%
Other values (147)	726	68.2%

주코드

주코드

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

First rows
Last rows

	부코드	코드명	코드설명
0	<NA>	시스템	<NA>
1	1	민원정보과	관리부서
2	3	20111100000000	민원행정
3	4	20080100000000	새올-인터넷민원용
4	5	1	암호 초기값
5	7	Y	관리자가 부서업무 문안 보임
6	8	53	지역번호
7	9	664	국번
8	10	TRUE	개별업무 회신번호 수정가능여부
9	11	TRUE	부서업무 회신번호 수정가능여부

	주코드	부코드	코드명	코드설명
517	107	3440069	536642314	민원행정과
518	107	3440070	536642514	주민생활지원국
519	107	3440071	536642514	주민생활지원과
520	107	3440072	536642501	복지지원과
521	107	3440073	536642641	지역경제과
522	107	3440074	536642714	환경관리과
523	107	3440075	536642753	위생과
524	108	<NA>	발송현황-자동문안삭제	<NA>
525	109	<NA>	세정관련 교체코드	<NA>
526	210	<NA>	민원행정 발송금지 부서	<NA>

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Uppercase Letter

Decimal Number

Dash Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Math Symbol

Space Separator

Close Punctuation

Open Punctuation

Dash Punctuation

Connector Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Dash Punctuation

Connector Punctuation

Close Punctuation

Open Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Interactions

Missing values

Sample