gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	400
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	66
Duplicate rows (%)	16.5%
Total size in memory	10.3 KiB
Average record size in memory	26.3 B

Variable types

Categorical	1
Text	1
Numeric	1

Dataset

Description	이어드림 스쿨의 나이별 선별현황 관련데이터로 이어드림 스쿨 관련 정보를 확인할수있습니다(나이기준은 2022년 1월 18일 기준입니다)
URL	https://www.data.go.kr/data/15103027/fileData.do

Alerts

Dataset has 66 (16.5%) duplicate rows

Duplicates

Reproduction

Analysis started	2023-12-12 09:02:38.976748
Analysis finished	2023-12-12 09:02:39.329277
Duration	0.35 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

입교년도
Categorical

Distinct	2
Distinct (%)	0.5%
Missing	0
Missing (%)	0.0%
Memory size	3.3 KiB

2022	200
2023	200

Length

Max length	4
Median length	4
Mean length	4
Min length	4

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	2022
2nd row	2022
3rd row	2022
4th row	2022
5th row	2022

Common Values

Value	Count	Frequency (%)
2022	200	50.0%
2023	200	50.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
2022	200	50.0%
2023	200	50.0%

이름
Text

Distinct	56
Distinct (%)	14.0%
Missing	0
Missing (%)	0.0%
Memory size	3.3 KiB

Length

Max length	4
Median length	3
Mean length	3
Min length	2

Characters and Unicode

Total characters	1200
Distinct characters	54
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	18 ?
Unique (%)	4.5%

Sample

1st row	강**
2nd row	정**
3rd row	양**
4th row	최**
5th row	정**

Value	Count	Frequency (%)
김	63	15.8%
이	61	15.2%
박	39	9.8%
최	27	6.8%
정	26	6.5%
조	14	3.5%
강	12	3.0%
신	12	3.0%
윤	10	2.5%
장	9	2.2%
Other values (43)	127	31.8%

Most occurring characters

Value	Count	Frequency (%)
*	799	66.6%
김	63	5.2%
이	61	5.1%
박	39	3.2%
최	27	2.2%
정	26	2.2%
조	14	1.2%
강	12	1.0%
신	12	1.0%
윤	10	0.8%
Other values (44)	137	11.4%

Most occurring categories

Value	Count	Frequency (%)
Other Punctuation	799	66.6%
Other Letter	401	33.4%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
김	63	15.7%
이	61	15.2%
박	39	9.7%
최	27	6.7%
정	26	6.5%
조	14	3.5%
강	12	3.0%
신	12	3.0%
윤	10	2.5%
장	9	2.2%
Other values (43)	128	31.9%

Other Punctuation

Value	Count	Frequency (%)
*	799	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	799	66.6%
Hangul	401	33.4%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
김	63	15.7%
이	61	15.2%
박	39	9.7%
최	27	6.7%
정	26	6.5%
조	14	3.5%
강	12	3.0%
신	12	3.0%
윤	10	2.5%
장	9	2.2%
Other values (43)	128	31.9%

Common

Value	Count	Frequency (%)
*	799	100.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	799	66.6%
Hangul	401	33.4%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
*	799	100.0%

Hangul

Value	Count	Frequency (%)
김	63	15.7%
이	61	15.2%
박	39	9.7%
최	27	6.7%
정	26	6.5%
조	14	3.5%
강	12	3.0%
신	12	3.0%
윤	10	2.5%
장	9	2.2%
Other values (43)	128	31.9%

만나이
Real number (ℝ)

Distinct	22
Distinct (%)	5.5%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	28.62

Minimum	18
Maximum	39
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	3.6 KiB

Quantile statistics

Minimum	18
5-th percentile	23
Q1	25
median	28
Q3	32
95-th percentile	38
Maximum	39
Range	21
Interquartile range (IQR)	7

Descriptive statistics

Standard deviation	4.5290043
Coefficient of variation (CV)	0.15824613
Kurtosis	-0.39747803
Mean	28.62
Median Absolute Deviation (MAD)	3
Skewness	0.5295929
Sum	11448
Variance	20.51188
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=22)

Value	Count	Frequency (%)
26	51	12.8%
25	41	10.2%
27	34	8.5%
24	33	8.2%
29	29	7.2%
28	28	7.0%
31	25	6.2%
30	22	5.5%
33	21	5.2%
23	18	4.5%
Other values (12)	98	24.5%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
18	1	0.2%
19	2	0.5%
20	2	0.5%
21	4	1.0%
22	9	2.2%
23	18	4.5%
24	33	8.2%
25	41	10.2%
26	51	12.8%
27	34	8.5%

Value	Count	Frequency (%)
39	11	2.8%
38	10	2.5%
37	9	2.2%
36	10	2.5%
35	12	3.0%
34	12	3.0%
33	21	5.2%
32	16	4.0%
31	25	6.2%
30	22	5.5%

만나이

만나이

Phik (φk)
Auto

Heatmap
Table

	입교년도	이름	만나이
입교년도	1.000	0.000	0.086
이름	0.000	1.000	0.000
만나이	0.086	0.000	1.000

Heatmap
Table

	만나이	입교년도
만나이	1.000	0.065
입교년도	0.065	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	입교년도	이름	만나이
0	2022	강**	25
1	2022	정**	37
2	2022	양**	28
3	2022	최**	33
4	2022	정**	23
5	2022	소**	31
6	2022	김**	26
7	2022	안**	27
8	2022	이**	28
9	2022	이**	30

	입교년도	이름	만나이
390	2023	최**	26
391	2023	한**	36
392	2023	한**	23
393	2023	한**	34
394	2023	허**	33
395	2023	홍**	25
396	2023	홍**	29
397	2023	황**	26
398	2023	황**	26
399	2023	황**	22

Most frequently occurring

	입교년도	이름	만나이	# duplicates
21	2022	이**	26	6
42	2023	김**	30	6
3	2022	김**	25	5
20	2022	이**	25	5
43	2023	김**	31	5
47	2023	박**	27	5
52	2023	이**	24	5
53	2023	이**	26	5
1	2022	김**	23	4
4	2022	김**	26	4

Overview

Variables

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Other Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Interactions

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring