Overview

Dataset statistics

Number of variables3
Number of observations400
Missing cells0
Missing cells (%)0.0%
Duplicate rows66
Duplicate rows (%)16.5%
Total size in memory10.3 KiB
Average record size in memory26.3 B

Variable types

Categorical1
Text1
Numeric1

Dataset

Description이어드림 스쿨의 나이별 선별현황 관련데이터로 이어드림 스쿨 관련 정보를 확인할수있습니다(나이기준은 2022년 1월 18일 기준입니다)
URLhttps://www.data.go.kr/data/15103027/fileData.do

Alerts

Dataset has 66 (16.5%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 09:02:38.976748
Analysis finished2023-12-12 09:02:39.329277
Duration0.35 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

입교년도
Categorical

Distinct2
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size3.3 KiB
2022
200 
2023
200 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022
2nd row2022
3rd row2022
4th row2022
5th row2022

Common Values

ValueCountFrequency (%)
2022 200
50.0%
2023 200
50.0%

Length

2023-12-12T18:02:39.408981image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:02:39.551776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2022 200
50.0%
2023 200
50.0%

이름
Text

Distinct56
Distinct (%)14.0%
Missing0
Missing (%)0.0%
Memory size3.3 KiB
2023-12-12T18:02:39.763782image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length3
Mean length3
Min length2

Characters and Unicode

Total characters1200
Distinct characters54
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)4.5%

Sample

1st row강**
2nd row정**
3rd row양**
4th row최**
5th row정**
ValueCountFrequency (%)
63
15.8%
61
15.2%
39
 
9.8%
27
 
6.8%
26
 
6.5%
14
 
3.5%
12
 
3.0%
12
 
3.0%
10
 
2.5%
9
 
2.2%
Other values (43) 127
31.8%
2023-12-12T18:02:40.350152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 799
66.6%
63
 
5.2%
61
 
5.1%
39
 
3.2%
27
 
2.2%
26
 
2.2%
14
 
1.2%
12
 
1.0%
12
 
1.0%
10
 
0.8%
Other values (44) 137
 
11.4%

Most occurring categories

ValueCountFrequency (%)
Other Punctuation 799
66.6%
Other Letter 401
33.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
63
15.7%
61
15.2%
39
 
9.7%
27
 
6.7%
26
 
6.5%
14
 
3.5%
12
 
3.0%
12
 
3.0%
10
 
2.5%
9
 
2.2%
Other values (43) 128
31.9%
Other Punctuation
ValueCountFrequency (%)
* 799
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 799
66.6%
Hangul 401
33.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
63
15.7%
61
15.2%
39
 
9.7%
27
 
6.7%
26
 
6.5%
14
 
3.5%
12
 
3.0%
12
 
3.0%
10
 
2.5%
9
 
2.2%
Other values (43) 128
31.9%
Common
ValueCountFrequency (%)
* 799
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 799
66.6%
Hangul 401
33.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 799
100.0%
Hangul
ValueCountFrequency (%)
63
15.7%
61
15.2%
39
 
9.7%
27
 
6.7%
26
 
6.5%
14
 
3.5%
12
 
3.0%
12
 
3.0%
10
 
2.5%
9
 
2.2%
Other values (43) 128
31.9%

만나이
Real number (ℝ)

Distinct22
Distinct (%)5.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.62
Minimum18
Maximum39
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.6 KiB
2023-12-12T18:02:40.584491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum18
5-th percentile23
Q125
median28
Q332
95-th percentile38
Maximum39
Range21
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.5290043
Coefficient of variation (CV)0.15824613
Kurtosis-0.39747803
Mean28.62
Median Absolute Deviation (MAD)3
Skewness0.5295929
Sum11448
Variance20.51188
MonotonicityNot monotonic
2023-12-12T18:02:40.837603image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
26 51
12.8%
25 41
10.2%
27 34
 
8.5%
24 33
 
8.2%
29 29
 
7.2%
28 28
 
7.0%
31 25
 
6.2%
30 22
 
5.5%
33 21
 
5.2%
23 18
 
4.5%
Other values (12) 98
24.5%
ValueCountFrequency (%)
18 1
 
0.2%
19 2
 
0.5%
20 2
 
0.5%
21 4
 
1.0%
22 9
 
2.2%
23 18
 
4.5%
24 33
8.2%
25 41
10.2%
26 51
12.8%
27 34
8.5%
ValueCountFrequency (%)
39 11
2.8%
38 10
 
2.5%
37 9
 
2.2%
36 10
 
2.5%
35 12
3.0%
34 12
3.0%
33 21
5.2%
32 16
4.0%
31 25
6.2%
30 22
5.5%

Interactions

2023-12-12T18:02:39.082067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T18:02:40.963948image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
입교년도이름만나이
입교년도1.0000.0000.086
이름0.0001.0000.000
만나이0.0860.0001.000
2023-12-12T18:02:41.106820image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
만나이입교년도
만나이1.0000.065
입교년도0.0651.000

Missing values

2023-12-12T18:02:39.196839image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T18:02:39.293608image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

입교년도이름만나이
02022강**25
12022정**37
22022양**28
32022최**33
42022정**23
52022소**31
62022김**26
72022안**27
82022이**28
92022이**30
입교년도이름만나이
3902023최**26
3912023한**36
3922023한**23
3932023한**34
3942023허**33
3952023홍**25
3962023홍**29
3972023황**26
3982023황**26
3992023황**22

Duplicate rows

Most frequently occurring

입교년도이름만나이# duplicates
212022이**266
422023김**306
32022김**255
202022이**255
432023김**315
472023박**275
522023이**245
532023이**265
12022김**234
42022김**264