Overview

Dataset statistics

Number of variables8
Number of observations187
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory12.5 KiB
Average record size in memory68.7 B

Variable types

Categorical5
Numeric2
Text1

Dataset

Description현재 건설중인 새울원자력3,4호기 건설현장 전문인력(용접사) 양성관련 연도 및 차수별 교육 시행 현황이고, 수행년도, 차수, 교육생번호, 이름, 출생년도, 지역, 성별 등을 포함합니다.
URLhttps://www.data.go.kr/data/15115519/fileData.do

Alerts

차수 is highly overall correlated with 비고High correlation
비고 is highly overall correlated with 차수High correlation
성별 is highly imbalanced (79.5%)Imbalance
비고 is highly imbalanced (54.5%)Imbalance

Reproduction

Analysis started2023-12-12 10:49:36.542206
Analysis finished2023-12-12 10:49:38.649469
Duration2.11 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

수행년도
Categorical

Distinct5
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
2021
42 
2020
40 
2018
35 
2019
35 
2022
35 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2018
2nd row2018
3rd row2018
4th row2018
5th row2018

Common Values

ValueCountFrequency (%)
2021 42
22.5%
2020 40
21.4%
2018 35
18.7%
2019 35
18.7%
2022 35
18.7%

Length

2023-12-12T19:49:38.792788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T19:49:39.070710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2021 42
22.5%
2020 40
21.4%
2018 35
18.7%
2019 35
18.7%
2022 35
18.7%

차수
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
1
73 
2
63 
3
51 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 73
39.0%
2 63
33.7%
3 51
27.3%

Length

2023-12-12T19:49:39.335903image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T19:49:39.553371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 73
39.0%
2 63
33.7%
3 51
27.3%

번호
Real number (ℝ)

Distinct25
Distinct (%)13.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.0855615
Minimum1
Maximum25
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.8 KiB
2023-12-12T19:49:39.741723image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q14
median7
Q311
95-th percentile19.7
Maximum25
Range24
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.4539278
Coefficient of variation (CV)0.67452678
Kurtosis0.57477644
Mean8.0855615
Median Absolute Deviation (MAD)3
Skewness0.94485752
Sum1512
Variance29.745328
MonotonicityNot monotonic
2023-12-12T19:49:39.943466image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=25)
ValueCountFrequency (%)
1 14
 
7.5%
3 14
 
7.5%
4 14
 
7.5%
5 14
 
7.5%
6 14
 
7.5%
2 14
 
7.5%
7 13
 
7.0%
8 13
 
7.0%
9 13
 
7.0%
10 13
 
7.0%
Other values (15) 51
27.3%
ValueCountFrequency (%)
1 14
7.5%
2 14
7.5%
3 14
7.5%
4 14
7.5%
5 14
7.5%
6 14
7.5%
7 13
7.0%
8 13
7.0%
9 13
7.0%
10 13
7.0%
ValueCountFrequency (%)
25 1
 
0.5%
24 1
 
0.5%
23 2
1.1%
22 2
1.1%
21 2
1.1%
20 2
1.1%
19 2
1.1%
18 2
1.1%
17 2
1.1%
16 3
1.6%

이름
Text

Distinct140
Distinct (%)74.9%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
2023-12-12T19:49:40.568202image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters561
Distinct characters99
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique108 ?
Unique (%)57.8%

Sample

1st row구*호
2nd row오*성
3rd row이*준
4th row최*훈
5th row이*영
ValueCountFrequency (%)
김*민 6
 
3.2%
김*훈 5
 
2.7%
정*화 3
 
1.6%
김*규 3
 
1.6%
김*중 3
 
1.6%
김*환 3
 
1.6%
정*우 3
 
1.6%
김*영 3
 
1.6%
이*진 3
 
1.6%
김*원 3
 
1.6%
Other values (130) 152
81.3%
2023-12-12T19:49:41.316356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 187
33.3%
64
 
11.4%
25
 
4.5%
15
 
2.7%
14
 
2.5%
13
 
2.3%
9
 
1.6%
8
 
1.4%
8
 
1.4%
7
 
1.2%
Other values (89) 211
37.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 374
66.7%
Other Punctuation 187
33.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
64
 
17.1%
25
 
6.7%
15
 
4.0%
14
 
3.7%
13
 
3.5%
9
 
2.4%
8
 
2.1%
8
 
2.1%
7
 
1.9%
7
 
1.9%
Other values (88) 204
54.5%
Other Punctuation
ValueCountFrequency (%)
* 187
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 374
66.7%
Common 187
33.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
64
 
17.1%
25
 
6.7%
15
 
4.0%
14
 
3.7%
13
 
3.5%
9
 
2.4%
8
 
2.1%
8
 
2.1%
7
 
1.9%
7
 
1.9%
Other values (88) 204
54.5%
Common
ValueCountFrequency (%)
* 187
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 374
66.7%
ASCII 187
33.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 187
100.0%
Hangul
ValueCountFrequency (%)
64
 
17.1%
25
 
6.7%
15
 
4.0%
14
 
3.7%
13
 
3.5%
9
 
2.4%
8
 
2.1%
8
 
2.1%
7
 
1.9%
7
 
1.9%
Other values (88) 204
54.5%

출생년도
Real number (ℝ)

Distinct38
Distinct (%)20.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1985.7594
Minimum1963
Maximum2003
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.8 KiB
2023-12-12T19:49:41.633762image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1963
5-th percentile1969.3
Q11979
median1987
Q31994
95-th percentile1998.7
Maximum2003
Range40
Interquartile range (IQR)15

Descriptive statistics

Standard deviation9.38661
Coefficient of variation (CV)0.0047269625
Kurtosis-0.84414144
Mean1985.7594
Median Absolute Deviation (MAD)7
Skewness-0.32452372
Sum371337
Variance88.108447
MonotonicityNot monotonic
2023-12-12T19:49:41.894471image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=38)
ValueCountFrequency (%)
1987 11
 
5.9%
1994 11
 
5.9%
1995 11
 
5.9%
1996 10
 
5.3%
1979 9
 
4.8%
1993 9
 
4.8%
1982 8
 
4.3%
1991 8
 
4.3%
1984 8
 
4.3%
1997 8
 
4.3%
Other values (28) 94
50.3%
ValueCountFrequency (%)
1963 1
 
0.5%
1966 2
 
1.1%
1967 3
1.6%
1968 3
1.6%
1969 1
 
0.5%
1970 2
 
1.1%
1971 6
3.2%
1972 4
2.1%
1973 1
 
0.5%
1974 3
1.6%
ValueCountFrequency (%)
2003 2
 
1.1%
2001 1
 
0.5%
2000 3
 
1.6%
1999 4
 
2.1%
1998 4
 
2.1%
1997 8
4.3%
1996 10
5.3%
1995 11
5.9%
1994 11
5.9%
1993 9
4.8%

지역
Categorical

Distinct12
Distinct (%)6.4%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
울산광역시
101 
부산광역시
53 
경상남도
 
10
경기도
 
4
대구광역시
 
4
Other values (7)
15 

Length

Max length5
Median length5
Mean length4.8342246
Min length3

Unique

Unique3 ?
Unique (%)1.6%

Sample

1st row부산광역시
2nd row울산광역시
3rd row부산광역시
4th row울산광역시
5th row울산광역시

Common Values

ValueCountFrequency (%)
울산광역시 101
54.0%
부산광역시 53
28.3%
경상남도 10
 
5.3%
경기도 4
 
2.1%
대구광역시 4
 
2.1%
전라남도 4
 
2.1%
경상북도 3
 
1.6%
충청남도 3
 
1.6%
서울특별시 2
 
1.1%
광주광역시 1
 
0.5%
Other values (2) 2
 
1.1%

Length

2023-12-12T19:49:42.274572image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
울산광역시 101
54.0%
부산광역시 53
28.3%
경상남도 10
 
5.3%
경기도 4
 
2.1%
대구광역시 4
 
2.1%
전라남도 4
 
2.1%
경상북도 3
 
1.6%
충청남도 3
 
1.6%
서울특별시 2
 
1.1%
광주광역시 1
 
0.5%
Other values (2) 2
 
1.1%

성별
Categorical

IMBALANCE 

Distinct2
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
181 
 
6

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
181
96.8%
6
 
3.2%

Length

2023-12-12T19:49:42.491704image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T19:49:42.689706image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
181
96.8%
6
 
3.2%

비고
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct7
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
<NA>
146 
일반 1차
 
13
일반 2차
 
11
자진 퇴교
 
6
전문가 3차
 
5
Other values (2)
 
6

Length

Max length6
Median length4
Mean length4.2780749
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 146
78.1%
일반 1차 13
 
7.0%
일반 2차 11
 
5.9%
자진 퇴교 6
 
3.2%
전문가 3차 5
 
2.7%
전문가 1차 3
 
1.6%
전문가 2차 3
 
1.6%

Length

2023-12-12T19:49:43.164767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T19:49:43.503294image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 146
64.0%
일반 24
 
10.5%
1차 16
 
7.0%
2차 14
 
6.1%
전문가 11
 
4.8%
자진 6
 
2.6%
퇴교 6
 
2.6%
3차 5
 
2.2%

Interactions

2023-12-12T19:49:38.012920image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:49:37.651726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:49:38.147815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:49:37.818023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T19:49:43.689949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수행년도차수번호출생년도지역성별비고
수행년도1.0000.3520.1150.4470.2140.2230.516
차수0.3521.0000.0000.2500.2490.0000.996
번호0.1150.0001.0000.3930.0000.0000.295
출생년도0.4470.2500.3931.0000.3790.1960.412
지역0.2140.2490.0000.3791.0000.0000.454
성별0.2230.0000.0000.1960.0001.0000.000
비고0.5160.9960.2950.4120.4540.0001.000
2023-12-12T19:49:43.883800image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역수행년도차수성별비고
지역1.0000.1160.1110.0000.280
수행년도0.1161.0000.2830.2700.373
차수0.1110.2831.0000.0000.884
성별0.0000.2700.0001.0000.000
비고0.2800.3730.8840.0001.000
2023-12-12T19:49:44.113677image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
번호출생년도수행년도차수지역성별비고
번호1.000-0.0920.0620.0000.0000.0000.188
출생년도-0.0921.0000.1900.1360.1580.1480.203
수행년도0.0620.1901.0000.2830.1160.2700.373
차수0.0000.1360.2831.0000.1110.0000.884
지역0.0000.1580.1160.1111.0000.0000.280
성별0.0000.1480.2700.0000.0001.0000.000
비고0.1880.2030.3730.8840.2800.0001.000

Missing values

2023-12-12T19:49:38.397413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T19:49:38.576995image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

수행년도차수번호이름출생년도지역성별비고
0201811구*호1989부산광역시<NA>
1201812오*성1979울산광역시<NA>
2201813이*준1977부산광역시<NA>
3201814최*훈1982울산광역시<NA>
4201815이*영1991울산광역시<NA>
5201816정*우1980울산광역시<NA>
6201817윤*무1983울산광역시<NA>
7201818허*성1971울산광역시<NA>
8201819이*호1985부산광역시<NA>
92018110정*윤1983울산광역시<NA>
수행년도차수번호이름출생년도지역성별비고
177202232김*훈1998울산광역시일반 2차
178202233김*민1994울산광역시일반 2차
179202234김*진2001울산광역시일반 2차
180202235문*덕1984부산광역시일반 2차
181202236박*현2000부산광역시일반 2차
182202237심*별1994울산광역시일반 2차
183202238심*석1991부산광역시일반 2차
184202239이*환1996부산광역시일반 2차
1852022310정*귀1978울산광역시일반 2차
1862022311정*우1996울산광역시일반 2차