Overview

Dataset statistics

Number of variables10
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows2432
Duplicate rows (%)24.3%
Total size in memory859.4 KiB
Average record size in memory88.0 B

Variable types

Unsupported1
Text2
Categorical7

Dataset

Description지도자별 신상 및 등록정보(식별정보 제외)
Author대한체육회
URLhttps://www.data.go.kr/data/15052696/fileData.do

Alerts

Dataset has 2432 (24.3%) duplicate rowsDuplicates
세부종목 is highly overall correlated with 소속세부구분High correlation
소속세부구분 is highly overall correlated with 세부종목High correlation
세부종목 is highly imbalanced (90.3%)Imbalance
소속구분 is highly imbalanced (94.2%)Imbalance
소속세부구분 is highly imbalanced (51.4%)Imbalance
등록년도 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-12 06:06:22.290391
Analysis finished2023-12-12 06:06:23.963933
Duration1.67 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

등록년도
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size156.2 KiB

소속
Text

Distinct3493
Distinct (%)34.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T15:06:24.217586image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length20
Mean length6.345
Min length2

Characters and Unicode

Total characters63450
Distinct characters468
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1255 ?
Unique (%)12.6%

Sample

1st row내동중학교(남)
2nd row단국대학교
3rd row신안중학교
4th row광주체육고등학교
5th row제주여자중학교
ValueCountFrequency (%)
서울 115
 
1.1%
한국체육대학교 66
 
0.6%
경기 55
 
0.5%
인천 47
 
0.4%
대전체육고등학교 38
 
0.4%
서울체육고등학교 37
 
0.3%
전북체육고등학교 36
 
0.3%
대전체육중학교 36
 
0.3%
광주 34
 
0.3%
부산 32
 
0.3%
Other values (3576) 10238
95.4%
2023-12-12T15:06:24.724015image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7354
 
11.6%
7204
 
11.4%
4024
 
6.3%
3353
 
5.3%
2852
 
4.5%
2002
 
3.2%
1844
 
2.9%
1094
 
1.7%
1030
 
1.6%
990
 
1.6%
Other values (458) 31703
50.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 61923
97.6%
Space Separator 735
 
1.2%
Uppercase Letter 336
 
0.5%
Open Punctuation 131
 
0.2%
Close Punctuation 131
 
0.2%
Decimal Number 107
 
0.2%
Lowercase Letter 43
 
0.1%
Dash Punctuation 24
 
< 0.1%
Other Punctuation 18
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
7354
 
11.9%
7204
 
11.6%
4024
 
6.5%
3353
 
5.4%
2852
 
4.6%
2002
 
3.2%
1844
 
3.0%
1094
 
1.8%
1030
 
1.7%
990
 
1.6%
Other values (402) 30176
48.7%
Uppercase Letter
ValueCountFrequency (%)
C 56
16.7%
F 52
15.5%
U 44
13.1%
K 28
8.3%
A 19
 
5.7%
B 16
 
4.8%
N 15
 
4.5%
O 14
 
4.2%
S 12
 
3.6%
H 11
 
3.3%
Other values (13) 69
20.5%
Lowercase Letter
ValueCountFrequency (%)
n 7
16.3%
u 6
14.0%
e 5
11.6%
c 4
9.3%
o 3
 
7.0%
r 3
 
7.0%
a 2
 
4.7%
i 2
 
4.7%
t 1
 
2.3%
s 1
 
2.3%
Other values (9) 9
20.9%
Decimal Number
ValueCountFrequency (%)
1 46
43.0%
5 22
20.6%
2 17
 
15.9%
8 11
 
10.3%
3 11
 
10.3%
Other Punctuation
ValueCountFrequency (%)
. 12
66.7%
& 5
27.8%
· 1
 
5.6%
Space Separator
ValueCountFrequency (%)
735
100.0%
Open Punctuation
ValueCountFrequency (%)
( 131
100.0%
Close Punctuation
ValueCountFrequency (%)
) 131
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 24
100.0%
Math Symbol
ValueCountFrequency (%)
= 1
100.0%
Other Symbol
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 61924
97.6%
Common 1147
 
1.8%
Latin 379
 
0.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
7354
 
11.9%
7204
 
11.6%
4024
 
6.5%
3353
 
5.4%
2852
 
4.6%
2002
 
3.2%
1844
 
3.0%
1094
 
1.8%
1030
 
1.7%
990
 
1.6%
Other values (403) 30177
48.7%
Latin
ValueCountFrequency (%)
C 56
14.8%
F 52
13.7%
U 44
 
11.6%
K 28
 
7.4%
A 19
 
5.0%
B 16
 
4.2%
N 15
 
4.0%
O 14
 
3.7%
S 12
 
3.2%
H 11
 
2.9%
Other values (32) 112
29.6%
Common
ValueCountFrequency (%)
735
64.1%
( 131
 
11.4%
) 131
 
11.4%
1 46
 
4.0%
- 24
 
2.1%
5 22
 
1.9%
2 17
 
1.5%
. 12
 
1.0%
8 11
 
1.0%
3 11
 
1.0%
Other values (3) 7
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 61923
97.6%
ASCII 1525
 
2.4%
None 2
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
7354
 
11.9%
7204
 
11.6%
4024
 
6.5%
3353
 
5.4%
2852
 
4.6%
2002
 
3.2%
1844
 
3.0%
1094
 
1.8%
1030
 
1.7%
990
 
1.6%
Other values (402) 30176
48.7%
ASCII
ValueCountFrequency (%)
735
48.2%
( 131
 
8.6%
) 131
 
8.6%
C 56
 
3.7%
F 52
 
3.4%
1 46
 
3.0%
U 44
 
2.9%
K 28
 
1.8%
- 24
 
1.6%
5 22
 
1.4%
Other values (44) 256
 
16.8%
None
ValueCountFrequency (%)
· 1
50.0%
1
50.0%

성별
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
8513 
1487 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
8513
85.1%
1487
 
14.9%

Length

2023-12-12T15:06:24.933509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:06:25.055295image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
8513
85.1%
1487
 
14.9%

지도자구분
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
감독
5120 
코치
4826 
트레이너
 
54

Length

Max length4
Median length2
Mean length2.0108
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row코치
2nd row감독
3rd row코치
4th row코치
5th row감독

Common Values

ValueCountFrequency (%)
감독 5120
51.2%
코치 4826
48.3%
트레이너 54
 
0.5%

Length

2023-12-12T15:06:25.182311image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:06:25.305475image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
감독 5120
51.2%
코치 4826
48.3%
트레이너 54
 
0.5%

종목
Text

Distinct55
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T15:06:25.523256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length2
Mean length3.0801
Min length2

Characters and Unicode

Total characters30801
Distinct characters109
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row탁구
2nd row빙상
3rd row테니스
4th row레슬링
5th row배드민턴
ValueCountFrequency (%)
야구소프트볼(야구 736
 
7.4%
수영 684
 
6.8%
유도 487
 
4.9%
테니스 459
 
4.6%
축구 449
 
4.5%
사격 441
 
4.4%
탁구 404
 
4.0%
소프트테니스 397
 
4.0%
양궁 362
 
3.6%
농구 357
 
3.6%
Other values (45) 5224
52.2%
2023-12-12T15:06:25.915200image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3042
 
9.9%
1505
 
4.9%
1296
 
4.2%
1254
 
4.1%
1251
 
4.1%
1248
 
4.1%
1199
 
3.9%
1081
 
3.5%
864
 
2.8%
856
 
2.8%
Other values (99) 17205
55.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 29166
94.7%
Open Punctuation 769
 
2.5%
Close Punctuation 769
 
2.5%
Decimal Number 83
 
0.3%
Connector Punctuation 9
 
< 0.1%
Other Punctuation 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3042
 
10.4%
1505
 
5.2%
1296
 
4.4%
1254
 
4.3%
1251
 
4.3%
1248
 
4.3%
1199
 
4.1%
1081
 
3.7%
864
 
3.0%
856
 
2.9%
Other values (93) 15570
53.4%
Decimal Number
ValueCountFrequency (%)
5 61
73.5%
3 22
 
26.5%
Open Punctuation
ValueCountFrequency (%)
( 769
100.0%
Close Punctuation
ValueCountFrequency (%)
) 769
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 9
100.0%
Other Punctuation
ValueCountFrequency (%)
· 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 29166
94.7%
Common 1635
 
5.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3042
 
10.4%
1505
 
5.2%
1296
 
4.4%
1254
 
4.3%
1251
 
4.3%
1248
 
4.3%
1199
 
4.1%
1081
 
3.7%
864
 
3.0%
856
 
2.9%
Other values (93) 15570
53.4%
Common
ValueCountFrequency (%)
( 769
47.0%
) 769
47.0%
5 61
 
3.7%
3 22
 
1.3%
_ 9
 
0.6%
· 5
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 29166
94.7%
ASCII 1630
 
5.3%
None 5
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3042
 
10.4%
1505
 
5.2%
1296
 
4.4%
1254
 
4.3%
1251
 
4.3%
1248
 
4.3%
1199
 
4.1%
1081
 
3.7%
864
 
3.0%
856
 
2.9%
Other values (93) 15570
53.4%
ASCII
ValueCountFrequency (%)
( 769
47.2%
) 769
47.2%
5 61
 
3.7%
3 22
 
1.3%
_ 9
 
0.6%
None
ValueCountFrequency (%)
· 5
100.0%

세부종목
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct16
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9490 
리커브
 
347
단체전
 
42
기계체조
 
35
에어로빅
 
23
Other values (11)
 
63

Length

Max length6
Median length4
Mean length3.9572
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9490
94.9%
리커브 347
 
3.5%
단체전 42
 
0.4%
기계체조 35
 
0.4%
에어로빅 23
 
0.2%
컴파운드 11
 
0.1%
경영 8
 
0.1%
개인전 8
 
0.1%
플러레 6
 
0.1%
스피드 6
 
0.1%
Other values (6) 24
 
0.2%

Length

2023-12-12T15:06:26.068371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 9490
94.9%
리커브 347
 
3.5%
단체전 42
 
0.4%
기계체조 35
 
0.4%
에어로빅 23
 
0.2%
컴파운드 11
 
0.1%
경영 8
 
0.1%
개인전 8
 
0.1%
플러레 6
 
0.1%
스피드 6
 
0.1%
Other values (6) 24
 
0.2%

종별
Categorical

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
중학부
3103 
고등부
2780 
초등부
2047 
실업(일반)
835 
대학부
795 
Other values (3)
440 

Length

Max length10
Median length3
Mean length3.3839
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row중학부
2nd row대학부
3rd row중학부
4th row고등부
5th row중학부

Common Values

ValueCountFrequency (%)
중학부 3103
31.0%
고등부 2780
27.8%
초등부 2047
20.5%
실업(일반) 835
 
8.3%
대학부 795
 
8.0%
기타(일반) 369
 
3.7%
군,경찰 45
 
0.4%
시도군청(삭제예정) 26
 
0.3%

Length

2023-12-12T15:06:26.224844image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:06:26.347303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
중학부 3103
31.0%
고등부 2780
27.8%
초등부 2047
20.5%
실업(일반 835
 
8.3%
대학부 795
 
8.0%
기타(일반 369
 
3.7%
군,경찰 45
 
0.4%
시도군청(삭제예정 26
 
0.3%

소속구분
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
엘리트
9933 
동호인
 
67

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row엘리트
2nd row엘리트
3rd row엘리트
4th row엘리트
5th row엘리트

Common Values

ValueCountFrequency (%)
엘리트 9933
99.3%
동호인 67
 
0.7%

Length

2023-12-12T15:06:26.478902image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:06:26.565048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
엘리트 9933
99.3%
동호인 67
 
0.7%

소속세부구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
운동부(학교,직장)
8467 
클럽,체육관 등
 
844
<NA>
 
689

Length

Max length10
Median length10
Mean length9.4178
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row운동부(학교,직장)
2nd row운동부(학교,직장)
3rd row운동부(학교,직장)
4th row운동부(학교,직장)
5th row운동부(학교,직장)

Common Values

ValueCountFrequency (%)
운동부(학교,직장) 8467
84.7%
클럽,체육관 등 844
 
8.4%
<NA> 689
 
6.9%

Length

2023-12-12T15:06:26.680983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:06:26.809014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
운동부(학교,직장 8467
78.1%
클럽,체육관 844
 
7.8%
844
 
7.8%
na 689
 
6.4%

시도
Categorical

Distinct18
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기
1398 
서울
1258 
강원
703 
경북
702 
충남
628 
Other values (13)
5311 

Length

Max length4
Median length2
Mean length2.006
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경기
2nd row충남
3rd row경기
4th row광주
5th row제주

Common Values

ValueCountFrequency (%)
경기 1398
14.0%
서울 1258
12.6%
강원 703
 
7.0%
경북 702
 
7.0%
충남 628
 
6.3%
대구 618
 
6.2%
부산 602
 
6.0%
경남 581
 
5.8%
인천 555
 
5.5%
전남 539
 
5.4%
Other values (8) 2416
24.2%

Length

2023-12-12T15:06:26.935519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기 1398
14.0%
서울 1258
12.6%
강원 703
 
7.0%
경북 702
 
7.0%
충남 628
 
6.3%
대구 618
 
6.2%
부산 602
 
6.0%
경남 581
 
5.8%
인천 555
 
5.5%
전남 539
 
5.4%
Other values (8) 2416
24.2%

Correlations

2023-12-12T15:06:27.035103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성별지도자구분종목세부종목종별소속구분소속세부구분시도
성별1.0000.1410.3710.3330.2580.0100.0370.096
지도자구분0.1411.0000.2910.0000.1190.0000.0050.109
종목0.3710.2911.0000.9690.6980.3570.8910.496
세부종목0.3330.0000.9691.0000.5870.0000.9290.642
종별0.2580.1190.6980.5871.0000.4050.1610.243
소속구분0.0100.0000.3570.0000.4051.0000.0000.074
소속세부구분0.0370.0050.8910.9290.1610.0001.0000.228
시도0.0960.1090.4960.6420.2430.0740.2281.000
2023-12-12T15:06:27.172880image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성별소속구분종별세부종목시도지도자구분소속세부구분
성별1.0000.0070.1940.3000.0760.2320.024
소속구분0.0071.0000.3050.0000.0580.0000.000
종별0.1940.3051.0000.3110.1040.0750.121
세부종목0.3000.0000.3111.0000.2710.0000.907
시도0.0760.0580.1040.2711.0000.0490.180
지도자구분0.2320.0000.0750.0000.0491.0000.008
소속세부구분0.0240.0000.1210.9070.1800.0081.000
2023-12-12T15:06:27.309895image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성별지도자구분세부종목종별소속구분소속세부구분시도
성별1.0000.2320.3000.1940.0070.0240.076
지도자구분0.2321.0000.0000.0750.0000.0080.049
세부종목0.3000.0001.0000.3110.0000.9070.271
종별0.1940.0750.3111.0000.3050.1210.104
소속구분0.0070.0000.0000.3051.0000.0000.058
소속세부구분0.0240.0080.9070.1210.0001.0000.180
시도0.0760.0490.2710.1040.0580.1801.000

Missing values

2023-12-12T15:06:23.586251image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:06:23.858715image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

등록년도소속성별지도자구분종목세부종목종별소속구분소속세부구분시도
161912008내동중학교(남)코치탁구<NA>중학부엘리트운동부(학교,직장)경기
768732016단국대학교감독빙상<NA>대학부엘리트운동부(학교,직장)충남
739252013신안중학교코치테니스<NA>중학부엘리트운동부(학교,직장)경기
90182006광주체육고등학교코치레슬링<NA>고등부엘리트운동부(학교,직장)광주
394332019제주여자중학교감독배드민턴<NA>중학부엘리트운동부(학교,직장)제주
523092014경기고양고코치축구<NA>고등부엘리트운동부(학교,직장)경기
446342018강원주문진중감독축구<NA>중학부엘리트운동부(학교,직장)강원
39752011인천연성초등학교감독아이스하키<NA>초등부엘리트운동부(학교,직장)인천
758242013경북여자고등학교코치테니스<NA>고등부엘리트운동부(학교,직장)대구
661102014용인대학교코치유도<NA>대학부엘리트운동부(학교,직장)경기
등록년도소속성별지도자구분종목세부종목종별소속구분소속세부구분시도
94162019인천조동초등학교코치탁구<NA>초등부엘리트운동부(학교,직장)인천
772712016유봉여자고등학교감독빙상<NA>고등부엘리트클럽,체육관 등강원
589242007영신중학교코치골프<NA>중학부엘리트운동부(학교,직장)대구
894382018조선대학교코치배구<NA>대학부엘리트운동부(학교,직장)광주
339072005일산주엽복싱체육관코치복싱<NA>고등부엘리트운동부(학교,직장)경기
625772006경기도청코치수영<NA>실업(일반)엘리트운동부(학교,직장)경기
323082007운남중학교코치복싱<NA>중학부엘리트클럽,체육관 등광주
817562007명덕여자중학교감독테니스<NA>중학부엘리트<NA>울산
40432019돌핀스아이스하키팀감독아이스하키<NA>초등부엘리트클럽,체육관 등서울
28162012대영고등학교코치검도<NA>고등부엘리트클럽,체육관 등서울

Duplicate rows

Most frequently occurring

소속성별지도자구분종목세부종목종별소속구분소속세부구분시도# duplicates
183경북일반코치사격<NA>기타(일반)엘리트운동부(학교,직장)경북11
408국군체육부대(상무)감독사격<NA>군,경찰엘리트운동부(학교,직장)경북10
1885전남일반코치사격<NA>기타(일반)엘리트운동부(학교,직장)전남10
1472안양여자중학교감독탁구<NA>중학부엘리트운동부(학교,직장)경기7
1609용산고등학교감독소프트테니스<NA>고등부엘리트운동부(학교,직장)서울7
1621용인대학교코치유도<NA>대학부엘리트운동부(학교,직장)경기7
1981정화중학교코치빙상<NA>중학부엘리트클럽,체육관 등대구7
2221충청북도청감독펜싱<NA>실업(일반)엘리트운동부(학교,직장)충북7
2290하동중앙중학교감독레슬링<NA>중학부엘리트운동부(학교,직장)경남7
66건국대코치야구소프트볼(야구)<NA>대학부엘리트운동부(학교,직장)충북6