Overview

Dataset statistics

Number of variables9
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows1832
Duplicate rows (%)18.3%
Total size in memory791.0 KiB
Average record size in memory81.0 B

Variable types

Numeric1
Categorical7
Text1

Dataset

Description종목별 등록선수 정보(등록년도,성별,종별,종목,세부종목,소속 등)
Author대한체육회
URLhttps://www.data.go.kr/data/15052695/fileData.do

Alerts

Dataset has 1832 (18.3%) duplicate rowsDuplicates
세부종목 is highly overall correlated with 종목 and 1 other fieldsHigh correlation
소속구분 is highly overall correlated with 종목 and 1 other fieldsHigh correlation
종목 is highly overall correlated with 세부종목 and 2 other fieldsHigh correlation
소속세부구분 is highly overall correlated with 종목High correlation
세부종목 is highly imbalanced (92.8%)Imbalance
소속구분 is highly imbalanced (76.9%)Imbalance

Reproduction

Analysis started2023-12-12 02:02:07.066351
Analysis finished2023-12-12 02:02:08.750211
Duration1.68 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

등록년도
Real number (ℝ)

Distinct42
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2006.2882
Minimum1977
Maximum2019
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T11:02:08.840227image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1977
5-th percentile2003
Q12003
median2005
Q32009
95-th percentile2015
Maximum2019
Range42
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.4176795
Coefficient of variation (CV)0.0022019167
Kurtosis2.9619092
Mean2006.2882
Median Absolute Deviation (MAD)2
Skewness0.32558538
Sum20062882
Variance19.515892
MonotonicityNot monotonic
2023-12-12T11:02:09.013681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=42)
ValueCountFrequency (%)
2003 2799
28.0%
2004 1265
12.7%
2005 971
 
9.7%
2006 809
 
8.1%
2007 679
 
6.8%
2008 548
 
5.5%
2009 497
 
5.0%
2010 424
 
4.2%
2011 341
 
3.4%
2013 227
 
2.3%
Other values (32) 1440
14.4%
ValueCountFrequency (%)
1977 2
 
< 0.1%
1978 2
 
< 0.1%
1979 1
 
< 0.1%
1980 3
< 0.1%
1981 3
< 0.1%
1982 3
< 0.1%
1983 5
0.1%
1985 2
 
< 0.1%
1986 2
 
< 0.1%
1987 1
 
< 0.1%
ValueCountFrequency (%)
2019 87
 
0.9%
2018 103
 
1.0%
2017 139
 
1.4%
2016 142
 
1.4%
2015 148
 
1.5%
2014 211
2.1%
2013 227
2.3%
2012 225
2.2%
2011 341
3.4%
2010 424
4.2%

성별
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
8044 
1956 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
8044
80.4%
1956
 
19.6%

Length

2023-12-12T11:02:09.205022image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:02:09.335526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
8044
80.4%
1956
 
19.6%

종별
Categorical

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
기타(일반)
2973 
고등부
1961 
대학부
1782 
실업(일반)
1400 
중학부
1026 
Other values (3)
858 

Length

Max length10
Median length3
Mean length4.4926
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row실업(일반)
2nd row기타(일반)
3rd row대학부
4th row대학부
5th row고등부

Common Values

ValueCountFrequency (%)
기타(일반) 2973
29.7%
고등부 1961
19.6%
대학부 1782
17.8%
실업(일반) 1400
14.0%
중학부 1026
 
10.3%
초등부 521
 
5.2%
시도군청(삭제예정) 245
 
2.5%
군,경찰 92
 
0.9%

Length

2023-12-12T11:02:09.461339image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:02:09.597411image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
기타(일반 2973
29.7%
고등부 1961
19.6%
대학부 1782
17.8%
실업(일반 1400
14.0%
중학부 1026
 
10.3%
초등부 521
 
5.2%
시도군청(삭제예정 245
 
2.5%
군,경찰 92
 
0.9%

종목
Categorical

HIGH CORRELATION 

Distinct43
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
궁도
2655 
태권도
1106 
검도
1036 
탁구
855 
레슬링
615 
Other values (38)
3733 

Length

Max length12
Median length2
Mean length3.0536
Min length2

Unique

Unique6 ?
Unique (%)0.1%

Sample

1st row스쿼시
2nd row궁도
3rd row검도
4th row태권도
5th row레슬링

Common Values

ValueCountFrequency (%)
궁도 2655
26.6%
태권도 1106
11.1%
검도 1036
 
10.4%
탁구 855
 
8.6%
레슬링 615
 
6.2%
소프트테니스 615
 
6.2%
하키 547
 
5.5%
야구소프트볼(야구) 509
 
5.1%
아이스하키 391
 
3.9%
자전거 379
 
3.8%
Other values (33) 1292
12.9%

Length

2023-12-12T11:02:09.783397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
궁도 2655
26.6%
태권도 1106
11.1%
검도 1036
 
10.4%
탁구 855
 
8.6%
레슬링 615
 
6.2%
소프트테니스 615
 
6.2%
하키 547
 
5.5%
야구소프트볼(야구 509
 
5.1%
아이스하키 391
 
3.9%
자전거 379
 
3.8%
Other values (33) 1292
12.9%

세부종목
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9728 
단체전
 
226
개인전
 
39
BMX
 
3
스피드
 
2
Other values (2)
 
2

Length

Max length5
Median length4
Mean length3.973
Min length3

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9728
97.3%
단체전 226
 
2.3%
개인전 39
 
0.4%
BMX 3
 
< 0.1%
스피드 2
 
< 0.1%
인라인하키 1
 
< 0.1%
리커브 1
 
< 0.1%

Length

2023-12-12T11:02:09.970884image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:02:10.156448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9728
97.3%
단체전 226
 
2.3%
개인전 39
 
0.4%
bmx 3
 
< 0.1%
스피드 2
 
< 0.1%
인라인하키 1
 
< 0.1%
리커브 1
 
< 0.1%

소속
Text

Distinct2329
Distinct (%)23.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T11:02:10.496262image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length17
Mean length6.0772
Min length2

Characters and Unicode

Total characters60772
Distinct characters435
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique885 ?
Unique (%)8.8%

Sample

1st row계양스쿼시클럽
2nd row함안 성심정
3rd row한국해양대학교
4th row목원대학교
5th row대구체육고등학교
ValueCountFrequency (%)
시흥 132
 
1.0%
파주 129
 
1.0%
한국체육대학교 115
 
0.9%
승마협회 102
 
0.8%
고양 81
 
0.6%
포항 80
 
0.6%
평창 79
 
0.6%
연무정 70
 
0.5%
군자정 67
 
0.5%
여주 62
 
0.5%
Other values (2422) 11970
92.9%
2023-12-12T11:02:11.070067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4998
 
8.2%
4615
 
7.6%
2909
 
4.8%
2889
 
4.8%
2486
 
4.1%
2277
 
3.7%
2209
 
3.6%
1134
 
1.9%
1048
 
1.7%
953
 
1.6%
Other values (425) 35254
58.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 57102
94.0%
Space Separator 2889
 
4.8%
Uppercase Letter 280
 
0.5%
Open Punctuation 204
 
0.3%
Close Punctuation 204
 
0.3%
Lowercase Letter 55
 
0.1%
Decimal Number 16
 
< 0.1%
Dash Punctuation 11
 
< 0.1%
Other Punctuation 11
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4998
 
8.8%
4615
 
8.1%
2909
 
5.1%
2486
 
4.4%
2277
 
4.0%
2209
 
3.9%
1134
 
2.0%
1048
 
1.8%
953
 
1.7%
952
 
1.7%
Other values (377) 33521
58.7%
Uppercase Letter
ValueCountFrequency (%)
B 64
22.9%
O 50
17.9%
K 43
15.4%
T 24
 
8.6%
D 14
 
5.0%
G 14
 
5.0%
H 9
 
3.2%
A 8
 
2.9%
N 8
 
2.9%
M 7
 
2.5%
Other values (11) 39
13.9%
Lowercase Letter
ValueCountFrequency (%)
a 12
21.8%
e 10
18.2%
r 9
16.4%
t 8
14.5%
w 6
10.9%
d 2
 
3.6%
i 2
 
3.6%
p 1
 
1.8%
c 1
 
1.8%
l 1
 
1.8%
Other values (3) 3
 
5.5%
Decimal Number
ValueCountFrequency (%)
1 7
43.8%
3 3
18.8%
2 3
18.8%
5 2
 
12.5%
0 1
 
6.2%
Other Punctuation
ValueCountFrequency (%)
& 9
81.8%
, 1
 
9.1%
. 1
 
9.1%
Open Punctuation
ValueCountFrequency (%)
( 203
99.5%
[ 1
 
0.5%
Close Punctuation
ValueCountFrequency (%)
) 203
99.5%
] 1
 
0.5%
Space Separator
ValueCountFrequency (%)
2889
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 57102
94.0%
Common 3335
 
5.5%
Latin 335
 
0.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4998
 
8.8%
4615
 
8.1%
2909
 
5.1%
2486
 
4.4%
2277
 
4.0%
2209
 
3.9%
1134
 
2.0%
1048
 
1.8%
953
 
1.7%
952
 
1.7%
Other values (377) 33521
58.7%
Latin
ValueCountFrequency (%)
B 64
19.1%
O 50
14.9%
K 43
12.8%
T 24
 
7.2%
D 14
 
4.2%
G 14
 
4.2%
a 12
 
3.6%
e 10
 
3.0%
H 9
 
2.7%
r 9
 
2.7%
Other values (24) 86
25.7%
Common
ValueCountFrequency (%)
2889
86.6%
( 203
 
6.1%
) 203
 
6.1%
- 11
 
0.3%
& 9
 
0.3%
1 7
 
0.2%
3 3
 
0.1%
2 3
 
0.1%
5 2
 
0.1%
, 1
 
< 0.1%
Other values (4) 4
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 57102
94.0%
ASCII 3670
 
6.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4998
 
8.8%
4615
 
8.1%
2909
 
5.1%
2486
 
4.4%
2277
 
4.0%
2209
 
3.9%
1134
 
2.0%
1048
 
1.8%
953
 
1.7%
952
 
1.7%
Other values (377) 33521
58.7%
ASCII
ValueCountFrequency (%)
2889
78.7%
( 203
 
5.5%
) 203
 
5.5%
B 64
 
1.7%
O 50
 
1.4%
K 43
 
1.2%
T 24
 
0.7%
D 14
 
0.4%
G 14
 
0.4%
a 12
 
0.3%
Other values (38) 154
 
4.2%

소속구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
엘리트
9625 
동호인
 
375

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row동호인
2nd row엘리트
3rd row동호인
4th row엘리트
5th row엘리트

Common Values

ValueCountFrequency (%)
엘리트 9625
96.2%
동호인 375
 
3.8%

Length

2023-12-12T11:02:11.236905image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:02:11.360613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
엘리트 9625
96.2%
동호인 375
 
3.8%

소속세부구분
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
운동부(학교,직장)
7091 
<NA>
2576 
클럽,체육관 등
 
333

Length

Max length10
Median length10
Mean length8.3878
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row운동부(학교,직장)
3rd row<NA>
4th row<NA>
5th row운동부(학교,직장)

Common Values

ValueCountFrequency (%)
운동부(학교,직장) 7091
70.9%
<NA> 2576
 
25.8%
클럽,체육관 등 333
 
3.3%

Length

2023-12-12T11:02:11.517544image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:02:11.672546image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
운동부(학교,직장 7091
68.6%
na 2576
 
24.9%
클럽,체육관 333
 
3.2%
333
 
3.2%

시도
Categorical

Distinct18
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기
2105 
서울
1368 
경북
858 
전남
828 
강원
797 
Other values (13)
4044 

Length

Max length4
Median length2
Mean length2.001
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row인천
2nd row경남
3rd row부산
4th row대전
5th row대구

Common Values

ValueCountFrequency (%)
경기 2105
21.1%
서울 1368
13.7%
경북 858
8.6%
전남 828
 
8.3%
강원 797
 
8.0%
전북 655
 
6.6%
인천 472
 
4.7%
부산 465
 
4.7%
충남 434
 
4.3%
충북 387
 
3.9%
Other values (8) 1631
16.3%

Length

2023-12-12T11:02:11.837593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기 2105
21.1%
서울 1368
13.7%
경북 858
8.6%
전남 828
 
8.3%
강원 797
 
8.0%
전북 655
 
6.6%
인천 472
 
4.7%
부산 465
 
4.7%
충남 434
 
4.3%
충북 387
 
3.9%
Other values (8) 1631
16.3%

Interactions

2023-12-12T11:02:08.292027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T11:02:11.967921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록년도성별종별종목세부종목소속구분소속세부구분시도
등록년도1.0000.0440.3600.5180.4390.0670.0780.149
성별0.0441.0000.2950.4820.3500.0000.0000.140
종별0.3600.2951.0000.7490.4640.2550.2080.423
종목0.5180.4820.7491.0000.9320.6130.8170.583
세부종목0.4390.3500.4640.9321.000NaN0.5720.430
소속구분0.0670.0000.2550.613NaN1.0000.2040.384
소속세부구분0.0780.0000.2080.8170.5720.2041.0000.220
시도0.1490.1400.4230.5830.4300.3840.2201.000
2023-12-12T11:02:12.131994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
종목종별세부종목성별소속구분시도소속세부구분
종목1.0000.4000.7680.4040.5180.1880.714
종별0.4001.0000.1830.2210.1920.1920.223
세부종목0.7680.1831.0000.2501.0000.2120.409
성별0.4040.2210.2501.0000.0000.1100.000
소속구분0.5180.1921.0000.0001.0000.3030.131
시도0.1880.1920.2120.1100.3031.0000.173
소속세부구분0.7140.2230.4090.0000.1310.1731.000
2023-12-12T11:02:12.276128image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록년도성별종별종목세부종목소속구분소속세부구분시도
등록년도1.0000.0340.1810.2070.2330.0520.0600.057
성별0.0341.0000.2210.4040.2500.0000.0000.110
종별0.1810.2211.0000.4000.1830.1920.2230.192
종목0.2070.4040.4001.0000.7680.5180.7140.188
세부종목0.2330.2500.1830.7681.0001.0000.4090.212
소속구분0.0520.0000.1920.5181.0001.0000.1310.303
소속세부구분0.0600.0000.2230.7140.4090.1311.0000.173
시도0.0570.1100.1920.1880.2120.3030.1731.000

Missing values

2023-12-12T11:02:08.461576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T11:02:08.654383image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

등록년도성별종별종목세부종목소속소속구분소속세부구분시도
438102006실업(일반)스쿼시<NA>계양스쿼시클럽동호인<NA>인천
951062007기타(일반)궁도<NA>함안 성심정엘리트운동부(학교,직장)경남
316652003대학부검도<NA>한국해양대학교동호인<NA>부산
913242008대학부태권도<NA>목원대학교엘리트<NA>대전
301712004고등부레슬링<NA>대구체육고등학교엘리트운동부(학교,직장)대구
451902008기타(일반)궁도<NA>여주 청심정엘리트운동부(학교,직장)경기
87491995기타(일반)승마<NA>(사)서울특별시승마협회엘리트운동부(학교,직장)서울
922052007대학부태권도<NA>대불대학교엘리트<NA>전남
590592003초등부소프트테니스<NA>옥산초등학교엘리트<NA>충남
363552004기타(일반)궁도<NA>포항 송학정엘리트운동부(학교,직장)경북
등록년도성별종별종목세부종목소속소속구분소속세부구분시도
182592014실업(일반)검도<NA>무안군청엘리트운동부(학교,직장)전남
525012003시도군청(삭제예정)궁도<NA>화성 반월정엘리트<NA>경기
824142004고등부야구소프트볼(야구)<NA>배재고엘리트운동부(학교,직장)서울
99232006고등부하키<NA>송곡여자고등학교엘리트운동부(학교,직장)서울
276192011기타(일반)궁도<NA>수원 연무정엘리트운동부(학교,직장)경기
35132013실업(일반)아이스하키<NA>웨이브즈엘리트클럽,체육관 등서울
295782007대학부검도<NA>원광대학교엘리트클럽,체육관 등전북
477672002고등부탁구<NA>안양여자고등학교엘리트운동부(학교,직장)경기
855642005대학부야구소프트볼(야구)<NA>단국대엘리트운동부(학교,직장)충남
768782006기타(일반)궁도<NA>시흥 시흥정엘리트운동부(학교,직장)경기

Duplicate rows

Most frequently occurring

등록년도성별종별종목세부종목소속소속구분소속세부구분시도# duplicates
2152003기타(일반)궁도<NA>안산 광덕정엘리트운동부(학교,직장)경기14
662003고등부야구소프트볼(야구)<NA>경동고엘리트운동부(학교,직장)서울13
2642003기타(일반)궁도<NA>파주 선무정엘리트운동부(학교,직장)경기12
2812003기타(일반)궁도<NA>한성정엘리트운동부(학교,직장)경기11
1542003기타(일반)궁도<NA>경산 경조정엘리트운동부(학교,직장)경북10
1852003기타(일반)궁도<NA>남원 관덕정엘리트운동부(학교,직장)전북10
2022003기타(일반)궁도<NA>수원 연무정엘리트운동부(학교,직장)경기10
2742003기타(일반)궁도<NA>평택 화궁정엘리트운동부(학교,직장)경기10
1572003기타(일반)궁도<NA>고봉정엘리트운동부(학교,직장)경기9
1602003기타(일반)궁도<NA>고양 비호정엘리트운동부(학교,직장)경기9