Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells450
Missing cells (%)0.6%
Duplicate rows1898
Duplicate rows (%)19.0%
Total size in memory625.0 KiB
Average record size in memory64.0 B

Variable types

Categorical6
Text1

Dataset

Description장애인체육대회 종목별 경기운영 정보(대회별, 종목별, 시도별, 성별, 등록구분, 선수구분, 소속)에 데한 데이터 제공
Author대한장애인체육회
URLhttps://www.data.go.kr/data/15072754/fileData.do

Alerts

Dataset has 1898 (19.0%) duplicate rowsDuplicates
종목명 is highly overall correlated with 선수구분High correlation
선수구분 is highly overall correlated with 종목명High correlation
등록구분 is highly imbalanced (83.2%)Imbalance
소속 has 450 (4.5%) missing valuesMissing

Reproduction

Analysis started2023-12-12 13:27:16.619009
Analysis finished2023-12-12 13:27:17.870102
Duration1.25 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

대회명
Categorical

Distinct18
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
43회 전국장애인체육대회
1161 
39회 전국장애인체육대회
1133 
37회 전국장애인체육대회
1128 
42회 전국장애인체육대회
1109 
41회 전국장애인체육대회
1106 
Other values (13)
4363 

Length

Max length15
Median length13
Mean length13.267
Min length13

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row37회 전국장애인체육대회
2nd row41회 전국장애인체육대회
3rd row36회 전국장애인체육대회
4th row17회 전국장애학생체육대회
5th row42회 전국장애인체육대회

Common Values

ValueCountFrequency (%)
43회 전국장애인체육대회 1161
11.6%
39회 전국장애인체육대회 1133
11.3%
37회 전국장애인체육대회 1128
11.3%
42회 전국장애인체육대회 1109
11.1%
41회 전국장애인체육대회 1106
11.1%
38회 전국장애인체육대회 1096
11.0%
36회 전국장애인체육대회 1023
10.2%
13회 전국장애학생체육대회 357
 
3.6%
17회 전국장애학생체육대회 330
 
3.3%
12회 전국장애학생체육대회 327
 
3.3%
Other values (8) 1230
12.3%

Length

2023-12-12T22:27:17.959435image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
전국장애인체육대회 7756
38.8%
전국장애학생체육대회 1818
 
9.1%
43회 1161
 
5.8%
39회 1133
 
5.7%
37회 1128
 
5.6%
42회 1109
 
5.5%
41회 1106
 
5.5%
38회 1096
 
5.5%
36회 1023
 
5.1%
전국장애인동계체육대회 426
 
2.1%
Other values (9) 2244
 
11.2%

종목명
Categorical

HIGH CORRELATION 

Distinct46
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
축구
1031 
탁구
784 
육상트랙
695 
역도
670 
육상필드
 
541
Other values (41)
6279 

Length

Max length8
Median length2
Mean length2.8201
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row사이클
2nd row수영
3rd row볼링
4th rowe스포츠
5th row유도

Common Values

ValueCountFrequency (%)
축구 1031
 
10.3%
탁구 784
 
7.8%
육상트랙 695
 
7.0%
역도 670
 
6.7%
육상필드 541
 
5.4%
수영 541
 
5.4%
볼링 508
 
5.1%
배드민턴 438
 
4.4%
농구 415
 
4.2%
론볼 361
 
3.6%
Other values (36) 4016
40.2%

Length

2023-12-12T22:27:18.102087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
축구 1031
 
10.3%
탁구 784
 
7.8%
육상트랙 695
 
7.0%
역도 670
 
6.7%
육상필드 541
 
5.4%
수영 541
 
5.4%
볼링 508
 
5.1%
배드민턴 438
 
4.4%
농구 415
 
4.2%
론볼 361
 
3.6%
Other values (36) 4016
40.2%

시도명
Categorical

Distinct19
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기
1044 
서울
962 
충북
762 
경북
694 
충남
643 
Other values (14)
5895 

Length

Max length4
Median length2
Mean length2.0012
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row제주
2nd row대전
3rd row강원
4th row경기
5th row경기

Common Values

ValueCountFrequency (%)
경기 1044
 
10.4%
서울 962
 
9.6%
충북 762
 
7.6%
경북 694
 
6.9%
충남 643
 
6.4%
대구 634
 
6.3%
부산 626
 
6.3%
전남 588
 
5.9%
울산 578
 
5.8%
인천 557
 
5.6%
Other values (9) 2912
29.1%

Length

2023-12-12T22:27:18.261659image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기 1044
 
10.4%
서울 962
 
9.6%
충북 762
 
7.6%
경북 694
 
6.9%
충남 643
 
6.4%
대구 634
 
6.3%
부산 626
 
6.3%
전남 588
 
5.9%
울산 578
 
5.8%
인천 557
 
5.6%
Other values (9) 2912
29.1%

성별
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
남자
7311 
여자
2689 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row여자
2nd row남자
3rd row남자
4th row남자
5th row남자

Common Values

ValueCountFrequency (%)
남자 7311
73.1%
여자 2689
 
26.9%

Length

2023-12-12T22:27:18.449650image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:27:18.548116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
남자 7311
73.1%
여자 2689
 
26.9%

등록구분
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
선수
9751 
비장애인선수
 
249

Length

Max length6
Median length2
Mean length2.0996
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row비장애인선수
2nd row선수
3rd row선수
4th row선수
5th row선수

Common Values

ValueCountFrequency (%)
선수 9751
97.5%
비장애인선수 249
 
2.5%

Length

2023-12-12T22:27:18.655547image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:27:18.775292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
선수 9751
97.5%
비장애인선수 249
 
2.5%

선수구분
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
선수부
7821 
동호인부
2179 

Length

Max length4
Median length3
Mean length3.2179
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row선수부
2nd row선수부
3rd row선수부
4th row선수부
5th row선수부

Common Values

ValueCountFrequency (%)
선수부 7821
78.2%
동호인부 2179
 
21.8%

Length

2023-12-12T22:27:18.898836image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:27:18.993022image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
선수부 7821
78.2%
동호인부 2179
 
21.8%

소속
Text

MISSING 

Distinct1575
Distinct (%)16.5%
Missing450
Missing (%)4.5%
Memory size156.2 KiB
2023-12-12T22:27:19.171105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length18
Mean length8.1419895
Min length1

Characters and Unicode

Total characters77756
Distinct characters472
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique487 ?
Unique (%)5.1%

Sample

1st row대전광역시 장애인수영연맹
2nd row동해일출
3rd row일반(개인)
4th row평택시청
5th row울산장애인컬링팀
ValueCountFrequency (%)
일반(개인 1002
 
9.7%
소속팀없음 551
 
5.3%
울산광역시장애인역도연맹 65
 
0.6%
fc 62
 
0.6%
부산육상 62
 
0.6%
충청남도장애인육상연맹 55
 
0.5%
잠실육상클럽 46
 
0.4%
부산장애인역도연맹 45
 
0.4%
경기도장애인육상연맹 38
 
0.4%
구미혜당학교 37
 
0.4%
Other values (1627) 8408
81.1%
2023-12-12T22:27:19.508786image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5142
 
6.6%
3537
 
4.5%
3494
 
4.5%
1993
 
2.6%
1764
 
2.3%
1658
 
2.1%
1603
 
2.1%
1603
 
2.1%
1569
 
2.0%
1285
 
1.7%
Other values (462) 54108
69.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 72329
93.0%
Uppercase Letter 1330
 
1.7%
Close Punctuation 1264
 
1.6%
Open Punctuation 1259
 
1.6%
Space Separator 829
 
1.1%
Lowercase Letter 569
 
0.7%
Other Punctuation 120
 
0.2%
Decimal Number 32
 
< 0.1%
Dash Punctuation 24
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5142
 
7.1%
3537
 
4.9%
3494
 
4.8%
1993
 
2.8%
1764
 
2.4%
1658
 
2.3%
1603
 
2.2%
1603
 
2.2%
1569
 
2.2%
1285
 
1.8%
Other values (410) 48681
67.3%
Uppercase Letter
ValueCountFrequency (%)
C 501
37.7%
F 435
32.7%
B 149
 
11.2%
D 48
 
3.6%
R 28
 
2.1%
S 25
 
1.9%
G 20
 
1.5%
N 19
 
1.4%
M 19
 
1.4%
K 19
 
1.4%
Other values (11) 67
 
5.0%
Lowercase Letter
ValueCountFrequency (%)
n 83
14.6%
i 65
11.4%
a 51
 
9.0%
l 49
 
8.6%
o 39
 
6.9%
c 37
 
6.5%
e 33
 
5.8%
u 30
 
5.3%
m 29
 
5.1%
y 22
 
3.9%
Other values (8) 131
23.0%
Decimal Number
ValueCountFrequency (%)
7 18
56.2%
1 6
 
18.8%
8 5
 
15.6%
2 1
 
3.1%
3 1
 
3.1%
5 1
 
3.1%
Other Punctuation
ValueCountFrequency (%)
. 113
94.2%
& 6
 
5.0%
, 1
 
0.8%
Close Punctuation
ValueCountFrequency (%)
) 1264
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1259
100.0%
Space Separator
ValueCountFrequency (%)
829
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 24
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 72329
93.0%
Common 3528
 
4.5%
Latin 1899
 
2.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5142
 
7.1%
3537
 
4.9%
3494
 
4.8%
1993
 
2.8%
1764
 
2.4%
1658
 
2.3%
1603
 
2.2%
1603
 
2.2%
1569
 
2.2%
1285
 
1.8%
Other values (410) 48681
67.3%
Latin
ValueCountFrequency (%)
C 501
26.4%
F 435
22.9%
B 149
 
7.8%
n 83
 
4.4%
i 65
 
3.4%
a 51
 
2.7%
l 49
 
2.6%
D 48
 
2.5%
o 39
 
2.1%
c 37
 
1.9%
Other values (29) 442
23.3%
Common
ValueCountFrequency (%)
) 1264
35.8%
( 1259
35.7%
829
23.5%
. 113
 
3.2%
- 24
 
0.7%
7 18
 
0.5%
1 6
 
0.2%
& 6
 
0.2%
8 5
 
0.1%
, 1
 
< 0.1%
Other values (3) 3
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 72329
93.0%
ASCII 5427
 
7.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5142
 
7.1%
3537
 
4.9%
3494
 
4.8%
1993
 
2.8%
1764
 
2.4%
1658
 
2.3%
1603
 
2.2%
1603
 
2.2%
1569
 
2.2%
1285
 
1.8%
Other values (410) 48681
67.3%
ASCII
ValueCountFrequency (%)
) 1264
23.3%
( 1259
23.2%
829
15.3%
C 501
 
9.2%
F 435
 
8.0%
B 149
 
2.7%
. 113
 
2.1%
n 83
 
1.5%
i 65
 
1.2%
a 51
 
0.9%
Other values (42) 678
12.5%

Correlations

2023-12-12T22:27:19.587349image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대회명종목명시도명성별등록구분선수구분
대회명1.0000.7870.1280.0000.1220.317
종목명0.7871.0000.3780.3490.5210.652
시도명0.1280.3781.0000.0820.0820.082
성별0.0000.3490.0821.0000.0390.025
등록구분0.1220.5210.0820.0391.0000.030
선수구분0.3170.6520.0820.0250.0301.000
2023-12-12T22:27:19.676789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대회명등록구분성별시도명종목명선수구분
대회명1.0000.0960.0000.0390.3210.250
등록구분0.0961.0000.0250.0730.4160.019
성별0.0000.0251.0000.0730.2780.016
시도명0.0390.0730.0731.0000.1040.072
종목명0.3210.4160.2780.1041.0000.526
선수구분0.2500.0190.0160.0720.5261.000
2023-12-12T22:27:19.766013image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대회명종목명시도명성별등록구분선수구분
대회명1.0000.3210.0390.0000.0960.250
종목명0.3211.0000.1040.2780.4160.526
시도명0.0390.1041.0000.0730.0730.072
성별0.0000.2780.0731.0000.0250.016
등록구분0.0960.4160.0730.0251.0000.019
선수구분0.2500.5260.0720.0160.0191.000

Missing values

2023-12-12T22:27:17.664947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T22:27:17.802744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

대회명종목명시도명성별등록구분선수구분소속
682137회 전국장애인체육대회사이클제주여자비장애인선수선수부<NA>
3369041회 전국장애인체육대회수영대전남자선수선수부대전광역시 장애인수영연맹
385236회 전국장애인체육대회볼링강원남자선수선수부동해일출
4585917회 전국장애학생체육대회e스포츠경기남자선수선수부일반(개인)
3815342회 전국장애인체육대회유도경기남자선수선수부평택시청
4516320회 전국장애인동계체육대회컬링울산남자선수선수부울산장애인컬링팀
2465811회 전국장애학생체육대회육상필드충남여자선수선수부<NA>
4217515회 전국장애학생체육대회축구울산남자선수선수부울산지적축구학생부
2053139회 전국장애인체육대회역도경기여자선수동호인부경기도장애인역도연맹
2465211회 전국장애학생체육대회육상필드인천남자선수선수부<NA>
대회명종목명시도명성별등록구분선수구분소속
4142742회 전국장애인체육대회휠체어럭비서울남자선수동호인부송파 한석맨파워
5279943회 전국장애인체육대회테니스서울여자선수선수부드림휠테니스클럽
1049837회 전국장애인체육대회배구인천남자선수선수부<NA>
2078939회 전국장애인체육대회조정경북여자선수선수부경상북도장애인조정연맹
1909839회 전국장애인체육대회축구경남남자선수선수부창원농아FC
1920539회 전국장애인체육대회축구강원남자선수선수부강농 F.C
2377816회 전국장애인동계체육대회크로스컨트리스키전북남자선수선수부전북장애인크로스컨트리스키팀
1393138회 전국장애인체육대회론볼서울여자선수동호인부송파론볼클럽
4951543회 전국장애인체육대회론볼전북여자선수선수부전북장애인론볼연맹
4984243회 전국장애인체육대회파크골프경기남자선수선수부화성장애인골프협회

Duplicate rows

Most frequently occurring

대회명종목명시도명성별등록구분선수구분소속# duplicates
98838회 전국장애인체육대회탁구전북남자선수선수부전라북도장애인탁구협회8
33317회 전국장애학생체육대회육상트랙충남남자선수선수부일반(개인)7
56436회 전국장애인체육대회축구충남남자선수동호인부충청남도서부 장애인종합복지관 축구팀7
73137회 전국장애인체육대회축구경기남자선수선수부용인시농아인축구클럽7
118039회 전국장애인체육대회축구대구남자선수동호인부지적드림FC7
140541회 전국장애인체육대회축구울산남자선수선수부울산드래곤축구회(청각)7
180243회 전국장애인체육대회육상필드경기남자선수선수부경기도장애인육상연맹7
011회 전국장애학생체육대회e스포츠경기남자선수선수부일반(개인)6
13313회 전국장애학생체육대회육상트랙부산남자선수선수부부산육상6
22616회 전국장애학생체육대회e스포츠경기남자선수선수부소속팀없음6