Overview

Dataset statistics

Number of variables7
Number of observations8316
Missing cells3
Missing cells (%)< 0.1%
Duplicate rows1968
Duplicate rows (%)23.7%
Total size in memory479.3 KiB
Average record size in memory59.0 B

Variable types

Numeric2
Categorical5

Dataset

Description보건 복지부에서 조혈모 세포 기증 현황에 대해서 (남,녀 성별, 연령, 시도, 기증 년 월, 구분) 정보를 제공합니다.
Author보건복지부
URLhttps://www.data.go.kr/data/15075226/fileData.do

Alerts

건수 has constant value ""Constant
Dataset has 1968 (23.7%) duplicate rowsDuplicates
기증연 is highly overall correlated with 장기High correlation
장기 is highly overall correlated with 기증연High correlation
장기 is highly imbalanced (86.2%)Imbalance

Reproduction

Analysis started2023-12-23 07:57:52.144040
Analysis finished2023-12-23 07:58:00.781023
Duration8.64 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

기증연
Real number (ℝ)

HIGH CORRELATION 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2020.3011
Minimum2017
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size73.2 KiB
2023-12-23T07:58:01.036865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2017
5-th percentile2018
Q12019
median2020
Q32021
95-th percentile2022
Maximum2022
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.2727794
Coefficient of variation (CV)0.00062999492
Kurtosis-0.91136347
Mean2020.3011
Median Absolute Deviation (MAD)1
Skewness-0.24659832
Sum16800824
Variance1.6199675
MonotonicityNot monotonic
2023-12-23T07:58:01.657351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
2021 2062
24.8%
2020 2004
24.1%
2022 1824
21.9%
2019 1696
20.4%
2018 680
 
8.2%
2017 50
 
0.6%
ValueCountFrequency (%)
2017 50
 
0.6%
2018 680
 
8.2%
2019 1696
20.4%
2020 2004
24.1%
2021 2062
24.8%
2022 1824
21.9%
ValueCountFrequency (%)
2022 1824
21.9%
2021 2062
24.8%
2020 2004
24.1%
2019 1696
20.4%
2018 680
 
8.2%
2017 50
 
0.6%

장기
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size65.1 KiB
말초혈
8155 
골수
 
161

Length

Max length3
Median length3
Mean length2.9806397
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row말초혈
2nd row말초혈
3rd row말초혈
4th row말초혈
5th row말초혈

Common Values

ValueCountFrequency (%)
말초혈 8155
98.1%
골수 161
 
1.9%

Length

2023-12-23T07:58:02.618597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-23T07:58:03.578198image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
말초혈 8155
98.1%
골수 161
 
1.9%

시도
Categorical

Distinct13
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size65.1 KiB
서울
4900 
경기
1063 
부산
566 
대구
 
429
대전
 
351
Other values (8)
1007 

Length

Max length4
Median length2
Mean length2.0084175
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울
2nd row서울
3rd row서울
4th row서울
5th row서울

Common Values

ValueCountFrequency (%)
서울 4900
58.9%
경기 1063
 
12.8%
부산 566
 
6.8%
대구 429
 
5.2%
대전 351
 
4.2%
인천 233
 
2.8%
전남 221
 
2.7%
경남 150
 
1.8%
울산 147
 
1.8%
전북 139
 
1.7%
Other values (3) 117
 
1.4%

Length

2023-12-23T07:58:04.017649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서울 4900
58.9%
경기 1063
 
12.8%
부산 566
 
6.8%
대구 429
 
5.2%
대전 351
 
4.2%
인천 233
 
2.8%
전남 221
 
2.7%
경남 150
 
1.8%
울산 147
 
1.8%
전북 139
 
1.7%
Other values (3) 117
 
1.4%

성별
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size65.1 KiB
남자
5466 
여자
2850 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row여자
2nd row남자
3rd row남자
4th row남자
5th row남자

Common Values

ValueCountFrequency (%)
남자 5466
65.7%
여자 2850
34.3%

Length

2023-12-23T07:58:05.040032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-23T07:58:05.652039image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
남자 5466
65.7%
여자 2850
34.3%

연령
Real number (ℝ)

Distinct76
Distinct (%)0.9%
Missing3
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean34.334657
Minimum0
Maximum79
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size73.2 KiB
2023-12-23T07:58:06.304576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile19
Q126
median32
Q342
95-th percentile57
Maximum79
Range79
Interquartile range (IQR)16

Descriptive statistics

Standard deviation11.92171
Coefficient of variation (CV)0.34722088
Kurtosis-0.0070940693
Mean34.334657
Median Absolute Deviation (MAD)8
Skewness0.5167843
Sum285424
Variance142.12716
MonotonicityNot monotonic
2023-12-23T07:58:06.992758image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28 383
 
4.6%
27 372
 
4.5%
24 354
 
4.3%
29 333
 
4.0%
26 330
 
4.0%
25 318
 
3.8%
23 301
 
3.6%
30 299
 
3.6%
33 291
 
3.5%
38 256
 
3.1%
Other values (66) 5076
61.0%
ValueCountFrequency (%)
0 1
 
< 0.1%
1 4
 
< 0.1%
2 9
0.1%
3 9
0.1%
4 9
0.1%
5 9
0.1%
6 9
0.1%
7 16
0.2%
8 13
0.2%
9 13
0.2%
ValueCountFrequency (%)
79 1
 
< 0.1%
74 1
 
< 0.1%
73 4
 
< 0.1%
72 4
 
< 0.1%
71 3
 
< 0.1%
70 3
 
< 0.1%
69 3
 
< 0.1%
68 5
 
0.1%
67 15
0.2%
66 18
0.2%

혈액형
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size65.1 KiB
A
2940 
O
2321 
B
2189 
AB
866 

Length

Max length2
Median length1
Mean length1.1041366
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowA
4th rowB
5th rowB

Common Values

ValueCountFrequency (%)
A 2940
35.4%
O 2321
27.9%
B 2189
26.3%
AB 866
 
10.4%

Length

2023-12-23T07:58:08.025290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-23T07:58:09.358150image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
a 2940
35.4%
o 2321
27.9%
b 2189
26.3%
ab 866
 
10.4%

건수
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size65.1 KiB
1
8316 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 8316
100.0%

Length

2023-12-23T07:58:10.015757image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-23T07:58:10.386299image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 8316
100.0%

Interactions

2023-12-23T07:57:57.464835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-23T07:57:54.898133image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-23T07:57:58.492369image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-23T07:57:56.237647image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-23T07:58:10.777841image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
기증연장기시도성별연령혈액형
기증연1.0000.0940.1060.0400.0870.000
장기0.0941.0000.0960.0280.4080.000
시도0.1060.0961.0000.0890.1170.062
성별0.0400.0280.0891.0000.1540.000
연령0.0870.4080.1170.1541.0000.020
혈액형0.0000.0000.0620.0000.0201.000
2023-12-23T07:58:11.349427image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
장기시도성별혈액형
장기1.0000.0750.0180.000
시도0.0751.0000.0690.029
성별0.0180.0691.0000.000
혈액형0.0000.0290.0001.000
2023-12-23T07:58:11.719567image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
기증연연령장기시도성별혈액형
기증연1.0000.0140.5170.0520.0480.000
연령0.0141.0000.3130.0490.1180.012
장기0.5170.3131.0000.0750.0180.000
시도0.0520.0490.0751.0000.0690.029
성별0.0480.1180.0180.0691.0000.000
혈액형0.0000.0120.0000.0290.0001.000

Missing values

2023-12-23T07:57:59.548306image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-23T07:58:00.353061image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

기증연장기시도성별연령혈액형건수
02018말초혈서울여자22A1
12018말초혈서울남자39A1
22018말초혈서울남자27A1
32018말초혈서울남자41B1
42018말초혈서울남자41B1
52018말초혈서울남자36A1
62018말초혈서울남자49B1
72018말초혈서울여자2B1
82018말초혈서울여자21B1
92018말초혈서울여자55AB1
기증연장기시도성별연령혈액형건수
83062017골수<NA>남자34O1
83072022골수<NA>남자30A1
83082019골수<NA>여자34A1
83092019골수<NA>남자36A1
83102018말초혈<NA>남자21A1
83112018말초혈<NA>여자25A1
83122019골수<NA>여자20B1
83132019말초혈<NA>남자40O1
83142021말초혈<NA>여자29O1
83152020말초혈<NA>남자42AB1

Duplicate rows

Most frequently occurring

기증연장기시도성별연령혈액형건수# duplicates
7792020말초혈서울남자27A121
12652021말초혈서울남자28A120
12722021말초혈서울남자29O118
3502019말초혈서울남자33A116
12532021말초혈서울남자25A116
12692021말초혈서울남자29A116
7862020말초혈서울남자28O115
8082020말초혈서울남자35A115
12672021말초혈서울남자28B115
16992022말초혈서울남자24O115