Overview

Dataset statistics

Number of variables7
Number of observations278
Missing cells0
Missing cells (%)0.0%
Duplicate rows25
Duplicate rows (%)9.0%
Total size in memory16.1 KiB
Average record size in memory59.5 B

Variable types

Numeric1
Categorical6

Dataset

Description국립정신건강센터 입원환자 기초정보 데이터 입니다. 연령대, 성별, 상병코드, 상병명, 보험유형, 진료연도, 진료월 등이 포함되어 있습니다.
Author보건복지부 국립정신건강센터
URLhttps://www.data.go.kr/data/15059739/fileData.do

Alerts

진료년도 has constant value ""Constant
Dataset has 25 (9.0%) duplicate rowsDuplicates
상병명 is highly overall correlated with 상병코드High correlation
상병코드 is highly overall correlated with 상병명High correlation

Reproduction

Analysis started2023-12-12 09:09:53.566948
Analysis finished2023-12-12 09:09:54.389088
Duration0.82 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연령대
Real number (ℝ)

Distinct8
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.417266
Minimum10
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.6 KiB
2023-12-12T18:09:54.452690image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile20
Q130
median50
Q350
95-th percentile70
Maximum80
Range70
Interquartile range (IQR)20

Descriptive statistics

Standard deviation15.512147
Coefficient of variation (CV)0.3572806
Kurtosis-0.72539903
Mean43.417266
Median Absolute Deviation (MAD)10
Skewness-0.057879162
Sum12070
Variance240.6267
MonotonicityIncreasing
2023-12-12T18:09:54.591048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
50 75
27.0%
60 54
19.4%
30 50
18.0%
40 42
15.1%
20 38
13.7%
70 10
 
3.6%
80 5
 
1.8%
10 4
 
1.4%
ValueCountFrequency (%)
10 4
 
1.4%
20 38
13.7%
30 50
18.0%
40 42
15.1%
50 75
27.0%
60 54
19.4%
70 10
 
3.6%
80 5
 
1.8%
ValueCountFrequency (%)
80 5
 
1.8%
70 10
 
3.6%
60 54
19.4%
50 75
27.0%
40 42
15.1%
30 50
18.0%
20 38
13.7%
10 4
 
1.4%

성별
Categorical

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
162 
116 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
162
58.3%
116
41.7%

Length

2023-12-12T18:09:54.748892image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:09:54.855396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
162
58.3%
116
41.7%

상병코드
Categorical

HIGH CORRELATION 

Distinct39
Distinct (%)14.0%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
U07.1
101 
F20.0
24 
F20.9
23 
F29
17 
F31.2
12 
Other values (34)
101 

Length

Max length5
Median length5
Mean length4.8489209
Min length3

Unique

Unique14 ?
Unique (%)5.0%

Sample

1st rowF31.9
2nd rowF32.1
3rd rowF91.8
4th rowF91.8
5th rowF10.2

Common Values

ValueCountFrequency (%)
U07.1 101
36.3%
F20.0 24
 
8.6%
F20.9 23
 
8.3%
F29 17
 
6.1%
F31.2 12
 
4.3%
F32.1 12
 
4.3%
F10.2 9
 
3.2%
F32.9 7
 
2.5%
F32.2 7
 
2.5%
F31.9 7
 
2.5%
Other values (29) 59
21.2%

Length

2023-12-12T18:09:55.015421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
u07.1 101
36.3%
f20.0 24
 
8.6%
f20.9 23
 
8.3%
f29 17
 
6.1%
f31.2 12
 
4.3%
f32.1 12
 
4.3%
f10.2 9
 
3.2%
f32.9 7
 
2.5%
f32.2 7
 
2.5%
f31.9 7
 
2.5%
Other values (29) 59
21.2%

상병명
Categorical

HIGH CORRELATION 

Distinct40
Distinct (%)14.4%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
Coronavirus disease 2019, virus identified [COVID-19, virus identified]
98 
Paranoid schizophrenia
24 
Schizophrenia, unspecified
23 
Unspecified nonorganic psychosis
17 
Bipolar affective disorder, current episode manic with psychotic symptoms
12 
Other values (35)
104 

Length

Max length104
Median length97
Mean length49.568345
Min length19

Unique

Unique14 ?
Unique (%)5.0%

Sample

1st rowBipolar affective disorder, unspecified
2nd rowModerate depressive episode
3rd rowOther conduct disorders
4th rowOther conduct disorders
5th rowDependence syndrome of alcohol

Common Values

ValueCountFrequency (%)
Coronavirus disease 2019, virus identified [COVID-19, virus identified] 98
35.3%
Paranoid schizophrenia 24
 
8.6%
Schizophrenia, unspecified 23
 
8.3%
Unspecified nonorganic psychosis 17
 
6.1%
Bipolar affective disorder, current episode manic with psychotic symptoms 12
 
4.3%
Moderate depressive episode 12
 
4.3%
Dependence syndrome of alcohol 9
 
3.2%
Depressive episode, unspecified 7
 
2.5%
Severe depressive episode without psychotic symptoms 7
 
2.5%
Bipolar affective disorder, unspecified 7
 
2.5%
Other values (30) 62
22.3%

Length

2023-12-12T18:09:55.190267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
virus 196
 
12.8%
identified 196
 
12.8%
disease 103
 
6.7%
coronavirus 101
 
6.6%
2019 98
 
6.4%
covid-19 98
 
6.4%
unspecified 62
 
4.0%
schizophrenia 53
 
3.5%
episode 51
 
3.3%
disorder 47
 
3.1%
Other values (68) 530
34.5%

보험유형
Categorical

Distinct6
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
건강보험
129 
의료급여1종
118 
의료급여2종
16 
의료급여2종장애
 
8
건강보험장애인
 
4

Length

Max length8
Median length7
Mean length5.1007194
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row건강보험
2nd row건강보험
3rd row건강보험
4th row건강보험
5th row건강보험

Common Values

ValueCountFrequency (%)
건강보험 129
46.4%
의료급여1종 118
42.4%
의료급여2종 16
 
5.8%
의료급여2종장애 8
 
2.9%
건강보험장애인 4
 
1.4%
일반 3
 
1.1%

Length

2023-12-12T18:09:55.363030image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:09:55.502593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
건강보험 129
46.4%
의료급여1종 118
42.4%
의료급여2종 16
 
5.8%
의료급여2종장애 8
 
2.9%
건강보험장애인 4
 
1.4%
일반 3
 
1.1%

진료년도
Categorical

CONSTANT 

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
2020
278 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020
2nd row2020
3rd row2020
4th row2020
5th row2020

Common Values

ValueCountFrequency (%)
2020 278
100.0%

Length

2023-12-12T18:09:55.665380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:09:55.766704image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020 278
100.0%

진료월
Categorical

Distinct5
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
4
74 
8
73 
5
57 
7
45 
6
29 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row7
2nd row5
3rd row6
4th row8
5th row4

Common Values

ValueCountFrequency (%)
4 74
26.6%
8 73
26.3%
5 57
20.5%
7 45
16.2%
6 29
 
10.4%

Length

2023-12-12T18:09:55.890619image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:09:56.028233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
4 74
26.6%
8 73
26.3%
5 57
20.5%
7 45
16.2%
6 29
 
10.4%

Interactions

2023-12-12T18:09:54.013082image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T18:09:56.168670image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연령대성별상병코드상병명보험유형진료월
연령대1.0000.0000.7110.7420.2300.140
성별0.0001.0000.4070.4320.2510.000
상병코드0.7110.4071.0001.0000.7240.579
상병명0.7420.4321.0001.0000.7170.611
보험유형0.2300.2510.7240.7171.0000.304
진료월0.1400.0000.5790.6110.3041.000
2023-12-12T18:09:56.320945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
진료월상병명상병코드보험유형성별
진료월1.0000.2920.2910.2100.000
상병명0.2921.0000.9980.3810.320
상병코드0.2910.9981.0000.3850.318
보험유형0.2100.3810.3851.0000.179
성별0.0000.3200.3180.1791.000
2023-12-12T18:09:56.463099image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연령대성별상병코드상병명보험유형진료월
연령대1.0000.0000.3410.3440.1290.085
성별0.0001.0000.3180.3200.1790.000
상병코드0.3410.3181.0000.9980.3850.291
상병명0.3440.3200.9981.0000.3810.292
보험유형0.1290.1790.3850.3811.0000.210
진료월0.0850.0000.2910.2920.2101.000

Missing values

2023-12-12T18:09:54.183102image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T18:09:54.334409image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연령대성별상병코드상병명보험유형진료년도진료월
010F31.9Bipolar affective disorder, unspecified건강보험20207
110F32.1Moderate depressive episode건강보험20205
210F91.8Other conduct disorders건강보험20206
310F91.8Other conduct disorders건강보험20208
420F10.2Dependence syndrome of alcohol건강보험20204
520F20.0Paranoid schizophrenia건강보험20207
620F20.0Paranoid schizophrenia건강보험20208
720F20.9Schizophrenia, unspecified건강보험20208
820F25.1Schizoaffective disorder, depressive type건강보험20207
920F25.1Schizoaffective disorder, depressive type건강보험20208
연령대성별상병코드상병명보험유형진료년도진료월
26870U07.1Coronavirus disease 2019[COVID-19]의료급여1종20204
26970U07.1Coronavirus disease 2019, virus identified [COVID-19, virus identified]의료급여1종20204
27070U07.1Coronavirus disease 2019, virus identified [COVID-19, virus identified]의료급여1종20205
27170F25.1Schizoaffective disorder, depressive type건강보험20204
27270F32.1Moderate depressive episode건강보험20207
27380F00.1Dementia in Alzheimer’s disease with late onset(G30.1†)건강보험20208
27480F10.2Dependence syndrome of alcohol건강보험20207
27580F10.2Dependence syndrome of alcohol건강보험20208
27680F22.0Delusional disorder건강보험20204
27780F32.1Moderate depressive episode의료급여1종20208

Duplicate rows

Most frequently occurring

연령대성별상병코드상병명보험유형진료년도진료월# duplicates
1450U07.1Coronavirus disease 2019, virus identified [COVID-19, virus identified]의료급여1종202049
2160U07.1Coronavirus disease 2019, virus identified [COVID-19, virus identified]의료급여1종202048
2260U07.1Coronavirus disease 2019, virus identified [COVID-19, virus identified]의료급여1종202057
1550U07.1Coronavirus disease 2019, virus identified [COVID-19, virus identified]의료급여1종202056
1750U07.1Coronavirus disease 2019, virus identified [COVID-19, virus identified]의료급여1종202046
1850U07.1Coronavirus disease 2019, virus identified [COVID-19, virus identified]의료급여1종202055
730U07.1Coronavirus disease 2019, virus identified [COVID-19, virus identified]의료급여1종202044
2360U07.1Coronavirus disease 2019, virus identified [COVID-19, virus identified]의료급여1종202044
2460U07.1Coronavirus disease 2019, virus identified [COVID-19, virus identified]의료급여1종202054
630U07.1Coronavirus disease 2019, virus identified [COVID-19, virus identified]건강보험202083