Overview

Dataset statistics

Number of variables4
Number of observations206
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.0 KiB
Average record size in memory34.6 B

Variable types

Text2
Numeric2

Alerts

상병코드 has unique valuesUnique
상병명 has unique valuesUnique
국비 has 108 (52.4%) zerosZeros
사비 has 48 (23.3%) zerosZeros

Reproduction

Analysis started2024-03-18 04:38:28.827194
Analysis finished2024-03-18 04:38:29.855744
Duration1.03 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

상병코드
Text

UNIQUE 

Distinct206
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
2024-03-18T13:38:30.145858image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length4
Mean length4.4514563
Min length3

Characters and Unicode

Total characters917
Distinct characters29
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique206 ?
Unique (%)100.0%

Sample

1st rowA090
2nd rowA099
3rd rowA319
4th rowB701
5th rowC159
ValueCountFrequency (%)
a090 1
 
0.5%
m509 1
 
0.5%
m512 1
 
0.5%
m4809 1
 
0.5%
m4836 1
 
0.5%
m4854 1
 
0.5%
m4855 1
 
0.5%
m4856 1
 
0.5%
m4859 1
 
0.5%
m500 1
 
0.5%
Other values (196) 196
95.1%
2024-03-18T13:38:30.752761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 127
13.8%
1 98
10.7%
9 87
9.5%
2 78
 
8.5%
8 66
 
7.2%
4 60
 
6.5%
3 58
 
6.3%
5 55
 
6.0%
6 44
 
4.8%
M 39
 
4.3%
Other values (19) 205
22.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 711
77.5%
Uppercase Letter 206
 
22.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M 39
18.9%
K 22
10.7%
S 22
10.7%
E 15
 
7.3%
N 13
 
6.3%
G 13
 
6.3%
C 13
 
6.3%
D 12
 
5.8%
J 11
 
5.3%
R 11
 
5.3%
Other values (9) 35
17.0%
Decimal Number
ValueCountFrequency (%)
0 127
17.9%
1 98
13.8%
9 87
12.2%
2 78
11.0%
8 66
9.3%
4 60
8.4%
3 58
8.2%
5 55
7.7%
6 44
 
6.2%
7 38
 
5.3%

Most occurring scripts

ValueCountFrequency (%)
Common 711
77.5%
Latin 206
 
22.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
M 39
18.9%
K 22
10.7%
S 22
10.7%
E 15
 
7.3%
N 13
 
6.3%
G 13
 
6.3%
C 13
 
6.3%
D 12
 
5.8%
J 11
 
5.3%
R 11
 
5.3%
Other values (9) 35
17.0%
Common
ValueCountFrequency (%)
0 127
17.9%
1 98
13.8%
9 87
12.2%
2 78
11.0%
8 66
9.3%
4 60
8.4%
3 58
8.2%
5 55
7.7%
6 44
 
6.2%
7 38
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 917
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 127
13.8%
1 98
10.7%
9 87
9.5%
2 78
 
8.5%
8 66
 
7.2%
4 60
 
6.5%
3 58
 
6.3%
5 55
 
6.0%
6 44
 
4.8%
M 39
 
4.3%
Other values (19) 205
22.4%

상병명
Text

UNIQUE 

Distinct206
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
2024-03-18T13:38:30.993674image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length66
Median length35.5
Mean length17.436893
Min length2

Characters and Unicode

Total characters3592
Distinct characters340
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique206 ?
Unique (%)100.0%

Sample

1st row감염성 기원의 기타 및 상세불명의 위장염 및 결장염
2nd row상세불명 기원의 위장염 및 결장염
3rd row상세불명의 마이코박테리아감염
4th row고충증
5th row상세불명의 식도의 악성 신생물
ValueCountFrequency (%)
상세불명의 63
 
7.5%
40
 
4.7%
기타 37
 
4.4%
동반한 24
 
2.8%
신생물 22
 
2.6%
않은 18
 
2.1%
상세불명 16
 
1.9%
폐쇄성 14
 
1.7%
악성 12
 
1.4%
당뇨병 11
 
1.3%
Other values (368) 588
69.6%
2024-03-18T13:38:31.377669image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
642
 
17.9%
149
 
4.1%
100
 
2.8%
96
 
2.7%
90
 
2.5%
84
 
2.3%
83
 
2.3%
, 76
 
2.1%
66
 
1.8%
56
 
1.6%
Other values (330) 2150
59.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2683
74.7%
Space Separator 642
 
17.9%
Other Punctuation 85
 
2.4%
Lowercase Letter 77
 
2.1%
Decimal Number 41
 
1.1%
Open Punctuation 20
 
0.6%
Close Punctuation 20
 
0.6%
Uppercase Letter 17
 
0.5%
Dash Punctuation 5
 
0.1%
Math Symbol 2
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
149
 
5.6%
100
 
3.7%
96
 
3.6%
90
 
3.4%
84
 
3.1%
83
 
3.1%
66
 
2.5%
56
 
2.1%
48
 
1.8%
44
 
1.6%
Other values (281) 1867
69.6%
Lowercase Letter
ValueCountFrequency (%)
s 11
14.3%
i 9
11.7%
r 8
10.4%
t 7
9.1%
e 7
9.1%
d 6
7.8%
a 5
6.5%
o 5
6.5%
n 4
 
5.2%
p 4
 
5.2%
Other values (8) 11
14.3%
Decimal Number
ValueCountFrequency (%)
2 11
26.8%
1 8
19.5%
9 5
12.2%
3 5
12.2%
0 5
12.2%
8 2
 
4.9%
5 2
 
4.9%
4 1
 
2.4%
7 1
 
2.4%
6 1
 
2.4%
Uppercase Letter
ValueCountFrequency (%)
G 6
35.3%
P 2
 
11.8%
N 2
 
11.8%
F 1
 
5.9%
U 1
 
5.9%
O 1
 
5.9%
C 1
 
5.9%
M 1
 
5.9%
S 1
 
5.9%
D 1
 
5.9%
Other Punctuation
ValueCountFrequency (%)
, 76
89.4%
* 5
 
5.9%
. 3
 
3.5%
1
 
1.2%
Open Punctuation
ValueCountFrequency (%)
( 17
85.0%
[ 3
 
15.0%
Close Punctuation
ValueCountFrequency (%)
) 17
85.0%
] 3
 
15.0%
Space Separator
ValueCountFrequency (%)
642
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%
Math Symbol
ValueCountFrequency (%)
+ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2681
74.6%
Common 815
 
22.7%
Latin 94
 
2.6%
Han 2
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
149
 
5.6%
100
 
3.7%
96
 
3.6%
90
 
3.4%
84
 
3.1%
83
 
3.1%
66
 
2.5%
56
 
2.1%
48
 
1.8%
44
 
1.6%
Other values (279) 1865
69.6%
Latin
ValueCountFrequency (%)
s 11
11.7%
i 9
 
9.6%
r 8
 
8.5%
t 7
 
7.4%
e 7
 
7.4%
G 6
 
6.4%
d 6
 
6.4%
a 5
 
5.3%
o 5
 
5.3%
n 4
 
4.3%
Other values (18) 26
27.7%
Common
ValueCountFrequency (%)
642
78.8%
, 76
 
9.3%
( 17
 
2.1%
) 17
 
2.1%
2 11
 
1.3%
1 8
 
1.0%
9 5
 
0.6%
- 5
 
0.6%
* 5
 
0.6%
3 5
 
0.6%
Other values (11) 24
 
2.9%
Han
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2681
74.6%
ASCII 908
 
25.3%
CJK 2
 
0.1%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
642
70.7%
, 76
 
8.4%
( 17
 
1.9%
) 17
 
1.9%
2 11
 
1.2%
s 11
 
1.2%
i 9
 
1.0%
1 8
 
0.9%
r 8
 
0.9%
t 7
 
0.8%
Other values (38) 102
 
11.2%
Hangul
ValueCountFrequency (%)
149
 
5.6%
100
 
3.7%
96
 
3.6%
90
 
3.4%
84
 
3.1%
83
 
3.1%
66
 
2.5%
56
 
2.1%
48
 
1.8%
44
 
1.6%
Other values (279) 1865
69.6%
CJK
ValueCountFrequency (%)
1
50.0%
1
50.0%
Punctuation
ValueCountFrequency (%)
1
100.0%

국비
Real number (ℝ)

ZEROS 

Distinct13
Distinct (%)6.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.3592233
Minimum0
Maximum42
Zeros108
Zeros (%)52.4%
Negative0
Negative (%)0.0%
Memory size1.9 KiB
2024-03-18T13:38:31.494832image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile4
Maximum42
Range42
Interquartile range (IQR)1

Descriptive statistics

Standard deviation4.1067568
Coefficient of variation (CV)3.0213996
Kurtosis55.624989
Mean1.3592233
Median Absolute Deviation (MAD)0
Skewness6.8234862
Sum280
Variance16.865451
MonotonicityNot monotonic
2024-03-18T13:38:31.604757image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
0 108
52.4%
1 64
31.1%
2 12
 
5.8%
3 9
 
4.4%
4 3
 
1.5%
5 2
 
1.0%
6 2
 
1.0%
11 1
 
0.5%
17 1
 
0.5%
20 1
 
0.5%
Other values (3) 3
 
1.5%
ValueCountFrequency (%)
0 108
52.4%
1 64
31.1%
2 12
 
5.8%
3 9
 
4.4%
4 3
 
1.5%
5 2
 
1.0%
6 2
 
1.0%
11 1
 
0.5%
15 1
 
0.5%
17 1
 
0.5%
ValueCountFrequency (%)
42 1
 
0.5%
26 1
 
0.5%
20 1
 
0.5%
17 1
 
0.5%
15 1
 
0.5%
11 1
 
0.5%
6 2
 
1.0%
5 2
 
1.0%
4 3
 
1.5%
3 9
4.4%

사비
Real number (ℝ)

ZEROS 

Distinct13
Distinct (%)6.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.1941748
Minimum0
Maximum64
Zeros48
Zeros (%)23.3%
Negative0
Negative (%)0.0%
Memory size1.9 KiB
2024-03-18T13:38:31.705197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q32
95-th percentile5
Maximum64
Range64
Interquartile range (IQR)1

Descriptive statistics

Standard deviation5.7831686
Coefficient of variation (CV)2.6356919
Kurtosis69.065346
Mean2.1941748
Median Absolute Deviation (MAD)1
Skewness7.5408784
Sum452
Variance33.445039
MonotonicityNot monotonic
2024-03-18T13:38:31.795581image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
1 93
45.1%
0 48
23.3%
2 31
 
15.0%
3 13
 
6.3%
4 7
 
3.4%
5 5
 
2.4%
6 2
 
1.0%
29 2
 
1.0%
19 1
 
0.5%
21 1
 
0.5%
Other values (3) 3
 
1.5%
ValueCountFrequency (%)
0 48
23.3%
1 93
45.1%
2 31
 
15.0%
3 13
 
6.3%
4 7
 
3.4%
5 5
 
2.4%
6 2
 
1.0%
7 1
 
0.5%
19 1
 
0.5%
21 1
 
0.5%
ValueCountFrequency (%)
64 1
 
0.5%
29 2
 
1.0%
24 1
 
0.5%
21 1
 
0.5%
19 1
 
0.5%
7 1
 
0.5%
6 2
 
1.0%
5 5
 
2.4%
4 7
3.4%
3 13
6.3%

Interactions

2024-03-18T13:38:29.557266image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-18T13:38:29.396303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-18T13:38:29.635742image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-18T13:38:29.478982image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-18T13:38:31.863008image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
국비사비
국비1.0000.816
사비0.8161.000
2024-03-18T13:38:31.932579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
국비사비
국비1.000-0.133
사비-0.1331.000

Missing values

2024-03-18T13:38:29.743626image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-18T13:38:29.821949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

상병코드상병명국비사비
0A090감염성 기원의 기타 및 상세불명의 위장염 및 결장염12
1A099상세불명 기원의 위장염 및 결장염23
2A319상세불명의 마이코박테리아감염10
3B701고충증02
4C159상세불명의 식도의 악성 신생물01
5C1691상세불명의 위의 악성 신생물, 진행형10
6C189상세불명의 결장의 악성 신생물12
7C220간세포암종의 악성 신생물03
8C252췌장의 꼬리의 악성 신생물02
9C3491상세불명의 기관지 또는 폐의 악성 신생물, 오른쪽40
상병코드상병명국비사비
196S7600엉덩이의 근육 및 힘줄의 손상, 열상01
197S82320비골골절(모든 부분)을 동반한 경골 몸통의 골절, 폐쇄성11
198S82430비골만의 몸통의 골절, 폐쇄성01
199S82820양측 복사골절, 발목, 폐쇄성01
200S82830삼복사골절, 발목, 폐쇄성02
201S8320내측반달연골의 찢김01
202T814달리 분류되지 않은 처치에 따른 감염01
203U071바이러스가 확인된 코로나바이러스 질환 2019 [바이러스가 확인된 코로나-19]364
204Z048기타 명시된 이유의 검사 및 관찰10
205Z470골절판 및 기타 내부고정장치의 제거를 포함한 추적치료를 위하여 보건서비스와 접하고 있는 사람01