Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells843
Missing cells (%)1.4%
Duplicate rows56
Duplicate rows (%)0.6%
Total size in memory556.6 KiB
Average record size in memory57.0 B

Variable types

Text2
Categorical1
Numeric1
DateTime2

Dataset

Description도서관 이용자별 대출 데이터현황(2020년 기준) - 책제목,성별, 생년, 대출일자, 반납일자 등
Author서울특별시 동작구
URLhttps://www.data.go.kr/data/15065639/fileData.do

Alerts

Dataset has 56 (0.6%) duplicate rowsDuplicates
반납일 has 833 (8.3%) missing valuesMissing

Reproduction

Analysis started2023-12-12 09:09:51.669883
Analysis finished2023-12-12 09:09:53.429064
Duration1.76 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

서명
Text

Distinct8761
Distinct (%)87.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T18:09:53.783920image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length138
Median length78
Mean length17.6853
Min length1

Characters and Unicode

Total characters176853
Distinct characters1409
Distinct categories17 ?
Distinct scripts4 ?
Distinct blocks9 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7789 ?
Unique (%)77.9%

Sample

1st row날마다 그림
2nd row악몽을 파는 가게 : 스티븐 킹 단편집. 2
3rd rowGulliver's travels
4th row나는 어린이입니다:철학동화
5th row치카치카 군단과 충치왕국
ValueCountFrequency (%)
2008
 
4.6%
the 417
 
1.0%
장편소설 318
 
0.7%
이야기 300
 
0.7%
1 216
 
0.5%
and 179
 
0.4%
2 178
 
0.4%
158
 
0.4%
우리 140
 
0.3%
a 121
 
0.3%
Other values (16984) 39779
90.8%
2023-12-12T18:09:54.546585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
34008
 
19.2%
e 3286
 
1.9%
3197
 
1.8%
: 2684
 
1.5%
2550
 
1.4%
a 2224
 
1.3%
o 2048
 
1.2%
1984
 
1.1%
t 1958
 
1.1%
n 1836
 
1.0%
Other values (1399) 121078
68.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 100728
57.0%
Space Separator 34008
 
19.2%
Lowercase Letter 25391
 
14.4%
Other Punctuation 7077
 
4.0%
Uppercase Letter 3580
 
2.0%
Decimal Number 2665
 
1.5%
Close Punctuation 1472
 
0.8%
Open Punctuation 1472
 
0.8%
Math Symbol 322
 
0.2%
Dash Punctuation 103
 
0.1%
Other values (7) 35
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3197
 
3.2%
2550
 
2.5%
1984
 
2.0%
1579
 
1.6%
1483
 
1.5%
1448
 
1.4%
1415
 
1.4%
1405
 
1.4%
1258
 
1.2%
1201
 
1.2%
Other values (1289) 83208
82.6%
Lowercase Letter
ValueCountFrequency (%)
e 3286
12.9%
a 2224
 
8.8%
o 2048
 
8.1%
t 1958
 
7.7%
n 1836
 
7.2%
r 1744
 
6.9%
s 1719
 
6.8%
i 1693
 
6.7%
h 1346
 
5.3%
l 1061
 
4.2%
Other values (16) 6476
25.5%
Uppercase Letter
ValueCountFrequency (%)
T 453
12.7%
S 318
 
8.9%
B 272
 
7.6%
M 256
 
7.2%
W 206
 
5.8%
D 205
 
5.7%
A 203
 
5.7%
C 198
 
5.5%
P 174
 
4.9%
H 163
 
4.6%
Other values (16) 1132
31.6%
Other Punctuation
ValueCountFrequency (%)
: 2684
37.9%
, 1531
21.6%
. 1031
 
14.6%
! 855
 
12.1%
? 535
 
7.6%
' 250
 
3.5%
· 113
 
1.6%
& 24
 
0.3%
% 10
 
0.1%
; 10
 
0.1%
Other values (8) 34
 
0.5%
Decimal Number
ValueCountFrequency (%)
1 748
28.1%
2 468
17.6%
0 455
17.1%
3 279
 
10.5%
4 175
 
6.6%
5 163
 
6.1%
7 98
 
3.7%
6 96
 
3.6%
9 95
 
3.6%
8 88
 
3.3%
Math Symbol
ValueCountFrequency (%)
= 281
87.3%
~ 25
 
7.8%
+ 5
 
1.6%
> 3
 
0.9%
< 3
 
0.9%
| 3
 
0.9%
× 2
 
0.6%
Close Punctuation
ValueCountFrequency (%)
) 1416
96.2%
] 51
 
3.5%
2
 
0.1%
2
 
0.1%
1
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 1416
96.2%
[ 51
 
3.5%
2
 
0.1%
2
 
0.1%
1
 
0.1%
Letter Number
ValueCountFrequency (%)
7
53.8%
3
23.1%
2
 
15.4%
1
 
7.7%
Currency Symbol
ValueCountFrequency (%)
¤ 1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
34008
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 103
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 7
100.0%
Final Punctuation
ValueCountFrequency (%)
5
100.0%
Initial Punctuation
ValueCountFrequency (%)
4
100.0%
Other Symbol
ValueCountFrequency (%)
2
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 100538
56.8%
Common 47141
26.7%
Latin 28984
 
16.4%
Han 190
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3197
 
3.2%
2550
 
2.5%
1984
 
2.0%
1579
 
1.6%
1483
 
1.5%
1448
 
1.4%
1415
 
1.4%
1405
 
1.4%
1258
 
1.3%
1201
 
1.2%
Other values (1225) 83018
82.6%
Han
ValueCountFrequency (%)
12
 
6.3%
11
 
5.8%
11
 
5.8%
11
 
5.8%
10
 
5.3%
10
 
5.3%
10
 
5.3%
10
 
5.3%
10
 
5.3%
10
 
5.3%
Other values (54) 85
44.7%
Latin
ValueCountFrequency (%)
e 3286
 
11.3%
a 2224
 
7.7%
o 2048
 
7.1%
t 1958
 
6.8%
n 1836
 
6.3%
r 1744
 
6.0%
s 1719
 
5.9%
i 1693
 
5.8%
h 1346
 
4.6%
l 1061
 
3.7%
Other values (46) 10069
34.7%
Common
ValueCountFrequency (%)
34008
72.1%
: 2684
 
5.7%
, 1531
 
3.2%
) 1416
 
3.0%
( 1416
 
3.0%
. 1031
 
2.2%
! 855
 
1.8%
1 748
 
1.6%
? 535
 
1.1%
2 468
 
1.0%
Other values (44) 2449
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 100523
56.8%
ASCII 75950
42.9%
CJK 189
 
0.1%
None 145
 
0.1%
Punctuation 15
 
< 0.1%
Compat Jamo 15
 
< 0.1%
Number Forms 13
 
< 0.1%
Misc Symbols 2
 
< 0.1%
CJK Compat Ideographs 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
34008
44.8%
e 3286
 
4.3%
: 2684
 
3.5%
a 2224
 
2.9%
o 2048
 
2.7%
t 1958
 
2.6%
n 1836
 
2.4%
r 1744
 
2.3%
s 1719
 
2.3%
i 1693
 
2.2%
Other values (77) 22750
30.0%
Hangul
ValueCountFrequency (%)
3197
 
3.2%
2550
 
2.5%
1984
 
2.0%
1579
 
1.6%
1483
 
1.5%
1448
 
1.4%
1415
 
1.4%
1405
 
1.4%
1258
 
1.3%
1201
 
1.2%
Other values (1222) 83003
82.6%
None
ValueCountFrequency (%)
· 113
77.9%
9
 
6.2%
4
 
2.8%
2
 
1.4%
2
 
1.4%
2
 
1.4%
2
 
1.4%
2
 
1.4%
2
 
1.4%
× 2
 
1.4%
Other values (5) 5
 
3.4%
CJK
ValueCountFrequency (%)
12
 
6.3%
11
 
5.8%
11
 
5.8%
11
 
5.8%
10
 
5.3%
10
 
5.3%
10
 
5.3%
10
 
5.3%
10
 
5.3%
10
 
5.3%
Other values (53) 84
44.4%
Number Forms
ValueCountFrequency (%)
7
53.8%
3
23.1%
2
 
15.4%
1
 
7.7%
Punctuation
ValueCountFrequency (%)
6
40.0%
5
33.3%
4
26.7%
Compat Jamo
ValueCountFrequency (%)
5
33.3%
5
33.3%
5
33.3%
Misc Symbols
ValueCountFrequency (%)
2
100.0%
CJK Compat Ideographs
ValueCountFrequency (%)
1
100.0%

성별
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
6003 
3002 
<NA>
995 

Length

Max length4
Median length1
Mean length1.2985
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row<NA>
4th row
5th row

Common Values

ValueCountFrequency (%)
6003
60.0%
3002
30.0%
<NA> 995
 
10.0%

Length

2023-12-12T18:09:54.731754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:09:54.838101image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
6003
60.0%
3002
30.0%
na 995
 
10.0%

생년
Real number (ℝ)

Distinct81
Distinct (%)0.8%
Missing10
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean1987.0931
Minimum1938
Maximum2019
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T18:09:55.010160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1938
5-th percentile1964
Q11976
median1981
Q32007
95-th percentile2013
Maximum2019
Range81
Interquartile range (IQR)31

Descriptive statistics

Standard deviation16.863545
Coefficient of variation (CV)0.0084865401
Kurtosis-0.83210111
Mean1987.0931
Median Absolute Deviation (MAD)7
Skewness0.28709982
Sum19851060
Variance284.37916
MonotonicityNot monotonic
2023-12-12T18:09:55.192027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1979 595
 
5.9%
1978 531
 
5.3%
1982 498
 
5.0%
1980 496
 
5.0%
2012 459
 
4.6%
1977 452
 
4.5%
1981 449
 
4.5%
1976 446
 
4.5%
2011 394
 
3.9%
1975 376
 
3.8%
Other values (71) 5294
52.9%
ValueCountFrequency (%)
1938 5
 
0.1%
1939 2
 
< 0.1%
1940 5
 
0.1%
1942 6
0.1%
1943 14
0.1%
1944 4
 
< 0.1%
1945 6
0.1%
1946 11
0.1%
1947 7
0.1%
1948 5
 
0.1%
ValueCountFrequency (%)
2019 21
 
0.2%
2018 15
 
0.1%
2017 40
 
0.4%
2016 55
 
0.5%
2015 113
 
1.1%
2014 224
2.2%
2013 289
2.9%
2012 459
4.6%
2011 394
3.9%
2010 365
3.6%
Distinct946
Distinct (%)9.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T18:09:55.716151image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length5
Mean length4.5588
Min length1

Characters and Unicode

Total characters45588
Distinct characters17
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique491 ?
Unique (%)4.9%

Sample

1st row652.52
2nd row808
3rd row747
4th row863
5th row813.8
ValueCountFrequency (%)
813.8 1203
 
12.0%
843.6 919
 
9.2%
843 456
 
4.6%
813.7 367
 
3.7%
747 338
 
3.4%
408 284
 
2.8%
808.91 266
 
2.7%
833.8 261
 
2.6%
843.5 229
 
2.3%
808.9 224
 
2.2%
Other values (935) 5453
54.5%
2023-12-12T18:09:56.468236image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8 9889
21.7%
3 6753
14.8%
. 6678
14.6%
1 5342
11.7%
4 3860
 
8.5%
0 2665
 
5.8%
9 2473
 
5.4%
7 2376
 
5.2%
5 2052
 
4.5%
6 1909
 
4.2%
Other values (7) 1591
 
3.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 38903
85.3%
Other Punctuation 6679
 
14.7%
Other Letter 4
 
< 0.1%
Lowercase Letter 1
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
8 9889
25.4%
3 6753
17.4%
1 5342
13.7%
4 3860
 
9.9%
0 2665
 
6.9%
9 2473
 
6.4%
7 2376
 
6.1%
5 2052
 
5.3%
6 1909
 
4.9%
2 1584
 
4.1%
Other Letter
ValueCountFrequency (%)
2
50.0%
1
25.0%
1
25.0%
Other Punctuation
ValueCountFrequency (%)
. 6678
> 99.9%
/ 1
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
b 1
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 45583
> 99.9%
Hangul 4
 
< 0.1%
Latin 1
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
8 9889
21.7%
3 6753
14.8%
. 6678
14.7%
1 5342
11.7%
4 3860
 
8.5%
0 2665
 
5.8%
9 2473
 
5.4%
7 2376
 
5.2%
5 2052
 
4.5%
6 1909
 
4.2%
Other values (3) 1586
 
3.5%
Hangul
ValueCountFrequency (%)
2
50.0%
1
25.0%
1
25.0%
Latin
ValueCountFrequency (%)
b 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 45584
> 99.9%
Compat Jamo 2
 
< 0.1%
Hangul 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
8 9889
21.7%
3 6753
14.8%
. 6678
14.6%
1 5342
11.7%
4 3860
 
8.5%
0 2665
 
5.8%
9 2473
 
5.4%
7 2376
 
5.2%
5 2052
 
4.5%
6 1909
 
4.2%
Other values (4) 1587
 
3.5%
Compat Jamo
ValueCountFrequency (%)
2
100.0%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%
Distinct88
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2020-01-02 00:00:00
Maximum2020-03-31 00:00:00
2023-12-12T18:09:56.662545image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T18:09:56.854860image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

반납일
Date

MISSING 

Distinct99
Distinct (%)1.1%
Missing833
Missing (%)8.3%
Memory size156.2 KiB
Minimum2020-01-02 00:00:00
Maximum2020-04-09 00:00:00
2023-12-12T18:09:57.035258image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T18:09:57.190412image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2023-12-12T18:09:52.988624image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T18:09:57.285333image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성별생년대출일반납일
성별1.0000.2510.1140.084
생년0.2511.0000.1770.237
대출일0.1140.1771.0000.900
반납일0.0840.2370.9001.000
2023-12-12T18:09:57.381614image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
생년성별
생년1.0000.191
성별0.1911.000

Missing values

2023-12-12T18:09:53.140360image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T18:09:53.256798image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T18:09:53.366930image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

서명성별생년분류번호대출일반납일
83023날마다 그림1970652.522020-01-292020-02-14
45957악몽을 파는 가게 : 스티븐 킹 단편집. 219818082020-03-31<NA>
92479Gulliver's travels<NA>20097472020-01-302020-02-13
13652나는 어린이입니다:철학동화20088632020-01-112020-01-21
20327치카치카 군단과 충치왕국2013813.82020-01-032020-01-16
66441Homework!2011843.62020-02-202020-03-31
64114동전 하나로도 행복했던 구멍가게의 날들19958182020-03-032020-03-04
40595백년 목1966514.3212020-01-162020-01-29
36138쿠키 : 한 입의 인생 수업19768432020-01-022020-01-16
7291Dog man. [2], Unleashed1981843.62020-01-292020-02-06
서명성별생년분류번호대출일반납일
105UFO를 따라간 외계인<NA>2009813.82020-02-092020-03-03
36028초록은 어디에 있을까?2015650.82020-02-042020-02-13
63613나의 뇌는 특별하다 : 템플 그랜딘의 자폐성 뇌 이야기1978513.8962020-02-142020-02-20
21821성性 정치학19713372020-03-072020-03-14
2383Little Beauty2011375.12020-01-292020-02-14
26681엉덩이탐정 : 뿡뿡 무지개 다이아몬드를 찾아라!1980833.82020-01-172020-02-04
10782국가대표 물고기 금붕이1974813.82020-02-022020-02-16
86401(에곤 실레)백 년간의 잠:임순만 장편소설1997813.72020-01-162020-02-04
53165이솝 이야기1973808.92020-02-162020-02-23
44764오 마이 갓 어쩌다 사춘기2008813.82020-01-142020-01-21

Duplicate rows

Most frequently occurring

서명성별생년분류번호대출일반납일# duplicates
9(The)Stars20127472020-01-302020-02-023
22Harry Potter and the Order of the Phoenix2008843.52020-01-172020-01-173
41개구쟁이 아치2009375.1082020-01-142020-01-313
0(Disney Princess)magical tales1975843.62020-02-18<NA>2
1(Disney) Aladdin1975843.62020-01-142020-01-192
2(The) lion, the witch and the wardrobe1971843.52020-01-212020-02-012
3(The)Day of the bad haircut19737472020-01-142020-01-212
4(The)Fault in our stars1973843.62020-01-122020-02-022
5(The)Huggles' hug19847472020-01-042020-01-082
6(The)Pizza Monster2011843.52020-01-212020-02-112