Overview

Dataset statistics

Number of variables10
Number of observations10000
Missing cells13303
Missing cells (%)13.3%
Duplicate rows2
Duplicate rows (%)< 0.1%
Total size in memory888.7 KiB
Average record size in memory91.0 B

Variable types

Numeric2
Text1
DateTime4
Boolean1
Categorical2

Dataset

Description부산시설공단_영락공원봉안사용현황_20201022
Author부산시설공단
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15067589

Alerts

Dataset has 2 (< 0.1%) duplicate rowsDuplicates
봉안번호 is highly overall correlated with 연장차수High correlation
연장차수 is highly overall correlated with 봉안번호High correlation
개장여부 is highly imbalanced (54.5%)Imbalance
개장일자 has 9044 (90.4%) missing valuesMissing
신청일자 has 4259 (42.6%) missing valuesMissing

Reproduction

Analysis started2023-12-10 17:10:54.541015
Analysis finished2023-12-10 17:10:57.680358
Duration3.14 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

봉안번호
Real number (ℝ)

HIGH CORRELATION 

Distinct9998
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20239187
Minimum10
Maximum34083545
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T02:10:57.849828image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile10100725
Q121118200
median22140768
Q323062332
95-th percentile33978917
Maximum34083545
Range34083535
Interquartile range (IQR)1944131.8

Descriptive statistics

Standard deviation7423333
Coefficient of variation (CV)0.36678019
Kurtosis0.92586467
Mean20239187
Median Absolute Deviation (MAD)1021731
Skewness-0.68688104
Sum2.0239187 × 1011
Variance5.5105873 × 1013
MonotonicityNot monotonic
2023-12-11T02:10:58.136377image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
21426359 2
 
< 0.1%
23368286 2
 
< 0.1%
21322855 1
 
< 0.1%
22448754 1
 
< 0.1%
10305045 1
 
< 0.1%
21220675 1
 
< 0.1%
23367867 1
 
< 0.1%
22140679 1
 
< 0.1%
22552279 1
 
< 0.1%
23265383 1
 
< 0.1%
Other values (9988) 9988
99.9%
ValueCountFrequency (%)
10 1
< 0.1%
12 1
< 0.1%
21 1
< 0.1%
24 1
< 0.1%
31 1
< 0.1%
51 1
< 0.1%
57 1
< 0.1%
77 1
< 0.1%
79 1
< 0.1%
83 1
< 0.1%
ValueCountFrequency (%)
34083545 1
< 0.1%
34083541 1
< 0.1%
34083538 1
< 0.1%
34083497 1
< 0.1%
34083487 1
< 0.1%
34083473 1
< 0.1%
34083450 1
< 0.1%
34083445 1
< 0.1%
34083443 1
< 0.1%
34083442 1
< 0.1%

순번
Real number (ℝ)

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.2114
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T02:10:58.371677image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum8
Range7
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.4668317
Coefficient of variation (CV)0.38536544
Kurtosis9.1981667
Mean1.2114
Median Absolute Deviation (MAD)0
Skewness2.4556232
Sum12114
Variance0.21793183
MonotonicityNot monotonic
2023-12-11T02:10:58.583229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
1 8109
81.1%
2 1690
 
16.9%
3 184
 
1.8%
4 15
 
0.1%
8 1
 
< 0.1%
5 1
 
< 0.1%
ValueCountFrequency (%)
1 8109
81.1%
2 1690
 
16.9%
3 184
 
1.8%
4 15
 
0.1%
5 1
 
< 0.1%
8 1
 
< 0.1%
ValueCountFrequency (%)
8 1
 
< 0.1%
5 1
 
< 0.1%
4 15
 
0.1%
3 184
 
1.8%
2 1690
 
16.9%
1 8109
81.1%
Distinct9998
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-11T02:10:59.060819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length14
Mean length14
Min length14

Characters and Unicode

Total characters140000
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9996 ?
Unique (%)> 99.9%

Sample

1st row02동 13실 22855호
2nd row01동 06실 07825호
3rd row02동 19실 35936호
4th row01동 07실 10771호
5th row01동 09실 16189호
ValueCountFrequency (%)
02동 6831
 
22.8%
01동 1872
 
6.2%
03동 877
 
2.9%
39실 472
 
1.6%
00실 420
 
1.4%
00동 420
 
1.4%
40실 405
 
1.4%
22실 363
 
1.2%
07실 352
 
1.2%
23실 321
 
1.1%
Other values (9987) 17667
58.9%
2023-12-11T02:10:59.760085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
20000
14.3%
0 19866
14.2%
2 15866
11.3%
1 10609
7.6%
10000
7.1%
10000
7.1%
10000
7.1%
3 9282
6.6%
4 6396
 
4.6%
7 6022
 
4.3%
Other values (4) 21959
15.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 90000
64.3%
Other Letter 30000
 
21.4%
Space Separator 20000
 
14.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 19866
22.1%
2 15866
17.6%
1 10609
11.8%
3 9282
10.3%
4 6396
 
7.1%
7 6022
 
6.7%
6 5878
 
6.5%
5 5760
 
6.4%
8 5175
 
5.8%
9 5146
 
5.7%
Other Letter
ValueCountFrequency (%)
10000
33.3%
10000
33.3%
10000
33.3%
Space Separator
ValueCountFrequency (%)
20000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 110000
78.6%
Hangul 30000
 
21.4%

Most frequent character per script

Common
ValueCountFrequency (%)
20000
18.2%
0 19866
18.1%
2 15866
14.4%
1 10609
9.6%
3 9282
8.4%
4 6396
 
5.8%
7 6022
 
5.5%
6 5878
 
5.3%
5 5760
 
5.2%
8 5175
 
4.7%
Hangul
ValueCountFrequency (%)
10000
33.3%
10000
33.3%
10000
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 110000
78.6%
Hangul 30000
 
21.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
20000
18.2%
0 19866
18.1%
2 15866
14.4%
1 10609
9.6%
3 9282
8.4%
4 6396
 
5.8%
7 6022
 
5.5%
6 5878
 
5.3%
5 5760
 
5.2%
8 5175
 
4.7%
Hangul
ValueCountFrequency (%)
10000
33.3%
10000
33.3%
10000
33.3%
Distinct4975
Distinct (%)49.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum1974-12-21 00:00:00
Maximum2020-09-25 00:00:00
2023-12-11T02:11:00.099759image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:11:00.440408image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct3802
Distinct (%)38.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum1984-12-20 00:00:00
Maximum2035-09-24 00:00:00
2023-12-11T02:11:00.754149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:11:01.041328image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

개장여부
Boolean

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size87.9 KiB
False
9043 
True
957 
ValueCountFrequency (%)
False 9043
90.4%
True 957
 
9.6%
2023-12-11T02:11:01.338385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

개장일자
Date

MISSING 

Distinct679
Distinct (%)71.0%
Missing9044
Missing (%)90.4%
Memory size156.2 KiB
Minimum1996-08-14 00:00:00
Maximum2020-09-25 00:00:00
2023-12-11T02:11:01.568559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:11:01.867401image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

연장차수
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
4259 
1회
3861 
2회
1754 
3회
 
126

Length

Max length4
Median length2
Mean length2.8518
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row1회
4th row<NA>
5th row2회

Common Values

ValueCountFrequency (%)
<NA> 4259
42.6%
1회 3861
38.6%
2회 1754
17.5%
3회 126
 
1.3%

Length

2023-12-11T02:11:02.199504image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T02:11:02.403457image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 4259
42.6%
1회 3861
38.6%
2회 1754
17.5%
3회 126
 
1.3%

신청일자
Date

MISSING 

Distinct1859
Distinct (%)32.4%
Missing4259
Missing (%)42.6%
Memory size156.2 KiB
Minimum2010-03-07 00:00:00
Maximum2020-09-25 00:00:00
2023-12-11T02:11:02.630713image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:11:02.849673image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

사용료
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
60000
5244 
<NA>
4259 
0
 
492
30000
 
5

Length

Max length5
Median length5
Mean length4.3773
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row60000
4th row<NA>
5th row60000

Common Values

ValueCountFrequency (%)
60000 5244
52.4%
<NA> 4259
42.6%
0 492
 
4.9%
30000 5
 
0.1%

Length

2023-12-11T02:11:03.094783image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T02:11:03.294770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
60000 5244
52.4%
na 4259
42.6%
0 492
 
4.9%
30000 5
 
< 0.1%

Interactions

2023-12-11T02:10:56.500264image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:10:56.141763image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:10:56.698082image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T02:10:56.320857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T02:11:03.418104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
봉안번호순번개장여부연장차수사용료
봉안번호1.0000.4440.3410.8720.288
순번0.4441.0000.1050.1140.310
개장여부0.3410.1051.0000.0000.027
연장차수0.8720.1140.0001.0000.397
사용료0.2880.3100.0270.3971.000
2023-12-11T02:11:03.588647image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연장차수개장여부사용료
연장차수1.0000.0000.145
개장여부0.0001.0000.045
사용료0.1450.0451.000
2023-12-11T02:11:03.812847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
봉안번호순번개장여부연장차수사용료
봉안번호1.000-0.1740.2450.5660.096
순번-0.1741.0000.0760.0470.135
개장여부0.2450.0761.0000.0000.045
연장차수0.5660.0470.0001.0000.145
사용료0.0960.1350.0450.1451.000

Missing values

2023-12-11T02:10:56.968365image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T02:10:57.268522image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T02:10:57.519687image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

봉안번호순번봉안정보봉안일자만료일자개장여부개장일자연장차수신청일자사용료
2643021322855202동 13실 22855호2008-08-112023-08-10N<NA><NA><NA><NA>
1137710607825101동 06실 07825호1996-12-112011-12-10N<NA><NA><NA><NA>
3955621935936102동 19실 35936호2001-12-292021-12-28N<NA>1회2016-09-1160000
1432310710771201동 07실 10771호2016-06-192031-06-18N<NA><NA><NA><NA>
1974110916189101동 09실 16189호1998-12-192023-12-18N<NA>2회2018-04-1560000
4985122346196102동 23실 46196호2003-02-212023-02-20N<NA>1회2017-05-0660000
2110221117542202동 11실 17542호2006-08-122021-08-11N<NA><NA><NA><NA>
374310100192101동 01실 00192호1995-03-142020-03-13N<NA>2회2016-09-1160000
7742123673710102동 36실 73710호2006-09-152021-09-14N<NA><NA><NA><NA>
572210202171101동 02실 02171호1995-08-142025-08-13N<NA>3회2020-03-0460000
봉안번호순번봉안정보봉안일자만료일자개장여부개장일자연장차수신청일자사용료
4465422141013102동 21실 41013호2002-08-082022-08-07N<NA>1회2017-08-1560000
2909621425503202동 14실 25503호2018-02-262033-02-25N<NA><NA><NA><NA>
467610101125101동 01실 01125호1995-05-182020-05-17Y2017-06-252회2015-03-1460000
2733621323756102동 13실 23756호2000-04-142025-04-13N<NA>2회2020-05-2060000
4003121936409202동 19실 36409호2012-01-262027-01-25N<NA><NA><NA><NA>
540010101849101동 01실 01849호1995-07-182025-07-17N<NA>3회2020-07-1160000
857410305022101동 03실 05022호1996-06-072021-06-06Y2017-10-272회2016-05-3060000
5647722552818102동 25실 52818호2003-11-162018-11-15Y2018-04-18<NA><NA><NA>
1256510609013101동 06실 09013호1997-04-042022-04-03N<NA>2회2018-05-0560000
2029321016738102동 10실 16738호1999-02-132024-02-12N<NA>2회2020-01-2960000

Duplicate rows

Most frequently occurring

봉안번호순번봉안정보봉안일자만료일자개장여부개장일자연장차수신청일자사용료# duplicates
021426359202동 14실 26359호2020-04-172035-04-16N<NA><NA><NA><NA>2
123368286202동 33실 68286호2020-07-022035-07-01N<NA><NA><NA><NA>2