Overview

Dataset statistics

Number of variables7
Number of observations50
Missing cells21
Missing cells (%)6.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.1 KiB
Average record size in memory62.6 B

Variable types

Categorical2
Text1
Numeric4

Dataset

Description부산광역시_하천현황_20230324
Author부산광역시
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=3076446

Alerts

하천연장(km) is highly overall correlated with 요개수연장 양안(km) and 3 other fieldsHigh correlation
요개수연장 양안(km) is highly overall correlated with 하천연장(km) and 3 other fieldsHigh correlation
개수연장 양안(km) is highly overall correlated with 하천연장(km) and 1 other fieldsHigh correlation
미개수연장 양안(km) is highly overall correlated with 하천연장(km) and 2 other fieldsHigh correlation
구분 is highly overall correlated with 하천연장(km) and 3 other fieldsHigh correlation
비고(담당구역별) is highly overall correlated with 구분High correlation
구분 is highly imbalanced (53.1%)Imbalance
개수연장 양안(km) has 3 (6.0%) missing valuesMissing
미개수연장 양안(km) has 18 (36.0%) missing valuesMissing

Reproduction

Analysis started2023-12-10 16:07:36.861680
Analysis finished2023-12-10 16:07:39.409250
Duration2.55 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size532.0 B
지방하천
45 
국가하천

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row국가하천
2nd row국가하천
3rd row국가하천
4th row국가하천
5th row국가하천

Common Values

ValueCountFrequency (%)
지방하천 45
90.0%
국가하천 5
 
10.0%

Length

2023-12-11T01:07:39.498680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:07:39.623876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
지방하천 45
90.0%
국가하천 5
 
10.0%
Distinct46
Distinct (%)92.0%
Missing0
Missing (%)0.0%
Memory size532.0 B
2023-12-11T01:07:39.837279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length3
Mean length3.1
Min length3

Characters and Unicode

Total characters155
Distinct characters72
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique43 ?
Unique (%)86.0%

Sample

1st row낙동강(본류)
2nd row서낙동강
3rd row평강천
4th row맥도강
5th row수영강
ValueCountFrequency (%)
송정천 3
 
5.7%
3
 
5.7%
수영강 2
 
3.8%
호계천 2
 
3.8%
용소천 1
 
1.9%
좌광천 1
 
1.9%
낙동강(본류 1
 
1.9%
덕선천 1
 
1.9%
지사천 1
 
1.9%
해반천 1
 
1.9%
Other values (37) 37
69.8%
2023-12-11T01:07:40.275806image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
47
30.3%
7
 
4.5%
6
 
3.9%
4
 
2.6%
3
 
1.9%
3
 
1.9%
3
 
1.9%
3
 
1.9%
3
 
1.9%
3
 
1.9%
Other values (62) 73
47.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 150
96.8%
Space Separator 3
 
1.9%
Close Punctuation 1
 
0.6%
Open Punctuation 1
 
0.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
47
31.3%
7
 
4.7%
6
 
4.0%
4
 
2.7%
3
 
2.0%
3
 
2.0%
3
 
2.0%
3
 
2.0%
3
 
2.0%
3
 
2.0%
Other values (59) 68
45.3%
Space Separator
ValueCountFrequency (%)
3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 150
96.8%
Common 5
 
3.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
47
31.3%
7
 
4.7%
6
 
4.0%
4
 
2.7%
3
 
2.0%
3
 
2.0%
3
 
2.0%
3
 
2.0%
3
 
2.0%
3
 
2.0%
Other values (59) 68
45.3%
Common
ValueCountFrequency (%)
3
60.0%
) 1
 
20.0%
( 1
 
20.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 150
96.8%
ASCII 5
 
3.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
47
31.3%
7
 
4.7%
6
 
4.0%
4
 
2.7%
3
 
2.0%
3
 
2.0%
3
 
2.0%
3
 
2.0%
3
 
2.0%
3
 
2.0%
Other values (59) 68
45.3%
ASCII
ValueCountFrequency (%)
3
60.0%
) 1
 
20.0%
( 1
 
20.0%

하천연장(km)
Real number (ℝ)

HIGH CORRELATION 

Distinct47
Distinct (%)94.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.486
Minimum0.69
Maximum20.26
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size582.0 B
2023-12-11T01:07:40.450487image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.69
5-th percentile0.97
Q12.5725
median4.02
Q36.62
95-th percentile16.062
Maximum20.26
Range19.57
Interquartile range (IQR)4.0475

Descriptive statistics

Standard deviation4.626181
Coefficient of variation (CV)0.84327033
Kurtosis2.519101
Mean5.486
Median Absolute Deviation (MAD)1.755
Skewness1.6917596
Sum274.3
Variance21.401551
MonotonicityNot monotonic
2023-12-11T01:07:40.946743image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=47)
ValueCountFrequency (%)
1.61 2
 
4.0%
0.97 2
 
4.0%
2.9 2
 
4.0%
6.68 1
 
2.0%
3.57 1
 
2.0%
3.65 1
 
2.0%
0.85 1
 
2.0%
4.53 1
 
2.0%
8.7 1
 
2.0%
4.34 1
 
2.0%
Other values (37) 37
74.0%
ValueCountFrequency (%)
0.69 1
2.0%
0.85 1
2.0%
0.97 2
4.0%
1.61 2
4.0%
1.69 1
2.0%
1.8 1
2.0%
1.99 1
2.0%
2.07 1
2.0%
2.35 1
2.0%
2.52 1
2.0%
ValueCountFrequency (%)
20.26 1
2.0%
18.55 1
2.0%
17.34 1
2.0%
14.5 1
2.0%
13.24 1
2.0%
12.54 1
2.0%
9.0 1
2.0%
8.9 1
2.0%
8.7 1
2.0%
8.27 1
2.0%

요개수연장 양안(km)
Real number (ℝ)

HIGH CORRELATION 

Distinct48
Distinct (%)96.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.9832
Minimum0.85
Maximum44.28
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size582.0 B
2023-12-11T01:07:41.147168image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.85
5-th percentile1.544
Q13.655
median7.67
Q311.545
95-th percentile27.437
Maximum44.28
Range43.43
Interquartile range (IQR)7.89

Descriptive statistics

Standard deviation9.0477259
Coefficient of variation (CV)0.90629517
Kurtosis3.9249024
Mean9.9832
Median Absolute Deviation (MAD)4.035
Skewness1.867474
Sum499.16
Variance81.861345
MonotonicityNot monotonic
2023-12-11T01:07:41.356923image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=48)
ValueCountFrequency (%)
3.22 2
 
4.0%
3.6 2
 
4.0%
0.97 1
 
2.0%
3.65 1
 
2.0%
0.85 1
 
2.0%
9.06 1
 
2.0%
15.06 1
 
2.0%
8.68 1
 
2.0%
18.47 1
 
2.0%
4.94 1
 
2.0%
Other values (38) 38
76.0%
ValueCountFrequency (%)
0.85 1
2.0%
0.97 1
2.0%
1.22 1
2.0%
1.94 1
2.0%
2.1 1
2.0%
2.3 1
2.0%
3.22 2
4.0%
3.38 1
2.0%
3.4 1
2.0%
3.6 2
4.0%
ValueCountFrequency (%)
44.28 1
2.0%
34.74 1
2.0%
28.22 1
2.0%
26.48 1
2.0%
24.86 1
2.0%
19.87 1
2.0%
18.47 1
2.0%
17.8 1
2.0%
16.54 1
2.0%
16.38 1
2.0%

개수연장 양안(km)
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct46
Distinct (%)97.9%
Missing3
Missing (%)6.0%
Infinite0
Infinite (%)0.0%
Mean6.9657447
Minimum0.26
Maximum34.44
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size582.0 B
2023-12-11T01:07:41.538167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.26
5-th percentile0.605
Q12.99
median4.62
Q37.755
95-th percentile19.296
Maximum34.44
Range34.18
Interquartile range (IQR)4.765

Descriptive statistics

Standard deviation7.0724935
Coefficient of variation (CV)1.0153248
Kurtosis4.6971826
Mean6.9657447
Median Absolute Deviation (MAD)2.02
Skewness2.049013
Sum327.39
Variance50.020164
MonotonicityNot monotonic
2023-12-11T01:07:41.759182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
3.67 2
 
4.0%
2.6 1
 
2.0%
0.5 1
 
2.0%
0.97 1
 
2.0%
3.65 1
 
2.0%
0.85 1
 
2.0%
1.8 1
 
2.0%
10.96 1
 
2.0%
6.13 1
 
2.0%
16.87 1
 
2.0%
Other values (36) 36
72.0%
(Missing) 3
 
6.0%
ValueCountFrequency (%)
0.26 1
2.0%
0.31 1
2.0%
0.5 1
2.0%
0.85 1
2.0%
0.97 1
2.0%
1.0 1
2.0%
1.22 1
2.0%
1.8 1
2.0%
1.94 1
2.0%
2.6 1
2.0%
ValueCountFrequency (%)
34.44 1
2.0%
26.48 1
2.0%
20.28 1
2.0%
17.0 1
2.0%
16.87 1
2.0%
16.54 1
2.0%
15.68 1
2.0%
14.2 1
2.0%
11.72 1
2.0%
10.96 1
2.0%

미개수연장 양안(km)
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct32
Distinct (%)100.0%
Missing18
Missing (%)36.0%
Infinite0
Infinite (%)0.0%
Mean5.3678125
Minimum0.2
Maximum24.98
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size582.0 B
2023-12-11T01:07:41.943598image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.2
5-th percentile0.629
Q12.03
median3.515
Q34.83
95-th percentile19.023
Maximum24.98
Range24.78
Interquartile range (IQR)2.8

Descriptive statistics

Standard deviation6.133356
Coefficient of variation (CV)1.1426174
Kurtosis5.0281499
Mean5.3678125
Median Absolute Deviation (MAD)1.455
Skewness2.2936898
Sum171.77
Variance37.618056
MonotonicityNot monotonic
2023-12-11T01:07:42.129464image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%)
4.1 1
 
2.0%
2.17 1
 
2.0%
2.2 1
 
2.0%
2.1 1
 
2.0%
2.6 1
 
2.0%
3.09 1
 
2.0%
0.8 1
 
2.0%
5.01 1
 
2.0%
1.38 1
 
2.0%
1.5 1
 
2.0%
Other values (22) 22
44.0%
(Missing) 18
36.0%
ValueCountFrequency (%)
0.2 1
2.0%
0.42 1
2.0%
0.8 1
2.0%
1.38 1
2.0%
1.5 1
2.0%
1.6 1
2.0%
1.8 1
2.0%
1.82 1
2.0%
2.1 1
2.0%
2.17 1
2.0%
ValueCountFrequency (%)
24.98 1
2.0%
24.6 1
2.0%
14.46 1
2.0%
14.2 1
2.0%
9.84 1
2.0%
7.98 1
2.0%
7.26 1
2.0%
5.01 1
2.0%
4.77 1
2.0%
4.7 1
2.0%

비고(담당구역별)
Categorical

HIGH CORRELATION 

Distinct23
Distinct (%)46.0%
Missing0
Missing (%)0.0%
Memory size532.0 B
기장군
14 
강서구
강서구(좌안)
부산진구
사상구
Other values (18)
21 

Length

Max length11
Median length10
Mean length4.48
Min length2

Unique

Unique15 ?
Unique (%)30.0%

Sample

1st row낙동강관리본부
2nd row강서구
3rd row강서구
4th row강서구
5th row금정,수영,해운대구

Common Values

ValueCountFrequency (%)
기장군 14
28.0%
강서구 5
 
10.0%
강서구(좌안) 4
 
8.0%
부산진구 3
 
6.0%
사상구 3
 
6.0%
해운대구 2
 
4.0%
북구 2
 
4.0%
동구 2
 
4.0%
동구,부산진구 1
 
2.0%
금정,수영,해운대구 1
 
2.0%
Other values (13) 13
26.0%

Length

2023-12-11T01:07:42.317652image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
기장군 16
30.8%
강서구 5
 
9.6%
강서구(좌안 4
 
7.7%
부산진구 3
 
5.8%
사상구 3
 
5.8%
해운대구 3
 
5.8%
북구 2
 
3.8%
동구 2
 
3.8%
금정구 2
 
3.8%
동구,부산진구,남구 1
 
1.9%
Other values (11) 11
21.2%

Interactions

2023-12-11T01:07:38.568925image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:37.225342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:37.646085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:38.096148image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:38.711202image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:37.344470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:37.756636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:38.222904image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:38.813720image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:37.444383image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:37.858905image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:38.330395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:38.939958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:37.550075image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:37.978120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T01:07:38.447320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:07:42.419920image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분하천명하천연장(km)요개수연장 양안(km)개수연장 양안(km)미개수연장 양안(km)비고(담당구역별)
구분1.0000.0000.6920.7120.5390.9770.800
하천명0.0001.0000.0000.0000.6220.7810.000
하천연장(km)0.6920.0001.0000.9820.8400.7280.703
요개수연장 양안(km)0.7120.0000.9821.0000.8740.8250.854
개수연장 양안(km)0.5390.6220.8400.8741.0000.6540.885
미개수연장 양안(km)0.9770.7810.7280.8250.6541.0000.653
비고(담당구역별)0.8000.0000.7030.8540.8850.6531.000
2023-12-11T01:07:42.551209image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분비고(담당구역별)
구분1.0000.539
비고(담당구역별)0.5391.000
2023-12-11T01:07:42.661459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
하천연장(km)요개수연장 양안(km)개수연장 양안(km)미개수연장 양안(km)구분비고(담당구역별)
하천연장(km)1.0000.9530.6880.6030.6480.277
요개수연장 양안(km)0.9531.0000.7200.6170.6680.435
개수연장 양안(km)0.6880.7201.0000.0800.3750.475
미개수연장 양안(km)0.6030.6170.0801.0000.8020.278
구분0.6480.6680.3750.8021.0000.539
비고(담당구역별)0.2770.4350.4750.2780.5391.000

Missing values

2023-12-11T01:07:39.061727image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:07:39.208630image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T01:07:39.343020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

구분하천명하천연장(km)요개수연장 양안(km)개수연장 양안(km)미개수연장 양안(km)비고(담당구역별)
0국가하천낙동강(본류)20.2634.7420.2814.46낙동강관리본부
1국가하천서낙동강18.5528.223.2424.98강서구
2국가하천평강천12.5424.860.2624.6강서구
3국가하천맥도강7.8414.2<NA>14.2강서구
4국가하천수영강9.019.8715.684.19금정,수영,해운대구
5지방하천괴정천5.3710.745.974.77사하구
6지방하천학장천5.8611.7211.72<NA>사상구
7지방하천덕천천3.77.43.673.73북구
8지방하천대리천1.693.382.960.42북구
9지방하천대천천5.4410.882.97.98북구,금정구
구분하천명하천연장(km)요개수연장 양안(km)개수연장 양안(km)미개수연장 양안(km)비고(담당구역별)
40지방하천만화천2.95.193.691.5기장군
41지방하천서부천3.294.533.151.38기장군
42지방하천송정천5.611.026.015.01기장군, 해운대구
43지방하천철마천8.917.817.00.8기장군
44지방하천구칠천2.073.40.313.09기장군
45지방하천이곡천2.663.61.02.6기장군
46지방하천송정천2.353.673.67<NA>금정구, 기장군
47지방하천임기천2.582.1<NA>2.1기장군
48지방하천삼락천4.69.27.02.2사상구
49지방하천감전천2.95.83.632.17사상구