Overview

Dataset statistics

Number of variables8
Number of observations68
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.6 KiB
Average record size in memory69.9 B

Variable types

Categorical4
DateTime1
Text1
Numeric2

Alerts

분양량 is highly overall correlated with 세입액(원)High correlation
세입액(원) is highly overall correlated with 분양량High correlation
집계년도 is highly overall correlated with 어종명 and 1 other fieldsHigh correlation
어종명 is highly overall correlated with 집계년도 and 1 other fieldsHigh correlation
단가(원) is highly overall correlated with 집계년도 and 1 other fieldsHigh correlation
단가(원) is highly imbalanced (52.7%)Imbalance

Reproduction

Analysis started2023-12-10 23:00:23.773913
Analysis finished2023-12-10 23:00:24.625394
Duration0.85 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

집계년도
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Memory size676.0 B
2020
24 
2017
17 
2018
15 
2019
12 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020
2nd row2020
3rd row2020
4th row2020
5th row2020

Common Values

ValueCountFrequency (%)
2020 24
35.3%
2017 17
25.0%
2018 15
22.1%
2019 12
17.6%

Length

2023-12-11T08:00:24.677112image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T08:00:24.768710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020 24
35.3%
2017 17
25.0%
2018 15
22.1%
2019 12
17.6%

어종명
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Memory size676.0 B
송어
41 
산천어
14 
송어전암컷
송어일반란

Length

Max length5
Median length2
Mean length2.7794118
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row송어
2nd row송어
3rd row송어
4th row송어
5th row송어

Common Values

ValueCountFrequency (%)
송어 41
60.3%
산천어 14
 
20.6%
송어전암컷 8
 
11.8%
송어일반란 5
 
7.4%

Length

2023-12-11T08:00:24.879631image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T08:00:24.996586image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
송어 41
60.3%
산천어 14
 
20.6%
송어전암컷 8
 
11.8%
송어일반란 5
 
7.4%
Distinct12
Distinct (%)17.6%
Missing0
Missing (%)0.0%
Memory size676.0 B
Minimum2017-08-28 00:00:00
Maximum2020-04-22 00:00:00
2023-12-11T08:00:25.097451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T08:00:25.186792image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)

지역명
Categorical

Distinct14
Distinct (%)20.6%
Missing0
Missing (%)0.0%
Memory size676.0 B
경기
16 
충남
11 
강원
10 
서울
인천
Other values (9)
20 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique3 ?
Unique (%)4.4%

Sample

1st row경기
2nd row경기
3rd row충남
4th row충북
5th row전북

Common Values

ValueCountFrequency (%)
경기 16
23.5%
충남 11
16.2%
강원 10
14.7%
서울 6
 
8.8%
인천 5
 
7.4%
전남 4
 
5.9%
충북 3
 
4.4%
전북 3
 
4.4%
경북 3
 
4.4%
광주 2
 
2.9%
Other values (4) 5
 
7.4%

Length

2023-12-11T08:00:25.299667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기 16
23.5%
충남 11
16.2%
강원 10
14.7%
서울 6
 
8.8%
인천 5
 
7.4%
전남 4
 
5.9%
충북 3
 
4.4%
전북 3
 
4.4%
경북 3
 
4.4%
광주 2
 
2.9%
Other values (4) 5
 
7.4%

성명
Text

Distinct55
Distinct (%)80.9%
Missing0
Missing (%)0.0%
Memory size676.0 B
2023-12-11T08:00:25.572609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters340
Distinct characters60
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique48 ?
Unique (%)70.6%

Sample

1st row김*수 외
2nd row최*원 외
3rd row이*호 외
4th row이*선 외
5th row최*석 외
ValueCountFrequency (%)
68
50.0%
이*호 4
 
2.9%
최*수 4
 
2.9%
유*선 3
 
2.2%
김*수 3
 
2.2%
민*훈 2
 
1.5%
남*현 2
 
1.5%
김*윤 2
 
1.5%
최*석 1
 
0.7%
이*담 1
 
0.7%
Other values (46) 46
33.8%
2023-12-11T08:00:26.207936image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 68
20.0%
68
20.0%
68
20.0%
16
 
4.7%
15
 
4.4%
8
 
2.4%
7
 
2.1%
7
 
2.1%
4
 
1.2%
4
 
1.2%
Other values (50) 75
22.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 204
60.0%
Other Punctuation 68
 
20.0%
Space Separator 68
 
20.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
68
33.3%
16
 
7.8%
15
 
7.4%
8
 
3.9%
7
 
3.4%
7
 
3.4%
4
 
2.0%
4
 
2.0%
4
 
2.0%
4
 
2.0%
Other values (48) 67
32.8%
Other Punctuation
ValueCountFrequency (%)
* 68
100.0%
Space Separator
ValueCountFrequency (%)
68
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 204
60.0%
Common 136
40.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
68
33.3%
16
 
7.8%
15
 
7.4%
8
 
3.9%
7
 
3.4%
7
 
3.4%
4
 
2.0%
4
 
2.0%
4
 
2.0%
4
 
2.0%
Other values (48) 67
32.8%
Common
ValueCountFrequency (%)
* 68
50.0%
68
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 204
60.0%
ASCII 136
40.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 68
50.0%
68
50.0%
Hangul
ValueCountFrequency (%)
68
33.3%
16
 
7.8%
15
 
7.4%
8
 
3.9%
7
 
3.4%
7
 
3.4%
4
 
2.0%
4
 
2.0%
4
 
2.0%
4
 
2.0%
Other values (48) 67
32.8%

분양량
Real number (ℝ)

HIGH CORRELATION 

Distinct40
Distinct (%)58.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5723.5294
Minimum100
Maximum77550
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size744.0 B
2023-12-11T08:00:26.337218image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile200
Q1600
median1500
Q33350
95-th percentile27480
Maximum77550
Range77450
Interquartile range (IQR)2750

Descriptive statistics

Standard deviation12599.018
Coefficient of variation (CV)2.2012673
Kurtosis17.677222
Mean5723.5294
Median Absolute Deviation (MAD)1000
Skewness3.9342719
Sum389200
Variance1.5873526 × 108
MonotonicityNot monotonic
2023-12-11T08:00:26.454299image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
1500 6
 
8.8%
500 6
 
8.8%
1000 5
 
7.4%
200 5
 
7.4%
2000 3
 
4.4%
1300 3
 
4.4%
300 3
 
4.4%
1100 2
 
2.9%
600 2
 
2.9%
2500 2
 
2.9%
Other values (30) 31
45.6%
ValueCountFrequency (%)
100 2
 
2.9%
200 5
7.4%
300 3
4.4%
500 6
8.8%
600 2
 
2.9%
700 1
 
1.5%
800 1
 
1.5%
950 1
 
1.5%
1000 5
7.4%
1100 2
 
2.9%
ValueCountFrequency (%)
77550 1
1.5%
50450 1
1.5%
37000 1
1.5%
30000 1
1.5%
22800 1
1.5%
21000 1
1.5%
20000 1
1.5%
12000 1
1.5%
11200 1
1.5%
10000 1
1.5%

세입액(원)
Real number (ℝ)

HIGH CORRELATION 

Distinct42
Distinct (%)61.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1399058.8
Minimum25000
Maximum19387500
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size744.0 B
2023-12-11T08:00:26.565966image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum25000
5-th percentile50000
Q1150000
median375000
Q3806250
95-th percentile6712500
Maximum19387500
Range19362500
Interquartile range (IQR)656250

Descriptive statistics

Standard deviation3122229.5
Coefficient of variation (CV)2.2316642
Kurtosis18.546861
Mean1399058.8
Median Absolute Deviation (MAD)250000
Skewness4.0401763
Sum95136000
Variance9.7483168 × 1012
MonotonicityNot monotonic
2023-12-11T08:00:26.679818image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=42)
ValueCountFrequency (%)
375000 6
 
8.8%
125000 5
 
7.4%
50000 5
 
7.4%
250000 4
 
5.9%
150000 3
 
4.4%
325000 3
 
4.4%
400000 2
 
2.9%
75000 2
 
2.9%
500000 2
 
2.9%
625000 2
 
2.9%
Other values (32) 34
50.0%
ValueCountFrequency (%)
25000 2
 
2.9%
50000 5
7.4%
54000 1
 
1.5%
75000 2
 
2.9%
125000 5
7.4%
150000 3
4.4%
175000 1
 
1.5%
198000 1
 
1.5%
200000 1
 
1.5%
237500 1
 
1.5%
ValueCountFrequency (%)
19387500 1
1.5%
12612500 1
1.5%
9250000 1
1.5%
7500000 1
1.5%
5250000 1
1.5%
5000000 1
1.5%
4104000 1
1.5%
3000000 1
1.5%
2800000 1
1.5%
1800000 2
2.9%

단가(원)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size676.0 B
250
58 
300
 
5
180
 
5

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row250
2nd row250
3rd row250
4th row250
5th row250

Common Values

ValueCountFrequency (%)
250 58
85.3%
300 5
 
7.4%
180 5
 
7.4%

Length

2023-12-11T08:00:26.786421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T08:00:26.899521image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
250 58
85.3%
300 5
 
7.4%
180 5
 
7.4%

Interactions

2023-12-11T08:00:24.247506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T08:00:24.085002image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T08:00:24.330415image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T08:00:24.164804image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T08:00:26.972136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
집계년도어종명분양일지역명성명분양량세입액(원)단가(원)
집계년도1.0000.8691.0000.0000.2630.2250.2730.516
어종명0.8691.0000.7400.0000.0000.0000.0000.748
분양일1.0000.7401.0000.0000.9460.0000.0000.684
지역명0.0000.0000.0001.0000.9390.0000.0000.000
성명0.2630.0000.9460.9391.0000.0000.0000.000
분양량0.2250.0000.0000.0000.0001.0001.0000.000
세입액(원)0.2730.0000.0000.0000.0001.0001.0000.000
단가(원)0.5160.7480.6840.0000.0000.0000.0001.000
2023-12-11T08:00:27.068744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역명단가(원)집계년도어종명
지역명1.0000.0000.0000.000
단가(원)0.0001.0000.5130.790
집계년도0.0000.5131.0000.533
어종명0.0000.7900.5331.000
2023-12-11T08:00:27.177115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
분양량세입액(원)집계년도어종명지역명단가(원)
분양량1.0000.9970.1480.0000.0000.000
세입액(원)0.9971.0000.1890.0000.0000.000
집계년도0.1480.1891.0000.5330.0000.513
어종명0.0000.0000.5331.0000.0000.790
지역명0.0000.0000.0000.0001.0000.000
단가(원)0.0000.0000.5130.7900.0001.000

Missing values

2023-12-11T08:00:24.465848image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T08:00:24.585461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

집계년도어종명분양일지역명성명분양량세입액(원)단가(원)
02020송어2020-03-23경기김*수 외2400400000250
12020송어2020-04-22경기최*원 외57001425000250
22020송어2020-03-23충남이*호 외3100775000250
32020송어2020-03-23충북이*선 외500125000250
42020송어2020-03-24전북최*석 외1000250000250
52020송어2020-03-24강원임*원 외1300325000250
62020송어2020-03-24경기김*엽 외20050000250
72020송어2020-03-25전남김*완 외1000250000250
82020송어2020-03-25충남송*호 외10025000250
92020송어2020-03-25경기강*호 외1000250000250
집계년도어종명분양일지역명성명분양량세입액(원)단가(원)
582017송어전암컷2017-08-28서울이*남 외3300825000250
592017산천어2017-08-28경북최*수 외20050000250
602017송어전암컷2017-08-28경북최*수 외20050000250
612017산천어2017-08-28경기김*호 외2550637500250
622017송어일반란2017-08-28경기엄*성 외228004104000180
632017송어전암컷2017-08-28경기유*선 외5045012612500250
642017산천어2017-08-28강원김*수 외1500375000250
652017송어일반란2017-08-28충북최*수 외1100198000180
662017송어전암컷2017-08-28강원박*호 외500125000250
672017송어일반란2017-08-28강원이*진 외100001800000180