Overview

Dataset statistics

Number of variables7
Number of observations29
Missing cells4
Missing cells (%)2.0%
Duplicate rows1
Duplicate rows (%)3.4%
Total size in memory1.8 KiB
Average record size in memory62.6 B

Variable types

Categorical5
Numeric1
DateTime1

Dataset

Description샘플 데이터
Author한국신용데이터
URLhttps://bigdata-region.kr/#/dataset/1efe2a80-8791-4964-89d9-052104838d24

Alerts

2022-09 has constant value ""Constant
Dataset has 1 (3.4%) duplicate rowsDuplicates
전체 is highly overall correlated with 통합 and 1 other fieldsHigh correlation
전국 is highly overall correlated with 통합 and 1 other fieldsHigh correlation
16.6667 is highly overall correlated with 270000 and 4 other fieldsHigh correlation
50만원 미만 is highly overall correlated with 270000 and 1 other fieldsHigh correlation
통합 is highly overall correlated with 전체 and 2 other fieldsHigh correlation
270000 is highly overall correlated with 50만원 미만 and 1 other fieldsHigh correlation
16.6667 is highly imbalanced (63.8%)Imbalance
270000 has 2 (6.9%) missing valuesMissing
2022-09 has 2 (6.9%) missing valuesMissing

Reproduction

Analysis started2023-12-22 20:41:02.647211
Analysis finished2023-12-22 20:41:11.482297
Duration8.84 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

통합
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)17.2%
Missing0
Missing (%)0.0%
Memory size364.0 B
업종
지역
지역X업종
통합
<NA>

Length

Max length5
Median length2
Mean length2.862069
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row통합
2nd row통합
3rd row통합
4th row통합
5th row통합

Common Values

ValueCountFrequency (%)
업종 7
24.1%
지역 7
24.1%
지역X업종 7
24.1%
통합 6
20.7%
<NA> 2
 
6.9%

Length

2023-12-22T20:41:11.945716image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-22T20:41:12.460214image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
업종 7
24.1%
지역 7
24.1%
지역x업종 7
24.1%
통합 6
20.7%
na 2
 
6.9%

전체
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)10.3%
Missing0
Missing (%)0.0%
Memory size364.0 B
유통업
14 
전체
13 
<NA>

Length

Max length4
Median length3
Mean length2.6206897
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row전체
2nd row전체
3rd row전체
4th row전체
5th row전체

Common Values

ValueCountFrequency (%)
유통업 14
48.3%
전체 13
44.8%
<NA> 2
 
6.9%

Length

2023-12-22T20:41:13.073730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-22T20:41:13.568465image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
유통업 14
48.3%
전체 13
44.8%
na 2
 
6.9%

전국
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)10.3%
Missing0
Missing (%)0.0%
Memory size364.0 B
서울특별시
14 
전국
13 
<NA>

Length

Max length5
Median length4
Mean length3.5862069
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row전국
2nd row전국
3rd row전국
4th row전국
5th row전국

Common Values

ValueCountFrequency (%)
서울특별시 14
48.3%
전국 13
44.8%
<NA> 2
 
6.9%

Length

2023-12-22T20:41:14.165207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-22T20:41:15.261045image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울특별시 14
48.3%
전국 13
44.8%
na 2
 
6.9%

50만원 미만
Categorical

HIGH CORRELATION 

Distinct8
Distinct (%)27.6%
Missing0
Missing (%)0.0%
Memory size364.0 B
50만원 이상 - 70만원 미만
70만원 이상 - 100만원 미만
100만원 이상 - 150만원 미만
150만원 이상 - 200만원 미만
200만원 이상 - 300만원 미만
Other values (3)

Length

Max length19
Median length18
Mean length14.793103
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row50만원 이상 - 70만원 미만
2nd row70만원 이상 - 100만원 미만
3rd row100만원 이상 - 150만원 미만
4th row150만원 이상 - 200만원 미만
5th row200만원 이상 - 300만원 미만

Common Values

ValueCountFrequency (%)
50만원 이상 - 70만원 미만 4
13.8%
70만원 이상 - 100만원 미만 4
13.8%
100만원 이상 - 150만원 미만 4
13.8%
150만원 이상 - 200만원 미만 4
13.8%
200만원 이상 - 300만원 미만 4
13.8%
300만원 이상 4
13.8%
50만원 미만 3
10.3%
<NA> 2
6.9%

Length

2023-12-22T20:41:17.282720image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-22T20:41:18.888101image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
이상 24
20.7%
미만 23
19.8%
20
17.2%
70만원 8
 
6.9%
100만원 8
 
6.9%
150만원 8
 
6.9%
200만원 8
 
6.9%
300만원 8
 
6.9%
50만원 7
 
6.0%
na 2
 
1.7%

16.6667
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)6.9%
Missing0
Missing (%)0.0%
Memory size364.0 B
16.6667
27 
<NA>
 
2

Length

Max length7
Median length7
Mean length6.7931034
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row16.6667
2nd row16.6667
3rd row16.6667
4th row16.6667
5th row16.6667

Common Values

ValueCountFrequency (%)
16.6667 27
93.1%
<NA> 2
 
6.9%

Length

2023-12-22T20:41:20.139592image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-22T20:41:21.117397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
16.6667 27
93.1%
na 2
 
6.9%

270000
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct7
Distinct (%)25.9%
Missing2
Missing (%)6.9%
Infinite0
Infinite (%)0.0%
Mean1710000
Minimum270000
Maximum5100000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size393.0 B
2023-12-22T20:41:22.008322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum270000
5-th percentile270000
Q1670000
median1100000
Q32200000
95-th percentile5100000
Maximum5100000
Range4830000
Interquartile range (IQR)1530000

Descriptive statistics

Standard deviation1562227.2
Coefficient of variation (CV)0.91358316
Kurtosis1.1117072
Mean1710000
Median Absolute Deviation (MAD)550000
Skewness1.4933881
Sum46170000
Variance2.4405538 × 1012
MonotonicityNot monotonic
2023-12-22T20:41:23.010844image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
550000 4
13.8%
790000 4
13.8%
1100000 4
13.8%
1600000 4
13.8%
2200000 4
13.8%
5100000 4
13.8%
270000 3
10.3%
(Missing) 2
6.9%
ValueCountFrequency (%)
270000 3
10.3%
550000 4
13.8%
790000 4
13.8%
1100000 4
13.8%
1600000 4
13.8%
2200000 4
13.8%
5100000 4
13.8%
ValueCountFrequency (%)
5100000 4
13.8%
2200000 4
13.8%
1600000 4
13.8%
1100000 4
13.8%
790000 4
13.8%
550000 4
13.8%
270000 3
10.3%

2022-09
Date

CONSTANT  MISSING 

Distinct1
Distinct (%)3.7%
Missing2
Missing (%)6.9%
Memory size364.0 B
Minimum2022-09-01 00:00:00
Maximum2022-09-01 00:00:00
2023-12-22T20:41:23.636209image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-22T20:41:24.512019image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

Interactions

2023-12-22T20:41:09.293637image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-22T20:41:25.156469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
통합전체전국50만원 미만270000
통합1.0001.0001.0000.0000.000
전체1.0001.0000.0000.0000.000
전국1.0000.0001.0000.0000.000
50만원 미만0.0000.0000.0001.0001.000
2700000.0000.0000.0001.0001.000
2023-12-22T20:41:25.752649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
전체전국16.666750만원 미만통합
전체1.0000.0001.0000.0000.959
전국0.0001.0001.0000.0000.959
16.66671.0001.0001.0001.0001.000
50만원 미만0.0000.0001.0001.0000.000
통합0.9590.9591.0000.0001.000
2023-12-22T20:41:27.050004image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
270000통합전체전국50만원 미만16.6667
2700001.0000.0000.0000.0000.9531.000
통합0.0001.0000.9590.9590.0001.000
전체0.0000.9591.0000.0000.0001.000
전국0.0000.9590.0001.0000.0001.000
50만원 미만0.9530.0000.0000.0001.0001.000
16.66671.0001.0001.0001.0001.0001.000

Missing values

2023-12-22T20:41:09.818330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-22T20:41:10.632447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-22T20:41:11.157313image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

통합전체전국50만원 미만16.66672700002022-09
0통합전체전국50만원 이상 - 70만원 미만16.66675500002022-09
1통합전체전국70만원 이상 - 100만원 미만16.66677900002022-09
2통합전체전국100만원 이상 - 150만원 미만16.666711000002022-09
3통합전체전국150만원 이상 - 200만원 미만16.666716000002022-09
4통합전체전국200만원 이상 - 300만원 미만16.666722000002022-09
5통합전체전국300만원 이상16.666751000002022-09
6업종유통업전국50만원 미만16.66672700002022-09
7업종유통업전국50만원 이상 - 70만원 미만16.66675500002022-09
8업종유통업전국70만원 이상 - 100만원 미만16.66677900002022-09
9업종유통업전국100만원 이상 - 150만원 미만16.666711000002022-09
통합전체전국50만원 미만16.66672700002022-09
19지역전체서울특별시300만원 이상16.666751000002022-09
20지역X업종유통업서울특별시50만원 미만16.66672700002022-09
21지역X업종유통업서울특별시50만원 이상 - 70만원 미만16.66675500002022-09
22지역X업종유통업서울특별시70만원 이상 - 100만원 미만16.66677900002022-09
23지역X업종유통업서울특별시100만원 이상 - 150만원 미만16.666711000002022-09
24지역X업종유통업서울특별시150만원 이상 - 200만원 미만16.666716000002022-09
25지역X업종유통업서울특별시200만원 이상 - 300만원 미만16.666722000002022-09
26지역X업종유통업서울특별시300만원 이상16.666751000002022-09
27<NA><NA><NA><NA><NA><NA><NA>
28<NA><NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

통합전체전국50만원 미만16.66672700002022-09# duplicates
0<NA><NA><NA><NA><NA><NA><NA>2