Overview

Dataset statistics

Number of variables7
Number of observations30
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.9 KiB
Average record size in memory64.4 B

Variable types

Text1
Categorical4
Numeric1
DateTime1

Dataset

Description샘플 데이터
Author한국신용데이터
URLhttps://bigdata-region.kr/#/dataset/788cf74c-d027-43d4-94b3-72588835c54c

Alerts

연월 has constant value ""Constant
게시글내키워드평균중요도 is highly overall correlated with 키워드분류 and 2 other fieldsHigh correlation
게시글내키워드평균빈도 is highly overall correlated with 키워드길이 and 2 other fieldsHigh correlation
키워드길이 is highly overall correlated with 키워드분류 and 1 other fieldsHigh correlation
키워드분류 is highly overall correlated with 키워드길이 and 2 other fieldsHigh correlation
키워드포함게시글작성비율 is highly overall correlated with 키워드분류 and 2 other fieldsHigh correlation
키워드포함게시글작성비율 is highly imbalanced (64.6%)Imbalance
게시글내키워드평균빈도 is highly imbalanced (64.7%)Imbalance

Reproduction

Analysis started2023-12-10 13:46:59.423138
Analysis finished2023-12-10 13:47:00.323416
Duration0.9 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct29
Distinct (%)96.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
2023-12-10T22:47:00.498226image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length6
Mean length4.1666667
Min length2

Characters and Unicode

Total characters125
Distinct characters40
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique28 ?
Unique (%)93.3%

Sample

1st row가가
2nd row가감
3rd row가건물
4th row가게
5th row가게
ValueCountFrequency (%)
가게 2
 
6.7%
가가 1
 
3.3%
가게안 1
 
3.3%
가게하나를 1
 
3.3%
가게하나가 1
 
3.3%
가게주인 1
 
3.3%
가게있음 1
 
3.3%
가게인데도 1
 
3.3%
가게이전을 1
 
3.3%
가게을 1
 
3.3%
Other values (19) 19
63.3%
2023-12-10T22:47:00.975440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
32
25.6%
27
21.6%
4
 
3.2%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
Other values (30) 41
32.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 125
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
32
25.6%
27
21.6%
4
 
3.2%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
Other values (30) 41
32.8%

Most occurring scripts

ValueCountFrequency (%)
Hangul 125
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
32
25.6%
27
21.6%
4
 
3.2%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
Other values (30) 41
32.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 125
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
32
25.6%
27
21.6%
4
 
3.2%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
Other values (30) 41
32.8%

키워드분류
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
동사
26 
명사

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row명사
2nd row명사
3rd row명사
4th row명사
5th row동사

Common Values

ValueCountFrequency (%)
동사 26
86.7%
명사 4
 
13.3%

Length

2023-12-10T22:47:01.235447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T22:47:01.418619image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
동사 26
86.7%
명사 4
 
13.3%

키워드길이
Real number (ℝ)

HIGH CORRELATION 

Distinct6
Distinct (%)20.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.1666667
Minimum2
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size402.0 B
2023-12-10T22:47:01.579159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile2
Q13
median4
Q35
95-th percentile6
Maximum7
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.3412124
Coefficient of variation (CV)0.32189096
Kurtosis-0.57874575
Mean4.1666667
Median Absolute Deviation (MAD)1
Skewness0.042537317
Sum125
Variance1.7988506
MonotonicityNot monotonic
2023-12-10T22:47:01.758854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
4 9
30.0%
5 7
23.3%
3 5
16.7%
2 4
13.3%
6 4
13.3%
7 1
 
3.3%
ValueCountFrequency (%)
2 4
13.3%
3 5
16.7%
4 9
30.0%
5 7
23.3%
6 4
13.3%
7 1
 
3.3%
ValueCountFrequency (%)
7 1
 
3.3%
6 4
13.3%
5 7
23.3%
4 9
30.0%
3 5
16.7%
2 4
13.3%

키워드포함게시글작성비율
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Memory size372.0 B
0
27 
1
 
2
34
 
1

Length

Max length2
Median length1
Mean length1.0333333
Min length1

Unique

Unique1 ?
Unique (%)3.3%

Sample

1st row1
2nd row0
3rd row0
4th row34
5th row0

Common Values

ValueCountFrequency (%)
0 27
90.0%
1 2
 
6.7%
34 1
 
3.3%

Length

2023-12-10T22:47:01.953058image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T22:47:02.139776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 27
90.0%
1 2
 
6.7%
34 1
 
3.3%

게시글내키워드평균빈도
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
1
28 
2
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row2
5th row2

Common Values

ValueCountFrequency (%)
1 28
93.3%
2 2
 
6.7%

Length

2023-12-10T22:47:02.321772image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T22:47:02.490881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 28
93.3%
2 2
 
6.7%

게시글내키워드평균중요도
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)13.3%
Missing0
Missing (%)0.0%
Memory size372.0 B
8
23 
7
5
 
1
10
 
1

Length

Max length2
Median length1
Mean length1.0333333
Min length1

Unique

Unique2 ?
Unique (%)6.7%

Sample

1st row7
2nd row8
3rd row7
4th row5
5th row10

Common Values

ValueCountFrequency (%)
8 23
76.7%
7 5
 
16.7%
5 1
 
3.3%
10 1
 
3.3%

Length

2023-12-10T22:47:02.666957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T22:47:02.889698image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
8 23
76.7%
7 5
 
16.7%
5 1
 
3.3%
10 1
 
3.3%

연월
Date

CONSTANT 

Distinct1
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Memory size372.0 B
Minimum2023-01-01 00:00:00
Maximum2023-01-01 00:00:00
2023-12-10T22:47:03.057899image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T22:47:03.237624image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

Interactions

2023-12-10T22:46:59.862156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T22:47:03.356756image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
키워드키워드분류키워드길이키워드포함게시글작성비율게시글내키워드평균빈도게시글내키워드평균중요도
키워드1.0000.0001.0000.0001.0000.000
키워드분류0.0001.0000.8710.3250.0000.764
키워드길이1.0000.8711.0000.4560.7930.304
키워드포함게시글작성비율0.0000.3250.4561.0000.4280.751
게시글내키워드평균빈도1.0000.0000.7930.4281.0001.000
게시글내키워드평균중요도0.0000.7640.3040.7511.0001.000
2023-12-10T22:47:03.532495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
키워드분류게시글내키워드평균중요도게시글내키워드평균빈도키워드포함게시글작성비율
키워드분류1.0000.5330.0000.512
게시글내키워드평균중요도0.5331.0000.9640.785
게시글내키워드평균빈도0.0000.9641.0000.656
키워드포함게시글작성비율0.5120.7850.6561.000
2023-12-10T22:47:03.691213image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
키워드길이키워드분류키워드포함게시글작성비율게시글내키워드평균빈도게시글내키워드평균중요도
키워드길이1.0000.6280.1850.5500.176
키워드분류0.6281.0000.5120.0000.533
키워드포함게시글작성비율0.1850.5121.0000.6560.785
게시글내키워드평균빈도0.5500.0000.6561.0000.964
게시글내키워드평균중요도0.1760.5330.7850.9641.000

Missing values

2023-12-10T22:47:00.056601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T22:47:00.254518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

키워드키워드분류키워드길이키워드포함게시글작성비율게시글내키워드평균빈도게시글내키워드평균중요도연월
0가가명사21172023-01
1가감명사20182023-01
2가건물명사30172023-01
3가게명사234252023-01
4가게동사202102023-01
5가게내동사30182023-01
6가게내놓을까동사60182023-01
7가게내놨습니다동사70182023-01
8가게라며동사40182023-01
9가게서도동사40182023-01
키워드키워드분류키워드길이키워드포함게시글작성비율게시글내키워드평균빈도게시글내키워드평균중요도연월
20가게였어요동사50182023-01
21가게오지동사40182023-01
22가게을동사30182023-01
23가게이전을동사50182023-01
24가게인데도동사50182023-01
25가게있음동사40182023-01
26가게주인동사40182023-01
27가게하나가동사50182023-01
28가게하나를동사50182023-01
29가게하나에서동사60182023-01