Overview

Dataset statistics

Number of variables7
Number of observations30
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.9 KiB
Average record size in memory64.4 B

Variable types

Text1
Categorical4
Numeric1
DateTime1

Dataset

Description샘플 데이터
Author한국신용데이터
URLhttps://bigdata-region.kr/#/dataset/033a5948-c562-403d-815a-480db6c9a6c4

Alerts

연월 has constant value ""Constant
키워드포함게시글작성비율 is highly overall correlated with 키워드분류 and 2 other fieldsHigh correlation
게시글내키워드평균중요도 is highly overall correlated with 키워드포함게시글작성비율 and 1 other fieldsHigh correlation
게시글내키워드평균빈도 is highly overall correlated with 키워드포함게시글작성비율 and 1 other fieldsHigh correlation
키워드길이 is highly overall correlated with 키워드분류High correlation
키워드분류 is highly overall correlated with 키워드길이 and 1 other fieldsHigh correlation
키워드포함게시글작성비율 is highly imbalanced (68.6%)Imbalance
게시글내키워드평균빈도 is highly imbalanced (64.7%)Imbalance
게시글내키워드평균중요도 is highly imbalanced (64.1%)Imbalance

Reproduction

Analysis started2023-12-10 13:45:15.200942
Analysis finished2023-12-10 13:45:16.645022
Duration1.44 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct29
Distinct (%)96.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
2023-12-10T22:45:16.839371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length5.5
Mean length3.9666667
Min length2

Characters and Unicode

Total characters119
Distinct characters36
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique28 ?
Unique (%)93.3%

Sample

1st row가가
2nd row가감
3rd row가거나
4th row가걱
5th row가건물
ValueCountFrequency (%)
가게 2
 
6.7%
가가 1
 
3.3%
가게안 1
 
3.3%
가게있 1
 
3.3%
가게인수하실 1
 
3.3%
가게인수하고 1
 
3.3%
가게인수 1
 
3.3%
가게인데요 1
 
3.3%
가게인데도 1
 
3.3%
가게을 1
 
3.3%
Other values (19) 19
63.3%
2023-12-10T22:45:17.437713image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
32
26.9%
25
21.0%
6
 
5.0%
5
 
4.2%
4
 
3.4%
4
 
3.4%
4
 
3.4%
3
 
2.5%
3
 
2.5%
3
 
2.5%
Other values (26) 30
25.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 119
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
32
26.9%
25
21.0%
6
 
5.0%
5
 
4.2%
4
 
3.4%
4
 
3.4%
4
 
3.4%
3
 
2.5%
3
 
2.5%
3
 
2.5%
Other values (26) 30
25.2%

Most occurring scripts

ValueCountFrequency (%)
Hangul 119
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
32
26.9%
25
21.0%
6
 
5.0%
5
 
4.2%
4
 
3.4%
4
 
3.4%
4
 
3.4%
3
 
2.5%
3
 
2.5%
3
 
2.5%
Other values (26) 30
25.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 119
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
32
26.9%
25
21.0%
6
 
5.0%
5
 
4.2%
4
 
3.4%
4
 
3.4%
4
 
3.4%
3
 
2.5%
3
 
2.5%
3
 
2.5%
Other values (26) 30
25.2%

키워드분류
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
동사
25 
명사

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row명사
2nd row명사
3rd row동사
4th row명사
5th row명사

Common Values

ValueCountFrequency (%)
동사 25
83.3%
명사 5
 
16.7%

Length

2023-12-10T22:45:17.634854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T22:45:17.753615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
동사 25
83.3%
명사 5
 
16.7%

키워드길이
Real number (ℝ)

HIGH CORRELATION 

Distinct6
Distinct (%)20.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.9666667
Minimum2
Maximum8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size402.0 B
2023-12-10T22:45:17.876811image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile2
Q13
median4
Q35
95-th percentile6
Maximum8
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.4499306
Coefficient of variation (CV)0.36552873
Kurtosis0.53709605
Mean3.9666667
Median Absolute Deviation (MAD)1
Skewness0.64357608
Sum119
Variance2.1022989
MonotonicityNot monotonic
2023-12-10T22:45:18.052445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
4 8
26.7%
3 7
23.3%
5 6
20.0%
2 5
16.7%
6 3
 
10.0%
8 1
 
3.3%
ValueCountFrequency (%)
2 5
16.7%
3 7
23.3%
4 8
26.7%
5 6
20.0%
6 3
 
10.0%
8 1
 
3.3%
ValueCountFrequency (%)
8 1
 
3.3%
6 3
 
10.0%
5 6
20.0%
4 8
26.7%
3 7
23.3%
2 5
16.7%

키워드포함게시글작성비율
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)13.3%
Missing0
Missing (%)0.0%
Memory size372.0 B
0
27 
2
 
1
39
 
1
1
 
1

Length

Max length2
Median length1
Mean length1.0333333
Min length1

Unique

Unique3 ?
Unique (%)10.0%

Sample

1st row2
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 27
90.0%
2 1
 
3.3%
39 1
 
3.3%
1 1
 
3.3%

Length

2023-12-10T22:45:18.253853image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T22:45:18.560817image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 27
90.0%
2 1
 
3.3%
39 1
 
3.3%
1 1
 
3.3%

게시글내키워드평균빈도
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
1
28 
2
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 28
93.3%
2 2
 
6.7%

Length

2023-12-10T22:45:18.840203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T22:45:19.140553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 28
93.3%
2 2
 
6.7%

게시글내키워드평균중요도
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)16.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
7
26 
5
 
1
15
 
1
4
 
1
6
 
1

Length

Max length2
Median length1
Mean length1.0333333
Min length1

Unique

Unique4 ?
Unique (%)13.3%

Sample

1st row5
2nd row7
3rd row7
4th row7
5th row7

Common Values

ValueCountFrequency (%)
7 26
86.7%
5 1
 
3.3%
15 1
 
3.3%
4 1
 
3.3%
6 1
 
3.3%

Length

2023-12-10T22:45:19.337736image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T22:45:19.503011image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
7 26
86.7%
5 1
 
3.3%
15 1
 
3.3%
4 1
 
3.3%
6 1
 
3.3%

연월
Date

CONSTANT 

Distinct1
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Memory size372.0 B
Minimum2023-07-01 00:00:00
Maximum2023-07-01 00:00:00
2023-12-10T22:45:19.658426image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T22:45:19.802628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

Interactions

2023-12-10T22:45:15.955271image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T22:45:19.916678image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
키워드키워드분류키워드길이키워드포함게시글작성비율게시글내키워드평균빈도게시글내키워드평균중요도
키워드1.0000.0001.0000.0001.0000.000
키워드분류0.0001.0000.9030.7450.0000.423
키워드길이1.0000.9031.0000.0000.6540.000
키워드포함게시글작성비율0.0000.7450.0001.0000.8541.000
게시글내키워드평균빈도1.0000.0000.6540.8541.0001.000
게시글내키워드평균중요도0.0000.4230.0001.0001.0001.000
2023-12-10T22:45:20.432288image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
키워드포함게시글작성비율키워드분류게시글내키워드평균중요도게시글내키워드평균빈도
키워드포함게시글작성비율1.0000.5150.9810.628
키워드분류0.5151.0000.4830.000
게시글내키워드평균중요도0.9810.4831.0000.945
게시글내키워드평균빈도0.6280.0000.9451.000
2023-12-10T22:45:20.585833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
키워드길이키워드분류키워드포함게시글작성비율게시글내키워드평균빈도게시글내키워드평균중요도
키워드길이1.0000.6670.0000.4370.000
키워드분류0.6671.0000.5150.0000.483
키워드포함게시글작성비율0.0000.5151.0000.6280.981
게시글내키워드평균빈도0.4370.0000.6281.0000.945
게시글내키워드평균중요도0.0000.4830.9810.9451.000

Missing values

2023-12-10T22:45:16.200402image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T22:45:16.527462image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

키워드키워드분류키워드길이키워드포함게시글작성비율게시글내키워드평균빈도게시글내키워드평균중요도연월
0가가명사22152023-07
1가감명사20172023-07
2가거나동사30172023-07
3가걱명사20172023-07
4가건물명사30172023-07
5가게동사202152023-07
6가게명사239242023-07
7가게가서동사40172023-07
8가게거동사30172023-07
9가게내놓고동사50172023-07
키워드키워드분류키워드길이키워드포함게시글작성비율게시글내키워드평균빈도게시글내키워드평균중요도연월
20가게안하고싶네요동사80172023-07
21가게였는데동사50172023-07
22가게을동사30172023-07
23가게인데도동사50172023-07
24가게인데요동사50172023-07
25가게인수동사40172023-07
26가게인수하고동사60172023-07
27가게인수하실동사60172023-07
28가게있동사30172023-07
29가게주인동사40172023-07