Overview

Dataset statistics

Number of variables9
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.5 KiB
Average record size in memory76.3 B

Variable types

Numeric2
Categorical6
Text1

Dataset

Description샘플 데이터
Author성균관대학교 산학협력단
URLhttps://www.bigdata-environment.kr/user/data_market/detail.do?id=b9de2350-e842-11ea-a837-83d4a69b8aa7

Alerts

연월일 has constant value ""Constant
환경플랫폼 하위 도메인명 has constant value ""Constant
도메인 하위 카테고리명 has constant value ""Constant
SNS 채널명 has constant value ""Constant
일간연관어언급량 is highly imbalanced (80.6%)Imbalance
일간연관어연번 has unique valuesUnique
연관어명 has unique valuesUnique

Reproduction

Analysis started2024-04-19 21:47:51.070934
Analysis finished2024-04-19 21:47:51.897842
Duration0.83 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

일간연관어연번
Real number (ℝ)

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean50.5
Minimum1
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2024-04-20T06:47:51.975726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5.95
Q125.75
median50.5
Q375.25
95-th percentile95.05
Maximum100
Range99
Interquartile range (IQR)49.5

Descriptive statistics

Standard deviation29.011492
Coefficient of variation (CV)0.57448499
Kurtosis-1.2
Mean50.5
Median Absolute Deviation (MAD)25
Skewness0
Sum5050
Variance841.66667
MonotonicityStrictly increasing
2024-04-20T06:47:52.111868image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
1.0%
65 1
 
1.0%
75 1
 
1.0%
74 1
 
1.0%
73 1
 
1.0%
72 1
 
1.0%
71 1
 
1.0%
70 1
 
1.0%
69 1
 
1.0%
68 1
 
1.0%
Other values (90) 90
90.0%
ValueCountFrequency (%)
1 1
1.0%
2 1
1.0%
3 1
1.0%
4 1
1.0%
5 1
1.0%
6 1
1.0%
7 1
1.0%
8 1
1.0%
9 1
1.0%
10 1
1.0%
ValueCountFrequency (%)
100 1
1.0%
99 1
1.0%
98 1
1.0%
97 1
1.0%
96 1
1.0%
95 1
1.0%
94 1
1.0%
93 1
1.0%
92 1
1.0%
91 1
1.0%

연월일
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2020-01-02
100 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020-01-02
2nd row2020-01-02
3rd row2020-01-02
4th row2020-01-02
5th row2020-01-02

Common Values

ValueCountFrequency (%)
2020-01-02 100
100.0%

Length

2024-04-20T06:47:52.243461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-20T06:47:52.333819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020-01-02 100
100.0%
Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
물환경
100 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row물환경
2nd row물환경
3rd row물환경
4th row물환경
5th row물환경

Common Values

ValueCountFrequency (%)
물환경 100
100.0%

Length

2024-04-20T06:47:52.426178image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-20T06:47:52.501756image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
물환경 100
100.0%

도메인 하위 카테고리명
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
상수도
100 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row상수도
2nd row상수도
3rd row상수도
4th row상수도
5th row상수도

Common Values

ValueCountFrequency (%)
상수도 100
100.0%

Length

2024-04-20T06:47:52.595640image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-20T06:47:52.687174image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
상수도 100
100.0%

SNS 채널명
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
patent
100 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowpatent
2nd rowpatent
3rd rowpatent
4th rowpatent
5th rowpatent

Common Values

ValueCountFrequency (%)
patent 100
100.0%

Length

2024-04-20T06:47:52.780041image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-20T06:47:52.873770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
patent 100
100.0%

단어속성명
Categorical

Distinct10
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
속성
57 
기타
17 
라이프
11 
장소
 
5
상품
 
4
Other values (5)

Length

Max length6
Median length2
Mean length2.17
Min length2

Unique

Unique4 ?
Unique (%)4.0%

Sample

1st row속성
2nd row속성
3rd row기타
4th row라이프
5th row속성

Common Values

ValueCountFrequency (%)
속성 57
57.0%
기타 17
 
17.0%
라이프 11
 
11.0%
장소 5
 
5.0%
상품 4
 
4.0%
인물 2
 
2.0%
단체 1
 
1.0%
시간 1
 
1.0%
엔터테인먼트 1
 
1.0%
사회이슈 1
 
1.0%

Length

2024-04-20T06:47:52.988490image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-20T06:47:53.103700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
속성 57
57.0%
기타 17
 
17.0%
라이프 11
 
11.0%
장소 5
 
5.0%
상품 4
 
4.0%
인물 2
 
2.0%
단체 1
 
1.0%
시간 1
 
1.0%
엔터테인먼트 1
 
1.0%
사회이슈 1
 
1.0%

연관어명
Text

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2024-04-20T06:47:53.376755image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length2
Mean length2.28
Min length2

Characters and Unicode

Total characters228
Distinct characters126
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique100 ?
Unique (%)100.0%

Sample

1st row가변
2nd row가스
3rd row가압
4th row가이드
5th row간격
ValueCountFrequency (%)
가변 1
 
1.0%
매설 1
 
1.0%
발열 1
 
1.0%
반하다 1
 
1.0%
반응 1
 
1.0%
반시 1
 
1.0%
반복 1
 
1.0%
바깥쪽 1
 
1.0%
밀폐 1
 
1.0%
미생물 1
 
1.0%
Other values (90) 90
90.0%
2024-04-20T06:47:53.788781image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8
 
3.5%
7
 
3.1%
7
 
3.1%
7
 
3.1%
6
 
2.6%
6
 
2.6%
6
 
2.6%
5
 
2.2%
5
 
2.2%
4
 
1.8%
Other values (116) 167
73.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 228
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8
 
3.5%
7
 
3.1%
7
 
3.1%
7
 
3.1%
6
 
2.6%
6
 
2.6%
6
 
2.6%
5
 
2.2%
5
 
2.2%
4
 
1.8%
Other values (116) 167
73.2%

Most occurring scripts

ValueCountFrequency (%)
Hangul 228
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8
 
3.5%
7
 
3.1%
7
 
3.1%
7
 
3.1%
6
 
2.6%
6
 
2.6%
6
 
2.6%
5
 
2.2%
5
 
2.2%
4
 
1.8%
Other values (116) 167
73.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 228
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
8
 
3.5%
7
 
3.1%
7
 
3.1%
7
 
3.1%
6
 
2.6%
6
 
2.6%
6
 
2.6%
5
 
2.2%
5
 
2.2%
4
 
1.8%
Other values (116) 167
73.2%

일간연관어언급량
Categorical

IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
1
97 
2
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 97
97.0%
2 3
 
3.0%

Length

2024-04-20T06:47:53.936514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-20T06:47:54.014467image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 97
97.0%
2 3
 
3.0%

일간연관어단어량
Real number (ℝ)

Distinct27
Distinct (%)27.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.34
Minimum1
Maximum184
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2024-04-20T06:47:54.106432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q36.25
95-th percentile46.55
Maximum184
Range183
Interquartile range (IQR)5.25

Descriptive statistics

Standard deviation27.66839
Coefficient of variation (CV)2.4398933
Kurtosis22.349437
Mean11.34
Median Absolute Deviation (MAD)1
Skewness4.4463332
Sum1134
Variance765.5398
MonotonicityNot monotonic
2024-04-20T06:47:54.237231image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
1 44
44.0%
2 15
 
15.0%
4 6
 
6.0%
5 4
 
4.0%
3 4
 
4.0%
9 3
 
3.0%
13 2
 
2.0%
6 2
 
2.0%
17 2
 
2.0%
184 1
 
1.0%
Other values (17) 17
 
17.0%
ValueCountFrequency (%)
1 44
44.0%
2 15
 
15.0%
3 4
 
4.0%
4 6
 
6.0%
5 4
 
4.0%
6 2
 
2.0%
7 1
 
1.0%
8 1
 
1.0%
9 3
 
3.0%
11 1
 
1.0%
ValueCountFrequency (%)
184 1
1.0%
155 1
1.0%
92 1
1.0%
81 1
1.0%
57 1
1.0%
46 1
1.0%
41 1
1.0%
39 1
1.0%
36 1
1.0%
35 1
1.0%

Interactions

2024-04-20T06:47:51.499032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-20T06:47:51.336580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-20T06:47:51.572007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-20T06:47:51.412947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-20T06:47:54.339170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
일간연관어연번단어속성명연관어명일간연관어언급량일간연관어단어량
일간연관어연번1.0000.0001.0000.0000.000
단어속성명0.0001.0001.0000.0000.068
연관어명1.0001.0001.0001.0001.000
일간연관어언급량0.0000.0001.0001.0000.000
일간연관어단어량0.0000.0681.0000.0001.000
2024-04-20T06:47:54.436612image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
일간연관어언급량단어속성명
일간연관어언급량1.0000.000
단어속성명0.0001.000
2024-04-20T06:47:54.509453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
일간연관어연번일간연관어단어량단어속성명일간연관어언급량
일간연관어연번1.0000.0800.0000.000
일간연관어단어량0.0801.0000.0130.000
단어속성명0.0000.0131.0000.000
일간연관어언급량0.0000.0000.0001.000

Missing values

2024-04-20T06:47:51.698070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-20T06:47:51.840555image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

일간연관어연번연월일환경플랫폼 하위 도메인명도메인 하위 카테고리명SNS 채널명단어속성명연관어명일간연관어언급량일간연관어단어량
012020-01-02물환경상수도patent속성가변126
122020-01-02물환경상수도patent속성가스13
232020-01-02물환경상수도patent기타가압11
342020-01-02물환경상수도patent라이프가이드16
452020-01-02물환경상수도patent속성간격11
562020-01-02물환경상수도patent속성간섭11
672020-01-02물환경상수도patent기타감싸다111
782020-01-02물환경상수도patent장소강원도13
892020-01-02물환경상수도patent기타개구부19
9102020-01-02물환경상수도patent기타개략12
일간연관어연번연월일환경플랫폼 하위 도메인명도메인 하위 카테고리명SNS 채널명단어속성명연관어명일간연관어언급량일간연관어단어량
90912020-01-02물환경상수도patent기타변환12
91922020-01-02물환경상수도patent속성병렬12
92932020-01-02물환경상수도patent라이프보조11
93942020-01-02물환경상수도patent속성본체146
94952020-01-02물환경상수도patent상품볼트128
95962020-01-02물환경상수도patent속성부대11
96972020-01-02물환경상수도patent사회이슈부도11
97982020-01-02물환경상수도patent장소부산11
98992020-01-02물환경상수도patent속성부산물14
991002020-01-02물환경상수도patent기타부적합11