Overview

Dataset statistics

Number of variables9
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.5 KiB
Average record size in memory76.3 B

Variable types

Numeric2
DateTime1
Categorical5
Text1

Dataset

Description샘플 데이터
Author성균관대학교 산학협력단
URLhttps://www.bigdata-environment.kr/user/data_market/detail.do?id=b9de2350-e842-11ea-a837-83d4a69b8aa7

Alerts

연월일 has constant value ""Constant
환경플랫폼 하위 도메인명 has constant value ""Constant
SNS 채널명 has constant value ""Constant
일간연관어연번 is highly overall correlated with 도메인 하위 카테고리명High correlation
일간연관어단어량 is highly overall correlated with 일간연관어언급량High correlation
도메인 하위 카테고리명 is highly overall correlated with 일간연관어연번High correlation
일간연관어언급량 is highly overall correlated with 일간연관어단어량High correlation
일간연관어언급량 is highly imbalanced (59.8%)Imbalance
일간연관어연번 has unique valuesUnique

Reproduction

Analysis started2024-04-22 00:29:55.088313
Analysis finished2024-04-22 00:29:56.155833
Duration1.07 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

일간연관어연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean50.5
Minimum1
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2024-04-22T09:29:56.250467image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5.95
Q125.75
median50.5
Q375.25
95-th percentile95.05
Maximum100
Range99
Interquartile range (IQR)49.5

Descriptive statistics

Standard deviation29.011492
Coefficient of variation (CV)0.57448499
Kurtosis-1.2
Mean50.5
Median Absolute Deviation (MAD)25
Skewness0
Sum5050
Variance841.66667
MonotonicityStrictly increasing
2024-04-22T09:29:56.792658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
1.0%
65 1
 
1.0%
75 1
 
1.0%
74 1
 
1.0%
73 1
 
1.0%
72 1
 
1.0%
71 1
 
1.0%
70 1
 
1.0%
69 1
 
1.0%
68 1
 
1.0%
Other values (90) 90
90.0%
ValueCountFrequency (%)
1 1
1.0%
2 1
1.0%
3 1
1.0%
4 1
1.0%
5 1
1.0%
6 1
1.0%
7 1
1.0%
8 1
1.0%
9 1
1.0%
10 1
1.0%
ValueCountFrequency (%)
100 1
1.0%
99 1
1.0%
98 1
1.0%
97 1
1.0%
96 1
1.0%
95 1
1.0%
94 1
1.0%
93 1
1.0%
92 1
1.0%
91 1
1.0%

연월일
Date

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
Minimum2020-10-05 00:00:00
Maximum2020-10-05 00:00:00
2024-04-22T09:29:56.921424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-22T09:29:57.032965image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)
Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
물환경
100 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row물환경
2nd row물환경
3rd row물환경
4th row물환경
5th row물환경

Common Values

ValueCountFrequency (%)
물환경 100
100.0%

Length

2024-04-22T09:29:57.155451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-22T09:29:57.255421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
물환경 100
100.0%

도메인 하위 카테고리명
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
상수도
77 
지하수
23 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row상수도
2nd row상수도
3rd row상수도
4th row상수도
5th row상수도

Common Values

ValueCountFrequency (%)
상수도 77
77.0%
지하수 23
 
23.0%

Length

2024-04-22T09:29:57.360237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-22T09:29:57.480753image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
상수도 77
77.0%
지하수 23
 
23.0%

SNS 채널명
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
patent
100 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowpatent
2nd rowpatent
3rd rowpatent
4th rowpatent
5th rowpatent

Common Values

ValueCountFrequency (%)
patent 100
100.0%

Length

2024-04-22T09:29:57.641573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-22T09:29:57.768335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
patent 100
100.0%

단어속성명
Categorical

Distinct8
Distinct (%)8.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
속성
44 
라이프
21 
장소
12 
기타
11 
단체
Other values (3)

Length

Max length3
Median length2
Mean length2.22
Min length2

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row장소
2nd row라이프
3rd row속성
4th row단체
5th row장소

Common Values

ValueCountFrequency (%)
속성 44
44.0%
라이프 21
21.0%
장소 12
 
12.0%
기타 11
 
11.0%
단체 6
 
6.0%
인물 3
 
3.0%
상품 2
 
2.0%
브랜드 1
 
1.0%

Length

2024-04-22T09:29:57.880018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-22T09:29:58.004679image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
속성 44
44.0%
라이프 21
21.0%
장소 12
 
12.0%
기타 11
 
11.0%
단체 6
 
6.0%
인물 3
 
3.0%
상품 2
 
2.0%
브랜드 1
 
1.0%
Distinct77
Distinct (%)77.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2024-04-22T09:29:58.293892image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length2
Mean length2.54
Min length2

Characters and Unicode

Total characters254
Distinct characters124
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique54 ?
Unique (%)54.0%

Sample

1st row강서구
2nd row건강
3rd row계면활성제
4th row고려대학교
5th row공간
ValueCountFrequency (%)
대한민국 2
 
2.0%
과제 2
 
2.0%
대학 2
 
2.0%
도면 2
 
2.0%
아미노산 2
 
2.0%
성북구 2
 
2.0%
서울특별시 2
 
2.0%
산학 2
 
2.0%
부처 2
 
2.0%
범위 2
 
2.0%
Other values (67) 80
80.0%
2024-04-22T09:29:58.879218image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9
 
3.5%
8
 
3.1%
7
 
2.8%
7
 
2.8%
6
 
2.4%
6
 
2.4%
6
 
2.4%
5
 
2.0%
4
 
1.6%
4
 
1.6%
Other values (114) 192
75.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 254
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9
 
3.5%
8
 
3.1%
7
 
2.8%
7
 
2.8%
6
 
2.4%
6
 
2.4%
6
 
2.4%
5
 
2.0%
4
 
1.6%
4
 
1.6%
Other values (114) 192
75.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 254
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9
 
3.5%
8
 
3.1%
7
 
2.8%
7
 
2.8%
6
 
2.4%
6
 
2.4%
6
 
2.4%
5
 
2.0%
4
 
1.6%
4
 
1.6%
Other values (114) 192
75.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 254
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
9
 
3.5%
8
 
3.1%
7
 
2.8%
7
 
2.8%
6
 
2.4%
6
 
2.4%
6
 
2.4%
5
 
2.0%
4
 
1.6%
4
 
1.6%
Other values (114) 192
75.6%

일간연관어언급량
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
1
92 
2
 
8

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 92
92.0%
2 8
 
8.0%

Length

2024-04-22T09:29:59.056515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-22T09:29:59.180730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 92
92.0%
2 8
 
8.0%

일간연관어단어량
Real number (ℝ)

HIGH CORRELATION 

Distinct11
Distinct (%)11.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.42
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2024-04-22T09:29:59.285154image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q33
95-th percentile7.05
Maximum12
Range11
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.1328598
Coefficient of variation (CV)0.88134702
Kurtosis5.6196363
Mean2.42
Median Absolute Deviation (MAD)1
Skewness2.2384361
Sum242
Variance4.5490909
MonotonicityNot monotonic
2024-04-22T09:29:59.405018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
1 48
48.0%
2 18
 
18.0%
3 16
 
16.0%
4 7
 
7.0%
6 3
 
3.0%
5 2
 
2.0%
8 2
 
2.0%
12 1
 
1.0%
9 1
 
1.0%
7 1
 
1.0%
ValueCountFrequency (%)
1 48
48.0%
2 18
 
18.0%
3 16
 
16.0%
4 7
 
7.0%
5 2
 
2.0%
6 3
 
3.0%
7 1
 
1.0%
8 2
 
2.0%
9 1
 
1.0%
10 1
 
1.0%
ValueCountFrequency (%)
12 1
 
1.0%
10 1
 
1.0%
9 1
 
1.0%
8 2
 
2.0%
7 1
 
1.0%
6 3
 
3.0%
5 2
 
2.0%
4 7
 
7.0%
3 16
16.0%
2 18
18.0%

Interactions

2024-04-22T09:29:55.731685image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-22T09:29:55.534163image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-22T09:29:55.823738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-22T09:29:55.639655image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-22T09:29:59.501226image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
일간연관어연번도메인 하위 카테고리명단어속성명연관어명일간연관어언급량일간연관어단어량
일간연관어연번1.0000.9940.0000.0000.1570.000
도메인 하위 카테고리명0.9941.0000.0000.0000.0970.000
단어속성명0.0000.0001.0001.0000.0000.000
연관어명0.0000.0001.0001.0000.0000.000
일간연관어언급량0.1570.0970.0000.0001.0000.759
일간연관어단어량0.0000.0000.0000.0000.7591.000
2024-04-22T09:29:59.630120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
도메인 하위 카테고리명일간연관어언급량단어속성명
도메인 하위 카테고리명1.0000.0610.000
일간연관어언급량0.0611.0000.000
단어속성명0.0000.0001.000
2024-04-22T09:29:59.741112image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
일간연관어연번일간연관어단어량도메인 하위 카테고리명단어속성명일간연관어언급량
일간연관어연번1.000-0.0500.8940.0000.112
일간연관어단어량-0.0501.0000.0000.0000.572
도메인 하위 카테고리명0.8940.0001.0000.0000.061
단어속성명0.0000.0000.0001.0000.000
일간연관어언급량0.1120.5720.0610.0001.000

Missing values

2024-04-22T09:29:55.945910image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-22T09:29:56.094336image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

일간연관어연번연월일환경플랫폼 하위 도메인명도메인 하위 카테고리명SNS 채널명단어속성명연관어명일간연관어언급량일간연관어단어량
012020-10-05물환경상수도patent장소강서구13
122020-10-05물환경상수도patent라이프건강11
232020-10-05물환경상수도patent속성계면활성제11
342020-10-05물환경상수도patent단체고려대학교13
452020-10-05물환경상수도patent장소공간12
562020-10-05물환경상수도patent속성과제15
672020-10-05물환경상수도patent라이프과학11
782020-10-05물환경상수도patent라이프관리11
892020-10-05물환경상수도patent라이프국가11
9102020-10-05물환경상수도patent속성국제22
일간연관어연번연월일환경플랫폼 하위 도메인명도메인 하위 카테고리명SNS 채널명단어속성명연관어명일간연관어언급량일간연관어단어량
90912020-10-05물환경지하수patent기타발명16
91922020-10-05물환경지하수patent속성범위13
92932020-10-05물환경지하수patent속성부처11
93942020-10-05물환경지하수patent라이프사업13
94952020-10-05물환경지하수patent속성산학12
95962020-10-05물환경지하수patent장소서울특별시13
96972020-10-05물환경지하수patent장소성북구13
97982020-10-05물환경지하수patent속성심사12
98992020-10-05물환경지하수patent기타심재호11
991002020-10-05물환경지하수patent라이프아미노산12