Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory654.3 KiB
Average record size in memory67.0 B

Variable types

Categorical3
Text1
Numeric3

Dataset

Description뉴스기반 통계검색 서비스 내의 주요 키워드, 키워드 관계망 그래프 작성을 위한 주간 키워드 데이터 집계 및 분석 자료입니다.
URLhttps://www.data.go.kr/data/15121130/fileData.do

Alerts

is highly overall correlated with 등록일자High correlation
등록일자 is highly overall correlated with High correlation
주간단어개수 is highly overall correlated with 주간랭크 and 1 other fieldsHigh correlation
주간랭크 is highly overall correlated with 주간단어개수High correlation
주간합계건수 is highly overall correlated with 주간단어개수High correlation

Reproduction

Analysis started2023-12-12 12:08:50.193328
Analysis finished2023-12-12 12:08:53.478520
Duration3.29 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables


Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
20220828-20220903
1606 
20220821-20220827
1586 
20220904-20220910
1579 
20220814-20220820
1577 
20220807-20220813
1565 
Other values (2)
2087 

Length

Max length17
Median length17
Mean length17
Min length17

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20220828-20220903
2nd row20220821-20220827
3rd row20220731-20220806
4th row20220814-20220820
5th row20220911-20220917

Common Values

ValueCountFrequency (%)
20220828-20220903 1606
16.1%
20220821-20220827 1586
15.9%
20220904-20220910 1579
15.8%
20220814-20220820 1577
15.8%
20220807-20220813 1565
15.7%
20220731-20220806 1552
15.5%
20220911-20220917 535
 
5.3%

Length

2023-12-12T21:08:53.571718image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:08:53.718133image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20220828-20220903 1606
16.1%
20220821-20220827 1586
15.9%
20220904-20220910 1579
15.8%
20220814-20220820 1577
15.8%
20220807-20220813 1565
15.7%
20220731-20220806 1552
15.5%
20220911-20220917 535
 
5.3%

단어
Text

Distinct5007
Distinct (%)50.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T21:08:54.177135image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length2
Mean length2.6529
Min length1

Characters and Unicode

Total characters26529
Distinct characters779
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2618 ?
Unique (%)26.2%

Sample

1st row시공
2nd row개가
3rd row가게
4th row해남
5th row독감유행
ValueCountFrequency (%)
국가 10
 
0.1%
안팎 9
 
0.1%
침체 9
 
0.1%
연합뉴스 9
 
0.1%
물량 9
 
0.1%
사실 9
 
0.1%
능력 8
 
0.1%
인정 8
 
0.1%
회복 8
 
0.1%
활성화 8
 
0.1%
Other values (4996) 9913
99.1%
2023-12-12T21:08:54.827987image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
423
 
1.6%
385
 
1.5%
368
 
1.4%
350
 
1.3%
350
 
1.3%
329
 
1.2%
311
 
1.2%
307
 
1.2%
304
 
1.1%
298
 
1.1%
Other values (769) 23104
87.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 25876
97.5%
Uppercase Letter 533
 
2.0%
Lowercase Letter 110
 
0.4%
Decimal Number 10
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
423
 
1.6%
385
 
1.5%
368
 
1.4%
350
 
1.4%
350
 
1.4%
329
 
1.3%
311
 
1.2%
307
 
1.2%
304
 
1.2%
298
 
1.2%
Other values (722) 22451
86.8%
Uppercase Letter
ValueCountFrequency (%)
C 52
 
9.8%
S 45
 
8.4%
B 37
 
6.9%
T 34
 
6.4%
M 30
 
5.6%
D 26
 
4.9%
P 26
 
4.9%
G 25
 
4.7%
A 24
 
4.5%
I 24
 
4.5%
Other values (14) 210
39.4%
Lowercase Letter
ValueCountFrequency (%)
e 13
11.8%
o 11
 
10.0%
t 10
 
9.1%
w 8
 
7.3%
s 8
 
7.3%
a 7
 
6.4%
i 7
 
6.4%
n 6
 
5.5%
p 5
 
4.5%
m 5
 
4.5%
Other values (11) 30
27.3%
Decimal Number
ValueCountFrequency (%)
1 5
50.0%
9 5
50.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 25876
97.5%
Latin 643
 
2.4%
Common 10
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
423
 
1.6%
385
 
1.5%
368
 
1.4%
350
 
1.4%
350
 
1.4%
329
 
1.3%
311
 
1.2%
307
 
1.2%
304
 
1.2%
298
 
1.2%
Other values (722) 22451
86.8%
Latin
ValueCountFrequency (%)
C 52
 
8.1%
S 45
 
7.0%
B 37
 
5.8%
T 34
 
5.3%
M 30
 
4.7%
D 26
 
4.0%
P 26
 
4.0%
G 25
 
3.9%
A 24
 
3.7%
I 24
 
3.7%
Other values (35) 320
49.8%
Common
ValueCountFrequency (%)
1 5
50.0%
9 5
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 25876
97.5%
ASCII 653
 
2.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
423
 
1.6%
385
 
1.5%
368
 
1.4%
350
 
1.4%
350
 
1.4%
329
 
1.3%
311
 
1.2%
307
 
1.2%
304
 
1.2%
298
 
1.2%
Other values (722) 22451
86.8%
ASCII
ValueCountFrequency (%)
C 52
 
8.0%
S 45
 
6.9%
B 37
 
5.7%
T 34
 
5.2%
M 30
 
4.6%
D 26
 
4.0%
P 26
 
4.0%
G 25
 
3.8%
A 24
 
3.7%
I 24
 
3.7%
Other values (37) 330
50.5%

등록일자
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2022-09-03
1606 
2022-08-27
1586 
2022-09-10
1579 
2022-08-20
1577 
2022-08-13
1565 
Other values (2)
2087 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022-09-03
2nd row2022-08-27
3rd row2022-08-06
4th row2022-08-20
5th row2022-09-17

Common Values

ValueCountFrequency (%)
2022-09-03 1606
16.1%
2022-08-27 1586
15.9%
2022-09-10 1579
15.8%
2022-08-20 1577
15.8%
2022-08-13 1565
15.7%
2022-08-06 1552
15.5%
2022-09-17 535
 
5.3%

Length

2023-12-12T21:08:54.995585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:08:55.131240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2022-09-03 1606
16.1%
2022-08-27 1586
15.9%
2022-09-10 1579
15.8%
2022-08-20 1577
15.8%
2022-08-13 1565
15.7%
2022-08-06 1552
15.5%
2022-09-17 535
 
5.3%

주간단어개수
Real number (ℝ)

HIGH CORRELATION 

Distinct1605
Distinct (%)16.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean377.8704
Minimum4
Maximum20040
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T21:08:55.296984image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum4
5-th percentile7
Q124
median87
Q3321
95-th percentile1634.05
Maximum20040
Range20036
Interquartile range (IQR)297

Descriptive statistics

Standard deviation951.81615
Coefficient of variation (CV)2.5188958
Kurtosis67.440283
Mean377.8704
Median Absolute Deviation (MAD)76
Skewness6.7739566
Sum3778704
Variance905953.98
MonotonicityNot monotonic
2023-12-12T21:08:55.449176image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6 207
 
2.1%
16 163
 
1.6%
22 162
 
1.6%
21 159
 
1.6%
7 159
 
1.6%
20 144
 
1.4%
24 142
 
1.4%
17 142
 
1.4%
8 133
 
1.3%
18 132
 
1.3%
Other values (1595) 8457
84.6%
ValueCountFrequency (%)
4 16
 
0.2%
5 124
1.2%
6 207
2.1%
7 159
1.6%
8 133
1.3%
9 114
1.1%
10 115
1.1%
11 84
0.8%
12 71
 
0.7%
13 71
 
0.7%
ValueCountFrequency (%)
20040 1
< 0.1%
15541 1
< 0.1%
14522 1
< 0.1%
13207 1
< 0.1%
12290 1
< 0.1%
11175 1
< 0.1%
10787 1
< 0.1%
10512 1
< 0.1%
10179 1
< 0.1%
10070 1
< 0.1%

주간랭크
Real number (ℝ)

HIGH CORRELATION 

Distinct2897
Distinct (%)29.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1490.6225
Minimum1
Maximum3000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T21:08:55.618436image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile137
Q1737.75
median1499
Q32244
95-th percentile2841
Maximum3000
Range2999
Interquartile range (IQR)1506.25

Descriptive statistics

Standard deviation872.95589
Coefficient of variation (CV)0.58563177
Kurtosis-1.2214308
Mean1490.6225
Median Absolute Deviation (MAD)753
Skewness-0.0048208467
Sum14906225
Variance762051.99
MonotonicityNot monotonic
2023-12-12T21:08:55.817347image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2211 11
 
0.1%
105 10
 
0.1%
791 10
 
0.1%
2684 9
 
0.1%
883 9
 
0.1%
872 9
 
0.1%
1628 9
 
0.1%
2758 9
 
0.1%
130 9
 
0.1%
1977 9
 
0.1%
Other values (2887) 9906
99.1%
ValueCountFrequency (%)
1 4
< 0.1%
2 6
0.1%
3 4
< 0.1%
4 4
< 0.1%
5 2
 
< 0.1%
6 3
< 0.1%
7 6
0.1%
8 6
0.1%
9 6
0.1%
11 5
0.1%
ValueCountFrequency (%)
3000 6
0.1%
2999 2
 
< 0.1%
2998 5
0.1%
2997 1
 
< 0.1%
2996 2
 
< 0.1%
2995 2
 
< 0.1%
2994 5
0.1%
2993 3
< 0.1%
2992 1
 
< 0.1%
2991 3
< 0.1%

주간합계건수
Real number (ℝ)

HIGH CORRELATION 

Distinct1245
Distinct (%)12.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean226.455
Minimum1
Maximum15264
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T21:08:55.969634image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q115
median52
Q3202
95-th percentile991.05
Maximum15264
Range15263
Interquartile range (IQR)187

Descriptive statistics

Standard deviation557.77365
Coefficient of variation (CV)2.4630662
Kurtosis97.342258
Mean226.455
Median Absolute Deviation (MAD)46
Skewness7.386956
Sum2264550
Variance311111.44
MonotonicityNot monotonic
2023-12-12T21:08:56.153682image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5 222
 
2.2%
6 205
 
2.1%
7 194
 
1.9%
4 185
 
1.8%
12 183
 
1.8%
14 178
 
1.8%
15 175
 
1.8%
13 174
 
1.7%
10 167
 
1.7%
8 165
 
1.7%
Other values (1235) 8152
81.5%
ValueCountFrequency (%)
1 135
1.4%
2 136
1.4%
3 131
1.3%
4 185
1.8%
5 222
2.2%
6 205
2.1%
7 194
1.9%
8 165
1.7%
9 158
1.6%
10 167
1.7%
ValueCountFrequency (%)
15264 1
< 0.1%
9680 1
< 0.1%
7735 1
< 0.1%
7067 1
< 0.1%
6721 1
< 0.1%
6446 1
< 0.1%
6382 1
< 0.1%
6209 1
< 0.1%
6046 1
< 0.1%
6028 1
< 0.1%
Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
COVID_SOC_KWD
2101 
COVID_ECO_KWD
2003 
INSTITUTE_KWD
1987 
FRMPRD_KWD
1972 
ECO_KWD
1937 

Length

Max length13
Median length13
Mean length11.2462
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCOVID_ECO_KWD
2nd rowCOVID_SOC_KWD
3rd rowFRMPRD_KWD
4th rowFRMPRD_KWD
5th rowCOVID_SOC_KWD

Common Values

ValueCountFrequency (%)
COVID_SOC_KWD 2101
21.0%
COVID_ECO_KWD 2003
20.0%
INSTITUTE_KWD 1987
19.9%
FRMPRD_KWD 1972
19.7%
ECO_KWD 1937
19.4%

Length

2023-12-12T21:08:56.394486image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:08:56.539744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
covid_soc_kwd 2101
21.0%
covid_eco_kwd 2003
20.0%
institute_kwd 1987
19.9%
frmprd_kwd 1972
19.7%
eco_kwd 1937
19.4%

Interactions

2023-12-12T21:08:52.655760image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:08:51.774969image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:08:52.227113image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:08:52.826470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:08:51.949812image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:08:52.385568image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:08:53.006129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:08:52.086564image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:08:52.502017image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T21:08:56.665570image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록일자주간단어개수주간랭크주간합계건수키워드별코드
1.0001.0000.0440.0740.0410.100
등록일자1.0001.0000.0440.0740.0410.100
주간단어개수0.0440.0441.0000.3900.8920.225
주간랭크0.0740.0740.3901.0000.3500.000
주간합계건수0.0410.0410.8920.3501.0000.183
키워드별코드0.1000.1000.2250.0000.1831.000
2023-12-12T21:08:56.799245image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
키워드별코드등록일자
키워드별코드1.0000.0640.064
0.0641.0001.000
등록일자0.0641.0001.000
2023-12-12T21:08:56.931876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
주간단어개수주간랭크주간합계건수등록일자키워드별코드
주간단어개수1.000-0.5180.9670.0230.0230.132
주간랭크-0.5181.000-0.4870.0370.0370.000
주간합계건수0.967-0.4871.0000.0220.0220.113
0.0230.0370.0221.0001.0000.064
등록일자0.0230.0370.0221.0001.0000.064
키워드별코드0.1320.0000.1130.0640.0641.000

Missing values

2023-12-12T21:08:53.197363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:08:53.406354image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

단어등록일자주간단어개수주간랭크주간합계건수키워드별코드
7205420220828-20220903시공2022-09-0320294317COVID_ECO_KWD
4941820220821-20220827개가2022-08-2721243121COVID_SOC_KWD
443120220731-20220806가게2022-08-061016644FRMPRD_KWD
4256020220814-20220820해남2022-08-20919327FRMPRD_KWD
9307620220911-20220917독감유행2022-09-1712439776COVID_SOC_KWD
4889220220821-20220827코딩2022-08-2721235313COVID_SOC_KWD
3604220220814-20220820중소기업계2022-08-2022273914COVID_ECO_KWD
841320220731-20220806병의원2022-08-0624198317COVID_SOC_KWD
8334920220904-20220910북상2022-09-104251059258ECO_KWD
6351320220828-20220903주방2022-09-031552362132ECO_KWD
단어등록일자주간단어개수주간랭크주간합계건수키워드별코드
178920220731-20220806세종뉴스2022-08-061895918FRMPRD_KWD
8820620220904-20220910만t2022-09-102775011FRMPRD_KWD
3069820220814-20220820구성원2022-08-201902149121ECO_KWD
112420220731-20220806농촌진흥청2022-08-062572216FRMPRD_KWD
7620620220904-20220910EU2022-09-1030221317COVID_ECO_KWD
3113420220814-20220820영세2022-08-20172234196ECO_KWD
2344820220807-20220813필수2022-08-133501134318ECO_KWD
3177020220814-20220820마무리2022-08-20549811455ECO_KWD
2107320220807-20220813비판2022-08-131003632686INSTITUTE_KWD
7624320220904-20220910호실2022-09-1029225021COVID_ECO_KWD