Overview

Dataset statistics

Number of variables4
Number of observations1262
Missing cells122
Missing cells (%)2.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory40.8 KiB
Average record size in memory33.1 B

Variable types

Text3
Numeric1

Dataset

Description경기도 경기통계시스템 출처
Author경기도
URLhttps://data.gg.go.kr/portal/data/service/selectServicePage.do?&infId=7Z0IQFV4O0V48NTF5QZJ33541341&infSeq=1

Alerts

최초실시년도 has 91 (7.2%) missing valuesMissing
영문통계조사명 has 31 (2.5%) missing valuesMissing
통계조사ID has unique valuesUnique

Reproduction

Analysis started2023-12-10 22:31:45.716552
Analysis finished2023-12-10 22:31:46.330016
Duration0.61 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

통계조사ID
Text

UNIQUE 

Distinct1262
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size10.0 KiB
2023-12-11T07:31:46.570351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length7
Mean length7.1568938
Min length7

Characters and Unicode

Total characters9032
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1262 ?
Unique (%)100.0%

Sample

1st row1991016
2nd row1992001
3rd row1992002
4th row1992003
5th row1992004
ValueCountFrequency (%)
1991016 1
 
0.1%
1971003 1
 
0.1%
1974002 1
 
0.1%
1974001 1
 
0.1%
1973002 1
 
0.1%
1973001 1
 
0.1%
1972002 1
 
0.1%
1975004 1
 
0.1%
1971004 1
 
0.1%
1971002 1
 
0.1%
Other values (1252) 1252
99.2%
2023-12-11T07:31:46.986957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 2810
31.1%
1 1389
15.4%
9 1288
14.3%
2 998
 
11.0%
6 569
 
6.3%
7 433
 
4.8%
3 403
 
4.5%
5 361
 
4.0%
8 354
 
3.9%
4 344
 
3.8%
Other values (2) 83
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8949
99.1%
Uppercase Letter 83
 
0.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2810
31.4%
1 1389
15.5%
9 1288
14.4%
2 998
 
11.2%
6 569
 
6.4%
7 433
 
4.8%
3 403
 
4.5%
5 361
 
4.0%
8 354
 
4.0%
4 344
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
B 80
96.4%
A 3
 
3.6%

Most occurring scripts

ValueCountFrequency (%)
Common 8949
99.1%
Latin 83
 
0.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2810
31.4%
1 1389
15.5%
9 1288
14.4%
2 998
 
11.2%
6 569
 
6.4%
7 433
 
4.8%
3 403
 
4.5%
5 361
 
4.0%
8 354
 
4.0%
4 344
 
3.8%
Latin
ValueCountFrequency (%)
B 80
96.4%
A 3
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9032
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2810
31.1%
1 1389
15.4%
9 1288
14.3%
2 998
 
11.0%
6 569
 
6.3%
7 433
 
4.8%
3 403
 
4.5%
5 361
 
4.0%
8 354
 
3.9%
4 344
 
3.8%
Other values (2) 83
 
0.9%

최초실시년도
Real number (ℝ)

MISSING 

Distinct67
Distinct (%)5.7%
Missing91
Missing (%)7.2%
Infinite0
Infinite (%)0.0%
Mean1991.9334
Minimum1910
Maximum2018
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.2 KiB
2023-12-11T07:31:47.350296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1910
5-th percentile1963
Q11981
median1997
Q32004
95-th percentile2007
Maximum2018
Range108
Interquartile range (IQR)23

Descriptive statistics

Standard deviation14.8142
Coefficient of variation (CV)0.007437096
Kurtosis2.6032776
Mean1991.9334
Median Absolute Deviation (MAD)9
Skewness-1.370025
Sum2332554
Variance219.46052
MonotonicityNot monotonic
2023-12-11T07:31:47.469041image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2006 143
 
11.3%
2007 75
 
5.9%
1976 56
 
4.4%
2001 53
 
4.2%
1999 51
 
4.0%
2005 51
 
4.0%
1975 49
 
3.9%
1998 49
 
3.9%
1994 42
 
3.3%
1996 41
 
3.2%
Other values (57) 561
44.5%
(Missing) 91
 
7.2%
ValueCountFrequency (%)
1910 3
0.2%
1925 1
 
0.1%
1936 1
 
0.1%
1937 1
 
0.1%
1938 1
 
0.1%
1940 1
 
0.1%
1946 2
0.2%
1948 3
0.2%
1949 1
 
0.1%
1952 2
0.2%
ValueCountFrequency (%)
2018 1
 
0.1%
2016 2
 
0.2%
2010 1
 
0.1%
2007 75
5.9%
2006 143
11.3%
2005 51
 
4.0%
2004 37
 
2.9%
2003 36
 
2.9%
2002 32
 
2.5%
2001 53
 
4.2%
Distinct1200
Distinct (%)95.1%
Missing0
Missing (%)0.0%
Memory size10.0 KiB
2023-12-11T07:31:47.649832image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length40
Median length28
Mean length10.744057
Min length2

Characters and Unicode

Total characters13559
Distinct characters422
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1171 ?
Unique (%)92.8%

Sample

1st row주요수입상품의경쟁력실태조사
2nd row임대공단및장기분할상환공단에대한수요조사
3rd row수입의파급효과와기업의대응방안조사
4th row고밀주택단지내시설배치에대한의식조사
5th row수출산업실태조사
ValueCountFrequency (%)
40
 
2.4%
실태조사 27
 
1.6%
조사 20
 
1.2%
주민등록인구통계 18
 
1.1%
교육통계 15
 
0.9%
대한 10
 
0.6%
기업의 10
 
0.6%
중소기업 8
 
0.5%
설비투자계획조사 7
 
0.4%
관한 6
 
0.4%
Other values (1407) 1518
90.4%
2023-12-11T07:31:47.989948image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
858
 
6.3%
773
 
5.7%
417
 
3.1%
407
 
3.0%
356
 
2.6%
345
 
2.5%
328
 
2.4%
323
 
2.4%
292
 
2.2%
205
 
1.5%
Other values (412) 9255
68.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 12847
94.7%
Space Separator 420
 
3.1%
Decimal Number 154
 
1.1%
Uppercase Letter 61
 
0.4%
Close Punctuation 25
 
0.2%
Open Punctuation 25
 
0.2%
Other Punctuation 25
 
0.2%
Lowercase Letter 1
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
858
 
6.7%
773
 
6.0%
407
 
3.2%
356
 
2.8%
345
 
2.7%
328
 
2.6%
323
 
2.5%
292
 
2.3%
205
 
1.6%
182
 
1.4%
Other values (369) 8778
68.3%
Uppercase Letter
ValueCountFrequency (%)
I 7
11.5%
D 7
11.5%
T 7
11.5%
B 5
8.2%
C 5
8.2%
R 4
 
6.6%
P 4
 
6.6%
F 4
 
6.6%
A 3
 
4.9%
G 3
 
4.9%
Other values (7) 12
19.7%
Decimal Number
ValueCountFrequency (%)
0 55
35.7%
2 26
16.9%
9 20
 
13.0%
1 17
 
11.0%
4 9
 
5.8%
5 8
 
5.2%
6 6
 
3.9%
8 6
 
3.9%
3 5
 
3.2%
7 2
 
1.3%
Other Punctuation
ValueCountFrequency (%)
, 8
32.0%
' 8
32.0%
. 4
16.0%
· 3
 
12.0%
/ 1
 
4.0%
& 1
 
4.0%
Close Punctuation
ValueCountFrequency (%)
) 20
80.0%
4
 
16.0%
1
 
4.0%
Open Punctuation
ValueCountFrequency (%)
( 20
80.0%
4
 
16.0%
1
 
4.0%
Space Separator
ValueCountFrequency (%)
417
99.3%
  3
 
0.7%
Lowercase Letter
ValueCountFrequency (%)
e 1
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 12847
94.7%
Common 650
 
4.8%
Latin 62
 
0.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
858
 
6.7%
773
 
6.0%
407
 
3.2%
356
 
2.8%
345
 
2.7%
328
 
2.6%
323
 
2.5%
292
 
2.3%
205
 
1.6%
182
 
1.4%
Other values (369) 8778
68.3%
Common
ValueCountFrequency (%)
417
64.2%
0 55
 
8.5%
2 26
 
4.0%
) 20
 
3.1%
9 20
 
3.1%
( 20
 
3.1%
1 17
 
2.6%
4 9
 
1.4%
, 8
 
1.2%
5 8
 
1.2%
Other values (15) 50
 
7.7%
Latin
ValueCountFrequency (%)
I 7
11.3%
D 7
11.3%
T 7
11.3%
B 5
 
8.1%
C 5
 
8.1%
R 4
 
6.5%
P 4
 
6.5%
F 4
 
6.5%
A 3
 
4.8%
G 3
 
4.8%
Other values (8) 13
21.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 12847
94.7%
ASCII 696
 
5.1%
None 16
 
0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
858
 
6.7%
773
 
6.0%
407
 
3.2%
356
 
2.8%
345
 
2.7%
328
 
2.6%
323
 
2.5%
292
 
2.3%
205
 
1.6%
182
 
1.4%
Other values (369) 8778
68.3%
ASCII
ValueCountFrequency (%)
417
59.9%
0 55
 
7.9%
2 26
 
3.7%
) 20
 
2.9%
9 20
 
2.9%
( 20
 
2.9%
1 17
 
2.4%
4 9
 
1.3%
, 8
 
1.1%
5 8
 
1.1%
Other values (27) 96
 
13.8%
None
ValueCountFrequency (%)
4
25.0%
4
25.0%
· 3
18.8%
  3
18.8%
1
 
6.2%
1
 
6.2%

영문통계조사명
Text

MISSING 

Distinct807
Distinct (%)65.6%
Missing31
Missing (%)2.5%
Memory size10.0 KiB
2023-12-11T07:31:48.296593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length80
Median length71
Mean length25.760357
Min length1

Characters and Unicode

Total characters31711
Distinct characters72
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique789 ?
Unique (%)64.1%

Sample

1st rowSurvey of the Competitiveness of Major Imported Goods
2nd row
3rd row
4th row
5th rowExport Industry Survey
ValueCountFrequency (%)
of 354
 
8.1%
survey 286
 
6.6%
statistics 185
 
4.3%
on 163
 
3.7%
the 143
 
3.3%
and 123
 
2.8%
in 87
 
2.0%
status 53
 
1.2%
for 46
 
1.1%
46
 
1.1%
Other values (1079) 2866
65.9%
2023-12-11T07:31:48.735042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3897
12.3%
e 2561
 
8.1%
t 2475
 
7.8%
i 2195
 
6.9%
n 2128
 
6.7%
o 2040
 
6.4%
a 1937
 
6.1%
s 1908
 
6.0%
r 1676
 
5.3%
u 1175
 
3.7%
Other values (62) 9719
30.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 24484
77.2%
Space Separator 3897
 
12.3%
Uppercase Letter 3114
 
9.8%
Other Punctuation 106
 
0.3%
Dash Punctuation 63
 
0.2%
Decimal Number 42
 
0.1%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%
Modifier Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2561
10.5%
t 2475
10.1%
i 2195
9.0%
n 2128
 
8.7%
o 2040
 
8.3%
a 1937
 
7.9%
s 1908
 
7.8%
r 1676
 
6.8%
u 1175
 
4.8%
c 1035
 
4.2%
Other values (16) 5354
21.9%
Uppercase Letter
ValueCountFrequency (%)
S 719
23.1%
C 283
 
9.1%
P 228
 
7.3%
R 204
 
6.6%
I 186
 
6.0%
E 185
 
5.9%
T 161
 
5.2%
M 155
 
5.0%
F 142
 
4.6%
A 139
 
4.5%
Other values (15) 712
22.9%
Decimal Number
ValueCountFrequency (%)
0 12
28.6%
1 11
26.2%
2 6
14.3%
9 4
 
9.5%
4 2
 
4.8%
5 2
 
4.8%
3 2
 
4.8%
8 2
 
4.8%
6 1
 
2.4%
Other Punctuation
ValueCountFrequency (%)
& 47
44.3%
' 25
23.6%
. 17
 
16.0%
, 9
 
8.5%
4
 
3.8%
/ 3
 
2.8%
: 1
 
0.9%
Space Separator
ValueCountFrequency (%)
3897
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 63
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 27598
87.0%
Common 4113
 
13.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 2561
 
9.3%
t 2475
 
9.0%
i 2195
 
8.0%
n 2128
 
7.7%
o 2040
 
7.4%
a 1937
 
7.0%
s 1908
 
6.9%
r 1676
 
6.1%
u 1175
 
4.3%
c 1035
 
3.8%
Other values (41) 8468
30.7%
Common
ValueCountFrequency (%)
3897
94.7%
- 63
 
1.5%
& 47
 
1.1%
' 25
 
0.6%
. 17
 
0.4%
0 12
 
0.3%
1 11
 
0.3%
, 9
 
0.2%
2 6
 
0.1%
4
 
0.1%
Other values (11) 22
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 31707
> 99.9%
None 4
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3897
12.3%
e 2561
 
8.1%
t 2475
 
7.8%
i 2195
 
6.9%
n 2128
 
6.7%
o 2040
 
6.4%
a 1937
 
6.1%
s 1908
 
6.0%
r 1676
 
5.3%
u 1175
 
3.7%
Other values (61) 9715
30.6%
None
ValueCountFrequency (%)
4
100.0%

Interactions

2023-12-11T07:31:46.025295image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Missing values

2023-12-11T07:31:46.132951image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T07:31:46.212689image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T07:31:46.284543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

통계조사ID최초실시년도통계조사명영문통계조사명
019910161991주요수입상품의경쟁력실태조사Survey of the Competitiveness of Major Imported Goods
119920011992임대공단및장기분할상환공단에대한수요조사
219920021992수입의파급효과와기업의대응방안조사
319920031992고밀주택단지내시설배치에대한의식조사
419920041985수출산업실태조사Export Industry Survey
519920051992산업내근로행태의변화와근로의질제고방안조사
619920061992주택청약관련저축자의의식조사
719920071992폐기물재자원화실태조사
819920081992주공이미지조사
919920091992세무행정에관한의견조사
통계조사ID최초실시년도통계조사명영문통계조사명
12522020049<NA>경기도경기종합지수<NA>
12531993010<NA>경기도기본통계<NA>
12542017058<NA>경기도청년통계<NA>
12552020040<NA>경기도특별사법경찰범죄통계<NA>
12562022001<NA>경기도주요관광지방문객실태조사<NA>
12572020032<NA>경기도아동가구주거실태조사Survey on Residential Conditions of Households with children in Gyeonggi-do
1258B21020180713171220<NA>경기도장래인구추계<NA>
1259B21020200327123408<NA>과학기술정보통신부 통계자료(유선통신서비스 가입자 현황)<NA>
1260B21020210209150152<NA>경기도영유아통계<NA>
1261B21020210218161736<NA>경기동행종합지수<NA>