Overview

Dataset statistics

Number of variables3
Number of observations8835
Missing cells1046
Missing cells (%)3.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory215.8 KiB
Average record size in memory25.0 B

Variable types

Text2
Numeric1

Dataset

Description경기도 경기통계시스템 통계조사항목관계
Author경기도
URLhttps://data.gg.go.kr/portal/data/service/selectServicePage.do?&infId=3LHWU1U2WHGSPGIY2J8T33564527&infSeq=1

Alerts

조사내용설명 has 1046 (11.8%) missing valuesMissing

Reproduction

Analysis started2023-12-10 21:23:51.997270
Analysis finished2023-12-10 21:23:52.628292
Duration0.63 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct1257
Distinct (%)14.2%
Missing0
Missing (%)0.0%
Memory size69.2 KiB
2023-12-11T06:23:52.735274image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length7
Mean length7.2016978
Min length7

Characters and Unicode

Total characters63627
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2006083
2nd row2006083
3rd row2006084
4th row2006084
5th row2006084
ValueCountFrequency (%)
b21020200327123408 9
 
0.1%
b21020130717192614 9
 
0.1%
b21020130719075550 9
 
0.1%
b21020130719080301 9
 
0.1%
b21020130719080847 9
 
0.1%
b21020130719081734 9
 
0.1%
b21020130719194939 9
 
0.1%
b21020130719195210 9
 
0.1%
b21020130720112241 9
 
0.1%
b21020130719082735 9
 
0.1%
Other values (1247) 8745
99.0%
2023-12-11T06:23:53.036500image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 19739
31.0%
1 9859
15.5%
9 9058
14.2%
2 7007
 
11.0%
6 3981
 
6.3%
7 3050
 
4.8%
3 2855
 
4.5%
5 2558
 
4.0%
8 2495
 
3.9%
4 2408
 
3.8%
Other values (2) 617
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 63010
99.0%
Uppercase Letter 617
 
1.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 19739
31.3%
1 9859
15.6%
9 9058
14.4%
2 7007
 
11.1%
6 3981
 
6.3%
7 3050
 
4.8%
3 2855
 
4.5%
5 2558
 
4.1%
8 2495
 
4.0%
4 2408
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
B 596
96.6%
A 21
 
3.4%

Most occurring scripts

ValueCountFrequency (%)
Common 63010
99.0%
Latin 617
 
1.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 19739
31.3%
1 9859
15.6%
9 9058
14.4%
2 7007
 
11.1%
6 3981
 
6.3%
7 3050
 
4.8%
3 2855
 
4.5%
5 2558
 
4.1%
8 2495
 
4.0%
4 2408
 
3.8%
Latin
ValueCountFrequency (%)
B 596
96.6%
A 21
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 63627
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 19739
31.0%
1 9859
15.5%
9 9058
14.2%
2 7007
 
11.0%
6 3981
 
6.3%
7 3050
 
4.8%
3 2855
 
4.5%
5 2558
 
4.0%
8 2495
 
3.9%
4 2408
 
3.8%
Other values (2) 617
 
1.0%

조사항목ID
Real number (ℝ)

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1212013
Minimum1212010
Maximum1212018
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size77.8 KiB
2023-12-11T06:23:53.144469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1212010
5-th percentile1212010
Q11212011
median1212013
Q31212015
95-th percentile1212016
Maximum1212018
Range8
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.0167689
Coefficient of variation (CV)1.6639829 × 10-6
Kurtosis-1.2066899
Mean1212013
Median Absolute Deviation (MAD)2
Skewness0.019675221
Sum1.0708135 × 1010
Variance4.0673568
MonotonicityNot monotonic
2023-12-11T06:23:53.273664image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
1212015 1257
14.2%
1212016 1257
14.2%
1212010 1257
14.2%
1212011 1257
14.2%
1212012 1257
14.2%
1212013 1257
14.2%
1212014 1257
14.2%
1212018 18
 
0.2%
1212017 18
 
0.2%
ValueCountFrequency (%)
1212010 1257
14.2%
1212011 1257
14.2%
1212012 1257
14.2%
1212013 1257
14.2%
1212014 1257
14.2%
1212015 1257
14.2%
1212016 1257
14.2%
1212017 18
 
0.2%
1212018 18
 
0.2%
ValueCountFrequency (%)
1212018 18
 
0.2%
1212017 18
 
0.2%
1212016 1257
14.2%
1212015 1257
14.2%
1212014 1257
14.2%
1212013 1257
14.2%
1212012 1257
14.2%
1212011 1257
14.2%
1212010 1257
14.2%

조사내용설명
Text

MISSING 

Distinct3139
Distinct (%)40.3%
Missing1046
Missing (%)11.8%
Memory size69.2 KiB
2023-12-11T06:23:53.473743image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length1024
Median length588
Mean length27.131853
Min length1

Characters and Unicode

Total characters211330
Distinct characters781
Distinct categories17 ?
Distinct scripts4 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2935 ?
Unique (%)37.7%

Sample

1st row전국 토양에 대한 오염추세를 파악하고 오염 우려지역에 대한 오염실태를 조사하여 토양오염을 예방하고 오염토양을 정화하는 등 토양보전대책을 수립·추진하기 위함
2nd row○ 토양측정망(지목별 토양오염도) - 중금속(8) : 카드뮴(Cd), 구리(Cu), 비소(Ag), 수은(Hg), 납(Pb), 6가크롬(Cr6+), 아연(Zn), 니켈(Ni) - 일반항목(8) : PCB, CN, 유기인, 페놀, 유류(BTEX, TPH), 불소, TCE, PCE - 토양산도(pH) ○ 토양오염실태(오염우려지역별 토양오염도) - 토양오염의 가능성이 높은 토양오염물질 및 토양pH
3rd row전국
4th row사업체
5th row기타-기타(현황자료)
ValueCountFrequency (%)
1769
 
4.3%
1577
 
3.9%
전국 930
 
2.3%
796
 
1.9%
사업체 481
 
1.2%
1년 430
 
1.1%
기타 363
 
0.9%
위한 347
 
0.8%
관한 344
 
0.8%
활용 325
 
0.8%
Other values (15568) 33547
82.0%
2023-12-11T06:23:53.811829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
36986
 
17.5%
, 6818
 
3.2%
4927
 
2.3%
4095
 
1.9%
3217
 
1.5%
3110
 
1.5%
2877
 
1.4%
2810
 
1.3%
2462
 
1.2%
2420
 
1.1%
Other values (771) 141608
67.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 150649
71.3%
Space Separator 36986
 
17.5%
Other Punctuation 9666
 
4.6%
Decimal Number 3715
 
1.8%
Dash Punctuation 3044
 
1.4%
Math Symbol 2157
 
1.0%
Close Punctuation 1955
 
0.9%
Open Punctuation 1955
 
0.9%
Uppercase Letter 618
 
0.3%
Lowercase Letter 440
 
0.2%
Other values (7) 145
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4927
 
3.3%
4095
 
2.7%
3217
 
2.1%
3110
 
2.1%
2877
 
1.9%
2810
 
1.9%
2462
 
1.6%
2420
 
1.6%
2152
 
1.4%
2084
 
1.4%
Other values (643) 120495
80.0%
Lowercase Letter
ValueCountFrequency (%)
o 80
18.2%
e 51
11.6%
a 34
 
7.7%
n 31
 
7.0%
i 30
 
6.8%
m 27
 
6.1%
r 26
 
5.9%
l 24
 
5.5%
t 20
 
4.5%
s 18
 
4.1%
Other values (15) 99
22.5%
Uppercase Letter
ValueCountFrequency (%)
C 71
11.5%
I 58
 
9.4%
P 56
 
9.1%
D 56
 
9.1%
B 43
 
7.0%
S 41
 
6.6%
O 36
 
5.8%
A 34
 
5.5%
T 34
 
5.5%
E 29
 
4.7%
Other values (15) 160
25.9%
Other Punctuation
ValueCountFrequency (%)
, 6818
70.5%
· 920
 
9.5%
. 804
 
8.3%
: 646
 
6.7%
146
 
1.5%
/ 104
 
1.1%
44
 
0.5%
& 43
 
0.4%
; 40
 
0.4%
# 35
 
0.4%
Other values (8) 66
 
0.7%
Decimal Number
ValueCountFrequency (%)
1 892
24.0%
2 457
12.3%
437
11.8%
0 404
10.9%
3 269
 
7.2%
4 266
 
7.2%
5 245
 
6.6%
6 199
 
5.4%
8 152
 
4.1%
7 149
 
4.0%
Other values (5) 245
 
6.6%
Math Symbol
ValueCountFrequency (%)
1840
85.3%
> 202
 
9.4%
< 64
 
3.0%
24
 
1.1%
~ 12
 
0.6%
= 5
 
0.2%
4
 
0.2%
+ 4
 
0.2%
1
 
< 0.1%
× 1
 
< 0.1%
Other Number
ValueCountFrequency (%)
6
25.0%
6
25.0%
4
16.7%
3
12.5%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
Other Symbol
ValueCountFrequency (%)
90
84.1%
7
 
6.5%
4
 
3.7%
3
 
2.8%
2
 
1.9%
1
 
0.9%
Close Punctuation
ValueCountFrequency (%)
) 1904
97.4%
42
 
2.1%
4
 
0.2%
] 4
 
0.2%
1
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 1901
97.2%
45
 
2.3%
4
 
0.2%
[ 4
 
0.2%
1
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
- 1901
62.5%
1143
37.5%
Modifier Symbol
ValueCountFrequency (%)
` 4
66.7%
2
33.3%
Letter Number
ValueCountFrequency (%)
2
66.7%
1
33.3%
Space Separator
ValueCountFrequency (%)
36986
100.0%
Initial Punctuation
ValueCountFrequency (%)
3
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 150650
71.3%
Common 59618
 
28.2%
Latin 1061
 
0.5%
Han 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4927
 
3.3%
4095
 
2.7%
3217
 
2.1%
3110
 
2.1%
2877
 
1.9%
2810
 
1.9%
2462
 
1.6%
2420
 
1.6%
2152
 
1.4%
2084
 
1.4%
Other values (643) 120496
80.0%
Common
ValueCountFrequency (%)
36986
62.0%
, 6818
 
11.4%
) 1904
 
3.2%
- 1901
 
3.2%
( 1901
 
3.2%
1840
 
3.1%
1143
 
1.9%
· 920
 
1.5%
1 892
 
1.5%
. 804
 
1.3%
Other values (65) 4509
 
7.6%
Latin
ValueCountFrequency (%)
o 80
 
7.5%
C 71
 
6.7%
I 58
 
5.5%
P 56
 
5.3%
D 56
 
5.3%
e 51
 
4.8%
B 43
 
4.1%
S 41
 
3.9%
O 36
 
3.4%
A 34
 
3.2%
Other values (42) 535
50.4%
Han
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 150469
71.2%
ASCII 55758
 
26.4%
None 2920
 
1.4%
Arrows 1864
 
0.9%
Compat Jamo 179
 
0.1%
Geometric Shapes 104
 
< 0.1%
Enclosed Alphanum 24
 
< 0.1%
Punctuation 7
 
< 0.1%
Number Forms 3
 
< 0.1%
CJK Compat 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
36986
66.3%
, 6818
 
12.2%
) 1904
 
3.4%
- 1901
 
3.4%
( 1901
 
3.4%
1 892
 
1.6%
. 804
 
1.4%
: 646
 
1.2%
2 457
 
0.8%
0 404
 
0.7%
Other values (74) 3045
 
5.5%
Hangul
ValueCountFrequency (%)
4927
 
3.3%
4095
 
2.7%
3217
 
2.1%
3110
 
2.1%
2877
 
1.9%
2810
 
1.9%
2462
 
1.6%
2420
 
1.6%
2152
 
1.4%
2084
 
1.4%
Other values (639) 120315
80.0%
Arrows
ValueCountFrequency (%)
1840
98.7%
24
 
1.3%
None
ValueCountFrequency (%)
1143
39.1%
· 920
31.5%
437
 
15.0%
146
 
5.0%
51
 
1.7%
45
 
1.5%
44
 
1.5%
42
 
1.4%
27
 
0.9%
27
 
0.9%
Other values (13) 38
 
1.3%
Compat Jamo
ValueCountFrequency (%)
172
96.1%
6
 
3.4%
1
 
0.6%
Geometric Shapes
ValueCountFrequency (%)
90
86.5%
7
 
6.7%
4
 
3.8%
3
 
2.9%
Enclosed Alphanum
ValueCountFrequency (%)
6
25.0%
6
25.0%
4
16.7%
3
12.5%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
Punctuation
ValueCountFrequency (%)
3
42.9%
3
42.9%
1
 
14.3%
Number Forms
ValueCountFrequency (%)
2
66.7%
1
33.3%
CJK Compat
ValueCountFrequency (%)
1
100.0%
CJK
ValueCountFrequency (%)
1
100.0%

Interactions

2023-12-11T06:23:52.418222image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Missing values

2023-12-11T06:23:52.530633image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T06:23:52.598517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

통계조사ID조사항목ID조사내용설명
020060831212015전국 토양에 대한 오염추세를 파악하고 오염 우려지역에 대한 오염실태를 조사하여 토양오염을 예방하고 오염토양을 정화하는 등 토양보전대책을 수립·추진하기 위함
120060831212016○ 토양측정망(지목별 토양오염도) - 중금속(8) : 카드뮴(Cd), 구리(Cu), 비소(Ag), 수은(Hg), 납(Pb), 6가크롬(Cr6+), 아연(Zn), 니켈(Ni) - 일반항목(8) : PCB, CN, 유기인, 페놀, 유류(BTEX, TPH), 불소, TCE, PCE - 토양산도(pH) ○ 토양오염실태(오염우려지역별 토양오염도) - 토양오염의 가능성이 높은 토양오염물질 및 토양pH
220060841212010전국
320060841212011사업체
420060841212012기타-기타(현황자료)
5200608412120131년
620060841212014지방청(국립환경과학원) → 환경부(토양지하수과)
720060841212015설치 년도가 오래되어 토양 오염이 우려되는 주유소 등 유류 저장시설을 대상으로 관리실태 및 토양오염도 검사를 실시하여 오염 토양 복원조치 등 토양오염 방지 대책을 추진하고자 함
820060841212016- 주유소 관리실태 조사 · 토양오염도 검사(정기, 수시) 실시여부 · 토양오염방지시설 적정설치 여부 및 행정처분사항 이행여부 등 - 토양오염도 검사 · 시설부지에 대한 토양오염도조사(시료채취 및 토양오염도 분석(BTEX, TPH)
920060851212010전국
통계조사ID조사항목ID조사내용설명
8825B210201807131712201212015<NA>
8826B210202003271234081212010<NA>
8827B210202003271234081212011<NA>
8828B210202003271234081212012<NA>
8829B210202003271234081212013<NA>
8830B210202003271234081212014<NA>
8831B210202003271234081212015<NA>
8832B210202003271234081212016<NA>
8833B210202003271234081212017<NA>
8834B210202003271234081212018<NA>