Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells76
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

Categorical3
Text3
Numeric1

Dataset

Description검사년도,검체번호,고유번호,구명,조사결과값,조사항목명,허가신고번호
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-22145/S/1/datasetView.do

Alerts

고유번호 is highly overall correlated with 구명High correlation
구명 is highly overall correlated with 고유번호High correlation

Reproduction

Analysis started2024-05-11 04:45:21.919953
Analysis finished2024-05-11 04:45:23.907552
Duration1.99 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

검사년도
Categorical

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2008
3284 
2011
2673 
2009
1936 
2010
1453 
2012
654 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2010
2nd row2011
3rd row2008
4th row2009
5th row2008

Common Values

ValueCountFrequency (%)
2008 3284
32.8%
2011 2673
26.7%
2009 1936
19.4%
2010 1453
14.5%
2012 654
 
6.5%

Length

2024-05-11T04:45:24.108810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T04:45:24.483014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2008 3284
32.8%
2011 2673
26.7%
2009 1936
19.4%
2010 1453
14.5%
2012 654
 
6.5%
Distinct2535
Distinct (%)25.4%
Missing4
Missing (%)< 0.1%
Memory size156.2 KiB
2024-05-11T04:45:24.859207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length14
Mean length14.011004
Min length14

Characters and Unicode

Total characters140054
Distinct characters17
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique180 ?
Unique (%)1.8%

Sample

1st row2W10-06670-005
2nd row2W11-07033-005
3rd row2W08-21171-002
4th row2W09-10650-003
5th row2W08-18635-001
ValueCountFrequency (%)
2w11-06223-002 11
 
0.1%
2w11-04999-001 11
 
0.1%
2w08-20634-010 10
 
0.1%
2w08-19706-003 10
 
0.1%
2w08-15145-003 10
 
0.1%
2w08-10144-004 10
 
0.1%
2w11-14611-003 10
 
0.1%
2w11-05727-001 10
 
0.1%
2w08-15612-001 10
 
0.1%
2w11-16257-017 10
 
0.1%
Other values (2525) 9894
99.0%
2024-05-11T04:45:25.514608image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 34789
24.8%
- 19992
14.3%
1 18487
13.2%
2 17666
12.6%
W 9996
 
7.1%
8 6891
 
4.9%
9 5863
 
4.2%
4 5768
 
4.1%
3 5709
 
4.1%
5 5460
 
3.9%
Other values (7) 9433
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 109956
78.5%
Dash Punctuation 19992
 
14.3%
Uppercase Letter 10062
 
7.2%
Open Punctuation 22
 
< 0.1%
Close Punctuation 22
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 34789
31.6%
1 18487
16.8%
2 17666
16.1%
8 6891
 
6.3%
9 5863
 
5.3%
4 5768
 
5.2%
3 5709
 
5.2%
5 5460
 
5.0%
6 5072
 
4.6%
7 4251
 
3.9%
Uppercase Letter
ValueCountFrequency (%)
W 9996
99.3%
C 22
 
0.2%
L 22
 
0.2%
S 22
 
0.2%
Dash Punctuation
ValueCountFrequency (%)
- 19992
100.0%
Open Punctuation
ValueCountFrequency (%)
( 22
100.0%
Close Punctuation
ValueCountFrequency (%)
) 22
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 129992
92.8%
Latin 10062
 
7.2%

Most frequent character per script

Common
ValueCountFrequency (%)
0 34789
26.8%
- 19992
15.4%
1 18487
14.2%
2 17666
13.6%
8 6891
 
5.3%
9 5863
 
4.5%
4 5768
 
4.4%
3 5709
 
4.4%
5 5460
 
4.2%
6 5072
 
3.9%
Other values (3) 4295
 
3.3%
Latin
ValueCountFrequency (%)
W 9996
99.3%
C 22
 
0.2%
L 22
 
0.2%
S 22
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 140054
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 34789
24.8%
- 19992
14.3%
1 18487
13.2%
2 17666
12.6%
W 9996
 
7.1%
8 6891
 
4.9%
9 5863
 
4.2%
4 5768
 
4.1%
3 5709
 
4.1%
5 5460
 
3.9%
Other values (7) 9433
 
6.7%

고유번호
Real number (ℝ)

HIGH CORRELATION 

Distinct2543
Distinct (%)25.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1302.4044
Minimum1
Maximum2591
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T04:45:25.827410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile131
Q1655.75
median1308.5
Q31942
95-th percentile2465
Maximum2591
Range2590
Interquartile range (IQR)1286.25

Descriptive statistics

Standard deviation748.53109
Coefficient of variation (CV)0.57473016
Kurtosis-1.1896201
Mean1302.4044
Median Absolute Deviation (MAD)641.5
Skewness-0.012355124
Sum13024044
Variance560298.79
MonotonicityNot monotonic
2024-05-11T04:45:26.108690image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1058 11
 
0.1%
783 11
 
0.1%
1569 10
 
0.1%
832 10
 
0.1%
11 10
 
0.1%
1339 10
 
0.1%
763 10
 
0.1%
918 10
 
0.1%
786 10
 
0.1%
2335 9
 
0.1%
Other values (2533) 9899
99.0%
ValueCountFrequency (%)
1 2
 
< 0.1%
2 2
 
< 0.1%
3 5
0.1%
4 6
0.1%
5 4
< 0.1%
6 3
 
< 0.1%
7 5
0.1%
8 9
0.1%
9 6
0.1%
10 3
 
< 0.1%
ValueCountFrequency (%)
2591 2
 
< 0.1%
2590 4
< 0.1%
2589 4
< 0.1%
2588 3
 
< 0.1%
2587 3
 
< 0.1%
2586 9
0.1%
2585 3
 
< 0.1%
2584 2
 
< 0.1%
2583 3
 
< 0.1%
2582 4
< 0.1%

구명
Categorical

HIGH CORRELATION 

Distinct25
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
서초구
923 
강동구
822 
관악구
674 
노원구
 
624
동작구
 
601
Other values (20)
6356 

Length

Max length4
Median length3
Mean length3.0755
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강동구
2nd row용산구
3rd row마포구
4th row은평구
5th row서초구

Common Values

ValueCountFrequency (%)
서초구 923
 
9.2%
강동구 822
 
8.2%
관악구 674
 
6.7%
노원구 624
 
6.2%
동작구 601
 
6.0%
송파구 551
 
5.5%
은평구 539
 
5.4%
강남구 473
 
4.7%
구로구 443
 
4.4%
금천구 437
 
4.4%
Other values (15) 3913
39.1%

Length

2024-05-11T04:45:26.531577image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서초구 923
 
9.2%
강동구 822
 
8.2%
관악구 674
 
6.7%
노원구 624
 
6.2%
동작구 601
 
6.0%
송파구 551
 
5.5%
은평구 539
 
5.4%
강남구 473
 
4.7%
구로구 443
 
4.4%
금천구 437
 
4.4%
Other values (15) 3913
39.1%
Distinct404
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T04:45:27.359194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length1
Mean length1.9949
Min length1

Characters and Unicode

Total characters19949
Distinct characters23
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique229 ?
Unique (%)2.3%

Sample

1st row불검출
2nd row불검출
3rd row-
4th row-
5th row-
ValueCountFrequency (%)
4885
48.9%
불검출 3799
38.0%
0 174
 
1.7%
1.8미만 47
 
0.5%
없음 41
 
0.4%
무취 38
 
0.4%
무미 33
 
0.3%
6.7 28
 
0.3%
2미만 21
 
0.2%
7 21
 
0.2%
Other values (394) 913
 
9.1%
2024-05-11T04:45:28.539412image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 4885
24.5%
3799
19.0%
3799
19.0%
3799
19.0%
. 650
 
3.3%
0 636
 
3.2%
1 403
 
2.0%
7 257
 
1.3%
6 254
 
1.3%
2 216
 
1.1%
Other values (13) 1251
 
6.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 11773
59.0%
Dash Punctuation 4885
24.5%
Decimal Number 2641
 
13.2%
Other Punctuation 650
 
3.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3799
32.3%
3799
32.3%
3799
32.3%
101
 
0.9%
71
 
0.6%
68
 
0.6%
41
 
0.3%
41
 
0.3%
38
 
0.3%
8
 
0.1%
Decimal Number
ValueCountFrequency (%)
0 636
24.1%
1 403
15.3%
7 257
9.7%
6 254
 
9.6%
2 216
 
8.2%
3 197
 
7.5%
4 191
 
7.2%
8 183
 
6.9%
5 182
 
6.9%
9 122
 
4.6%
Dash Punctuation
ValueCountFrequency (%)
- 4885
100.0%
Other Punctuation
ValueCountFrequency (%)
. 650
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 11773
59.0%
Common 8176
41.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 4885
59.7%
. 650
 
8.0%
0 636
 
7.8%
1 403
 
4.9%
7 257
 
3.1%
6 254
 
3.1%
2 216
 
2.6%
3 197
 
2.4%
4 191
 
2.3%
8 183
 
2.2%
Other values (2) 304
 
3.7%
Hangul
ValueCountFrequency (%)
3799
32.3%
3799
32.3%
3799
32.3%
101
 
0.9%
71
 
0.6%
68
 
0.6%
41
 
0.3%
41
 
0.3%
38
 
0.3%
8
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 11773
59.0%
ASCII 8176
41.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 4885
59.7%
. 650
 
8.0%
0 636
 
7.8%
1 403
 
4.9%
7 257
 
3.1%
6 254
 
3.1%
2 216
 
2.6%
3 197
 
2.4%
4 191
 
2.3%
8 183
 
2.2%
Other values (2) 304
 
3.7%
Hangul
ValueCountFrequency (%)
3799
32.3%
3799
32.3%
3799
32.3%
101
 
0.9%
71
 
0.6%
68
 
0.6%
41
 
0.3%
41
 
0.3%
38
 
0.3%
8
 
0.1%

조사항목명
Categorical

Distinct43
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1,2-디브로모-3-클로로프로판
 
300
대장균군수
 
297
카드뮴
 
283
1,1-디클로로에틸렌
 
279
분원성대장균군
 
277
Other values (38)
8564 

Length

Max length17
Median length11
Mean length4.6873
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1,2-디브로모-3-클로로프로판
2nd row비소
3rd row크실렌
4th row시안
5th row디클로로메탄

Common Values

ValueCountFrequency (%)
1,2-디브로모-3-클로로프로판 300
 
3.0%
대장균군수 297
 
3.0%
카드뮴 283
 
2.8%
1,1-디클로로에틸렌 279
 
2.8%
분원성대장균군 277
 
2.8%
벤젠 277
 
2.8%
페놀 276
 
2.8%
수은 276
 
2.8%
비소 275
 
2.8%
과망간산칼륨소비량 272
 
2.7%
Other values (33) 7188
71.9%

Length

2024-05-11T04:45:29.066797image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1,2-디브로모-3-클로로프로판 300
 
3.0%
대장균군수 297
 
3.0%
카드뮴 283
 
2.8%
1,1-디클로로에틸렌 279
 
2.8%
분원성대장균군 277
 
2.8%
벤젠 277
 
2.8%
페놀 276
 
2.8%
수은 276
 
2.8%
비소 275
 
2.8%
시안 272
 
2.7%
Other values (33) 7188
71.9%
Distinct990
Distinct (%)10.0%
Missing72
Missing (%)0.7%
Memory size156.2 KiB
2024-05-11T04:45:29.788196image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length10
Mean length9.6175463
Min length1

Characters and Unicode

Total characters95483
Distinct characters35
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)0.2%

Sample

1st row1190100007
2nd row2200600022
3rd row142120009
4th row120520003
5th row1190100004
ValueCountFrequency (%)
폐공 111
 
1.1%
10320098 68
 
0.7%
2200600013 60
 
0.6%
2200600014 58
 
0.6%
2200600001 54
 
0.5%
1199700003 46
 
0.5%
2197700003 45
 
0.5%
2200100005 43
 
0.4%
2200800043 42
 
0.4%
2200900004 40
 
0.4%
Other values (980) 9361
94.3%
2024-05-11T04:45:30.916371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 37179
38.9%
1 17855
18.7%
2 14451
 
15.1%
9 7371
 
7.7%
3 3760
 
3.9%
4 2937
 
3.1%
8 2921
 
3.1%
6 2688
 
2.8%
7 2516
 
2.6%
5 2468
 
2.6%
Other values (25) 1337
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 94146
98.6%
Dash Punctuation 1009
 
1.1%
Other Letter 305
 
0.3%
Lowercase Letter 23
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
119
39.0%
119
39.0%
5
 
1.6%
5
 
1.6%
5
 
1.6%
5
 
1.6%
5
 
1.6%
4
 
1.3%
4
 
1.3%
4
 
1.3%
Other values (12) 30
 
9.8%
Decimal Number
ValueCountFrequency (%)
0 37179
39.5%
1 17855
19.0%
2 14451
 
15.3%
9 7371
 
7.8%
3 3760
 
4.0%
4 2937
 
3.1%
8 2921
 
3.1%
6 2688
 
2.9%
7 2516
 
2.7%
5 2468
 
2.6%
Lowercase Letter
ValueCountFrequency (%)
a 12
52.2%
b 11
47.8%
Dash Punctuation
ValueCountFrequency (%)
- 1009
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 95155
99.7%
Hangul 305
 
0.3%
Latin 23
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
119
39.0%
119
39.0%
5
 
1.6%
5
 
1.6%
5
 
1.6%
5
 
1.6%
5
 
1.6%
4
 
1.3%
4
 
1.3%
4
 
1.3%
Other values (12) 30
 
9.8%
Common
ValueCountFrequency (%)
0 37179
39.1%
1 17855
18.8%
2 14451
 
15.2%
9 7371
 
7.7%
3 3760
 
4.0%
4 2937
 
3.1%
8 2921
 
3.1%
6 2688
 
2.8%
7 2516
 
2.6%
5 2468
 
2.6%
Latin
ValueCountFrequency (%)
a 12
52.2%
b 11
47.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 95178
99.7%
Hangul 305
 
0.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 37179
39.1%
1 17855
18.8%
2 14451
 
15.2%
9 7371
 
7.7%
3 3760
 
4.0%
4 2937
 
3.1%
8 2921
 
3.1%
6 2688
 
2.8%
7 2516
 
2.6%
5 2468
 
2.6%
Other values (3) 1032
 
1.1%
Hangul
ValueCountFrequency (%)
119
39.0%
119
39.0%
5
 
1.6%
5
 
1.6%
5
 
1.6%
5
 
1.6%
5
 
1.6%
4
 
1.3%
4
 
1.3%
4
 
1.3%
Other values (12) 30
 
9.8%

Interactions

2024-05-11T04:45:22.805114image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T04:45:31.269559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
검사년도고유번호구명조사항목명
검사년도1.0000.4830.6090.053
고유번호0.4831.0000.9890.261
구명0.6090.9891.0000.228
조사항목명0.0530.2610.2281.000
2024-05-11T04:45:31.550085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
조사항목명검사년도구명
조사항목명1.0000.0240.054
검사년도0.0241.0000.311
구명0.0540.3111.000
2024-05-11T04:45:31.881251image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
고유번호검사년도구명조사항목명
고유번호1.0000.2210.8980.093
검사년도0.2211.0000.3110.024
구명0.8980.3111.0000.054
조사항목명0.0930.0240.0541.000

Missing values

2024-05-11T04:45:23.153164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T04:45:23.542269image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-05-11T04:45:23.794212image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

검사년도검체번호고유번호구명조사결과값조사항목명허가신고번호
1544220102W10-06670-005286강동구불검출1,2-디브로모-3-클로로프로판1190100007
3778920112W11-07033-0052259용산구불검출비소2200600022
2933120082W08-21171-0021460마포구-크실렌142120009
4812720092W09-10650-0032330은평구-시안120520003
7199320082W08-18635-0011602서초구-디클로로메탄1190100004
8734520112W11-09029-0022207영등포구-분원성대장균군2200600001
1848120102W10-02562-001266강동구0대장균군수2200100097
187420092W09-03445-006188강동구불검출테트라클로로에틸렌2200600037
6740720112W11-04093-0011857성동구-망간2200600009
8186720102W10-03413-0051131도봉구불검출세제2199400018
검사년도검체번호고유번호구명조사결과값조사항목명허가신고번호
7177820092W09-04089-0091886성북구-2199700033
8381420092W09-04705-0031659서초구-사염화탄소1190100238
7914920112W11-14687-004924금천구-냄새18-07-10729
5592720092W09-03506-007217강동구-과망간산칼륨소비량1190100053
6969820082W08-12662-008414강서구불검출1199500025
3574920082W08-15198-00114강남구8.1수소이온농도23140143
7775520122W12-04372-0092591중랑구-냄새2200800011
1625620112W11-05430-0062363은평구불검출페놀121520011
2464720102W10-04459-0062354은평구불검출6가크롬121720010
7593920102W10-12328-0062454종로구-냄새10322251