Overview

Dataset statistics

Number of variables5
Number of observations918
Missing cells322
Missing cells (%)7.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory36.9 KiB
Average record size in memory41.1 B

Variable types

Numeric1
Text3
Categorical1

Dataset

Description경기도 화성시의 폐기물 처리업체 정보입니다. 상호, 사업장, 소재지, 연락처, 분류로 구성되어있습니다. 종합재활용업체, 사업장 수집 운반업체, 건폐운반업체, 건폐중간처리업체, 소각전문업체, 최종처분업체로 분류됩니다.
Author경기도 화성시
URLhttps://www.data.go.kr/data/15098440/fileData.do

Alerts

연번 is highly overall correlated with 분류High correlation
분류 is highly overall correlated with 연번High correlation
분류 is highly imbalanced (52.5%)Imbalance
연락처 has 317 (34.5%) missing valuesMissing
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 00:25:21.830831
Analysis finished2023-12-12 00:25:22.673561
Duration0.84 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct918
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean459.5
Minimum1
Maximum918
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2023-12-12T09:25:22.760552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile46.85
Q1230.25
median459.5
Q3688.75
95-th percentile872.15
Maximum918
Range917
Interquartile range (IQR)458.5

Descriptive statistics

Standard deviation265.14807
Coefficient of variation (CV)0.57703606
Kurtosis-1.2
Mean459.5
Median Absolute Deviation (MAD)229.5
Skewness0
Sum421821
Variance70303.5
MonotonicityStrictly increasing
2023-12-12T09:25:23.199214image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.1%
605 1
 
0.1%
607 1
 
0.1%
608 1
 
0.1%
609 1
 
0.1%
610 1
 
0.1%
611 1
 
0.1%
612 1
 
0.1%
613 1
 
0.1%
614 1
 
0.1%
Other values (908) 908
98.9%
ValueCountFrequency (%)
1 1
0.1%
2 1
0.1%
3 1
0.1%
4 1
0.1%
5 1
0.1%
6 1
0.1%
7 1
0.1%
8 1
0.1%
9 1
0.1%
10 1
0.1%
ValueCountFrequency (%)
918 1
0.1%
917 1
0.1%
916 1
0.1%
915 1
0.1%
914 1
0.1%
913 1
0.1%
912 1
0.1%
911 1
0.1%
910 1
0.1%
909 1
0.1%

상호
Text

Distinct821
Distinct (%)89.4%
Missing0
Missing (%)0.0%
Memory size7.3 KiB
2023-12-12T09:25:23.515791image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length22
Mean length5.7440087
Min length2

Characters and Unicode

Total characters5273
Distinct characters373
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique741 ?
Unique (%)80.7%

Sample

1st row부광자원㈜
2nd row리라금속
3rd row㈜우원유업
4th rowRAMANAR BRICK KOREA.CO
5th row㈜제삼섹터
ValueCountFrequency (%)
주식회사 12
 
1.3%
삼흥산업개발㈜ 5
 
0.5%
㈜오성개발 5
 
0.5%
화성지점 5
 
0.5%
㈜진흥중공업 4
 
0.4%
미래환경 4
 
0.4%
㈜하나알씨 3
 
0.3%
㈜미래환경 3
 
0.3%
꿈에그린㈜ 3
 
0.3%
현민자원 3
 
0.3%
Other values (826) 907
95.1%
2023-12-12T09:25:24.004770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
456
 
8.6%
164
 
3.1%
159
 
3.0%
152
 
2.9%
148
 
2.8%
137
 
2.6%
129
 
2.4%
114
 
2.2%
111
 
2.1%
107
 
2.0%
Other values (363) 3596
68.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4537
86.0%
Other Symbol 456
 
8.6%
Close Punctuation 67
 
1.3%
Open Punctuation 67
 
1.3%
Uppercase Letter 63
 
1.2%
Space Separator 43
 
0.8%
Lowercase Letter 21
 
0.4%
Decimal Number 10
 
0.2%
Other Punctuation 6
 
0.1%
Dash Punctuation 3
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
164
 
3.6%
159
 
3.5%
152
 
3.4%
148
 
3.3%
137
 
3.0%
129
 
2.8%
114
 
2.5%
111
 
2.4%
107
 
2.4%
94
 
2.1%
Other values (321) 3222
71.0%
Uppercase Letter
ValueCountFrequency (%)
S 6
9.5%
C 6
9.5%
K 6
9.5%
E 6
9.5%
A 5
 
7.9%
R 5
 
7.9%
P 5
 
7.9%
M 4
 
6.3%
H 3
 
4.8%
J 3
 
4.8%
Other values (7) 14
22.2%
Lowercase Letter
ValueCountFrequency (%)
n 3
14.3%
l 3
14.3%
e 2
9.5%
a 2
9.5%
t 2
9.5%
i 2
9.5%
o 1
 
4.8%
r 1
 
4.8%
s 1
 
4.8%
p 1
 
4.8%
Other values (3) 3
14.3%
Decimal Number
ValueCountFrequency (%)
1 4
40.0%
4 3
30.0%
2 2
20.0%
9 1
 
10.0%
Other Punctuation
ValueCountFrequency (%)
& 3
50.0%
. 2
33.3%
, 1
 
16.7%
Other Symbol
ValueCountFrequency (%)
456
100.0%
Close Punctuation
ValueCountFrequency (%)
) 67
100.0%
Open Punctuation
ValueCountFrequency (%)
( 67
100.0%
Space Separator
ValueCountFrequency (%)
43
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4993
94.7%
Common 196
 
3.7%
Latin 84
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
456
 
9.1%
164
 
3.3%
159
 
3.2%
152
 
3.0%
148
 
3.0%
137
 
2.7%
129
 
2.6%
114
 
2.3%
111
 
2.2%
107
 
2.1%
Other values (322) 3316
66.4%
Latin
ValueCountFrequency (%)
S 6
 
7.1%
C 6
 
7.1%
K 6
 
7.1%
E 6
 
7.1%
A 5
 
6.0%
R 5
 
6.0%
P 5
 
6.0%
M 4
 
4.8%
H 3
 
3.6%
n 3
 
3.6%
Other values (20) 35
41.7%
Common
ValueCountFrequency (%)
) 67
34.2%
( 67
34.2%
43
21.9%
1 4
 
2.0%
& 3
 
1.5%
- 3
 
1.5%
4 3
 
1.5%
2 2
 
1.0%
. 2
 
1.0%
9 1
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4537
86.0%
None 456
 
8.6%
ASCII 280
 
5.3%

Most frequent character per block

None
ValueCountFrequency (%)
456
100.0%
Hangul
ValueCountFrequency (%)
164
 
3.6%
159
 
3.5%
152
 
3.4%
148
 
3.3%
137
 
3.0%
129
 
2.8%
114
 
2.5%
111
 
2.4%
107
 
2.4%
94
 
2.1%
Other values (321) 3222
71.0%
ASCII
ValueCountFrequency (%)
) 67
23.9%
( 67
23.9%
43
15.4%
S 6
 
2.1%
C 6
 
2.1%
K 6
 
2.1%
E 6
 
2.1%
A 5
 
1.8%
R 5
 
1.8%
P 5
 
1.8%
Other values (31) 64
22.9%
Distinct878
Distinct (%)96.2%
Missing5
Missing (%)0.5%
Memory size7.3 KiB
2023-12-12T09:25:24.358205image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length59
Median length42
Mean length23.187295
Min length9

Characters and Unicode

Total characters21170
Distinct characters257
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique845 ?
Unique (%)92.6%

Sample

1st row경기도 화성시 향남읍 서봉로 715-7
2nd row경기도 화성시 팔탄면 시청로 788
3rd row경기도 화성시 서신면 전곡리 630-83
4th row경기도 화성시 비봉면 하저자안로 186-7
5th row경기도 화성시 향남읍 솔태상두길 281-14
ValueCountFrequency (%)
화성시 681
 
16.1%
경기도 587
 
13.9%
장안면 122
 
2.9%
팔탄면 120
 
2.8%
양감면 99
 
2.3%
향남읍 92
 
2.2%
정남면 90
 
2.1%
남양읍 62
 
1.5%
우정읍 59
 
1.4%
마도면 54
 
1.3%
Other values (1168) 2252
53.4%
2023-12-12T09:25:24.864800image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4906
23.2%
1 917
 
4.3%
716
 
3.4%
715
 
3.4%
704
 
3.3%
668
 
3.2%
627
 
3.0%
618
 
2.9%
2 607
 
2.9%
597
 
2.8%
Other values (247) 10095
47.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 10876
51.4%
Space Separator 4906
23.2%
Decimal Number 4405
20.8%
Dash Punctuation 533
 
2.5%
Other Punctuation 172
 
0.8%
Close Punctuation 121
 
0.6%
Open Punctuation 121
 
0.6%
Uppercase Letter 34
 
0.2%
Math Symbol 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
716
 
6.6%
715
 
6.6%
704
 
6.5%
668
 
6.1%
627
 
5.8%
618
 
5.7%
597
 
5.5%
581
 
5.3%
541
 
5.0%
307
 
2.8%
Other values (221) 4802
44.2%
Decimal Number
ValueCountFrequency (%)
1 917
20.8%
2 607
13.8%
3 462
10.5%
5 427
9.7%
4 416
9.4%
0 386
8.8%
6 343
 
7.8%
7 316
 
7.2%
8 271
 
6.2%
9 260
 
5.9%
Other Punctuation
ValueCountFrequency (%)
, 149
86.6%
. 16
 
9.3%
' 4
 
2.3%
: 3
 
1.7%
Uppercase Letter
ValueCountFrequency (%)
B 20
58.8%
A 8
 
23.5%
C 5
 
14.7%
G 1
 
2.9%
Close Punctuation
ValueCountFrequency (%)
) 115
95.0%
] 6
 
5.0%
Open Punctuation
ValueCountFrequency (%)
( 115
95.0%
[ 6
 
5.0%
Math Symbol
ValueCountFrequency (%)
= 1
50.0%
> 1
50.0%
Space Separator
ValueCountFrequency (%)
4906
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 533
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 10876
51.4%
Common 10260
48.5%
Latin 34
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
716
 
6.6%
715
 
6.6%
704
 
6.5%
668
 
6.1%
627
 
5.8%
618
 
5.7%
597
 
5.5%
581
 
5.3%
541
 
5.0%
307
 
2.8%
Other values (221) 4802
44.2%
Common
ValueCountFrequency (%)
4906
47.8%
1 917
 
8.9%
2 607
 
5.9%
- 533
 
5.2%
3 462
 
4.5%
5 427
 
4.2%
4 416
 
4.1%
0 386
 
3.8%
6 343
 
3.3%
7 316
 
3.1%
Other values (12) 947
 
9.2%
Latin
ValueCountFrequency (%)
B 20
58.8%
A 8
 
23.5%
C 5
 
14.7%
G 1
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 10876
51.4%
ASCII 10294
48.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4906
47.7%
1 917
 
8.9%
2 607
 
5.9%
- 533
 
5.2%
3 462
 
4.5%
5 427
 
4.1%
4 416
 
4.0%
0 386
 
3.7%
6 343
 
3.3%
7 316
 
3.1%
Other values (16) 981
 
9.5%
Hangul
ValueCountFrequency (%)
716
 
6.6%
715
 
6.6%
704
 
6.5%
668
 
6.1%
627
 
5.8%
618
 
5.7%
597
 
5.5%
581
 
5.3%
541
 
5.0%
307
 
2.8%
Other values (221) 4802
44.2%

연락처
Text

MISSING 

Distinct529
Distinct (%)88.0%
Missing317
Missing (%)34.5%
Memory size7.3 KiB
2023-12-12T09:25:25.145894image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length29
Median length12
Mean length12.309484
Min length11

Characters and Unicode

Total characters7398
Distinct characters14
Distinct categories5 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique471 ?
Unique (%)78.4%

Sample

1st row031-376-4574
2nd row031-352-9342
3rd row031-357-3842
4th row031-355-6064
5th row031-354-0708
ValueCountFrequency (%)
031-352-0401 4
 
0.7%
031-355-8060 4
 
0.7%
031-222-1071 4
 
0.7%
031-351-8872 3
 
0.5%
031-298-0388 3
 
0.5%
031-354-2808 3
 
0.5%
031-357-8417 3
 
0.5%
031-352-7696 3
 
0.5%
031-354-6660 3
 
0.5%
031-372-2191 3
 
0.5%
Other values (513) 573
94.6%
2023-12-12T09:25:25.592435image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 1384
18.7%
- 1216
16.4%
0 953
12.9%
1 902
12.2%
5 669
9.0%
2 462
 
6.2%
6 383
 
5.2%
8 365
 
4.9%
7 351
 
4.7%
4 332
 
4.5%
Other values (4) 381
 
5.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 6110
82.6%
Dash Punctuation 1216
 
16.4%
Space Separator 68
 
0.9%
Math Symbol 3
 
< 0.1%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 1384
22.7%
0 953
15.6%
1 902
14.8%
5 669
10.9%
2 462
 
7.6%
6 383
 
6.3%
8 365
 
6.0%
7 351
 
5.7%
4 332
 
5.4%
9 309
 
5.1%
Dash Punctuation
ValueCountFrequency (%)
- 1216
100.0%
Space Separator
ValueCountFrequency (%)
68
100.0%
Math Symbol
ValueCountFrequency (%)
~ 3
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 7398
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 1384
18.7%
- 1216
16.4%
0 953
12.9%
1 902
12.2%
5 669
9.0%
2 462
 
6.2%
6 383
 
5.2%
8 365
 
4.9%
7 351
 
4.7%
4 332
 
4.5%
Other values (4) 381
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7398
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 1384
18.7%
- 1216
16.4%
0 953
12.9%
1 902
12.2%
5 669
9.0%
2 462
 
6.2%
6 383
 
5.2%
8 365
 
4.9%
7 351
 
4.7%
4 332
 
4.5%
Other values (4) 381
 
5.2%

분류
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size7.3 KiB
종합재활용업
584 
사업장 수집 운반업
287 
건폐운반업
 
29
건폐중간처리업
 
14
소각전문
 
3

Length

Max length10
Median length6
Mean length7.2309368
Min length4

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row종합재활용업
2nd row종합재활용업
3rd row종합재활용업
4th row종합재활용업
5th row종합재활용업

Common Values

ValueCountFrequency (%)
종합재활용업 584
63.6%
사업장 수집 운반업 287
31.3%
건폐운반업 29
 
3.2%
건폐중간처리업 14
 
1.5%
소각전문 3
 
0.3%
최종처분업(매립) 1
 
0.1%

Length

2023-12-12T09:25:25.795099image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T09:25:25.932522image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
종합재활용업 584
39.1%
사업장 287
19.2%
수집 287
19.2%
운반업 287
19.2%
건폐운반업 29
 
1.9%
건폐중간처리업 14
 
0.9%
소각전문 3
 
0.2%
최종처분업(매립 1
 
0.1%

Interactions

2023-12-12T09:25:22.269136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T09:25:26.039666image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번분류
연번1.0000.738
분류0.7381.000
2023-12-12T09:25:26.129758image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번분류
연번1.0000.506
분류0.5061.000

Missing values

2023-12-12T09:25:22.402957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T09:25:22.518240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T09:25:22.620348image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

연번상호사업장 소재지연락처분류
01부광자원㈜경기도 화성시 향남읍 서봉로 715-7031-376-4574종합재활용업
12리라금속경기도 화성시 팔탄면 시청로 788031-352-9342종합재활용업
23㈜우원유업경기도 화성시 서신면 전곡리 630-83031-357-3842종합재활용업
34RAMANAR BRICK KOREA.CO경기도 화성시 비봉면 하저자안로 186-7031-355-6064종합재활용업
45㈜제삼섹터경기도 화성시 향남읍 솔태상두길 281-14031-354-0708종합재활용업
56㈜전홍개발경기도 화성시 동탄면 원고매로2번길 34031-376-3573종합재활용업
67정우리사이클링㈜경기도 화성시 팔탄면 노하길454번길 23031-354-7778종합재활용업
78태라자원주식회사경기도 화성시 정남면 괘랑보통길 19031-354-4800종합재활용업
89㈜동우개발경기도 화성시 은수포북길 89031-357-6451종합재활용업
910㈜하이콘코리아경기도 화성시 은수포북길 89031-358-7515종합재활용업
연번상호사업장 소재지연락처분류
908909㈜진흥종합환경향남읍 행정리 57-2번지031-352-0401건폐중간처리업
909910㈜태형기업향남읍 서봉로755번길 37-18031-354-6660건폐중간처리업
910911(주)삼일아스콘양감면 정문송산로 65031-352-7066건폐중간처리업
911912우리아스콘㈜경기도 화성시 팔탄면 현대기아로 556번길 111031-357-5914 031-357-5871건폐중간처리업
912913㈜한결아스콘정남면 서봉로757031-8077-2253건폐중간처리업
913914㈜신성아스콘경기도 화성시 팔탄면 푸른들판로 576031-354-8500건폐중간처리업
914915신대한정유산업㈜정남면 가장로 334-10031-352-3831소각전문
915916㈜신승에너지매바위로 570-10(장덕동)031-358-8131소각전문
916917신대원에너지㈜향남읍 발안공단로 139031-354-4701소각전문
917918㈜진흥중공업향남읍 발안공단로 139[본사(사무실): 양감면 정문송산로93번길 10-27]031-8059-2233최종처분업(매립)