Overview

Dataset statistics

Number of variables5
Number of observations3031
Missing cells2223
Missing cells (%)14.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory121.5 KiB
Average record size in memory41.0 B

Variable types

Numeric1
Categorical2
Text2

Dataset

Description대전광역시 산업단지내 입주기업으로 (대덕특구, 대덕산업단지, 대전산업단지, 하소친환경 산업단지) 입주기업체 현황으로 공유하고자 합니다.
Author대전광역시
URLhttps://www.data.go.kr/data/15063355/fileData.do

Alerts

지구 is highly overall correlated with 순번 and 1 other fieldsHigh correlation
지역 is highly overall correlated with 순번 and 1 other fieldsHigh correlation
순번 is highly overall correlated with 지역 and 1 other fieldsHigh correlation
지역 is highly imbalanced (63.6%)Imbalance
생산품 has 2223 (73.3%) missing valuesMissing
순번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 14:22:07.042594
Analysis finished2023-12-12 14:22:08.204704
Duration1.16 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

순번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct3031
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1516
Minimum1
Maximum3031
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.8 KiB
2023-12-12T23:22:08.315788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile152.5
Q1758.5
median1516
Q32273.5
95-th percentile2879.5
Maximum3031
Range3030
Interquartile range (IQR)1515

Descriptive statistics

Standard deviation875.11866
Coefficient of variation (CV)0.57725505
Kurtosis-1.2
Mean1516
Median Absolute Deviation (MAD)758
Skewness0
Sum4594996
Variance765832.67
MonotonicityStrictly increasing
2023-12-12T23:22:08.510061image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
2026 1
 
< 0.1%
2017 1
 
< 0.1%
2018 1
 
< 0.1%
2019 1
 
< 0.1%
2020 1
 
< 0.1%
2021 1
 
< 0.1%
2022 1
 
< 0.1%
2023 1
 
< 0.1%
2024 1
 
< 0.1%
Other values (3021) 3021
99.7%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
3031 1
< 0.1%
3030 1
< 0.1%
3029 1
< 0.1%
3028 1
< 0.1%
3027 1
< 0.1%
3026 1
< 0.1%
3025 1
< 0.1%
3024 1
< 0.1%
3023 1
< 0.1%
3022 1
< 0.1%

지역
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size23.8 KiB
대덕
2598 
제1산업단지
 
196
편입지역
 
126
제2산업단지
 
76
하소
 
35

Length

Max length6
Median length2
Mean length2.4420983
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row대덕
2nd row대덕
3rd row대덕
4th row대덕
5th row대덕

Common Values

ValueCountFrequency (%)
대덕 2598
85.7%
제1산업단지 196
 
6.5%
편입지역 126
 
4.2%
제2산업단지 76
 
2.5%
하소 35
 
1.2%

Length

2023-12-12T23:22:08.677386image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:22:09.102869image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
대덕 2598
85.7%
제1산업단지 196
 
6.5%
편입지역 126
 
4.2%
제2산업단지 76
 
2.5%
하소 35
 
1.2%
Distinct2866
Distinct (%)94.6%
Missing0
Missing (%)0.0%
Memory size23.8 KiB
2023-12-12T23:22:09.374169image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length26
Median length21
Mean length7.3767733
Min length2

Characters and Unicode

Total characters22359
Distinct characters631
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2712 ?
Unique (%)89.5%

Sample

1st row(주)디토
2nd row(주)알테오젠
3rd row(주)파멥신
4th row환경플라즈마(주)
5th row주식회사디앤티
ValueCountFrequency (%)
주식회사 190
 
5.7%
10
 
0.3%
제2공장 7
 
0.2%
주)삼양사 4
 
0.1%
에이스트로닉스(주 3
 
0.1%
주)제노포커스 3
 
0.1%
주)삼양바이오팜 3
 
0.1%
케이맥(주 3
 
0.1%
재단법인 3
 
0.1%
대영금속공업㈜ 3
 
0.1%
Other values (2937) 3131
93.2%
2023-12-12T23:22:09.850579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1811
 
8.1%
) 1603
 
7.2%
( 1597
 
7.1%
977
 
4.4%
838
 
3.7%
412
 
1.8%
361
 
1.6%
338
 
1.5%
328
 
1.5%
318
 
1.4%
Other values (621) 13776
61.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 18060
80.8%
Close Punctuation 1611
 
7.2%
Open Punctuation 1605
 
7.2%
Space Separator 338
 
1.5%
Other Symbol 283
 
1.3%
Uppercase Letter 276
 
1.2%
Lowercase Letter 103
 
0.5%
Decimal Number 42
 
0.2%
Other Punctuation 32
 
0.1%
Dash Punctuation 8
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1811
 
10.0%
977
 
5.4%
838
 
4.6%
412
 
2.3%
361
 
2.0%
328
 
1.8%
318
 
1.8%
272
 
1.5%
267
 
1.5%
265
 
1.5%
Other values (558) 12211
67.6%
Uppercase Letter
ValueCountFrequency (%)
E 28
 
10.1%
T 28
 
10.1%
N 23
 
8.3%
S 22
 
8.0%
I 22
 
8.0%
C 21
 
7.6%
G 20
 
7.2%
L 14
 
5.1%
K 13
 
4.7%
D 13
 
4.7%
Other values (11) 72
26.1%
Lowercase Letter
ValueCountFrequency (%)
o 14
13.6%
e 12
11.7%
n 10
9.7%
t 8
 
7.8%
a 8
 
7.8%
s 8
 
7.8%
c 6
 
5.8%
r 6
 
5.8%
i 5
 
4.9%
g 4
 
3.9%
Other values (11) 22
21.4%
Decimal Number
ValueCountFrequency (%)
2 12
28.6%
1 7
16.7%
5 7
16.7%
3 6
14.3%
4 4
 
9.5%
8 2
 
4.8%
7 2
 
4.8%
6 1
 
2.4%
9 1
 
2.4%
Other Punctuation
ValueCountFrequency (%)
. 19
59.4%
& 7
 
21.9%
, 5
 
15.6%
: 1
 
3.1%
Close Punctuation
ValueCountFrequency (%)
) 1603
99.5%
] 8
 
0.5%
Open Punctuation
ValueCountFrequency (%)
( 1597
99.5%
[ 8
 
0.5%
Space Separator
ValueCountFrequency (%)
338
100.0%
Other Symbol
ValueCountFrequency (%)
283
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 8
100.0%
Control
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 18343
82.0%
Common 3637
 
16.3%
Latin 379
 
1.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1811
 
9.9%
977
 
5.3%
838
 
4.6%
412
 
2.2%
361
 
2.0%
328
 
1.8%
318
 
1.7%
283
 
1.5%
272
 
1.5%
267
 
1.5%
Other values (559) 12476
68.0%
Latin
ValueCountFrequency (%)
E 28
 
7.4%
T 28
 
7.4%
N 23
 
6.1%
S 22
 
5.8%
I 22
 
5.8%
C 21
 
5.5%
G 20
 
5.3%
o 14
 
3.7%
L 14
 
3.7%
K 13
 
3.4%
Other values (32) 174
45.9%
Common
ValueCountFrequency (%)
) 1603
44.1%
( 1597
43.9%
338
 
9.3%
. 19
 
0.5%
2 12
 
0.3%
- 8
 
0.2%
] 8
 
0.2%
[ 8
 
0.2%
1 7
 
0.2%
& 7
 
0.2%
Other values (10) 30
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 18060
80.8%
ASCII 4016
 
18.0%
None 283
 
1.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1811
 
10.0%
977
 
5.4%
838
 
4.6%
412
 
2.3%
361
 
2.0%
328
 
1.8%
318
 
1.8%
272
 
1.5%
267
 
1.5%
265
 
1.5%
Other values (558) 12211
67.6%
ASCII
ValueCountFrequency (%)
) 1603
39.9%
( 1597
39.8%
338
 
8.4%
E 28
 
0.7%
T 28
 
0.7%
N 23
 
0.6%
S 22
 
0.5%
I 22
 
0.5%
C 21
 
0.5%
G 20
 
0.5%
Other values (52) 314
 
7.8%
None
ValueCountFrequency (%)
283
100.0%

지구
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size23.8 KiB
2지구(대덕테크노밸리)
1438 
1지구(연구단지)
788 
<NA>
433 
3지구(대덕산업단지)
372 

Length

Max length12
Median length11
Mean length9.9544705
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1지구(연구단지)
2nd row1지구(연구단지)
3rd row1지구(연구단지)
4th row1지구(연구단지)
5th row1지구(연구단지)

Common Values

ValueCountFrequency (%)
2지구(대덕테크노밸리) 1438
47.4%
1지구(연구단지) 788
26.0%
<NA> 433
 
14.3%
3지구(대덕산업단지) 372
 
12.3%

Length

2023-12-12T23:22:10.012625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:22:10.150915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2지구(대덕테크노밸리 1438
47.4%
1지구(연구단지 788
26.0%
na 433
 
14.3%
3지구(대덕산업단지 372
 
12.3%

생산품
Text

MISSING 

Distinct710
Distinct (%)87.9%
Missing2223
Missing (%)73.3%
Memory size23.8 KiB
2023-12-12T23:22:10.385859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length43
Median length30
Mean length8.7066832
Min length2

Characters and Unicode

Total characters7035
Distinct characters500
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique660 ?
Unique (%)81.7%

Sample

1st row디스플레이
2nd row액체비료, 미생물제제, 방제제, 탈취제 등
3rd row모션센서모듈 외
4th row항공기 부품
5th rowPolarization Scrambler 외
ValueCountFrequency (%)
69
 
4.5%
44
 
2.8%
제조 38
 
2.5%
부품 17
 
1.1%
13
 
0.8%
소프트웨어 11
 
0.7%
산업용 8
 
0.5%
연구개발 8
 
0.5%
밸브 8
 
0.5%
자동차 8
 
0.5%
Other values (1018) 1322
85.5%
2023-12-12T23:22:10.868018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
747
 
10.6%
309
 
4.4%
, 212
 
3.0%
141
 
2.0%
137
 
1.9%
123
 
1.7%
104
 
1.5%
102
 
1.4%
98
 
1.4%
93
 
1.3%
Other values (490) 4969
70.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 5475
77.8%
Space Separator 747
 
10.6%
Uppercase Letter 292
 
4.2%
Other Punctuation 242
 
3.4%
Lowercase Letter 179
 
2.5%
Decimal Number 35
 
0.5%
Open Punctuation 28
 
0.4%
Close Punctuation 28
 
0.4%
Dash Punctuation 8
 
0.1%
Other Symbol 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
309
 
5.6%
141
 
2.6%
137
 
2.5%
123
 
2.2%
104
 
1.9%
102
 
1.9%
98
 
1.8%
93
 
1.7%
91
 
1.7%
87
 
1.6%
Other values (428) 4190
76.5%
Uppercase Letter
ValueCountFrequency (%)
C 37
12.7%
D 32
11.0%
S 22
 
7.5%
P 21
 
7.2%
E 20
 
6.8%
I 19
 
6.5%
L 19
 
6.5%
T 19
 
6.5%
A 14
 
4.8%
W 11
 
3.8%
Other values (13) 78
26.7%
Lowercase Letter
ValueCountFrequency (%)
e 18
10.1%
n 16
 
8.9%
a 16
 
8.9%
r 15
 
8.4%
t 13
 
7.3%
o 13
 
7.3%
i 13
 
7.3%
l 13
 
7.3%
s 12
 
6.7%
y 8
 
4.5%
Other values (12) 42
23.5%
Decimal Number
ValueCountFrequency (%)
0 9
25.7%
3 9
25.7%
1 8
22.9%
2 6
17.1%
5 1
 
2.9%
7 1
 
2.9%
4 1
 
2.9%
Other Punctuation
ValueCountFrequency (%)
, 212
87.6%
/ 17
 
7.0%
. 10
 
4.1%
· 2
 
0.8%
& 1
 
0.4%
Space Separator
ValueCountFrequency (%)
747
100.0%
Open Punctuation
ValueCountFrequency (%)
( 28
100.0%
Close Punctuation
ValueCountFrequency (%)
) 28
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 8
100.0%
Other Symbol
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 5476
77.8%
Common 1088
 
15.5%
Latin 471
 
6.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
309
 
5.6%
141
 
2.6%
137
 
2.5%
123
 
2.2%
104
 
1.9%
102
 
1.9%
98
 
1.8%
93
 
1.7%
91
 
1.7%
87
 
1.6%
Other values (429) 4191
76.5%
Latin
ValueCountFrequency (%)
C 37
 
7.9%
D 32
 
6.8%
S 22
 
4.7%
P 21
 
4.5%
E 20
 
4.2%
I 19
 
4.0%
L 19
 
4.0%
T 19
 
4.0%
e 18
 
3.8%
n 16
 
3.4%
Other values (35) 248
52.7%
Common
ValueCountFrequency (%)
747
68.7%
, 212
 
19.5%
( 28
 
2.6%
) 28
 
2.6%
/ 17
 
1.6%
. 10
 
0.9%
0 9
 
0.8%
3 9
 
0.8%
- 8
 
0.7%
1 8
 
0.7%
Other values (6) 12
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 5475
77.8%
ASCII 1557
 
22.1%
None 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
747
48.0%
, 212
 
13.6%
C 37
 
2.4%
D 32
 
2.1%
( 28
 
1.8%
) 28
 
1.8%
S 22
 
1.4%
P 21
 
1.3%
E 20
 
1.3%
I 19
 
1.2%
Other values (50) 391
25.1%
Hangul
ValueCountFrequency (%)
309
 
5.6%
141
 
2.6%
137
 
2.5%
123
 
2.2%
104
 
1.9%
102
 
1.9%
98
 
1.8%
93
 
1.7%
91
 
1.7%
87
 
1.6%
Other values (428) 4190
76.5%
None
ValueCountFrequency (%)
· 2
66.7%
1
33.3%

Interactions

2023-12-12T23:22:07.871313image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T23:22:10.970357image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번지역지구
순번1.0000.8570.993
지역0.8571.000NaN
지구0.993NaN1.000
2023-12-12T23:22:11.083741image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지구지역
지구1.0001.000
지역1.0001.000
2023-12-12T23:22:11.190829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번지역지구
순번1.0000.5250.900
지역0.5251.0001.000
지구0.9001.0001.000

Missing values

2023-12-12T23:22:08.035291image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T23:22:08.151802image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

순번지역기관명지구생산품
01대덕(주)디토1지구(연구단지)<NA>
12대덕(주)알테오젠1지구(연구단지)<NA>
23대덕(주)파멥신1지구(연구단지)<NA>
34대덕환경플라즈마(주)1지구(연구단지)<NA>
45대덕주식회사디앤티1지구(연구단지)디스플레이
56대덕(주)에이멕1지구(연구단지)<NA>
67대덕(주)엘지하우시스1지구(연구단지)<NA>
78대덕(주)그린플러스1지구(연구단지)<NA>
89대덕(주)모두텍1지구(연구단지)<NA>
910대덕한국원자력환경공단1지구(연구단지)<NA>
순번지역기관명지구생산품
30213022하소신한물산<NA>제조
30223023하소현오토<NA>제조
30233024하소사모스<NA>제조
30243025하소케이비덴탈<NA>제조
30253026하소성신종합가구<NA>제조
30263027하소그래EAT<NA>제조
30273028하소신코퍼레이션<NA>제조
30283029하소태환산업<NA>제조
30293030하소㈜페인트인포<NA>제조
30303031하소ks전기제어<NA>제조