Overview

Dataset statistics

Number of variables19
Number of observations10000
Missing cells44662
Missing cells (%)23.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.5 MiB
Average record size in memory162.0 B

Variable types

Categorical4
Text10
Numeric3
Boolean1
DateTime1

Dataset

Description물 산업관련 기업 목록을 제공합니다. 제공항목 : 기업분류,기업ID,기업명,기업영문명,사업자등록번호,정보공개동의여부,대표자명,대표번호,설립일,주소,기업규모,자본금,매출액,수출액,종업원수,표준산업분류,물산업대분류,물산업중분류,물산업분류,주요생산품,담당자명,담당자전화번호,담당자휴대전화번호,담당자이메일
URLhttps://www.data.go.kr/data/15118899/fileData.do

Alerts

기업분류 is highly overall correlated with 수출액High correlation
정보공개동의여부 is highly overall correlated with 수출액High correlation
수출액 is highly overall correlated with 기업분류 and 1 other fieldsHigh correlation
물산업대분류 is highly overall correlated with 물산업중분류High correlation
물산업중분류 is highly overall correlated with 물산업대분류High correlation
기업분류 is highly imbalanced (60.4%)Imbalance
정보공개동의여부 is highly imbalanced (90.6%)Imbalance
수출액 is highly imbalanced (90.8%)Imbalance
기업아이디 has 9218 (92.2%) missing valuesMissing
기업영문명 has 9550 (95.5%) missing valuesMissing
설립일 has 6579 (65.8%) missing valuesMissing
주소 has 112 (1.1%) missing valuesMissing
자본금 has 9672 (96.7%) missing valuesMissing
매출액 has 7349 (73.5%) missing valuesMissing
표준산업분류 has 518 (5.2%) missing valuesMissing
물산업분류 has 519 (5.2%) missing valuesMissing
주요생산품 has 1067 (10.7%) missing valuesMissing

Reproduction

Analysis started2023-12-12 16:52:53.888915
Analysis finished2023-12-12 16:53:00.385058
Duration6.5 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

기업분류
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
D
9218 
M
 
782

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowD
2nd rowD
3rd rowD
4th rowD
5th rowD

Common Values

ValueCountFrequency (%)
D 9218
92.2%
M 782
 
7.8%

Length

2023-12-13T01:53:00.440685image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T01:53:00.760273image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
d 9218
92.2%
m 782
 
7.8%

기업아이디
Text

MISSING 

Distinct745
Distinct (%)95.3%
Missing9218
Missing (%)92.2%
Memory size156.2 KiB
2023-12-13T01:53:01.044364image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length16
Mean length8.056266
Min length5

Characters and Unicode

Total characters6300
Distinct characters62
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique715 ?
Unique (%)91.4%

Sample

1st rowdecaeng
2nd rowgslbio
3rd rowjlmco269
4th rowecell0504
5th rowdk0548
ValueCountFrequency (%)
namuya82 3
 
0.4%
tjswls2274 3
 
0.4%
sheco12 3
 
0.4%
argos88 3
 
0.4%
getblue 3
 
0.4%
thewavetalk 3
 
0.4%
genad123 3
 
0.4%
psglobal6 2
 
0.3%
waterkorea 2
 
0.3%
koreaet 2
 
0.3%
Other values (735) 755
96.5%
2023-12-13T01:53:01.592851image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 506
 
8.0%
o 405
 
6.4%
n 401
 
6.4%
a 349
 
5.5%
s 337
 
5.3%
0 289
 
4.6%
c 272
 
4.3%
t 267
 
4.2%
r 260
 
4.1%
i 257
 
4.1%
Other values (52) 2957
46.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4872
77.3%
Decimal Number 1320
 
21.0%
Uppercase Letter 101
 
1.6%
Dash Punctuation 4
 
0.1%
Connector Punctuation 2
 
< 0.1%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 506
 
10.4%
o 405
 
8.3%
n 401
 
8.2%
a 349
 
7.2%
s 337
 
6.9%
c 272
 
5.6%
t 267
 
5.5%
r 260
 
5.3%
i 257
 
5.3%
h 200
 
4.1%
Other values (16) 1618
33.2%
Uppercase Letter
ValueCountFrequency (%)
T 13
12.9%
E 13
12.9%
S 8
 
7.9%
A 7
 
6.9%
H 6
 
5.9%
C 5
 
5.0%
O 5
 
5.0%
K 5
 
5.0%
W 5
 
5.0%
L 5
 
5.0%
Other values (13) 29
28.7%
Decimal Number
ValueCountFrequency (%)
0 289
21.9%
1 238
18.0%
2 207
15.7%
7 94
 
7.1%
5 93
 
7.0%
4 92
 
7.0%
3 86
 
6.5%
8 83
 
6.3%
9 77
 
5.8%
6 61
 
4.6%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%
Other Punctuation
ValueCountFrequency (%)
. 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4973
78.9%
Common 1327
 
21.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 506
 
10.2%
o 405
 
8.1%
n 401
 
8.1%
a 349
 
7.0%
s 337
 
6.8%
c 272
 
5.5%
t 267
 
5.4%
r 260
 
5.2%
i 257
 
5.2%
h 200
 
4.0%
Other values (39) 1719
34.6%
Common
ValueCountFrequency (%)
0 289
21.8%
1 238
17.9%
2 207
15.6%
7 94
 
7.1%
5 93
 
7.0%
4 92
 
6.9%
3 86
 
6.5%
8 83
 
6.3%
9 77
 
5.8%
6 61
 
4.6%
Other values (3) 7
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6300
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 506
 
8.0%
o 405
 
6.4%
n 401
 
6.4%
a 349
 
5.5%
s 337
 
5.3%
0 289
 
4.6%
c 272
 
4.3%
t 267
 
4.2%
r 260
 
4.1%
i 257
 
4.1%
Other values (52) 2957
46.9%
Distinct9534
Distinct (%)95.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T01:53:01.913476image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length25
Median length23
Mean length7.4089
Min length1

Characters and Unicode

Total characters74089
Distinct characters702
Distinct categories9 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9154 ?
Unique (%)91.5%

Sample

1st row장백환경(주)
2nd row(주)중원디앤케이
3rd row윤도건설(주)
4th row(주)정엔지니어링
5th row(주)가나수질환경
ValueCountFrequency (%)
주식회사 221
 
2.1%
19
 
0.2%
신화건설(주 9
 
0.1%
유한회사 6
 
0.1%
선진건설(주 5
 
< 0.1%
주)동진건설 5
 
< 0.1%
동진건설(주 5
 
< 0.1%
대원건설(주 5
 
< 0.1%
상원건설(주 4
 
< 0.1%
청우건설(주 4
 
< 0.1%
Other values (9564) 10016
97.3%
2023-12-13T01:53:02.524067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 7837
 
10.6%
) 7837
 
10.6%
7805
 
10.5%
2506
 
3.4%
2286
 
3.1%
2006
 
2.7%
1243
 
1.7%
1210
 
1.6%
1062
 
1.4%
961
 
1.3%
Other values (692) 39336
53.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 57790
78.0%
Open Punctuation 7838
 
10.6%
Close Punctuation 7838
 
10.6%
Space Separator 306
 
0.4%
Uppercase Letter 231
 
0.3%
Other Punctuation 40
 
0.1%
Decimal Number 22
 
< 0.1%
Lowercase Letter 22
 
< 0.1%
Dash Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
7805
 
13.5%
2506
 
4.3%
2286
 
4.0%
2006
 
3.5%
1243
 
2.2%
1210
 
2.1%
1062
 
1.8%
961
 
1.7%
939
 
1.6%
935
 
1.6%
Other values (642) 36837
63.7%
Uppercase Letter
ValueCountFrequency (%)
E 32
13.9%
N 22
 
9.5%
G 21
 
9.1%
S 18
 
7.8%
T 16
 
6.9%
C 14
 
6.1%
M 12
 
5.2%
K 11
 
4.8%
H 10
 
4.3%
O 9
 
3.9%
Other values (13) 66
28.6%
Lowercase Letter
ValueCountFrequency (%)
e 4
18.2%
a 2
9.1%
c 2
9.1%
t 2
9.1%
v 2
9.1%
d 2
9.1%
h 1
 
4.5%
z 1
 
4.5%
x 1
 
4.5%
o 1
 
4.5%
Other values (4) 4
18.2%
Decimal Number
ValueCountFrequency (%)
1 13
59.1%
2 6
27.3%
4 2
 
9.1%
3 1
 
4.5%
Other Punctuation
ValueCountFrequency (%)
. 29
72.5%
& 7
 
17.5%
/ 4
 
10.0%
Open Punctuation
ValueCountFrequency (%)
( 7837
> 99.9%
1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 7837
> 99.9%
1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
306
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 57780
78.0%
Common 16046
 
21.7%
Latin 253
 
0.3%
Han 10
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
7805
 
13.5%
2506
 
4.3%
2286
 
4.0%
2006
 
3.5%
1243
 
2.2%
1210
 
2.1%
1062
 
1.8%
961
 
1.7%
939
 
1.6%
935
 
1.6%
Other values (632) 36827
63.7%
Latin
ValueCountFrequency (%)
E 32
 
12.6%
N 22
 
8.7%
G 21
 
8.3%
S 18
 
7.1%
T 16
 
6.3%
C 14
 
5.5%
M 12
 
4.7%
K 11
 
4.3%
H 10
 
4.0%
O 9
 
3.6%
Other values (27) 88
34.8%
Common
ValueCountFrequency (%)
( 7837
48.8%
) 7837
48.8%
306
 
1.9%
. 29
 
0.2%
1 13
 
0.1%
& 7
 
< 0.1%
2 6
 
< 0.1%
/ 4
 
< 0.1%
4 2
 
< 0.1%
- 2
 
< 0.1%
Other values (3) 3
 
< 0.1%
Han
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 57780
78.0%
ASCII 16297
 
22.0%
CJK 9
 
< 0.1%
None 2
 
< 0.1%
CJK Compat Ideographs 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 7837
48.1%
) 7837
48.1%
306
 
1.9%
E 32
 
0.2%
. 29
 
0.2%
N 22
 
0.1%
G 21
 
0.1%
S 18
 
0.1%
T 16
 
0.1%
C 14
 
0.1%
Other values (38) 165
 
1.0%
Hangul
ValueCountFrequency (%)
7805
 
13.5%
2506
 
4.3%
2286
 
4.0%
2006
 
3.5%
1243
 
2.2%
1210
 
2.1%
1062
 
1.8%
961
 
1.7%
939
 
1.6%
935
 
1.6%
Other values (632) 36827
63.7%
CJK
ValueCountFrequency (%)
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
None
ValueCountFrequency (%)
1
50.0%
1
50.0%
CJK Compat Ideographs
ValueCountFrequency (%)
1
100.0%

기업영문명
Text

MISSING 

Distinct423
Distinct (%)94.0%
Missing9550
Missing (%)95.5%
Memory size156.2 KiB
2023-12-13T01:53:02.880981image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length42
Median length29
Mean length14.6
Min length3

Characters and Unicode

Total characters6570
Distinct characters66
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique401 ?
Unique (%)89.1%

Sample

1st rowDECA Eng/ Co./ LTD
2nd rowGSL-BIO Co./ Ltd.
3rd rowLumimax.Co./Ltd.
4th rowECELL INC.
5th rowIssaLab Co./ Ltd.
ValueCountFrequency (%)
co 112
 
11.2%
ltd 105
 
10.5%
inc 47
 
4.7%
co./ltd 37
 
3.7%
korea 12
 
1.2%
system 9
 
0.9%
technology 8
 
0.8%
engineering 8
 
0.8%
tech 8
 
0.8%
co.ltd 8
 
0.8%
Other values (538) 642
64.5%
2023-12-13T01:53:03.326878image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
560
 
8.5%
o 394
 
6.0%
. 359
 
5.5%
e 311
 
4.7%
n 308
 
4.7%
C 279
 
4.2%
t 276
 
4.2%
E 243
 
3.7%
L 217
 
3.3%
d 186
 
2.8%
Other values (56) 3437
52.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2958
45.0%
Uppercase Letter 2478
37.7%
Space Separator 560
 
8.5%
Other Punctuation 545
 
8.3%
Dash Punctuation 16
 
0.2%
Decimal Number 9
 
0.1%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 394
13.3%
e 311
10.5%
n 308
10.4%
t 276
9.3%
d 186
 
6.3%
c 177
 
6.0%
a 173
 
5.8%
r 170
 
5.7%
i 168
 
5.7%
s 113
 
3.8%
Other values (16) 682
23.1%
Uppercase Letter
ValueCountFrequency (%)
C 279
11.3%
E 243
 
9.8%
L 217
 
8.8%
T 181
 
7.3%
O 175
 
7.1%
S 170
 
6.9%
N 168
 
6.8%
I 158
 
6.4%
A 133
 
5.4%
G 97
 
3.9%
Other values (16) 657
26.5%
Other Punctuation
ValueCountFrequency (%)
. 359
65.9%
/ 157
28.8%
& 27
 
5.0%
; 1
 
0.2%
' 1
 
0.2%
Decimal Number
ValueCountFrequency (%)
4 3
33.3%
1 2
22.2%
2 2
22.2%
3 1
 
11.1%
8 1
 
11.1%
Space Separator
ValueCountFrequency (%)
560
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 16
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5436
82.7%
Common 1134
 
17.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 394
 
7.2%
e 311
 
5.7%
n 308
 
5.7%
C 279
 
5.1%
t 276
 
5.1%
E 243
 
4.5%
L 217
 
4.0%
d 186
 
3.4%
T 181
 
3.3%
c 177
 
3.3%
Other values (42) 2864
52.7%
Common
ValueCountFrequency (%)
560
49.4%
. 359
31.7%
/ 157
 
13.8%
& 27
 
2.4%
- 16
 
1.4%
4 3
 
0.3%
1 2
 
0.2%
2 2
 
0.2%
( 2
 
0.2%
) 2
 
0.2%
Other values (4) 4
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6570
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
560
 
8.5%
o 394
 
6.0%
. 359
 
5.5%
e 311
 
4.7%
n 308
 
4.7%
C 279
 
4.2%
t 276
 
4.2%
E 243
 
3.7%
L 217
 
3.3%
d 186
 
2.8%
Other values (56) 3437
52.3%
Distinct9957
Distinct (%)99.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.3070495 × 109
Minimum0
Maximum8.9888001 × 109
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T01:53:03.485271image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1.1302773 × 109
Q11.3400302 × 109
median3.0781046 × 109
Q35.0581097 × 109
95-th percentile6.1682197 × 109
Maximum8.9888001 × 109
Range8.9888001 × 109
Interquartile range (IQR)3.7180795 × 109

Descriptive statistics

Standard deviation1.916418 × 109
Coefficient of variation (CV)0.57949481
Kurtosis-1.0151567
Mean3.3070495 × 109
Median Absolute Deviation (MAD)1.7694364 × 109
Skewness0.44352694
Sum3.3070495 × 1013
Variance3.672658 × 1018
MonotonicityNot monotonic
2023-12-13T01:53:03.614075image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1238612609 3
 
< 0.1%
7198100549 3
 
< 0.1%
8878601642 3
 
< 0.1%
5058129442 3
 
< 0.1%
4668600994 3
 
< 0.1%
4978701897 3
 
< 0.1%
2198701400 3
 
< 0.1%
6058135638 2
 
< 0.1%
1268619294 2
 
< 0.1%
1248600523 2
 
< 0.1%
Other values (9947) 9973
99.7%
ValueCountFrequency (%)
0 1
< 0.1%
19850771 1
< 0.1%
19861238 1
< 0.1%
19941697 1
< 0.1%
1010141577 1
< 0.1%
1010546041 1
< 0.1%
1011225423 1
< 0.1%
1011793983 1
< 0.1%
1012063868 1
< 0.1%
1012285380 1
< 0.1%
ValueCountFrequency (%)
8988800079 1
< 0.1%
8980600016 1
< 0.1%
8928802050 1
< 0.1%
8928800052 1
< 0.1%
8908800342 1
< 0.1%
8898801870 1
< 0.1%
8898600222 2
< 0.1%
8888801542 1
< 0.1%
8888100433 1
< 0.1%
8878800098 1
< 0.1%
Distinct9840
Distinct (%)98.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T01:53:03.843156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length9.9138
Min length1

Characters and Unicode

Total characters99138
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9806 ?
Unique (%)98.1%

Sample

1st row3158134416
2nd row1278196038
3rd row2248133464
4th row6108142688
5th row4178135254
ValueCountFrequency (%)
비공개 121
 
1.2%
5058129442 3
 
< 0.1%
1238612609 3
 
< 0.1%
4668600994 3
 
< 0.1%
7198100549 3
 
< 0.1%
2198701400 3
 
< 0.1%
4978701897 3
 
< 0.1%
8878601642 3
 
< 0.1%
4108169980 2
 
< 0.1%
3058182231 2
 
< 0.1%
Other values (9830) 9854
98.5%
2023-12-13T01:53:04.215520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 19895
20.1%
8 13980
14.1%
0 11296
11.4%
2 9690
9.8%
6 8902
9.0%
3 8669
8.7%
4 8099
8.2%
5 6907
 
7.0%
7 5915
 
6.0%
9 5422
 
5.5%
Other values (3) 363
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 98775
99.6%
Other Letter 363
 
0.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 19895
20.1%
8 13980
14.2%
0 11296
11.4%
2 9690
9.8%
6 8902
9.0%
3 8669
8.8%
4 8099
8.2%
5 6907
 
7.0%
7 5915
 
6.0%
9 5422
 
5.5%
Other Letter
ValueCountFrequency (%)
121
33.3%
121
33.3%
121
33.3%

Most occurring scripts

ValueCountFrequency (%)
Common 98775
99.6%
Hangul 363
 
0.4%

Most frequent character per script

Common
ValueCountFrequency (%)
1 19895
20.1%
8 13980
14.2%
0 11296
11.4%
2 9690
9.8%
6 8902
9.0%
3 8669
8.8%
4 8099
8.2%
5 6907
 
7.0%
7 5915
 
6.0%
9 5422
 
5.5%
Hangul
ValueCountFrequency (%)
121
33.3%
121
33.3%
121
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 98775
99.6%
Hangul 363
 
0.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 19895
20.1%
8 13980
14.2%
0 11296
11.4%
2 9690
9.8%
6 8902
9.0%
3 8669
8.8%
4 8099
8.2%
5 6907
 
7.0%
7 5915
 
6.0%
9 5422
 
5.5%
Hangul
ValueCountFrequency (%)
121
33.3%
121
33.3%
121
33.3%

정보공개동의여부
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size87.9 KiB
True
9879 
False
 
121
ValueCountFrequency (%)
True 9879
98.8%
False 121
 
1.2%
2023-12-13T01:53:04.338996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

설립일
Date

MISSING 

Distinct2669
Distinct (%)78.0%
Missing6579
Missing (%)65.8%
Memory size156.2 KiB
Minimum1941-05-29 00:00:00
Maximum2022-03-25 00:00:00
2023-12-13T01:53:04.506999image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:53:04.657560image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

주소
Text

MISSING 

Distinct9108
Distinct (%)92.1%
Missing112
Missing (%)1.1%
Memory size156.2 KiB
2023-12-13T01:53:05.060533image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length59
Median length47
Mean length19.561185
Min length2

Characters and Unicode

Total characters193421
Distinct characters615
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8768 ?
Unique (%)88.7%

Sample

1st row충북 청주시 상당구 무농정로 42/ 302호
2nd row경기 성남시 중원구 여수울로15번길 8-10
3rd row강원 원주시 치악로 1316
4th row부산 기장군
5th row전남 여수시 돌산읍 강남3길 18-20
ValueCountFrequency (%)
경기 2115
 
4.5%
서울 1158
 
2.5%
경남 781
 
1.7%
전남 644
 
1.4%
경북 630
 
1.3%
부산 576
 
1.2%
충남 532
 
1.1%
전북 491
 
1.0%
2층 402
 
0.9%
인천 401
 
0.9%
Other values (11301) 39133
83.5%
2023-12-13T01:53:05.607762image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
37001
 
19.1%
1 8946
 
4.6%
7320
 
3.8%
2 6192
 
3.2%
5807
 
3.0%
5379
 
2.8%
3 4657
 
2.4%
4298
 
2.2%
4137
 
2.1%
0 4098
 
2.1%
Other values (605) 105586
54.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 108635
56.2%
Decimal Number 41073
 
21.2%
Space Separator 37002
 
19.1%
Other Punctuation 3157
 
1.6%
Dash Punctuation 2375
 
1.2%
Close Punctuation 457
 
0.2%
Open Punctuation 457
 
0.2%
Uppercase Letter 241
 
0.1%
Lowercase Letter 21
 
< 0.1%
Math Symbol 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
7320
 
6.7%
5807
 
5.3%
5379
 
5.0%
4298
 
4.0%
4137
 
3.8%
3501
 
3.2%
3000
 
2.8%
2646
 
2.4%
2602
 
2.4%
2592
 
2.4%
Other values (552) 67353
62.0%
Uppercase Letter
ValueCountFrequency (%)
B 90
37.3%
A 67
27.8%
C 21
 
8.7%
D 15
 
6.2%
E 8
 
3.3%
S 7
 
2.9%
T 6
 
2.5%
K 6
 
2.5%
I 4
 
1.7%
M 3
 
1.2%
Other values (10) 14
 
5.8%
Decimal Number
ValueCountFrequency (%)
1 8946
21.8%
2 6192
15.1%
3 4657
11.3%
0 4098
10.0%
4 3692
9.0%
5 3257
 
7.9%
6 2995
 
7.3%
7 2587
 
6.3%
8 2402
 
5.8%
9 2244
 
5.5%
Other values (2) 3
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
s 4
19.0%
t 3
14.3%
w 3
14.3%
k 2
9.5%
e 2
9.5%
r 2
9.5%
a 1
 
4.8%
i 1
 
4.8%
c 1
 
4.8%
x 1
 
4.8%
Other Punctuation
ValueCountFrequency (%)
/ 3143
99.6%
. 13
 
0.4%
" 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
37001
> 99.9%
  1
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 2374
> 99.9%
1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 457
100.0%
Open Punctuation
ValueCountFrequency (%)
( 457
100.0%
Math Symbol
ValueCountFrequency (%)
~ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 108635
56.2%
Common 84524
43.7%
Latin 262
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
7320
 
6.7%
5807
 
5.3%
5379
 
5.0%
4298
 
4.0%
4137
 
3.8%
3501
 
3.2%
3000
 
2.8%
2646
 
2.4%
2602
 
2.4%
2592
 
2.4%
Other values (552) 67353
62.0%
Latin
ValueCountFrequency (%)
B 90
34.4%
A 67
25.6%
C 21
 
8.0%
D 15
 
5.7%
E 8
 
3.1%
S 7
 
2.7%
T 6
 
2.3%
K 6
 
2.3%
I 4
 
1.5%
s 4
 
1.5%
Other values (21) 34
 
13.0%
Common
ValueCountFrequency (%)
37001
43.8%
1 8946
 
10.6%
2 6192
 
7.3%
3 4657
 
5.5%
0 4098
 
4.8%
4 3692
 
4.4%
5 3257
 
3.9%
/ 3143
 
3.7%
6 2995
 
3.5%
7 2587
 
3.1%
Other values (12) 7956
 
9.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 108635
56.2%
ASCII 84781
43.8%
None 5
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
37001
43.6%
1 8946
 
10.6%
2 6192
 
7.3%
3 4657
 
5.5%
0 4098
 
4.8%
4 3692
 
4.4%
5 3257
 
3.8%
/ 3143
 
3.7%
6 2995
 
3.5%
7 2587
 
3.1%
Other values (39) 8213
 
9.7%
Hangul
ValueCountFrequency (%)
7320
 
6.7%
5807
 
5.3%
5379
 
5.0%
4298
 
4.0%
4137
 
3.8%
3501
 
3.2%
3000
 
2.8%
2646
 
2.4%
2602
 
2.4%
2592
 
2.4%
Other values (552) 67353
62.0%
None
ValueCountFrequency (%)
2
40.0%
  1
20.0%
1
20.0%
1
20.0%

기업규모
Real number (ℝ)

Distinct10
Distinct (%)0.1%
Missing50
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean102019.95
Minimum102001
Maximum102040
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T01:53:05.725396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum102001
5-th percentile102020
Q1102020
median102020
Q3102020
95-th percentile102020
Maximum102040
Range39
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.0266849
Coefficient of variation (CV)1.9865574 × 10-5
Kurtosis50.230053
Mean102019.95
Median Absolute Deviation (MAD)0
Skewness-2.6326947
Sum1.0150985 × 109
Variance4.1074517
MonotonicityNot monotonic
2023-12-13T01:53:05.841745image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
102020 9554
95.5%
102015 117
 
1.2%
102030 108
 
1.1%
102010 52
 
0.5%
102001 48
 
0.5%
102026 37
 
0.4%
102022 20
 
0.2%
102035 7
 
0.1%
102040 4
 
< 0.1%
102024 3
 
< 0.1%
(Missing) 50
 
0.5%
ValueCountFrequency (%)
102001 48
 
0.5%
102010 52
 
0.5%
102015 117
 
1.2%
102020 9554
95.5%
102022 20
 
0.2%
102024 3
 
< 0.1%
102026 37
 
0.4%
102030 108
 
1.1%
102035 7
 
0.1%
102040 4
 
< 0.1%
ValueCountFrequency (%)
102040 4
 
< 0.1%
102035 7
 
0.1%
102030 108
 
1.1%
102026 37
 
0.4%
102024 3
 
< 0.1%
102022 20
 
0.2%
102020 9554
95.5%
102015 117
 
1.2%
102010 52
 
0.5%
102001 48
 
0.5%

자본금
Real number (ℝ)

MISSING 

Distinct120
Distinct (%)36.6%
Missing9672
Missing (%)96.7%
Infinite0
Infinite (%)0.0%
Mean6.0709722 × 108
Minimum0
Maximum8 × 1010
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T01:53:05.962169image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile10000000
Q150000000
median1.658165 × 108
Q34.184268 × 108
95-th percentile1.1825 × 109
Maximum8 × 1010
Range8 × 1010
Interquartile range (IQR)3.684268 × 108

Descriptive statistics

Standard deviation4.4460502 × 109
Coefficient of variation (CV)7.3234567
Kurtosis313.68595
Mean6.0709722 × 108
Median Absolute Deviation (MAD)1.188165 × 108
Skewness17.531719
Sum1.9912789 × 1011
Variance1.9767362 × 1019
MonotonicityNot monotonic
2023-12-13T01:53:06.109737image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
50000000 39
 
0.4%
100000000 34
 
0.3%
200000000 24
 
0.2%
300000000 15
 
0.1%
500000000 12
 
0.1%
150000000 10
 
0.1%
700000000 9
 
0.1%
20000000 8
 
0.1%
30000000 8
 
0.1%
10000000 8
 
0.1%
Other values (110) 161
 
1.6%
(Missing) 9672
96.7%
ValueCountFrequency (%)
0 1
 
< 0.1%
27 1
 
< 0.1%
300 1
 
< 0.1%
2610 1
 
< 0.1%
1000000 3
 
< 0.1%
3000000 2
 
< 0.1%
5000000 4
< 0.1%
10000000 8
0.1%
11111111 1
 
< 0.1%
13000000 4
< 0.1%
ValueCountFrequency (%)
80000000000 1
 
< 0.1%
5200000000 1
 
< 0.1%
4840000000 3
< 0.1%
2910835000 1
 
< 0.1%
2750000000 1
 
< 0.1%
2640560000 1
 
< 0.1%
2300000000 1
 
< 0.1%
2210287500 1
 
< 0.1%
1750810000 1
 
< 0.1%
1650000000 1
 
< 0.1%

매출액
Text

MISSING 

Distinct682
Distinct (%)25.7%
Missing7349
Missing (%)73.5%
Memory size156.2 KiB
2023-12-13T01:53:06.321443image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length11
Mean length7.7770653
Min length1

Characters and Unicode

Total characters20617
Distinct characters17
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique536 ?
Unique (%)20.2%

Sample

1st row3000000000
2nd row4328000000
3rd row450000000
4th row533000000
5th row2300000000
ValueCountFrequency (%)
0 504
 
19.0%
1000000000 142
 
5.4%
비공개 121
 
4.6%
2000000000 102
 
3.8%
500000000 60
 
2.3%
3000000000 60
 
2.3%
1500000000 58
 
2.2%
300000000 46
 
1.7%
800000000 46
 
1.7%
200000000 43
 
1.6%
Other values (672) 1469
55.4%
2023-12-13T01:53:06.720834image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 15754
76.4%
1 963
 
4.7%
2 629
 
3.1%
5 556
 
2.7%
3 476
 
2.3%
4 418
 
2.0%
6 342
 
1.7%
7 338
 
1.6%
8 331
 
1.6%
9 266
 
1.3%
Other values (7) 544
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 20073
97.4%
Other Letter 363
 
1.8%
Other Punctuation 71
 
0.3%
Uppercase Letter 55
 
0.3%
Math Symbol 55
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 15754
78.5%
1 963
 
4.8%
2 629
 
3.1%
5 556
 
2.8%
3 476
 
2.4%
4 418
 
2.1%
6 342
 
1.7%
7 338
 
1.7%
8 331
 
1.6%
9 266
 
1.3%
Other Letter
ValueCountFrequency (%)
121
33.3%
121
33.3%
121
33.3%
Other Punctuation
ValueCountFrequency (%)
. 55
77.5%
# 16
 
22.5%
Uppercase Letter
ValueCountFrequency (%)
E 55
100.0%
Math Symbol
ValueCountFrequency (%)
+ 55
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 20199
98.0%
Hangul 363
 
1.8%
Latin 55
 
0.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0 15754
78.0%
1 963
 
4.8%
2 629
 
3.1%
5 556
 
2.8%
3 476
 
2.4%
4 418
 
2.1%
6 342
 
1.7%
7 338
 
1.7%
8 331
 
1.6%
9 266
 
1.3%
Other values (3) 126
 
0.6%
Hangul
ValueCountFrequency (%)
121
33.3%
121
33.3%
121
33.3%
Latin
ValueCountFrequency (%)
E 55
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20254
98.2%
Hangul 363
 
1.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 15754
77.8%
1 963
 
4.8%
2 629
 
3.1%
5 556
 
2.7%
3 476
 
2.4%
4 418
 
2.1%
6 342
 
1.7%
7 338
 
1.7%
8 331
 
1.6%
9 266
 
1.3%
Other values (4) 181
 
0.9%
Hangul
ValueCountFrequency (%)
121
33.3%
121
33.3%
121
33.3%

수출액
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct40
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9218 
0
 
619
비공개
 
121
64461300
 
2
80000000
 
2
Other values (35)
 
38

Length

Max length11
Median length4
Mean length3.8227
Min length1

Unique

Unique32 ?
Unique (%)0.3%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9218
92.2%
0 619
 
6.2%
비공개 121
 
1.2%
64461300 2
 
< 0.1%
80000000 2
 
< 0.1%
1540000000 2
 
< 0.1%
486016512 2
 
< 0.1%
200000000 2
 
< 0.1%
599000000 1
 
< 0.1%
22561018 1
 
< 0.1%
Other values (30) 30
 
0.3%

Length

2023-12-13T01:53:06.852437image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 9218
92.2%
0 619
 
6.2%
비공개 121
 
1.2%
64461300 2
 
< 0.1%
80000000 2
 
< 0.1%
1540000000 2
 
< 0.1%
486016512 2
 
< 0.1%
200000000 2
 
< 0.1%
25000000000 1
 
< 0.1%
356132309 1
 
< 0.1%
Other values (30) 30
 
0.3%
Distinct285
Distinct (%)2.9%
Missing28
Missing (%)0.3%
Memory size156.2 KiB
2023-12-13T01:53:07.043350image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length1
Mean length1.3822704
Min length1

Characters and Unicode

Total characters13784
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique140 ?
Unique (%)1.4%

Sample

1st row5
2nd row0
3rd row14
4th row8
5th row0
ValueCountFrequency (%)
0 2412
24.2%
5 685
 
6.9%
4 636
 
6.4%
6 513
 
5.1%
3 506
 
5.1%
7 462
 
4.6%
8 437
 
4.4%
10 434
 
4.4%
2 377
 
3.8%
9 345
 
3.5%
Other values (275) 3165
31.7%
2023-12-13T01:53:07.398042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 3425
24.8%
1 2439
17.7%
2 1366
 
9.9%
5 1166
 
8.5%
3 1131
 
8.2%
4 1096
 
8.0%
6 831
 
6.0%
7 720
 
5.2%
8 701
 
5.1%
9 530
 
3.8%
Other values (4) 379
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 13405
97.3%
Other Letter 363
 
2.6%
Other Punctuation 16
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 3425
25.6%
1 2439
18.2%
2 1366
 
10.2%
5 1166
 
8.7%
3 1131
 
8.4%
4 1096
 
8.2%
6 831
 
6.2%
7 720
 
5.4%
8 701
 
5.2%
9 530
 
4.0%
Other Letter
ValueCountFrequency (%)
121
33.3%
121
33.3%
121
33.3%
Other Punctuation
ValueCountFrequency (%)
# 16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 13421
97.4%
Hangul 363
 
2.6%

Most frequent character per script

Common
ValueCountFrequency (%)
0 3425
25.5%
1 2439
18.2%
2 1366
 
10.2%
5 1166
 
8.7%
3 1131
 
8.4%
4 1096
 
8.2%
6 831
 
6.2%
7 720
 
5.4%
8 701
 
5.2%
9 530
 
3.9%
Hangul
ValueCountFrequency (%)
121
33.3%
121
33.3%
121
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13421
97.4%
Hangul 363
 
2.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 3425
25.5%
1 2439
18.2%
2 1366
 
10.2%
5 1166
 
8.7%
3 1131
 
8.4%
4 1096
 
8.2%
6 831
 
6.2%
7 720
 
5.4%
8 701
 
5.2%
9 530
 
3.9%
Hangul
ValueCountFrequency (%)
121
33.3%
121
33.3%
121
33.3%

표준산업분류
Text

MISSING 

Distinct449
Distinct (%)4.7%
Missing518
Missing (%)5.2%
Memory size156.2 KiB
2023-12-13T01:53:07.738580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length254
Median length63
Mean length13.238346
Min length3

Characters and Unicode

Total characters125526
Distinct characters305
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique194 ?
Unique (%)2.0%

Sample

1st row폐기물처리 및 오염방지시설 건설업
2nd row사업시설 및 산업용품 청소업
3rd row철근 및 철근콘크리트 공사업
4th row전자기 측정/ 시험 및 분석기구 제조업
5th row하수 처리업
ValueCountFrequency (%)
4428
 
12.7%
제조업 2926
 
8.4%
공사업 2473
 
7.1%
기타 1688
 
4.9%
건설업 1301
 
3.7%
토목시설물 798
 
2.3%
배관 672
 
1.9%
냉·난방 666
 
1.9%
액체 666
 
1.9%
철근 647
 
1.9%
Other values (701) 18476
53.2%
2023-12-13T01:53:08.213682image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
25259
20.1%
9982
 
8.0%
5642
 
4.5%
4428
 
3.5%
4282
 
3.4%
3655
 
2.9%
3639
 
2.9%
3537
 
2.8%
3488
 
2.8%
2569
 
2.0%
Other values (295) 59045
47.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 96900
77.2%
Space Separator 25259
 
20.1%
Other Punctuation 2159
 
1.7%
Decimal Number 613
 
0.5%
Math Symbol 289
 
0.2%
Close Punctuation 153
 
0.1%
Open Punctuation 153
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9982
 
10.3%
5642
 
5.8%
4428
 
4.6%
4282
 
4.4%
3655
 
3.8%
3639
 
3.8%
3537
 
3.7%
3488
 
3.6%
2569
 
2.7%
2205
 
2.3%
Other values (277) 53473
55.2%
Decimal Number
ValueCountFrequency (%)
4 141
23.0%
3 136
22.2%
1 117
19.1%
0 112
18.3%
7 40
 
6.5%
2 17
 
2.8%
5 17
 
2.8%
6 16
 
2.6%
8 10
 
1.6%
9 7
 
1.1%
Other Punctuation
ValueCountFrequency (%)
/ 1458
67.5%
· 698
32.3%
; 3
 
0.1%
Math Symbol
ValueCountFrequency (%)
~ 151
52.2%
| 138
47.8%
Space Separator
ValueCountFrequency (%)
25259
100.0%
Close Punctuation
ValueCountFrequency (%)
) 153
100.0%
Open Punctuation
ValueCountFrequency (%)
( 153
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 96900
77.2%
Common 28626
 
22.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9982
 
10.3%
5642
 
5.8%
4428
 
4.6%
4282
 
4.4%
3655
 
3.8%
3639
 
3.8%
3537
 
3.7%
3488
 
3.6%
2569
 
2.7%
2205
 
2.3%
Other values (277) 53473
55.2%
Common
ValueCountFrequency (%)
25259
88.2%
/ 1458
 
5.1%
· 698
 
2.4%
) 153
 
0.5%
( 153
 
0.5%
~ 151
 
0.5%
4 141
 
0.5%
| 138
 
0.5%
3 136
 
0.5%
1 117
 
0.4%
Other values (8) 222
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 96894
77.2%
ASCII 27928
 
22.2%
None 698
 
0.6%
Compat Jamo 6
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
25259
90.4%
/ 1458
 
5.2%
) 153
 
0.5%
( 153
 
0.5%
~ 151
 
0.5%
4 141
 
0.5%
| 138
 
0.5%
3 136
 
0.5%
1 117
 
0.4%
0 112
 
0.4%
Other values (7) 110
 
0.4%
Hangul
ValueCountFrequency (%)
9982
 
10.3%
5642
 
5.8%
4428
 
4.6%
4282
 
4.4%
3655
 
3.8%
3639
 
3.8%
3537
 
3.7%
3488
 
3.6%
2569
 
2.7%
2205
 
2.3%
Other values (276) 53467
55.2%
None
ValueCountFrequency (%)
· 698
100.0%
Compat Jamo
ValueCountFrequency (%)
6
100.0%

물산업대분류
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
건설 시공
4209 
장치
3176 
시설유지보수
697 
<NA>
519 
설계/ 컨설팅 및 진단
455 
Other values (11)
944 

Length

Max length13
Median length12
Mean length4.4474
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row건설 시공
2nd row정화 및 청소
3rd row건설 시공
4th row장치
5th row시설 운영

Common Values

ValueCountFrequency (%)
건설 시공 4209
42.1%
장치 3176
31.8%
시설유지보수 697
 
7.0%
<NA> 519
 
5.2%
설계/ 컨설팅 및 진단 455
 
4.5%
시설 운영 308
 
3.1%
약품 234
 
2.3%
정화 및 청소 182
 
1.8%
먹는샘물 및 정수기 145
 
1.5%
설계/컨설팅 및 진단 31
 
0.3%
Other values (6) 44
 
0.4%

Length

2023-12-13T01:53:08.443871image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
건설 4209
25.3%
시공 4209
25.3%
장치 3176
19.1%
823
 
4.9%
시설유지보수 697
 
4.2%
na 519
 
3.1%
진단 486
 
2.9%
설계 455
 
2.7%
컨설팅 455
 
2.7%
시설 308
 
1.9%
Other values (13) 1290
 
7.8%

물산업중분류
Categorical

HIGH CORRELATION 

Distinct41
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
상하수도 건설시공
3601 
밸브(Valve)
625 
상하수도 시설 유지보수
587 
펌프(Pump)
586 
<NA>
519 
Other values (36)
4082 

Length

Max length13
Median length12
Mean length7.9775
Min length1

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row상하수도 건설시공
2nd row저수조 청소
3rd row상하수도 건설시공
4th row계측기
5th row상하수도 시설 운영

Common Values

ValueCountFrequency (%)
상하수도 건설시공 3601
36.0%
밸브(Valve) 625
 
6.2%
상하수도 시설 유지보수 587
 
5.9%
펌프(Pump) 586
 
5.9%
<NA> 519
 
5.2%
여과장치 516
 
5.2%
감시/제어/주입시스템 391
 
3.9%
물에너지 건설시공 336
 
3.4%
수자원 건설시공 282
 
2.8%
계측기 282
 
2.8%
Other values (31) 2275
22.8%

Length

2023-12-13T01:53:08.597047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
상하수도 4440
25.7%
건설시공 4219
24.4%
시설 1068
 
6.2%
유지보수 696
 
4.0%
밸브(valve 625
 
3.6%
펌프(pump 586
 
3.4%
na 519
 
3.0%
여과장치 516
 
3.0%
물에너지 455
 
2.6%
감시/제어/주입시스템 391
 
2.3%
Other values (39) 3793
21.9%

물산업분류
Text

MISSING 

Distinct80
Distinct (%)0.8%
Missing519
Missing (%)5.2%
Memory size156.2 KiB
2023-12-13T01:53:08.933199image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length30
Median length21
Mean length7.6224027
Min length2

Characters and Unicode

Total characters72268
Distinct characters168
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row수질오염방지시설건설
2nd row저수조 청소
3rd row상하수도공사
4th row수질측정기
5th row하수도시설 운영
ValueCountFrequency (%)
상하수도공사 3378
25.3%
유지관리 697
 
5.2%
시설 633
 
4.7%
밸브(valve 625
 
4.7%
상하수도 587
 
4.4%
펌프(pump 586
 
4.4%
기타 517
 
3.9%
감시/제어/주입시스템 391
 
2.9%
물에너지설비공사 319
 
2.4%
296
 
2.2%
Other values (90) 5317
39.8%
2023-12-13T01:53:09.396445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6233
 
8.6%
4947
 
6.8%
4656
 
6.4%
4304
 
6.0%
4008
 
5.5%
3996
 
5.5%
3865
 
5.3%
2634
 
3.6%
2626
 
3.6%
1702
 
2.4%
Other values (158) 33297
46.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 58819
81.4%
Lowercase Letter 4501
 
6.2%
Space Separator 3865
 
5.3%
Uppercase Letter 1597
 
2.2%
Close Punctuation 1296
 
1.8%
Open Punctuation 1296
 
1.8%
Other Punctuation 889
 
1.2%
Dash Punctuation 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6233
 
10.6%
4947
 
8.4%
4656
 
7.9%
4304
 
7.3%
4008
 
6.8%
3996
 
6.8%
2634
 
4.5%
2626
 
4.5%
1702
 
2.9%
1333
 
2.3%
Other values (133) 22380
38.0%
Lowercase Letter
ValueCountFrequency (%)
p 673
15.0%
e 664
14.8%
a 656
14.6%
l 625
13.9%
v 625
13.9%
u 586
13.0%
m 586
13.0%
d 31
 
0.7%
i 31
 
0.7%
z 8
 
0.2%
Other values (2) 16
 
0.4%
Uppercase Letter
ValueCountFrequency (%)
V 694
43.5%
P 586
36.7%
C 122
 
7.6%
H 87
 
5.4%
T 61
 
3.8%
M 31
 
1.9%
U 8
 
0.5%
O 8
 
0.5%
Space Separator
ValueCountFrequency (%)
3865
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1296
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1296
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 889
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 58819
81.4%
Common 7351
 
10.2%
Latin 6098
 
8.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6233
 
10.6%
4947
 
8.4%
4656
 
7.9%
4304
 
7.3%
4008
 
6.8%
3996
 
6.8%
2634
 
4.5%
2626
 
4.5%
1702
 
2.9%
1333
 
2.3%
Other values (133) 22380
38.0%
Latin
ValueCountFrequency (%)
V 694
11.4%
p 673
11.0%
e 664
10.9%
a 656
10.8%
l 625
10.2%
v 625
10.2%
u 586
9.6%
P 586
9.6%
m 586
9.6%
C 122
 
2.0%
Other values (10) 281
4.6%
Common
ValueCountFrequency (%)
3865
52.6%
) 1296
 
17.6%
( 1296
 
17.6%
/ 889
 
12.1%
- 5
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 58819
81.4%
ASCII 13449
 
18.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
6233
 
10.6%
4947
 
8.4%
4656
 
7.9%
4304
 
7.3%
4008
 
6.8%
3996
 
6.8%
2634
 
4.5%
2626
 
4.5%
1702
 
2.9%
1333
 
2.3%
Other values (133) 22380
38.0%
ASCII
ValueCountFrequency (%)
3865
28.7%
) 1296
 
9.6%
( 1296
 
9.6%
/ 889
 
6.6%
V 694
 
5.2%
p 673
 
5.0%
e 664
 
4.9%
a 656
 
4.9%
l 625
 
4.6%
v 625
 
4.6%
Other values (15) 2166
16.1%

주요생산품
Text

MISSING 

Distinct6580
Distinct (%)73.7%
Missing1067
Missing (%)10.7%
Memory size156.2 KiB
2023-12-13T01:53:09.714228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length145
Median length76
Mean length22.279301
Min length1

Characters and Unicode

Total characters199021
Distinct characters706
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6167 ?
Unique (%)69.0%

Sample

1st row폐수처리시설시공/상하수도공사
2nd row저수조청소/청소용역
3rd row철근콘크리트공사/금속구조물창호공사/토목공사/토공사/상하수도공사/인테리어공사/석공사/철물공사/주택신축/판매/부동산 임대/매매
4th row수질측정기
5th row하수 처리
ValueCountFrequency (%)
제조 1426
 
8.6%
도소매 373
 
2.2%
상하수도공사 330
 
2.0%
임대 319
 
1.9%
도매 184
 
1.1%
제조/도매 168
 
1.0%
제조/도소매 161
 
1.0%
상하수도설비공사 119
 
0.7%
밸브 115
 
0.7%
철근콘크리트공사/상하수도공사 91
 
0.5%
Other values (8991) 13306
80.2%
2023-12-13T01:53:10.193624image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 20649
 
10.4%
12787
 
6.4%
12501
 
6.3%
7684
 
3.9%
7202
 
3.6%
5512
 
2.8%
5240
 
2.6%
4851
 
2.4%
4636
 
2.3%
4211
 
2.1%
Other values (696) 113748
57.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 165442
83.1%
Other Punctuation 20711
 
10.4%
Space Separator 7684
 
3.9%
Uppercase Letter 1833
 
0.9%
Open Punctuation 1216
 
0.6%
Close Punctuation 1213
 
0.6%
Lowercase Letter 391
 
0.2%
Math Symbol 340
 
0.2%
Decimal Number 123
 
0.1%
Dash Punctuation 67
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
12787
 
7.7%
12501
 
7.6%
7202
 
4.4%
5512
 
3.3%
5240
 
3.2%
4851
 
2.9%
4636
 
2.8%
4211
 
2.5%
4126
 
2.5%
3602
 
2.2%
Other values (625) 100774
60.9%
Uppercase Letter
ValueCountFrequency (%)
C 397
21.7%
T 195
10.6%
V 178
9.7%
P 146
 
8.0%
S 127
 
6.9%
E 103
 
5.6%
D 82
 
4.5%
A 73
 
4.0%
F 66
 
3.6%
R 64
 
3.5%
Other values (14) 402
21.9%
Lowercase Letter
ValueCountFrequency (%)
e 46
11.8%
t 38
9.7%
a 35
 
9.0%
i 30
 
7.7%
r 30
 
7.7%
o 29
 
7.4%
s 27
 
6.9%
l 24
 
6.1%
n 20
 
5.1%
m 18
 
4.6%
Other values (14) 94
24.0%
Decimal Number
ValueCountFrequency (%)
0 63
51.2%
2 15
 
12.2%
3 14
 
11.4%
1 13
 
10.6%
5 6
 
4.9%
4 4
 
3.3%
7 4
 
3.3%
6 2
 
1.6%
9 1
 
0.8%
8 1
 
0.8%
Other Punctuation
ValueCountFrequency (%)
/ 20649
99.7%
. 44
 
0.2%
· 7
 
< 0.1%
: 4
 
< 0.1%
' 3
 
< 0.1%
& 2
 
< 0.1%
% 2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
7684
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1216
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1213
100.0%
Math Symbol
ValueCountFrequency (%)
| 340
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 67
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 165442
83.1%
Common 31355
 
15.8%
Latin 2224
 
1.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
12787
 
7.7%
12501
 
7.6%
7202
 
4.4%
5512
 
3.3%
5240
 
3.2%
4851
 
2.9%
4636
 
2.8%
4211
 
2.5%
4126
 
2.5%
3602
 
2.2%
Other values (625) 100774
60.9%
Latin
ValueCountFrequency (%)
C 397
17.9%
T 195
 
8.8%
V 178
 
8.0%
P 146
 
6.6%
S 127
 
5.7%
E 103
 
4.6%
D 82
 
3.7%
A 73
 
3.3%
F 66
 
3.0%
R 64
 
2.9%
Other values (38) 793
35.7%
Common
ValueCountFrequency (%)
/ 20649
65.9%
7684
 
24.5%
( 1216
 
3.9%
) 1213
 
3.9%
| 340
 
1.1%
- 67
 
0.2%
0 63
 
0.2%
. 44
 
0.1%
2 15
 
< 0.1%
3 14
 
< 0.1%
Other values (13) 50
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 165442
83.1%
ASCII 33572
 
16.9%
None 7
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/ 20649
61.5%
7684
 
22.9%
( 1216
 
3.6%
) 1213
 
3.6%
C 397
 
1.2%
| 340
 
1.0%
T 195
 
0.6%
V 178
 
0.5%
P 146
 
0.4%
S 127
 
0.4%
Other values (60) 1427
 
4.3%
Hangul
ValueCountFrequency (%)
12787
 
7.7%
12501
 
7.6%
7202
 
4.4%
5512
 
3.3%
5240
 
3.2%
4851
 
2.9%
4636
 
2.8%
4211
 
2.5%
4126
 
2.5%
3602
 
2.2%
Other values (625) 100774
60.9%
None
ValueCountFrequency (%)
· 7
100.0%

Interactions

2023-12-13T01:52:58.619072image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:52:56.888489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:52:57.870631image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:52:58.905238image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:52:56.987811image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:52:58.004402image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:52:59.195621image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:52:57.103797image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:52:58.134141image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T01:53:10.309313image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
기업분류사업자등록번호(화면에는안보임)정보공개동의여부기업규모자본금수출액물산업대분류물산업중분류물산업분류
기업분류1.0000.3040.5600.1470.000NaN0.4180.3680.561
사업자등록번호(화면에는안보임)0.3041.0000.0380.4240.0000.0000.2950.3400.375
정보공개동의여부0.5600.0381.0000.0660.0721.0000.1050.1030.244
기업규모0.1470.4240.0661.0000.0000.0000.2150.3990.473
자본금0.0000.0000.0720.0001.0000.000NaNNaNNaN
수출액NaN0.0001.0000.0000.0001.0000.3570.0000.624
물산업대분류0.4180.2950.1050.215NaN0.3571.0000.9730.977
물산업중분류0.3680.3400.1030.399NaN0.0000.9731.0001.000
물산업분류0.5610.3750.2440.473NaN0.6240.9771.0001.000
2023-12-13T01:53:10.480842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
물산업중분류기업분류정보공개동의여부수출액물산업대분류
물산업중분류1.0000.2920.0820.0000.758
기업분류0.2921.0000.3781.0000.381
정보공개동의여부0.0820.3781.0000.9760.096
수출액0.0001.0000.9761.0000.135
물산업대분류0.7580.3810.0960.1351.000
2023-12-13T01:53:10.587345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업자등록번호(화면에는안보임)기업규모자본금기업분류정보공개동의여부수출액물산업대분류물산업중분류
사업자등록번호(화면에는안보임)1.0000.009-0.0310.2330.0290.0000.1140.115
기업규모0.0091.0000.1380.2610.0520.0000.1140.157
자본금-0.0310.1381.0000.2310.2530.1910.2600.157
기업분류0.2330.2610.2311.0000.3781.0000.3810.292
정보공개동의여부0.0290.0520.2530.3781.0000.9760.0960.082
수출액0.0000.0000.1911.0000.9761.0000.1350.000
물산업대분류0.1140.1140.2600.3810.0960.1351.0000.758
물산업중분류0.1150.1570.1570.2920.0820.0000.7581.000

Missing values

2023-12-13T01:52:59.776402image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T01:53:00.004151image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T01:53:00.227287image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

기업분류기업아이디기업명기업영문명사업자등록번호(화면에는안보임)사업자등록번호정보공개동의여부설립일주소기업규모자본금매출액수출액종업원수표준산업분류물산업대분류물산업중분류물산업분류주요생산품
7084D<NA>장백환경(주)<NA>31581344163158134416Y<NA>충북 청주시 상당구 무농정로 42/ 302호102020<NA><NA><NA>5폐기물처리 및 오염방지시설 건설업건설 시공상하수도 건설시공수질오염방지시설건설폐수처리시설시공/상하수도공사
17097D<NA>(주)중원디앤케이<NA>12781960381278196038Y<NA>경기 성남시 중원구 여수울로15번길 8-10102020<NA><NA><NA>0사업시설 및 산업용품 청소업정화 및 청소저수조 청소저수조 청소저수조청소/청소용역
5712D<NA>윤도건설(주)<NA>22481334642248133464Y2002-02-07강원 원주시 치악로 1316102020<NA><NA><NA>14철근 및 철근콘크리트 공사업건설 시공상하수도 건설시공상하수도공사철근콘크리트공사/금속구조물창호공사/토목공사/토공사/상하수도공사/인테리어공사/석공사/철물공사/주택신축/판매/부동산 임대/매매
12116D<NA>(주)정엔지니어링<NA>61081426886108142688Y<NA>부산 기장군102020<NA><NA><NA>8전자기 측정/ 시험 및 분석기구 제조업장치계측기수질측정기수질측정기
9032D<NA>(주)가나수질환경<NA>41781352544178135254Y<NA>전남 여수시 돌산읍 강남3길 18-20102020<NA><NA><NA>0하수 처리업시설 운영상하수도 시설 운영하수도시설 운영하수 처리
394D<NA>동은전기<NA>20622880952062288095Y2007-10-10서울 광진구 동일로58길 13102020<NA><NA><NA>0운송장비용 조명장치 제조업건설 시공물에너지 건설시공기타 물에너지설비공사<NA>
9271D<NA>(자)경희정화<NA>13881168971388116897Y<NA>경기 의왕시 내손순환로 72-2102020<NA><NA><NA>0폐수 처리업시설 운영상하수도 시설 운영하폐수재이용운영폐수 처리
16764D<NA>성수테크<NA>10902924561090292456Y1999-06-01경기 김포시 월곶면 용강로37번길 76-13102020<NA><NA><NA>0그외 기타 일반목적용 기계 제조업장치표준처리 장치 및 시스템표준처리 장치 및 시스템-
4476D<NA>명진건설(주)<NA>51481140215148114021Y<NA>대구 달성군 화원읍 명천로17길 82102020<NA>3000000000<NA>9토공사업건설 시공상하수도 건설시공상하수도공사토공사/철근콘크리트공사/상하수도공사
9189D<NA>에스알이엔씨(주)<NA>12186133351218613335Y<NA>서울 서초구 신반포로45길 74/ 501호102020<NA><NA><NA>13하수 처리업시설 운영상하수도 시설 운영하수도시설 운영하수시설관리/운영/폐수처리
기업분류기업아이디기업명기업영문명사업자등록번호(화면에는안보임)사업자등록번호정보공개동의여부설립일주소기업규모자본금매출액수출액종업원수표준산업분류물산업대분류물산업중분류물산업분류주요생산품
14357D<NA>인피니티에너지(주)<NA>10587943711058794371Y2013-11-04서울 금천구 가산디지털1로 196/ 1202호102020<NA><NA><NA>9전동기 및 발전기 제조업장치수상태양광설비수상태양광 전용판넬태양광발전설비 제조/공사/신재생에너지사업/에너지절약사업 컨설팅/자문
2279D<NA>(주)신방건설<NA>41581296784158129678Y<NA>전남 해남군 현산면 현산북평로 333-25102020<NA><NA><NA>3철근 및 철근콘크리트 공사업건설 시공상하수도 건설시공상하수도공사철근콘크리트공사/토공사/상하수도공사
4439D<NA>만보엔지니어링(주)<NA>10881192671088119267Y<NA>서울 영등포구 대방천로 138102020<NA><NA><NA>3배관 및 냉·난방 공사업건설 시공상하수도 건설시공상하수도공사상하수도공사
9048Mdoohyun4312(주)두현이엔씨doo-hyun E&C31081068033108106803Y1994-09-12충남 홍성군 홍성읍 의사로72번길 30-141020201000000000231380000000278하수/ 폐수 및 분뇨 처리업시설 운영상하수도 시설 운영하수도시설 운영하수처리공정 시설물
17129D<NA>삼진공사<NA>61015815036101581503Y<NA>울산 남구 신화로 111102020<NA>1000000000<NA>11사업시설 및 산업용품 청소업정화 및 청소저수조 청소저수조 청소물탱크/정화조 청소대행/지정폐기물 수집운반/배관설비공사/상하수도공사
8490D<NA>(주)엠케이에스<NA>42987004824298700482Y<NA>광주 광산구 하남산단3번로 138 B동 207호(장덕동)102020<NA><NA><NA>0항행용 무선기기 및 측량기구 제조업설계/ 컨설팅 및 진단조사 및 측량조사업(CCTV/ 누수탐사 등)<NA>
17766MUOS2022서울시립대학교university of seoul20482072562048207256Y1978-01-01서울특별시 동대문구 서울시립대로 163102026<NA>000교육 서비스업(85)<NA><NA><NA><NA>
16279Mshinhoent신호이앤티 주식회사SHINHO E&T Co./ Ltd.61081735186108173518Y2004-08-06부산광역시 기장군 기장읍 읍내로15번길 910202050000000000측정/ 시험/ 항해/ 제어 및 기타 정밀 기기 제조업|기술 시험/ 검사 및 분석업|기계장비 및 관련 물품 도매업장치펌프(Pump)펌프(Pump)계측기기 및 산업용측정장비|산업용 기계 및 부품|진동 및 산업용 센서
13448D<NA>디에이에스<NA>60814650136081465013Y<NA>경남 마산시 봉암동 391-166102020<NA><NA><NA>1탭/ 밸브 및 유사장치 제조업장치밸브(Valve)밸브(Valve)탭/ 샤프트 외기계공구
17303Mdkeng00(주)디케이엠엔이<NA>21881166452188116645Y1995-07-15서울특별시 성동구 아차산로 84102020<NA>000<NA><NA><NA><NA><NA>