Overview

Dataset statistics

Number of variables7
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.7 KiB
Average record size in memory58.3 B

Variable types

Numeric1
Categorical3
Text3

Dataset

DescriptionSample
Author경기대학교 빅데이터센터
URLhttps://www.bigdata-telecom.kr/invoke/SOKBP2603/?goodsCode=KGUENTRPRSINFO000001

Alerts

업종대분류코드 is highly overall correlated with 업종대분류명High correlation
업종대분류명 is highly overall correlated with 업종대분류코드High correlation
기업ID has unique valuesUnique

Reproduction

Analysis started2023-12-10 06:46:12.320696
Analysis finished2023-12-10 06:46:13.786481
Duration1.47 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

기업ID
Real number (ℝ)

UNIQUE 

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean50.5
Minimum1
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T15:46:13.904350image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5.95
Q125.75
median50.5
Q375.25
95-th percentile95.05
Maximum100
Range99
Interquartile range (IQR)49.5

Descriptive statistics

Standard deviation29.011492
Coefficient of variation (CV)0.57448499
Kurtosis-1.2
Mean50.5
Median Absolute Deviation (MAD)25
Skewness0
Sum5050
Variance841.66667
MonotonicityStrictly increasing
2023-12-10T15:46:14.086935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
1.0%
65 1
 
1.0%
75 1
 
1.0%
74 1
 
1.0%
73 1
 
1.0%
72 1
 
1.0%
71 1
 
1.0%
70 1
 
1.0%
69 1
 
1.0%
68 1
 
1.0%
Other values (90) 90
90.0%
ValueCountFrequency (%)
1 1
1.0%
2 1
1.0%
3 1
1.0%
4 1
1.0%
5 1
1.0%
6 1
1.0%
7 1
1.0%
8 1
1.0%
9 1
1.0%
10 1
1.0%
ValueCountFrequency (%)
100 1
1.0%
99 1
1.0%
98 1
1.0%
97 1
1.0%
96 1
1.0%
95 1
1.0%
94 1
1.0%
93 1
1.0%
92 1
1.0%
91 1
1.0%

업종대분류명
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
제조업
61 
전문. 과학 및 기술 서비스업
14 
도매 및 소매업
건설업
정보통신업
 
2
Other values (5)
 
6

Length

Max length23
Median length3
Mean length5.7
Min length3

Unique

Unique4 ?
Unique (%)4.0%

Sample

1st row도매 및 소매업
2nd row도매 및 소매업
3rd row도매 및 소매업
4th row제조업
5th row전문. 과학 및 기술 서비스업

Common Values

ValueCountFrequency (%)
제조업 61
61.0%
전문. 과학 및 기술 서비스업 14
 
14.0%
도매 및 소매업 9
 
9.0%
건설업 8
 
8.0%
정보통신업 2
 
2.0%
소매업 2
 
2.0%
교육 서비스업 1
 
1.0%
제조업/도매 및 소매업 1
 
1.0%
도매 및 소매업 1
 
1.0%
수도. 하수 및 폐기물 처리. 원료 재생업 1
 
1.0%

Length

2023-12-10T15:46:14.265390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:46:14.453600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
제조업 61
33.0%
26
14.1%
서비스업 15
 
8.1%
전문 14
 
7.6%
과학 14
 
7.6%
기술 14
 
7.6%
소매업 13
 
7.0%
도매 10
 
5.4%
건설업 8
 
4.3%
정보통신업 2
 
1.1%
Other values (8) 8
 
4.3%

업종대분류코드
Categorical

HIGH CORRELATION 

Distinct8
Distinct (%)8.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
C
61 
M
14 
G
12 
F
J
 
2
Other values (3)
 
3

Length

Max length3
Median length1
Mean length1.02
Min length1

Unique

Unique3 ?
Unique (%)3.0%

Sample

1st rowG
2nd rowG
3rd rowG
4th rowC
5th rowM

Common Values

ValueCountFrequency (%)
C 61
61.0%
M 14
 
14.0%
G 12
 
12.0%
F 8
 
8.0%
J 2
 
2.0%
P 1
 
1.0%
C/G 1
 
1.0%
E 1
 
1.0%

Length

2023-12-10T15:46:14.617973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:46:14.809819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
c 61
61.0%
m 14
 
14.0%
g 12
 
12.0%
f 8
 
8.0%
j 2
 
2.0%
p 1
 
1.0%
c/g 1
 
1.0%
e 1
 
1.0%
Distinct79
Distinct (%)79.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-10T15:46:15.112664image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length27
Median length18
Mean length6.2
Min length2

Characters and Unicode

Total characters620
Distinct characters165
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique73 ?
Unique (%)73.0%

Sample

1st row단미사료/식품판매/축산물/조미료/사업경영및관리자문
2nd row디자인개발
3rd row\N
4th row조명기구
5th row조경
ValueCountFrequency (%)
it 9
 
8.3%
자동차부품 6
 
5.5%
n 5
 
4.6%
금형 3
 
2.8%
산업디자인 2
 
1.8%
수질 2
 
1.8%
자문 2
 
1.8%
단미사료/식품판매/축산물/조미료/사업경영및관리자문 1
 
0.9%
환경설비 1
 
0.9%
건설업재료/계측기 1
 
0.9%
Other values (77) 77
70.6%
2023-12-10T15:46:15.599820image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 36
 
5.8%
33
 
5.3%
23
 
3.7%
21
 
3.4%
16
 
2.6%
13
 
2.1%
13
 
2.1%
12
 
1.9%
11
 
1.8%
11
 
1.8%
Other values (155) 431
69.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 547
88.2%
Other Punctuation 41
 
6.6%
Uppercase Letter 23
 
3.7%
Space Separator 9
 
1.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
33
 
6.0%
23
 
4.2%
21
 
3.8%
16
 
2.9%
13
 
2.4%
13
 
2.4%
12
 
2.2%
11
 
2.0%
11
 
2.0%
10
 
1.8%
Other values (149) 384
70.2%
Uppercase Letter
ValueCountFrequency (%)
T 9
39.1%
I 9
39.1%
N 5
21.7%
Other Punctuation
ValueCountFrequency (%)
/ 36
87.8%
\ 5
 
12.2%
Space Separator
ValueCountFrequency (%)
9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 547
88.2%
Common 50
 
8.1%
Latin 23
 
3.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
33
 
6.0%
23
 
4.2%
21
 
3.8%
16
 
2.9%
13
 
2.4%
13
 
2.4%
12
 
2.2%
11
 
2.0%
11
 
2.0%
10
 
1.8%
Other values (149) 384
70.2%
Common
ValueCountFrequency (%)
/ 36
72.0%
9
 
18.0%
\ 5
 
10.0%
Latin
ValueCountFrequency (%)
T 9
39.1%
I 9
39.1%
N 5
21.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 547
88.2%
ASCII 73
 
11.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/ 36
49.3%
T 9
 
12.3%
9
 
12.3%
I 9
 
12.3%
\ 5
 
6.8%
N 5
 
6.8%
Hangul
ValueCountFrequency (%)
33
 
6.0%
23
 
4.2%
21
 
3.8%
16
 
2.9%
13
 
2.4%
13
 
2.4%
12
 
2.2%
11
 
2.0%
11
 
2.0%
10
 
1.8%
Other values (149) 384
70.2%

시군구명
Categorical

Distinct42
Distinct (%)42.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
경기도 화성시
14 
경기도 수원시
경기도 시흥시
경기도 부천시
 
5
경기도 성남시
 
5
Other values (37)
60 

Length

Max length10
Median length7
Mean length7.46
Min length5

Unique

Unique27 ?
Unique (%)27.0%

Sample

1st row서울특별시 송파구
2nd row경기도 화성시
3rd row경기도 성남시
4th row경기도 화성시
5th row경기도 성남시

Common Values

ValueCountFrequency (%)
경기도 화성시 14
 
14.0%
경기도 수원시 9
 
9.0%
경기도 시흥시 7
 
7.0%
경기도 부천시 5
 
5.0%
경기도 성남시 5
 
5.0%
경기도 안산시 5
 
5.0%
경기도 용인시 4
 
4.0%
경기도 평택시 4
 
4.0%
서울특별시 강남구 4
 
4.0%
경기도 안양시 3
 
3.0%
Other values (32) 40
40.0%

Length

2023-12-10T15:46:15.772754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기도 72
36.2%
서울특별시 19
 
9.5%
화성시 14
 
7.0%
수원시 9
 
4.5%
시흥시 7
 
3.5%
부천시 5
 
2.5%
성남시 5
 
2.5%
안산시 5
 
2.5%
용인시 4
 
2.0%
강남구 4
 
2.0%
Other values (37) 55
27.6%
Distinct99
Distinct (%)99.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-10T15:46:16.385182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length9.92
Min length2

Characters and Unicode

Total characters992
Distinct characters13
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique98 ?
Unique (%)98.0%

Sample

1st row1998-03-02
2nd row2007-10-25
3rd row2008-09-01
4th row2005-07-04
5th row2008-07-16
ValueCountFrequency (%)
2006-06-01 2
 
2.0%
1990-01-03 1
 
1.0%
2011-06-01 1
 
1.0%
2009-10-19 1
 
1.0%
2004-04-01 1
 
1.0%
1993-09-01 1
 
1.0%
2009-02-18 1
 
1.0%
2006-04-05 1
 
1.0%
1999-07-01 1
 
1.0%
1993-09-10 1
 
1.0%
Other values (89) 89
89.0%
2023-12-10T15:46:16.919817image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 287
28.9%
- 198
20.0%
1 149
15.0%
2 122
12.3%
9 82
 
8.3%
8 34
 
3.4%
5 31
 
3.1%
6 24
 
2.4%
7 24
 
2.4%
3 22
 
2.2%
Other values (3) 19
 
1.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 792
79.8%
Dash Punctuation 198
 
20.0%
Other Punctuation 1
 
0.1%
Uppercase Letter 1
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 287
36.2%
1 149
18.8%
2 122
15.4%
9 82
 
10.4%
8 34
 
4.3%
5 31
 
3.9%
6 24
 
3.0%
7 24
 
3.0%
3 22
 
2.8%
4 17
 
2.1%
Dash Punctuation
ValueCountFrequency (%)
- 198
100.0%
Other Punctuation
ValueCountFrequency (%)
\ 1
100.0%
Uppercase Letter
ValueCountFrequency (%)
N 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 991
99.9%
Latin 1
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 287
29.0%
- 198
20.0%
1 149
15.0%
2 122
12.3%
9 82
 
8.3%
8 34
 
3.4%
5 31
 
3.1%
6 24
 
2.4%
7 24
 
2.4%
3 22
 
2.2%
Other values (2) 18
 
1.8%
Latin
ValueCountFrequency (%)
N 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 992
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 287
28.9%
- 198
20.0%
1 149
15.0%
2 122
12.3%
9 82
 
8.3%
8 34
 
3.4%
5 31
 
3.1%
6 24
 
2.4%
7 24
 
2.4%
3 22
 
2.2%
Other values (3) 19
 
1.9%
Distinct51
Distinct (%)51.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-10T15:46:17.200700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length5.5
Mean length3.76
Min length2

Characters and Unicode

Total characters376
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique32 ?
Unique (%)32.0%

Sample

1st row25000
2nd row150
3rd row2200
4th row4500
5th row2500
ValueCountFrequency (%)
n 12
 
12.0%
300 5
 
5.0%
2000 5
 
5.0%
1000 5
 
5.0%
5000 4
 
4.0%
1500 4
 
4.0%
3000 4
 
4.0%
15000 3
 
3.0%
1200 3
 
3.0%
2500 3
 
3.0%
Other values (41) 52
52.0%
2023-12-10T15:46:17.657277image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 211
56.1%
1 35
 
9.3%
5 28
 
7.4%
2 24
 
6.4%
3 15
 
4.0%
8 14
 
3.7%
\ 12
 
3.2%
N 12
 
3.2%
6 9
 
2.4%
4 9
 
2.4%
Other values (2) 7
 
1.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 352
93.6%
Other Punctuation 12
 
3.2%
Uppercase Letter 12
 
3.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 211
59.9%
1 35
 
9.9%
5 28
 
8.0%
2 24
 
6.8%
3 15
 
4.3%
8 14
 
4.0%
6 9
 
2.6%
4 9
 
2.6%
7 6
 
1.7%
9 1
 
0.3%
Other Punctuation
ValueCountFrequency (%)
\ 12
100.0%
Uppercase Letter
ValueCountFrequency (%)
N 12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 364
96.8%
Latin 12
 
3.2%

Most frequent character per script

Common
ValueCountFrequency (%)
0 211
58.0%
1 35
 
9.6%
5 28
 
7.7%
2 24
 
6.6%
3 15
 
4.1%
8 14
 
3.8%
\ 12
 
3.3%
6 9
 
2.5%
4 9
 
2.5%
7 6
 
1.6%
Latin
ValueCountFrequency (%)
N 12
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 376
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 211
56.1%
1 35
 
9.3%
5 28
 
7.4%
2 24
 
6.4%
3 15
 
4.0%
8 14
 
3.7%
\ 12
 
3.2%
N 12
 
3.2%
6 9
 
2.4%
4 9
 
2.4%
Other values (2) 7
 
1.9%

Interactions

2023-12-10T15:46:13.395746image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T15:46:17.788194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
기업ID업종대분류명업종대분류코드세부업종명시군구명개업년월일매출금액
기업ID1.0000.3620.2830.7280.5101.0000.000
업종대분류명0.3621.0001.0000.8930.6940.9880.000
업종대분류코드0.2831.0001.0000.9790.8380.9770.000
세부업종명0.7280.8930.9791.0000.9830.9930.967
시군구명0.5100.6940.8380.9831.0000.9830.367
개업년월일1.0000.9880.9770.9930.9831.0000.998
매출금액0.0000.0000.0000.9670.3670.9981.000
2023-12-10T15:46:17.923642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업종대분류코드업종대분류명시군구명
업종대분류코드1.0000.9890.403
업종대분류명0.9891.0000.248
시군구명0.4030.2481.000
2023-12-10T15:46:18.040275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
기업ID업종대분류명업종대분류코드시군구명
기업ID1.0000.1140.1340.147
업종대분류명0.1141.0000.9890.248
업종대분류코드0.1340.9891.0000.403
시군구명0.1470.2480.4031.000

Missing values

2023-12-10T15:46:13.566814image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T15:46:13.721710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

기업ID업종대분류명업종대분류코드세부업종명시군구명개업년월일매출금액
01도매 및 소매업G단미사료/식품판매/축산물/조미료/사업경영및관리자문서울특별시 송파구1998-03-0225000
12도매 및 소매업G디자인개발경기도 화성시2007-10-25150
23도매 및 소매업G\N경기도 성남시2008-09-012200
34제조업C조명기구경기도 화성시2005-07-044500
45전문. 과학 및 기술 서비스업M조경경기도 성남시2008-07-162500
56제조업C정수기경기도 이천시2009-03-032000
67제조업C도자기충청북도 진천군2009-02-061700
78제조업C정밀기기경기도 고양시2004-02-125000
89건설업F전시모형제작전라북도 전주시2002-01-035000
910건설업F전시물제작서울특별시 영등포구2001-10-29\N
기업ID업종대분류명업종대분류코드세부업종명시군구명개업년월일매출금액
9091제조업C금속열처리도금/도장및기타경기도 수원시1987-07-151200
9192제조업C\N경기도 광주시2002-02-062000
9293제조업C산업용기계/전기제어 장치경기도 수원시1998-04-206500
9394도매 및 소매업G농수축산물/가공식품/장례용품경기도 성남시1991-08-012000
9495제조업C와이퍼/브레드암경기도 안산시1987-04-1166000
9596제조업C금형경기도 부천시1979-04-2015000
9697제조업C의료용기기경기도 수원시2009-12-14100
9798제조업C광섬유종합재료충청남도 아산시2002-05-0118000
9899제조업C비철금속합금주조충청남도 당진군2007-09-01\N
99100제조업C낚시용품경기도 부천시1991-11-2011300