Overview

Dataset statistics

Number of variables5
Number of observations1218
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory47.7 KiB
Average record size in memory40.1 B

Variable types

Categorical3
Text2

Dataset

Description소방시설 설계업 현황(2019)
Author소방청
URLhttps://www.data.go.kr/data/15064103/fileData.do

Alerts

등록구분 is highly imbalanced (94.3%)Imbalance

Reproduction

Analysis started2023-12-12 17:26:55.962029
Analysis finished2023-12-12 17:26:56.474386
Duration0.51 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

지역
Categorical

Distinct17
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size9.6 KiB
서울특별시
369 
경기도
231 
부산광역시
91 
경상북도
71 
경상남도
55 
Other values (12)
401 

Length

Max length7
Median length5
Mean length4.3850575
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강원도
2nd row강원도
3rd row강원도
4th row강원도
5th row강원도

Common Values

ValueCountFrequency (%)
서울특별시 369
30.3%
경기도 231
19.0%
부산광역시 91
 
7.5%
경상북도 71
 
5.8%
경상남도 55
 
4.5%
대구광역시 53
 
4.4%
전라남도 48
 
3.9%
대전광역시 44
 
3.6%
인천광역시 44
 
3.6%
충청남도 41
 
3.4%
Other values (7) 171
14.0%

Length

2023-12-13T02:26:56.595112image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서울특별시 369
30.3%
경기도 231
19.0%
부산광역시 91
 
7.5%
경상북도 71
 
5.8%
경상남도 55
 
4.5%
대구광역시 53
 
4.4%
전라남도 48
 
3.9%
인천광역시 44
 
3.6%
대전광역시 44
 
3.6%
충청남도 41
 
3.4%
Other values (7) 171
14.0%
Distinct133
Distinct (%)10.9%
Missing0
Missing (%)0.0%
Memory size9.6 KiB
2023-12-13T02:26:56.884417image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length3
Mean length2.9778325
Min length2

Characters and Unicode

Total characters3627
Distinct characters105
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique26 ?
Unique (%)2.1%

Sample

1st row강릉시
2nd row강릉시
3rd row강릉시
4th row강릉시
5th row강릉시
ValueCountFrequency (%)
송파구 68
 
5.6%
서초구 45
 
3.7%
강남구 42
 
3.4%
서구 39
 
3.2%
금천구 35
 
2.9%
안양시 35
 
2.9%
성남시 32
 
2.6%
남구 31
 
2.5%
창원시 30
 
2.5%
성동구 29
 
2.4%
Other values (123) 832
68.3%
2023-12-13T02:26:57.325872image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
694
19.1%
551
 
15.2%
135
 
3.7%
123
 
3.4%
113
 
3.1%
112
 
3.1%
110
 
3.0%
102
 
2.8%
89
 
2.5%
79
 
2.2%
Other values (95) 1519
41.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3627
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
694
19.1%
551
 
15.2%
135
 
3.7%
123
 
3.4%
113
 
3.1%
112
 
3.1%
110
 
3.0%
102
 
2.8%
89
 
2.5%
79
 
2.2%
Other values (95) 1519
41.9%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3627
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
694
19.1%
551
 
15.2%
135
 
3.7%
123
 
3.4%
113
 
3.1%
112
 
3.1%
110
 
3.0%
102
 
2.8%
89
 
2.5%
79
 
2.2%
Other values (95) 1519
41.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3627
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
694
19.1%
551
 
15.2%
135
 
3.7%
123
 
3.4%
113
 
3.1%
112
 
3.1%
110
 
3.0%
102
 
2.8%
89
 
2.5%
79
 
2.2%
Other values (95) 1519
41.9%

상호
Text

Distinct1201
Distinct (%)98.6%
Missing0
Missing (%)0.0%
Memory size9.6 KiB
2023-12-13T02:26:57.611840image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length19
Mean length9.3382594
Min length2

Characters and Unicode

Total characters11374
Distinct characters328
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1184 ?
Unique (%)97.2%

Sample

1st row다감이엔지
2nd row신호ENG
3rd row(주)대주기술
4th row(주)화신엔지니어링
5th row인선설계감리
ValueCountFrequency (%)
주식회사 254
 
16.8%
기술사사무소 6
 
0.4%
건축사사무소 4
 
0.3%
엔지니어링 3
 
0.2%
하나이엔지 3
 
0.2%
주)대한소방공사 2
 
0.1%
주)한빛엔지니어링 2
 
0.1%
삼보기술단 2
 
0.1%
유한회사 2
 
0.1%
제일엔지니어링 2
 
0.1%
Other values (1209) 1229
81.4%
2023-12-13T02:26:58.025535image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
997
 
8.8%
( 750
 
6.6%
) 750
 
6.6%
515
 
4.5%
504
 
4.4%
489
 
4.3%
470
 
4.1%
315
 
2.8%
314
 
2.8%
310
 
2.7%
Other values (318) 5960
52.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 9433
82.9%
Open Punctuation 750
 
6.6%
Close Punctuation 750
 
6.6%
Space Separator 292
 
2.6%
Uppercase Letter 130
 
1.1%
Lowercase Letter 9
 
0.1%
Decimal Number 6
 
0.1%
Other Punctuation 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
997
 
10.6%
515
 
5.5%
504
 
5.3%
489
 
5.2%
470
 
5.0%
315
 
3.3%
314
 
3.3%
310
 
3.3%
264
 
2.8%
259
 
2.7%
Other values (286) 4996
53.0%
Uppercase Letter
ValueCountFrequency (%)
E 32
24.6%
N 22
16.9%
G 20
15.4%
S 13
10.0%
C 11
 
8.5%
T 6
 
4.6%
M 4
 
3.1%
A 4
 
3.1%
F 4
 
3.1%
J 3
 
2.3%
Other values (7) 11
 
8.5%
Lowercase Letter
ValueCountFrequency (%)
n 2
22.2%
o 2
22.2%
s 1
11.1%
e 1
11.1%
k 1
11.1%
a 1
11.1%
m 1
11.1%
Decimal Number
ValueCountFrequency (%)
1 4
66.7%
9 1
 
16.7%
2 1
 
16.7%
Other Punctuation
ValueCountFrequency (%)
& 2
50.0%
. 2
50.0%
Open Punctuation
ValueCountFrequency (%)
( 750
100.0%
Close Punctuation
ValueCountFrequency (%)
) 750
100.0%
Space Separator
ValueCountFrequency (%)
292
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 9433
82.9%
Common 1802
 
15.8%
Latin 139
 
1.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
997
 
10.6%
515
 
5.5%
504
 
5.3%
489
 
5.2%
470
 
5.0%
315
 
3.3%
314
 
3.3%
310
 
3.3%
264
 
2.8%
259
 
2.7%
Other values (286) 4996
53.0%
Latin
ValueCountFrequency (%)
E 32
23.0%
N 22
15.8%
G 20
14.4%
S 13
9.4%
C 11
 
7.9%
T 6
 
4.3%
M 4
 
2.9%
A 4
 
2.9%
F 4
 
2.9%
J 3
 
2.2%
Other values (14) 20
14.4%
Common
ValueCountFrequency (%)
( 750
41.6%
) 750
41.6%
292
 
16.2%
1 4
 
0.2%
& 2
 
0.1%
. 2
 
0.1%
9 1
 
0.1%
2 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 9433
82.9%
ASCII 1941
 
17.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
997
 
10.6%
515
 
5.5%
504
 
5.3%
489
 
5.2%
470
 
5.0%
315
 
3.3%
314
 
3.3%
310
 
3.3%
264
 
2.8%
259
 
2.7%
Other values (286) 4996
53.0%
ASCII
ValueCountFrequency (%)
( 750
38.6%
) 750
38.6%
292
 
15.0%
E 32
 
1.6%
N 22
 
1.1%
G 20
 
1.0%
S 13
 
0.7%
C 11
 
0.6%
T 6
 
0.3%
1 4
 
0.2%
Other values (22) 41
 
2.1%
Distinct5
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size9.6 KiB
일반(전기),일반(기계)
726 
전문
203 
일반(전기)
162 
일반(기계)
126 
전문,일반(전기),일반(기계)
 
1

Length

Max length16
Median length13
Mean length9.5139573
Min length2

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row일반(전기),일반(기계)
2nd row일반(전기),일반(기계)
3rd row일반(전기),일반(기계)
4th row일반(전기),일반(기계)
5th row일반(전기),일반(기계)

Common Values

ValueCountFrequency (%)
일반(전기),일반(기계) 726
59.6%
전문 203
 
16.7%
일반(전기) 162
 
13.3%
일반(기계) 126
 
10.3%
전문,일반(전기),일반(기계) 1
 
0.1%

Length

2023-12-13T02:26:58.214550image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:26:58.349121image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반(전기),일반(기계 726
59.6%
전문 203
 
16.7%
일반(전기 162
 
13.3%
일반(기계 126
 
10.3%
전문,일반(전기),일반(기계 1
 
0.1%

등록구분
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size9.6 KiB
기존
1210 
휴업
 
8

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row기존
2nd row기존
3rd row기존
4th row기존
5th row기존

Common Values

ValueCountFrequency (%)
기존 1210
99.3%
휴업 8
 
0.7%

Length

2023-12-13T02:26:58.507839image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:26:58.631693image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
기존 1210
99.3%
휴업 8
 
0.7%

Correlations

2023-12-13T02:26:58.706513image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역분야(설계업)등록구분
지역1.0000.3260.073
분야(설계업)0.3261.0000.050
등록구분0.0730.0501.000
2023-12-13T02:26:58.836440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록구분지역분야(설계업)
등록구분1.0000.0650.061
지역0.0651.0000.173
분야(설계업)0.0610.1731.000
2023-12-13T02:26:58.945233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역분야(설계업)등록구분
지역1.0000.1730.065
분야(설계업)0.1731.0000.061
등록구분0.0650.0611.000

Missing values

2023-12-13T02:26:56.310016image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T02:26:56.426536image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

지역조회지역상호분야(설계업)등록구분
0강원도강릉시다감이엔지일반(전기),일반(기계)기존
1강원도강릉시신호ENG일반(전기),일반(기계)기존
2강원도강릉시(주)대주기술일반(전기),일반(기계)기존
3강원도강릉시(주)화신엔지니어링일반(전기),일반(기계)기존
4강원도강릉시인선설계감리일반(전기),일반(기계)기존
5강원도강릉시다보이엔지 주식회사일반(전기),일반(기계)기존
6강원도강릉시대현설계감리사무소일반(전기),일반(기계)기존
7강원도동해시주식회사 동해소방설비일반(전기),일반(기계)기존
8강원도삼척시(주)한일엔지니어링일반(전기),일반(기계)기존
9강원도속초시주식회사 탑설계감리사무소일반(전기),일반(기계)기존
지역조회지역상호분야(설계업)등록구분
1208충청북도청주시주식회사 탑테크엔지니어링전문기존
1209충청북도청주시주식회사 원일반(전기),일반(기계)기존
1210충청북도청주시씨케이엔지니어링일반(전기),일반(기계)기존
1211충청북도청주시DS엔지니어링일반(전기),일반(기계)기존
1212충청북도청주시(주)건사엔지니어링전문기존
1213충청북도충주시주식회사 중앙방재전문기존
1214충청북도충주시(주)예원엔지니어링건축사사무소일반(기계)기존
1215충청북도충주시예인전기기술사사무소일반(전기),일반(기계)기존
1216충청북도충주시주식회사 에이스소방일반(전기),일반(기계)기존
1217충청북도충주시주식회사 좋은이엔지일반(전기)기존