Overview

Dataset statistics

Number of variables9
Number of observations47
Missing cells102
Missing cells (%)24.1%
Duplicate rows4
Duplicate rows (%)8.5%
Total size in memory3.4 KiB
Average record size in memory74.8 B

Variable types

Text2
Unsupported7

Dataset

Description전라북도건설업현황201711월
Author전라북도
URLhttps://www.bigdatahub.go.kr/opendata/dataSet/detail.nm?contentId=37&rlik=49451aebf056b486&serviceId=202990

Alerts

Dataset has 4 (8.5%) duplicate rowsDuplicates
[전북][일반건설업] 등록현황 has 7 (14.9%) missing valuesMissing
Unnamed: 1 has 43 (91.5%) missing valuesMissing
Unnamed: 2 has 8 (17.0%) missing valuesMissing
Unnamed: 3 has 4 (8.5%) missing valuesMissing
Unnamed: 4 has 8 (17.0%) missing valuesMissing
Unnamed: 5 has 8 (17.0%) missing valuesMissing
Unnamed: 6 has 8 (17.0%) missing valuesMissing
Unnamed: 7 has 8 (17.0%) missing valuesMissing
Unnamed: 8 has 8 (17.0%) missing valuesMissing
Unnamed: 2 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 3 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 6 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-03-14 01:21:28.527941
Analysis finished2024-03-14 01:21:28.977456
Duration0.45 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct37
Distinct (%)92.5%
Missing7
Missing (%)14.9%
Memory size508.0 B
2024-03-14T10:21:29.119351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length10
Mean length7.25
Min length1

Characters and Unicode

Total characters290
Distinct characters84
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique34 ?
Unique (%)85.0%

Sample

1st row구 분
2nd row
3rd row
4th row토목건축공사업
5th row토목공사업
ValueCountFrequency (%)
가스시설시공업 3
 
6.1%
난방시공업 3
 
6.1%
2
 
4.1%
제2종 2
 
4.1%
제3종 2
 
4.1%
제1종 2
 
4.1%
2
 
4.1%
2
 
4.1%
2
 
4.1%
금속구조물ㆍ창호공사업 1
 
2.0%
Other values (28) 28
57.1%
2024-03-14T10:21:29.466333image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
34
 
11.7%
32
 
11.0%
26
 
9.0%
14
 
4.8%
11
 
3.8%
9
 
3.1%
8
 
2.8%
6
 
2.1%
6
 
2.1%
6
 
2.1%
Other values (74) 138
47.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 267
92.1%
Space Separator 9
 
3.1%
Decimal Number 6
 
2.1%
Other Punctuation 4
 
1.4%
Open Punctuation 2
 
0.7%
Close Punctuation 2
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
34
 
12.7%
32
 
12.0%
26
 
9.7%
14
 
5.2%
11
 
4.1%
8
 
3.0%
6
 
2.2%
6
 
2.2%
6
 
2.2%
5
 
1.9%
Other values (67) 119
44.6%
Decimal Number
ValueCountFrequency (%)
2 2
33.3%
1 2
33.3%
3 2
33.3%
Space Separator
ValueCountFrequency (%)
9
100.0%
Other Punctuation
ValueCountFrequency (%)
· 4
100.0%
Open Punctuation
ValueCountFrequency (%)
[ 2
100.0%
Close Punctuation
ValueCountFrequency (%)
] 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 267
92.1%
Common 23
 
7.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
34
 
12.7%
32
 
12.0%
26
 
9.7%
14
 
5.2%
11
 
4.1%
8
 
3.0%
6
 
2.2%
6
 
2.2%
6
 
2.2%
5
 
1.9%
Other values (67) 119
44.6%
Common
ValueCountFrequency (%)
9
39.1%
· 4
17.4%
2 2
 
8.7%
1 2
 
8.7%
3 2
 
8.7%
[ 2
 
8.7%
] 2
 
8.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 262
90.3%
ASCII 19
 
6.6%
Compat Jamo 5
 
1.7%
None 4
 
1.4%

Most frequent character per block

Hangul
ValueCountFrequency (%)
34
 
13.0%
32
 
12.2%
26
 
9.9%
14
 
5.3%
11
 
4.2%
8
 
3.1%
6
 
2.3%
6
 
2.3%
6
 
2.3%
5
 
1.9%
Other values (66) 114
43.5%
ASCII
ValueCountFrequency (%)
9
47.4%
2 2
 
10.5%
1 2
 
10.5%
3 2
 
10.5%
[ 2
 
10.5%
] 2
 
10.5%
Compat Jamo
ValueCountFrequency (%)
5
100.0%
None
ValueCountFrequency (%)
· 4
100.0%

Unnamed: 1
Text

MISSING 

Distinct2
Distinct (%)50.0%
Missing43
Missing (%)91.5%
Memory size508.0 B
2024-03-14T10:21:29.575911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length4
Mean length4
Min length3

Characters and Unicode

Total characters16
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row등록업종수
2nd row업체수
3rd row등록업종수
4th row업체수
ValueCountFrequency (%)
등록업종수 2
50.0%
업체수 2
50.0%
2024-03-14T10:21:29.785996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4
25.0%
4
25.0%
2
12.5%
2
12.5%
2
12.5%
2
12.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 16
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4
25.0%
4
25.0%
2
12.5%
2
12.5%
2
12.5%
2
12.5%

Most occurring scripts

ValueCountFrequency (%)
Hangul 16
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4
25.0%
4
25.0%
2
12.5%
2
12.5%
2
12.5%
2
12.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 16
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4
25.0%
4
25.0%
2
12.5%
2
12.5%
2
12.5%
2
12.5%

Unnamed: 2
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing8
Missing (%)17.0%
Memory size508.0 B

Unnamed: 3
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing4
Missing (%)8.5%
Memory size508.0 B

Unnamed: 4
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing8
Missing (%)17.0%
Memory size508.0 B

Unnamed: 5
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing8
Missing (%)17.0%
Memory size508.0 B

Unnamed: 6
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing8
Missing (%)17.0%
Memory size508.0 B

Unnamed: 7
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing8
Missing (%)17.0%
Memory size508.0 B

Unnamed: 8
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing8
Missing (%)17.0%
Memory size508.0 B

Correlations

2024-03-14T10:21:29.851601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
[전북][일반건설업] 등록현황Unnamed: 1
[전북][일반건설업] 등록현황1.0000.000
Unnamed: 10.0001.000

Missing values

2024-03-14T10:21:28.651663image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-14T10:21:28.756618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-14T10:21:28.867555image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

[전북][일반건설업] 등록현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8
0구 분<NA>(2017.10)까지누계조건 : 2017년 11월 ~ 2017년 11월NaNNaNNaNNaN(2017.11)까지누계
1<NA><NA>NaN변 동 사 항NaNNaNNaNNaNNaN
2<NA><NA>NaN증감재등록전입전출등록말소NaN
3등록업종수86158324866
4업체수67756322682
5<NA><NA>NaNNaNNaNNaNNaNNaNNaN
6토목건축공사업<NA>23612001237
7토목공사업<NA>22710201228
8건축공사업<NA>26946121273
9조경공사업<NA>106-10001105
[전북][일반건설업] 등록현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8
37가스시설시공업 제1종<NA>470000047
38가스시설시공업 제2종<NA>284-21003282
39가스시설시공업 제3종<NA>13100000131
40난방시공업 제1종<NA>60-2000258
41난방시공업 제2종<NA>301-21104299
42난방시공업 제3종<NA>2000002
43시설물유지관리업<NA>37021430372
44미장ㆍ방수ㆍ조적공사업<NA>84-1011183
45금속구조물ㆍ창호공사업<NA>381-21122379
46지붕판금ㆍ건축물조립공사업<NA>220000022

Duplicate rows

Most frequently occurring

[전북][일반건설업] 등록현황Unnamed: 1# duplicates
3<NA><NA>7
0업체수2
1구 분<NA>2
2등록업종수2