Overview

Dataset statistics

Number of variables9
Number of observations48
Missing cells103
Missing cells (%)23.8%
Duplicate rows4
Duplicate rows (%)8.3%
Total size in memory3.5 KiB
Average record size in memory74.8 B

Variable types

Text2
Unsupported7

Alerts

Dataset has 4 (8.3%) duplicate rowsDuplicates
[전북][일반건설업] 등록현황 has 7 (14.6%) missing valuesMissing
Unnamed: 1 has 44 (91.7%) missing valuesMissing
Unnamed: 2 has 8 (16.7%) missing valuesMissing
Unnamed: 3 has 4 (8.3%) missing valuesMissing
Unnamed: 4 has 8 (16.7%) missing valuesMissing
Unnamed: 5 has 8 (16.7%) missing valuesMissing
Unnamed: 6 has 8 (16.7%) missing valuesMissing
Unnamed: 7 has 8 (16.7%) missing valuesMissing
Unnamed: 8 has 8 (16.7%) missing valuesMissing
Unnamed: 2 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 3 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 6 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-03-14 03:26:54.892115
Analysis finished2024-03-14 03:26:55.346514
Duration0.45 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct38
Distinct (%)92.7%
Missing7
Missing (%)14.6%
Memory size516.0 B
2024-03-14T12:26:55.455591image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length11
Mean length7.2682927
Min length1

Characters and Unicode

Total characters298
Distinct characters85
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique35 ?
Unique (%)85.4%

Sample

1st row구 분
2nd row
3rd row
4th row토목건축공사업
5th row토목공사업
ValueCountFrequency (%)
난방시공업 3
 
6.0%
가스시설시공업 3
 
6.0%
2
 
4.0%
2
 
4.0%
2
 
4.0%
2
 
4.0%
제2종 2
 
4.0%
제3종 2
 
4.0%
제1종 2
 
4.0%
강구조물공사업 1
 
2.0%
Other values (29) 29
58.0%
2024-03-14T12:26:55.734110image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
35
 
11.7%
33
 
11.1%
27
 
9.1%
14
 
4.7%
11
 
3.7%
9
 
3.0%
7
 
2.3%
6
 
2.0%
6
 
2.0%
6
 
2.0%
Other values (75) 144
48.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 274
91.9%
Space Separator 9
 
3.0%
Decimal Number 6
 
2.0%
Other Punctuation 5
 
1.7%
Open Punctuation 2
 
0.7%
Close Punctuation 2
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
35
 
12.8%
33
 
12.0%
27
 
9.9%
14
 
5.1%
11
 
4.0%
7
 
2.6%
6
 
2.2%
6
 
2.2%
6
 
2.2%
5
 
1.8%
Other values (68) 124
45.3%
Decimal Number
ValueCountFrequency (%)
3 2
33.3%
1 2
33.3%
2 2
33.3%
Space Separator
ValueCountFrequency (%)
9
100.0%
Other Punctuation
ValueCountFrequency (%)
· 5
100.0%
Open Punctuation
ValueCountFrequency (%)
[ 2
100.0%
Close Punctuation
ValueCountFrequency (%)
] 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 274
91.9%
Common 24
 
8.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
35
 
12.8%
33
 
12.0%
27
 
9.9%
14
 
5.1%
11
 
4.0%
7
 
2.6%
6
 
2.2%
6
 
2.2%
6
 
2.2%
5
 
1.8%
Other values (68) 124
45.3%
Common
ValueCountFrequency (%)
9
37.5%
· 5
20.8%
3 2
 
8.3%
1 2
 
8.3%
2 2
 
8.3%
[ 2
 
8.3%
] 2
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 269
90.3%
ASCII 19
 
6.4%
Compat Jamo 5
 
1.7%
None 5
 
1.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
35
 
13.0%
33
 
12.3%
27
 
10.0%
14
 
5.2%
11
 
4.1%
7
 
2.6%
6
 
2.2%
6
 
2.2%
6
 
2.2%
5
 
1.9%
Other values (67) 119
44.2%
ASCII
ValueCountFrequency (%)
9
47.4%
3 2
 
10.5%
1 2
 
10.5%
2 2
 
10.5%
[ 2
 
10.5%
] 2
 
10.5%
Compat Jamo
ValueCountFrequency (%)
5
100.0%
None
ValueCountFrequency (%)
· 5
100.0%

Unnamed: 1
Text

MISSING 

Distinct2
Distinct (%)50.0%
Missing44
Missing (%)91.7%
Memory size516.0 B
2024-03-14T12:26:55.841939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length4
Mean length4
Min length3

Characters and Unicode

Total characters16
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row등록업종수
2nd row업체수
3rd row등록업종수
4th row업체수
ValueCountFrequency (%)
등록업종수 2
50.0%
업체수 2
50.0%
2024-03-14T12:26:56.068611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4
25.0%
4
25.0%
2
12.5%
2
12.5%
2
12.5%
2
12.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 16
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4
25.0%
4
25.0%
2
12.5%
2
12.5%
2
12.5%
2
12.5%

Most occurring scripts

ValueCountFrequency (%)
Hangul 16
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4
25.0%
4
25.0%
2
12.5%
2
12.5%
2
12.5%
2
12.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 16
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4
25.0%
4
25.0%
2
12.5%
2
12.5%
2
12.5%
2
12.5%

Unnamed: 2
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing8
Missing (%)16.7%
Memory size516.0 B

Unnamed: 3
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing4
Missing (%)8.3%
Memory size516.0 B

Unnamed: 4
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing8
Missing (%)16.7%
Memory size516.0 B

Unnamed: 5
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing8
Missing (%)16.7%
Memory size516.0 B

Unnamed: 6
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing8
Missing (%)16.7%
Memory size516.0 B

Unnamed: 7
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing8
Missing (%)16.7%
Memory size516.0 B

Unnamed: 8
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing8
Missing (%)16.7%
Memory size516.0 B

Correlations

2024-03-14T12:26:56.137096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
[전북][일반건설업] 등록현황Unnamed: 1
[전북][일반건설업] 등록현황1.0000.000
Unnamed: 10.0001.000

Missing values

2024-03-14T12:26:55.015264image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-14T12:26:55.137197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-14T12:26:55.256862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

[전북][일반건설업] 등록현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8
0구 분<NA>(2018.12)까지누계조건 : 2019년 01월 ~ 2019년 12월NaNNaNNaNNaN(2019.12)까지누계
1<NA><NA>NaN변 동 사 항NaNNaNNaNNaNNaN
2<NA><NA>NaN증감재등록전입전출등록말소NaN
3등록업종수8732851252127901
4업체수6952235232016717
5<NA><NA>NaNNaNNaNNaNNaNNaNNaN
6토목건축공사업<NA>249-54366244
7토목공사업<NA>227712463234
8건축공사업<NA>271243014911295
9조경공사업<NA>10525306107
[전북][일반건설업] 등록현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8
38가스시설시공업 제1종<NA>45-1223244
39가스시설시공업 제2종<NA>2911421209305
40가스시설시공업 제3종<NA>12868314134
41난방시공업 제1종<NA>605431165
42난방시공업 제2종<NA>2922142014294
43난방시공업 제3종<NA>3000003
44시설물유지관리업<NA>399242930323423
45습식ㆍ방수공사업<NA>911517442106
46금속구조물ㆍ창호ㆍ온실공사업<NA>4093044222016439
47지붕판금ㆍ건축물조립공사업<NA>222401124

Duplicate rows

Most frequently occurring

[전북][일반건설업] 등록현황Unnamed: 1# duplicates
3<NA><NA>7
0업체수2
1구 분<NA>2
2등록업종수2