Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells39972
Missing cells (%)66.6%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory546.9 KiB
Average record size in memory56.0 B

Variable types

Categorical2
DateTime1
Text3

Dataset

Description경기도 의왕시에 신고된 화물자동차 운송 사업자 현황입니다. 업체명, 업종, 도로명주소, 지번주소를 제공하고 있습니다.
Author경기도 의왕시
URLhttps://www.data.go.kr/data/15113394/fileData.do

Alerts

Dataset has 1 (< 0.1%) duplicate rowsDuplicates
시군명 is highly overall correlated with 사업의종류High correlation
사업의종류 is highly overall correlated with 시군명High correlation
시군명 is highly imbalanced (99.2%)Imbalance
사업의종류 is highly imbalanced (99.2%)Imbalance
허가연월일 has 9993 (99.9%) missing valuesMissing
상호 has 9993 (99.9%) missing valuesMissing
주사무소도로명주소 has 9993 (99.9%) missing valuesMissing
전화번호 has 9993 (99.9%) missing valuesMissing

Reproduction

Analysis started2023-12-12 15:15:10.664023
Analysis finished2023-12-12 15:15:11.331507
Duration0.67 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시군명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9993 
의왕시
 
7

Length

Max length4
Median length4
Mean length3.9993
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9993
99.9%
의왕시 7
 
0.1%

Length

2023-12-13T00:15:11.398417image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:15:11.491416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9993
99.9%
의왕시 7
 
0.1%

허가연월일
Date

MISSING 

Distinct7
Distinct (%)100.0%
Missing9993
Missing (%)99.9%
Memory size156.2 KiB
Minimum2003-01-09 00:00:00
Maximum2010-08-24 00:00:00
2023-12-13T00:15:11.567105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:15:11.653096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)

상호
Text

MISSING 

Distinct7
Distinct (%)100.0%
Missing9993
Missing (%)99.9%
Memory size156.2 KiB
2023-12-13T00:15:11.831982image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length7
Mean length7.1428571
Min length5

Characters and Unicode

Total characters50
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)100.0%

Sample

1st row(주)트랜스틸
2nd row(주)세양
3rd row(주)씨티엘물류
4th row(주)한원물류
5th row케이에스씨로지스
ValueCountFrequency (%)
주)트랜스틸 1
14.3%
주)세양 1
14.3%
주)씨티엘물류 1
14.3%
주)한원물류 1
14.3%
케이에스씨로지스 1
14.3%
삼화물류(주 1
14.3%
주)장평로지스 1
14.3%
2023-12-13T00:15:12.119432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 6
 
12.0%
) 6
 
12.0%
6
 
12.0%
4
 
8.0%
3
 
6.0%
3
 
6.0%
2
 
4.0%
2
 
4.0%
2
 
4.0%
1
 
2.0%
Other values (15) 15
30.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 38
76.0%
Open Punctuation 6
 
12.0%
Close Punctuation 6
 
12.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6
15.8%
4
 
10.5%
3
 
7.9%
3
 
7.9%
2
 
5.3%
2
 
5.3%
2
 
5.3%
1
 
2.6%
1
 
2.6%
1
 
2.6%
Other values (13) 13
34.2%
Open Punctuation
ValueCountFrequency (%)
( 6
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 38
76.0%
Common 12
 
24.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6
15.8%
4
 
10.5%
3
 
7.9%
3
 
7.9%
2
 
5.3%
2
 
5.3%
2
 
5.3%
1
 
2.6%
1
 
2.6%
1
 
2.6%
Other values (13) 13
34.2%
Common
ValueCountFrequency (%)
( 6
50.0%
) 6
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 38
76.0%
ASCII 12
 
24.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 6
50.0%
) 6
50.0%
Hangul
ValueCountFrequency (%)
6
15.8%
4
 
10.5%
3
 
7.9%
3
 
7.9%
2
 
5.3%
2
 
5.3%
2
 
5.3%
1
 
2.6%
1
 
2.6%
1
 
2.6%
Other values (13) 13
34.2%

사업의종류
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9993 
일반화물
 
7

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9993
99.9%
일반화물 7
 
0.1%

Length

2023-12-13T00:15:12.266614image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:15:12.371527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9993
99.9%
일반화물 7
 
0.1%
Distinct7
Distinct (%)100.0%
Missing9993
Missing (%)99.9%
Memory size156.2 KiB
2023-12-13T00:15:12.516785image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length44
Median length37
Mean length36.142857
Min length28

Characters and Unicode

Total characters253
Distinct characters64
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)100.0%

Sample

1st row경기도 의왕시 왕송못동로 79, 천일정기화물자동차 (삼동)
2nd row경기도 의왕시 오봉산단3로 25, 더리브비즈원 1동 1323호 (삼동)
3rd row경기도 의왕시 창말로 39 (이동, 의왕제1터미널)
4th row경기도 의왕시 이미로 40, 인덕원IT밸리 에이동 1020호 (포일동)
5th row경기도 의왕시 창말로 39, 4군 의왕제1터미널 2층 (이동)
ValueCountFrequency (%)
경기도 7
 
13.0%
의왕시 7
 
13.0%
이동 4
 
7.4%
오봉로 2
 
3.7%
2층 2
 
3.7%
의왕제1터미널 2
 
3.7%
39 2
 
3.7%
창말로 2
 
3.7%
175 2
 
3.7%
삼동 2
 
3.7%
Other values (22) 22
40.7%
2023-12-13T00:15:12.827537image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
47
 
18.6%
12
 
4.7%
11
 
4.3%
11
 
4.3%
1 9
 
3.6%
8
 
3.2%
2 8
 
3.2%
, 7
 
2.8%
) 7
 
2.8%
( 7
 
2.8%
Other values (54) 126
49.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 143
56.5%
Space Separator 47
 
18.6%
Decimal Number 40
 
15.8%
Other Punctuation 7
 
2.8%
Close Punctuation 7
 
2.8%
Open Punctuation 7
 
2.8%
Uppercase Letter 2
 
0.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
12
 
8.4%
11
 
7.7%
11
 
7.7%
8
 
5.6%
7
 
4.9%
7
 
4.9%
7
 
4.9%
7
 
4.9%
7
 
4.9%
4
 
2.8%
Other values (39) 62
43.4%
Decimal Number
ValueCountFrequency (%)
1 9
22.5%
2 8
20.0%
0 5
12.5%
3 5
12.5%
7 4
10.0%
9 3
 
7.5%
5 3
 
7.5%
4 2
 
5.0%
6 1
 
2.5%
Uppercase Letter
ValueCountFrequency (%)
T 1
50.0%
I 1
50.0%
Space Separator
ValueCountFrequency (%)
47
100.0%
Other Punctuation
ValueCountFrequency (%)
, 7
100.0%
Close Punctuation
ValueCountFrequency (%)
) 7
100.0%
Open Punctuation
ValueCountFrequency (%)
( 7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 143
56.5%
Common 108
42.7%
Latin 2
 
0.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
12
 
8.4%
11
 
7.7%
11
 
7.7%
8
 
5.6%
7
 
4.9%
7
 
4.9%
7
 
4.9%
7
 
4.9%
7
 
4.9%
4
 
2.8%
Other values (39) 62
43.4%
Common
ValueCountFrequency (%)
47
43.5%
1 9
 
8.3%
2 8
 
7.4%
, 7
 
6.5%
) 7
 
6.5%
( 7
 
6.5%
0 5
 
4.6%
3 5
 
4.6%
7 4
 
3.7%
9 3
 
2.8%
Other values (3) 6
 
5.6%
Latin
ValueCountFrequency (%)
T 1
50.0%
I 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 143
56.5%
ASCII 110
43.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
47
42.7%
1 9
 
8.2%
2 8
 
7.3%
, 7
 
6.4%
) 7
 
6.4%
( 7
 
6.4%
0 5
 
4.5%
3 5
 
4.5%
7 4
 
3.6%
9 3
 
2.7%
Other values (5) 8
 
7.3%
Hangul
ValueCountFrequency (%)
12
 
8.4%
11
 
7.7%
11
 
7.7%
8
 
5.6%
7
 
4.9%
7
 
4.9%
7
 
4.9%
7
 
4.9%
7
 
4.9%
4
 
2.8%
Other values (39) 62
43.4%

전화번호
Text

MISSING 

Distinct7
Distinct (%)100.0%
Missing9993
Missing (%)99.9%
Memory size156.2 KiB
2023-12-13T00:15:13.004218image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length12
Mean length10.428571
Min length1

Characters and Unicode

Total characters73
Distinct characters11
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)100.0%

Sample

1st row031-462-2131
2nd row031-462-4247
3rd row031-461-0280
4th row031-426-4891
5th row
ValueCountFrequency (%)
031-462-2131 1
16.7%
031-462-4247 1
16.7%
031-461-0280 1
16.7%
031-426-4891 1
16.7%
031-461-6691 1
16.7%
031-462-6060 1
16.7%
2023-12-13T00:15:13.346530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 12
16.4%
- 12
16.4%
0 10
13.7%
6 10
13.7%
4 9
12.3%
3 7
9.6%
2 7
9.6%
8 2
 
2.7%
9 2
 
2.7%
7 1
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 60
82.2%
Dash Punctuation 12
 
16.4%
Space Separator 1
 
1.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 12
20.0%
0 10
16.7%
6 10
16.7%
4 9
15.0%
3 7
11.7%
2 7
11.7%
8 2
 
3.3%
9 2
 
3.3%
7 1
 
1.7%
Dash Punctuation
ValueCountFrequency (%)
- 12
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 73
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 12
16.4%
- 12
16.4%
0 10
13.7%
6 10
13.7%
4 9
12.3%
3 7
9.6%
2 7
9.6%
8 2
 
2.7%
9 2
 
2.7%
7 1
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 73
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 12
16.4%
- 12
16.4%
0 10
13.7%
6 10
13.7%
4 9
12.3%
3 7
9.6%
2 7
9.6%
8 2
 
2.7%
9 2
 
2.7%
7 1
 
1.4%

Correlations

2023-12-13T00:15:13.452666image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
허가연월일상호주사무소도로명주소전화번호
허가연월일1.0001.0001.0001.000
상호1.0001.0001.0001.000
주사무소도로명주소1.0001.0001.0001.000
전화번호1.0001.0001.0001.000
2023-12-13T00:15:13.588623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시군명사업의종류
시군명1.0001.000
사업의종류1.0001.000
2023-12-13T00:15:13.688345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시군명사업의종류
시군명1.0001.000
사업의종류1.0001.000

Missing values

2023-12-13T00:15:10.995746image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T00:15:11.125024image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T00:15:11.246098image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

시군명허가연월일상호사업의종류주사무소도로명주소전화번호
19952<NA><NA><NA><NA><NA><NA>
90982<NA><NA><NA><NA><NA><NA>
63934<NA><NA><NA><NA><NA><NA>
66329<NA><NA><NA><NA><NA><NA>
69214<NA><NA><NA><NA><NA><NA>
83511<NA><NA><NA><NA><NA><NA>
72150<NA><NA><NA><NA><NA><NA>
23154<NA><NA><NA><NA><NA><NA>
3054<NA><NA><NA><NA><NA><NA>
72356<NA><NA><NA><NA><NA><NA>
시군명허가연월일상호사업의종류주사무소도로명주소전화번호
5629<NA><NA><NA><NA><NA><NA>
7996<NA><NA><NA><NA><NA><NA>
16807<NA><NA><NA><NA><NA><NA>
32614<NA><NA><NA><NA><NA><NA>
55035<NA><NA><NA><NA><NA><NA>
38425<NA><NA><NA><NA><NA><NA>
20359<NA><NA><NA><NA><NA><NA>
37343<NA><NA><NA><NA><NA><NA>
21732<NA><NA><NA><NA><NA><NA>
99457<NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

시군명허가연월일상호사업의종류주사무소도로명주소전화번호# duplicates
0<NA><NA><NA><NA><NA><NA>9993