Overview

Dataset statistics

Number of variables6
Number of observations87
Missing cells16
Missing cells (%)3.1%
Duplicate rows1
Duplicate rows (%)1.1%
Total size in memory4.3 KiB
Average record size in memory50.5 B

Variable types

Numeric1
Categorical2
Text3

Dataset

Description2022년도 기준으로 유선 및 도선 사업자 현황에 대한 데이터로 관할기관, 상호명, 대표자명, 주소 등에 대한 항목을 제공합니다.
URLhttps://www.data.go.kr/data/15061882/fileData.do

Alerts

Dataset has 1 (1.1%) duplicate rowsDuplicates
구분 is highly overall correlated with 번호 and 1 other fieldsHigh correlation
면허_신고기관 is highly overall correlated with 번호 and 1 other fieldsHigh correlation
번호 is highly overall correlated with 면허_신고기관 and 1 other fieldsHigh correlation
구분 is highly imbalanced (73.1%)Imbalance
번호 has 4 (4.6%) missing valuesMissing
상호 has 4 (4.6%) missing valuesMissing
성명(대표자) has 4 (4.6%) missing valuesMissing
주소 has 4 (4.6%) missing valuesMissing

Reproduction

Analysis started2023-12-12 22:33:17.252290
Analysis finished2023-12-12 22:33:18.484303
Duration1.23 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

번호
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct83
Distinct (%)100.0%
Missing4
Missing (%)4.6%
Infinite0
Infinite (%)0.0%
Mean42
Minimum1
Maximum83
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size915.0 B
2023-12-13T07:33:18.576256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5.1
Q121.5
median42
Q362.5
95-th percentile78.9
Maximum83
Range82
Interquartile range (IQR)41

Descriptive statistics

Standard deviation24.103942
Coefficient of variation (CV)0.57390337
Kurtosis-1.2
Mean42
Median Absolute Deviation (MAD)21
Skewness0
Sum3486
Variance581
MonotonicityStrictly increasing
2023-12-13T07:33:18.730269image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
54 1
 
1.1%
62 1
 
1.1%
61 1
 
1.1%
60 1
 
1.1%
59 1
 
1.1%
58 1
 
1.1%
57 1
 
1.1%
56 1
 
1.1%
55 1
 
1.1%
53 1
 
1.1%
Other values (73) 73
83.9%
(Missing) 4
 
4.6%
ValueCountFrequency (%)
1 1
1.1%
2 1
1.1%
3 1
1.1%
4 1
1.1%
5 1
1.1%
6 1
1.1%
7 1
1.1%
8 1
1.1%
9 1
1.1%
10 1
1.1%
ValueCountFrequency (%)
83 1
1.1%
82 1
1.1%
81 1
1.1%
80 1
1.1%
79 1
1.1%
78 1
1.1%
77 1
1.1%
76 1
1.1%
75 1
1.1%
74 1
1.1%

면허_신고기관
Categorical

HIGH CORRELATION 

Distinct35
Distinct (%)40.2%
Missing0
Missing (%)0.0%
Memory size828.0 B
강원도 춘천시
16 
서울특별시 한강사업본부
14 
경기도 가평군
 
4
충청북도 단양군
 
4
<NA>
 
4
Other values (30)
45 

Length

Max length12
Median length10
Mean length8.091954
Min length4

Unique

Unique19 ?
Unique (%)21.8%

Sample

1st row서울특별시 한강사업본부
2nd row서울특별시 한강사업본부
3rd row서울특별시 한강사업본부
4th row서울특별시 한강사업본부
5th row서울특별시 한강사업본부

Common Values

ValueCountFrequency (%)
강원도 춘천시 16
18.4%
서울특별시 한강사업본부 14
16.1%
경기도 가평군 4
 
4.6%
충청북도 단양군 4
 
4.6%
<NA> 4
 
4.6%
대구광역시 동구 4
 
4.6%
경상북도 안동시 3
 
3.4%
충청남도 부여군 3
 
3.4%
충청북도 충주시 2
 
2.3%
경기도 여주시 2
 
2.3%
Other values (25) 31
35.6%

Length

2023-12-13T07:33:18.964642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
강원도 23
13.7%
춘천시 16
 
9.5%
서울특별시 15
 
8.9%
한강사업본부 14
 
8.3%
충청북도 12
 
7.1%
경기도 11
 
6.5%
경상북도 6
 
3.6%
대구광역시 6
 
3.6%
na 4
 
2.4%
동구 4
 
2.4%
Other values (36) 57
33.9%

상호
Text

MISSING 

Distinct80
Distinct (%)96.4%
Missing4
Missing (%)4.6%
Memory size828.0 B
2023-12-13T07:33:19.250780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length12
Mean length6.5180723
Min length2

Characters and Unicode

Total characters541
Distinct characters178
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique78 ?
Unique (%)94.0%

Sample

1st row현대요트(주)서울지점
2nd row㈜세븐마린레저
3rd row㈜리우엠앤씨
4th row㈜선스톤쉬핑
5th row㈜서울마리나
ValueCountFrequency (%)
㈜남이섬 3
 
3.5%
㈜충주호관광선 2
 
2.3%
수상레저&바이크 1
 
1.2%
㈜남숭 1
 
1.2%
이디오피아보트장 1
 
1.2%
코리아크루즈㈜ 1
 
1.2%
별장1호 1
 
1.2%
월명호 1
 
1.2%
진유선 1
 
1.2%
파로호 1
 
1.2%
Other values (73) 73
84.9%
2023-12-13T07:33:19.960767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
28
 
5.2%
17
 
3.1%
14
 
2.6%
13
 
2.4%
13
 
2.4%
13
 
2.4%
) 13
 
2.4%
( 13
 
2.4%
12
 
2.2%
10
 
1.8%
Other values (168) 395
73.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 479
88.5%
Other Symbol 28
 
5.2%
Close Punctuation 13
 
2.4%
Open Punctuation 13
 
2.4%
Space Separator 4
 
0.7%
Uppercase Letter 2
 
0.4%
Other Punctuation 1
 
0.2%
Decimal Number 1
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
17
 
3.5%
14
 
2.9%
13
 
2.7%
13
 
2.7%
13
 
2.7%
12
 
2.5%
10
 
2.1%
10
 
2.1%
9
 
1.9%
9
 
1.9%
Other values (160) 359
74.9%
Uppercase Letter
ValueCountFrequency (%)
N 1
50.0%
O 1
50.0%
Other Symbol
ValueCountFrequency (%)
28
100.0%
Close Punctuation
ValueCountFrequency (%)
) 13
100.0%
Open Punctuation
ValueCountFrequency (%)
( 13
100.0%
Space Separator
ValueCountFrequency (%)
4
100.0%
Other Punctuation
ValueCountFrequency (%)
& 1
100.0%
Decimal Number
ValueCountFrequency (%)
1 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 507
93.7%
Common 32
 
5.9%
Latin 2
 
0.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
28
 
5.5%
17
 
3.4%
14
 
2.8%
13
 
2.6%
13
 
2.6%
13
 
2.6%
12
 
2.4%
10
 
2.0%
10
 
2.0%
9
 
1.8%
Other values (161) 368
72.6%
Common
ValueCountFrequency (%)
) 13
40.6%
( 13
40.6%
4
 
12.5%
& 1
 
3.1%
1 1
 
3.1%
Latin
ValueCountFrequency (%)
N 1
50.0%
O 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 479
88.5%
ASCII 34
 
6.3%
None 28
 
5.2%

Most frequent character per block

None
ValueCountFrequency (%)
28
100.0%
Hangul
ValueCountFrequency (%)
17
 
3.5%
14
 
2.9%
13
 
2.7%
13
 
2.7%
13
 
2.7%
12
 
2.5%
10
 
2.1%
10
 
2.1%
9
 
1.9%
9
 
1.9%
Other values (160) 359
74.9%
ASCII
ValueCountFrequency (%)
) 13
38.2%
( 13
38.2%
4
 
11.8%
& 1
 
2.9%
1 1
 
2.9%
N 1
 
2.9%
O 1
 
2.9%

구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Memory size828.0 B
갱신
83 
<NA>
 
4

Length

Max length4
Median length2
Mean length2.091954
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row갱신
2nd row갱신
3rd row갱신
4th row갱신
5th row갱신

Common Values

ValueCountFrequency (%)
갱신 83
95.4%
<NA> 4
 
4.6%

Length

2023-12-13T07:33:20.118181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T07:33:20.230828image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
갱신 83
95.4%
na 4
 
4.6%

성명(대표자)
Text

MISSING 

Distinct81
Distinct (%)97.6%
Missing4
Missing (%)4.6%
Memory size828.0 B
2023-12-13T07:33:20.502713image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length3
Mean length3.1927711
Min length3

Characters and Unicode

Total characters265
Distinct characters118
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique79 ?
Unique (%)95.2%

Sample

1st row이철웅
2nd row박명철
3rd row문덕범
4th row신용석
5th row임탁기
ValueCountFrequency (%)
민경혁 2
 
2.4%
김철석 2
 
2.4%
이덕기 1
 
1.2%
노창영 1
 
1.2%
평창군수 1
 
1.2%
김정욱 1
 
1.2%
여석민 1
 
1.2%
김상덕 1
 
1.2%
홍성진 1
 
1.2%
화천군수 1
 
1.2%
Other values (72) 72
85.7%
2023-12-13T07:33:20.935472image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
18
 
6.8%
11
 
4.2%
8
 
3.0%
8
 
3.0%
7
 
2.6%
7
 
2.6%
6
 
2.3%
6
 
2.3%
5
 
1.9%
5
 
1.9%
Other values (108) 184
69.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 257
97.0%
Other Symbol 3
 
1.1%
Decimal Number 2
 
0.8%
Space Separator 1
 
0.4%
Open Punctuation 1
 
0.4%
Close Punctuation 1
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
18
 
7.0%
11
 
4.3%
8
 
3.1%
8
 
3.1%
7
 
2.7%
7
 
2.7%
6
 
2.3%
6
 
2.3%
5
 
1.9%
5
 
1.9%
Other values (103) 176
68.5%
Other Symbol
ValueCountFrequency (%)
3
100.0%
Decimal Number
ValueCountFrequency (%)
0 2
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 260
98.1%
Common 5
 
1.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
18
 
6.9%
11
 
4.2%
8
 
3.1%
8
 
3.1%
7
 
2.7%
7
 
2.7%
6
 
2.3%
6
 
2.3%
5
 
1.9%
5
 
1.9%
Other values (104) 179
68.8%
Common
ValueCountFrequency (%)
0 2
40.0%
1
20.0%
( 1
20.0%
) 1
20.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 257
97.0%
ASCII 5
 
1.9%
None 3
 
1.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
18
 
7.0%
11
 
4.3%
8
 
3.1%
8
 
3.1%
7
 
2.7%
7
 
2.7%
6
 
2.3%
6
 
2.3%
5
 
1.9%
5
 
1.9%
Other values (103) 176
68.5%
None
ValueCountFrequency (%)
3
100.0%
ASCII
ValueCountFrequency (%)
0 2
40.0%
1
20.0%
( 1
20.0%
) 1
20.0%

주소
Text

MISSING 

Distinct76
Distinct (%)91.6%
Missing4
Missing (%)4.6%
Memory size828.0 B
2023-12-13T07:33:21.383268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length30
Median length23
Mean length19.975904
Min length13

Characters and Unicode

Total characters1658
Distinct characters174
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique73 ?
Unique (%)88.0%

Sample

1st row서울특별시 서초구 동작대로350 (반포한강공원)
2nd row서울특별시 용산구 이촌동 361
3rd row서울특별시 마포구 마포나루길 435
4th row서울특별시 강남구 압구정로 11길 37-53
5th row서울특별시 영등포구 여의도동 81
ValueCountFrequency (%)
강원도 23
 
6.2%
춘천시 16
 
4.3%
서울특별시 15
 
4.0%
충청북도 12
 
3.2%
경기도 11
 
3.0%
북산면 8
 
2.2%
청평리 7
 
1.9%
대구광역시 6
 
1.6%
산205-3번지 6
 
1.6%
경상북도 6
 
1.6%
Other values (213) 261
70.4%
2023-12-13T07:33:22.094834image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
289
 
17.4%
64
 
3.9%
61
 
3.7%
1 42
 
2.5%
38
 
2.3%
2 38
 
2.3%
3 34
 
2.1%
33
 
2.0%
0 32
 
1.9%
31
 
1.9%
Other values (164) 996
60.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1054
63.6%
Space Separator 289
 
17.4%
Decimal Number 279
 
16.8%
Dash Punctuation 29
 
1.7%
Open Punctuation 3
 
0.2%
Close Punctuation 3
 
0.2%
Other Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
64
 
6.1%
61
 
5.8%
38
 
3.6%
33
 
3.1%
31
 
2.9%
29
 
2.8%
29
 
2.8%
29
 
2.8%
29
 
2.8%
27
 
2.6%
Other values (149) 684
64.9%
Decimal Number
ValueCountFrequency (%)
1 42
15.1%
2 38
13.6%
3 34
12.2%
0 32
11.5%
5 29
10.4%
8 28
10.0%
4 27
9.7%
6 20
7.2%
9 17
6.1%
7 12
 
4.3%
Space Separator
ValueCountFrequency (%)
289
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 29
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1054
63.6%
Common 604
36.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
64
 
6.1%
61
 
5.8%
38
 
3.6%
33
 
3.1%
31
 
2.9%
29
 
2.8%
29
 
2.8%
29
 
2.8%
29
 
2.8%
27
 
2.6%
Other values (149) 684
64.9%
Common
ValueCountFrequency (%)
289
47.8%
1 42
 
7.0%
2 38
 
6.3%
3 34
 
5.6%
0 32
 
5.3%
- 29
 
4.8%
5 29
 
4.8%
8 28
 
4.6%
4 27
 
4.5%
6 20
 
3.3%
Other values (5) 36
 
6.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1054
63.6%
ASCII 604
36.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
289
47.8%
1 42
 
7.0%
2 38
 
6.3%
3 34
 
5.6%
0 32
 
5.3%
- 29
 
4.8%
5 29
 
4.8%
8 28
 
4.6%
4 27
 
4.5%
6 20
 
3.3%
Other values (5) 36
 
6.0%
Hangul
ValueCountFrequency (%)
64
 
6.1%
61
 
5.8%
38
 
3.6%
33
 
3.1%
31
 
2.9%
29
 
2.8%
29
 
2.8%
29
 
2.8%
29
 
2.8%
27
 
2.6%
Other values (149) 684
64.9%

Interactions

2023-12-13T07:33:18.021311image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T07:33:22.198279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
번호면허_신고기관상호성명(대표자)주소
번호1.0000.9520.9110.9710.988
면허_신고기관0.9521.0000.9940.9951.000
상호0.9110.9941.0001.0000.995
성명(대표자)0.9710.9951.0001.0000.998
주소0.9881.0000.9950.9981.000
2023-12-13T07:33:22.300970image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분면허_신고기관
구분1.0001.000
면허_신고기관1.0001.000
2023-12-13T07:33:22.384589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
번호면허_신고기관구분
번호1.0000.6051.000
면허_신고기관0.6051.0001.000
구분1.0001.0001.000

Missing values

2023-12-13T07:33:18.143609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T07:33:18.265000image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T07:33:18.379601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

번호면허_신고기관상호구분성명(대표자)주소
01서울특별시 한강사업본부현대요트(주)서울지점갱신이철웅서울특별시 서초구 동작대로350 (반포한강공원)
12서울특별시 한강사업본부㈜세븐마린레저갱신박명철서울특별시 용산구 이촌동 361
23서울특별시 한강사업본부㈜리우엠앤씨갱신문덕범서울특별시 마포구 마포나루길 435
34서울특별시 한강사업본부㈜선스톤쉬핑갱신신용석서울특별시 강남구 압구정로 11길 37-53
45서울특별시 한강사업본부㈜서울마리나갱신임탁기서울특별시 영등포구 여의도동 81
56서울특별시 한강사업본부오엔(ON)갱신소문섭서울특별시 강남구 압구정동 380-2
67서울특별시 한강사업본부㈜이크루즈갱신박동진서울특별시 영등포구 여의도동 290
78서울특별시 한강사업본부㈜서울메리모나크갱신구길용서울특별시 서초구 잠원동 121-9
89서울특별시 한강사업본부㈜에프앤에이치인베스트먼트갱신이국보서울특별시 서초구 잠원동149-2
910서울특별시 한강사업본부아리랑하우스갱신홍정희서울특별시 광진구 강변북로96
번호면허_신고기관상호구분성명(대표자)주소
7778경상북도 포항시㈜포항크루즈갱신최만달경상북도 포항시 남구 희망대로 1040 (송도동222)
7879경상북도 안동시㈜글로벌코리아갱신송진호경상북도 안동시 민속촌길 26
7980경상북도 안동시㈜안동수상레져갱신백민규경상북도 안동시 석주로 383 (글로리호)
8081경상북도 안동시안동시청갱신안동시장경상북도 안동시 퇴계로 115
8182경상북도 구미시㈜남숭갱신김경조경상북도 구미시 금오산로 336-44
8283경상북도 청도군수상레저&바이크갱신박정민경상북도 청도군 청도읍 하지길 46-40
83<NA><NA><NA><NA><NA><NA>
84<NA><NA><NA><NA><NA><NA>
85<NA><NA><NA><NA><NA><NA>
86<NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

번호면허_신고기관상호구분성명(대표자)주소# duplicates
0<NA><NA><NA><NA><NA><NA>4