Overview

Dataset statistics

Number of variables6
Number of observations67
Missing cells8
Missing cells (%)2.0%
Duplicate rows1
Duplicate rows (%)1.5%
Total size in memory3.4 KiB
Average record size in memory52.0 B

Variable types

Numeric2
Text2
Categorical2

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15120/S/1/datasetView.do

Alerts

Dataset has 1 (1.5%) duplicate rowsDuplicates
영업상태명 is highly overall correlated with 번호 and 2 other fieldsHigh correlation
업체유형명 is highly overall correlated with 번호 and 2 other fieldsHigh correlation
번호 is highly overall correlated with 영업상태명 and 1 other fieldsHigh correlation
설립일 is highly overall correlated with 영업상태명 and 1 other fieldsHigh correlation
영업상태명 is highly imbalanced (80.6%)Imbalance
업체유형명 is highly imbalanced (80.6%)Imbalance
번호 has 2 (3.0%) missing valuesMissing
사업장명 has 2 (3.0%) missing valuesMissing
소재지 has 2 (3.0%) missing valuesMissing
설립일 has 2 (3.0%) missing valuesMissing

Reproduction

Analysis started2023-12-11 03:47:32.991308
Analysis finished2023-12-11 03:47:34.004701
Duration1.01 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

번호
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct65
Distinct (%)100.0%
Missing2
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean33
Minimum1
Maximum65
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size735.0 B
2023-12-11T12:47:34.071178image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4.2
Q117
median33
Q349
95-th percentile61.8
Maximum65
Range64
Interquartile range (IQR)32

Descriptive statistics

Standard deviation18.90767
Coefficient of variation (CV)0.57295971
Kurtosis-1.2
Mean33
Median Absolute Deviation (MAD)16
Skewness0
Sum2145
Variance357.5
MonotonicityStrictly increasing
2023-12-11T12:47:34.206293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
50 1
 
1.5%
36 1
 
1.5%
37 1
 
1.5%
38 1
 
1.5%
39 1
 
1.5%
40 1
 
1.5%
41 1
 
1.5%
42 1
 
1.5%
43 1
 
1.5%
44 1
 
1.5%
Other values (55) 55
82.1%
(Missing) 2
 
3.0%
ValueCountFrequency (%)
1 1
1.5%
2 1
1.5%
3 1
1.5%
4 1
1.5%
5 1
1.5%
6 1
1.5%
7 1
1.5%
8 1
1.5%
9 1
1.5%
10 1
1.5%
ValueCountFrequency (%)
65 1
1.5%
64 1
1.5%
63 1
1.5%
62 1
1.5%
61 1
1.5%
60 1
1.5%
59 1
1.5%
58 1
1.5%
57 1
1.5%
56 1
1.5%

사업장명
Text

MISSING 

Distinct65
Distinct (%)100.0%
Missing2
Missing (%)3.0%
Memory size668.0 B
2023-12-11T12:47:34.433856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length4
Mean length4.1692308
Min length3

Characters and Unicode

Total characters271
Distinct characters76
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique65 ?
Unique (%)100.0%

Sample

1st row경성여객
2nd row공항버스
3rd row관악교통
4th row군포교통
5th row김포교통
ValueCountFrequency (%)
동아운수 1
 
1.5%
선진운수 1
 
1.5%
성원여객 1
 
1.5%
세풍운수 1
 
1.5%
송파상운 1
 
1.5%
신길교통 1
 
1.5%
신수교통 1
 
1.5%
신인운수 1
 
1.5%
신촌교통 1
 
1.5%
신흥운수 1
 
1.5%
Other values (55) 55
84.6%
2023-12-11T12:47:34.775959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
26
 
9.6%
25
 
9.2%
19
 
7.0%
19
 
7.0%
10
 
3.7%
10
 
3.7%
10
 
3.7%
8
 
3.0%
7
 
2.6%
7
 
2.6%
Other values (66) 130
48.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 268
98.9%
Lowercase Letter 3
 
1.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
26
 
9.7%
25
 
9.3%
19
 
7.1%
19
 
7.1%
10
 
3.7%
10
 
3.7%
10
 
3.7%
8
 
3.0%
7
 
2.6%
7
 
2.6%
Other values (63) 127
47.4%
Lowercase Letter
ValueCountFrequency (%)
b 1
33.3%
r 1
33.3%
t 1
33.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 268
98.9%
Latin 3
 
1.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
26
 
9.7%
25
 
9.3%
19
 
7.1%
19
 
7.1%
10
 
3.7%
10
 
3.7%
10
 
3.7%
8
 
3.0%
7
 
2.6%
7
 
2.6%
Other values (63) 127
47.4%
Latin
ValueCountFrequency (%)
b 1
33.3%
r 1
33.3%
t 1
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 268
98.9%
ASCII 3
 
1.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
26
 
9.7%
25
 
9.3%
19
 
7.1%
19
 
7.1%
10
 
3.7%
10
 
3.7%
10
 
3.7%
8
 
3.0%
7
 
2.6%
7
 
2.6%
Other values (63) 127
47.4%
ASCII
ValueCountFrequency (%)
b 1
33.3%
r 1
33.3%
t 1
33.3%

소재지
Text

MISSING 

Distinct64
Distinct (%)98.5%
Missing2
Missing (%)3.0%
Memory size668.0 B
2023-12-11T12:47:35.055827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length42
Median length30
Mean length22.353846
Min length14

Characters and Unicode

Total characters1453
Distinct characters129
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique63 ?
Unique (%)96.9%

Sample

1st row서울특별시 중랑구 용마산로 380
2nd row서울특별시강서구개화동로8길 17 강서공영차고지 3층
3rd row서울특별시 양천구 신정동 1312번지
4th row서울특별시 구로구 구로동 145-17 3층
5th row서울특별시 강서구 방화3동 820번지
ValueCountFrequency (%)
서울특별시 35
 
11.9%
서울 18
 
6.1%
송파구 7
 
2.4%
서울시 7
 
2.4%
은평구 6
 
2.0%
강북구 5
 
1.7%
양천구 5
 
1.7%
성북구 5
 
1.7%
노원구 4
 
1.4%
수색동 3
 
1.0%
Other values (156) 199
67.7%
2023-12-11T12:47:35.733640image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
233
 
16.0%
76
 
5.2%
72
 
5.0%
1 65
 
4.5%
64
 
4.4%
63
 
4.3%
2 50
 
3.4%
47
 
3.2%
37
 
2.5%
37
 
2.5%
Other values (119) 709
48.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 839
57.7%
Decimal Number 303
 
20.9%
Space Separator 233
 
16.0%
Dash Punctuation 32
 
2.2%
Open Punctuation 21
 
1.4%
Close Punctuation 21
 
1.4%
Math Symbol 3
 
0.2%
Other Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
76
 
9.1%
72
 
8.6%
64
 
7.6%
63
 
7.5%
47
 
5.6%
37
 
4.4%
37
 
4.4%
36
 
4.3%
15
 
1.8%
14
 
1.7%
Other values (103) 378
45.1%
Decimal Number
ValueCountFrequency (%)
1 65
21.5%
2 50
16.5%
3 29
9.6%
4 28
9.2%
0 26
 
8.6%
8 24
 
7.9%
7 24
 
7.9%
5 22
 
7.3%
6 18
 
5.9%
9 17
 
5.6%
Space Separator
ValueCountFrequency (%)
233
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 32
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21
100.0%
Close Punctuation
ValueCountFrequency (%)
) 21
100.0%
Math Symbol
ValueCountFrequency (%)
3
100.0%
Other Punctuation
ValueCountFrequency (%)
. 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 839
57.7%
Common 614
42.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
76
 
9.1%
72
 
8.6%
64
 
7.6%
63
 
7.5%
47
 
5.6%
37
 
4.4%
37
 
4.4%
36
 
4.3%
15
 
1.8%
14
 
1.7%
Other values (103) 378
45.1%
Common
ValueCountFrequency (%)
233
37.9%
1 65
 
10.6%
2 50
 
8.1%
- 32
 
5.2%
3 29
 
4.7%
4 28
 
4.6%
0 26
 
4.2%
8 24
 
3.9%
7 24
 
3.9%
5 22
 
3.6%
Other values (6) 81
 
13.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 839
57.7%
ASCII 611
42.1%
Math Operators 3
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
233
38.1%
1 65
 
10.6%
2 50
 
8.2%
- 32
 
5.2%
3 29
 
4.7%
4 28
 
4.6%
0 26
 
4.3%
8 24
 
3.9%
7 24
 
3.9%
5 22
 
3.6%
Other values (5) 78
 
12.8%
Hangul
ValueCountFrequency (%)
76
 
9.1%
72
 
8.6%
64
 
7.6%
63
 
7.5%
47
 
5.6%
37
 
4.4%
37
 
4.4%
36
 
4.3%
15
 
1.8%
14
 
1.7%
Other values (103) 378
45.1%
Math Operators
ValueCountFrequency (%)
3
100.0%

영업상태명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size668.0 B
운영중
65 
<NA>
 
2

Length

Max length4
Median length3
Mean length3.0298507
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row운영중
2nd row운영중
3rd row운영중
4th row운영중
5th row운영중

Common Values

ValueCountFrequency (%)
운영중 65
97.0%
<NA> 2
 
3.0%

Length

2023-12-11T12:47:35.941079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T12:47:36.042789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
운영중 65
97.0%
na 2
 
3.0%

설립일
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct61
Distinct (%)93.8%
Missing2
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean19764519
Minimum19491223
Maximum20211209
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size735.0 B
2023-12-11T12:47:36.166563image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum19491223
5-th percentile19611210
Q119660529
median19701101
Q319741008
95-th percentile20183086
Maximum20211209
Range719986
Interquartile range (IQR)80479

Descriptive statistics

Standard deviation174004.12
Coefficient of variation (CV)0.008803863
Kurtosis0.92938221
Mean19764519
Median Absolute Deviation (MAD)40572
Skewness1.3890914
Sum1.2846937 × 109
Variance3.0277433 × 1010
MonotonicityNot monotonic
2023-12-11T12:47:36.375332image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19701001 2
 
3.0%
20201001 2
 
3.0%
20040701 2
 
3.0%
19740201 2
 
3.0%
19701223 1
 
1.5%
19940517 1
 
1.5%
19700914 1
 
1.5%
20211209 1
 
1.5%
19660523 1
 
1.5%
19610210 1
 
1.5%
Other values (51) 51
76.1%
(Missing) 2
 
3.0%
ValueCountFrequency (%)
19491223 1
1.5%
19540820 1
1.5%
19610210 1
1.5%
19611206 1
1.5%
19611226 1
1.5%
19620123 1
1.5%
19620201 1
1.5%
19620219 1
1.5%
19620221 1
1.5%
19641202 1
1.5%
ValueCountFrequency (%)
20211209 1
1.5%
20201001 2
3.0%
20191202 1
1.5%
20150623 1
1.5%
20040701 2
3.0%
20040623 1
1.5%
20040618 1
1.5%
20040614 1
1.5%
20040609 1
1.5%
19940517 1
1.5%

업체유형명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size668.0 B
시내버스
65 
<NA>
 
2

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row시내버스
2nd row시내버스
3rd row시내버스
4th row시내버스
5th row시내버스

Common Values

ValueCountFrequency (%)
시내버스 65
97.0%
<NA> 2
 
3.0%

Length

2023-12-11T12:47:36.535180image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T12:47:36.632821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
시내버스 65
97.0%
na 2
 
3.0%

Interactions

2023-12-11T12:47:33.471351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T12:47:33.269770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T12:47:33.572161image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T12:47:33.358460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T12:47:36.719089image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
번호사업장명소재지설립일
번호1.0001.0001.0000.231
사업장명1.0001.0001.0001.000
소재지1.0001.0001.0000.000
설립일0.2311.0000.0001.000
2023-12-11T12:47:36.833816image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
영업상태명업체유형명
영업상태명1.0001.000
업체유형명1.0001.000
2023-12-11T12:47:36.932271image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
번호설립일영업상태명업체유형명
번호1.000-0.0221.0001.000
설립일-0.0221.0001.0001.000
영업상태명1.0001.0001.0001.000
업체유형명1.0001.0001.0001.000

Missing values

2023-12-11T12:47:33.714765image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T12:47:33.820339image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T12:47:33.931698image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

번호사업장명소재지영업상태명설립일업체유형명
01경성여객서울특별시 중랑구 용마산로 380운영중19701001시내버스
12공항버스서울특별시강서구개화동로8길 17 강서공영차고지 3층운영중19620123시내버스
23관악교통서울특별시 양천구 신정동 1312번지운영중19700615시내버스
34군포교통서울특별시 구로구 구로동 145-17 3층운영중19680520시내버스
45김포교통서울특별시 강서구 방화3동 820번지운영중19701223시내버스
56남성버스서울특별시 송파구 헌릉로 870 2층 남성교통(주)(성북구 보국문로 188)운영중20201001시내버스
67다모아자동차서울특별시 마포구 상암동 1667 다모아자동차(주)운영중20040614시내버스
78대성운수서울시 송파구 헌릉로 869 (장지동)운영중19660707시내버스
89대원교통서울광진구자양동 769-7 케이앤에스빌딩운영중19740201시내버스
910대원여객서울시 성동구 왕십리로 125운영중19720125시내버스
번호사업장명소재지영업상태명설립일업체유형명
5758태진운수서울특별시 성동구 성수2가동 649-1운영중19710924시내버스
5859한국brt자동차서울특별시 송파구 헌릉로 870(장지동)운영중20040609시내버스
5960한남여객서울특별시 관악구 대학동 194∼241 대학동 241-42운영중19620219시내버스
6061한서교통서울특별시 송파구 장지동 579번지운영중19701001시내버스
6162한성여객서울특별시 노원구 상계동 110-8운영중19620201시내버스
6263한성운수서울 강북구 번2동 375운영중19680727시내버스
6364현대교통서울특별시 서대문구 모래내로 289운영중19651126시내버스
6465흥안운수서울특별시 노원구 상계동 110-8운영중19880101시내버스
65<NA><NA><NA><NA><NA><NA>
66<NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

번호사업장명소재지영업상태명설립일업체유형명# duplicates
0<NA><NA><NA><NA><NA><NA>2