Overview

Dataset statistics

Number of variables4
Number of observations448
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory14.6 KiB
Average record size in memory33.3 B

Variable types

Numeric1
Categorical2
Text1

Dataset

DescriptionIDX,코드,자치구,회사
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15491/S/1/datasetView.do

Alerts

IDX is highly overall correlated with 코드High correlation
코드 is highly overall correlated with IDX and 1 other fieldsHigh correlation
자치구 is highly overall correlated with 코드High correlation

Reproduction

Analysis started2024-05-10 22:43:46.511390
Analysis finished2024-05-10 22:43:47.757367
Duration1.25 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

IDX
Real number (ℝ)

HIGH CORRELATION 

Distinct441
Distinct (%)98.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean486.01116
Minimum0
Maximum1004
Zeros1
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size4.1 KiB
2024-05-10T22:43:48.012647image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile21.35
Q1118.75
median576.5
Q3710.25
95-th percentile821.65
Maximum1004
Range1004
Interquartile range (IQR)591.5

Descriptive statistics

Standard deviation282.60121
Coefficient of variation (CV)0.58147061
Kurtosis-1.1417554
Mean486.01116
Median Absolute Deviation (MAD)150
Skewness-0.57366509
Sum217733
Variance79863.443
MonotonicityDecreasing
2024-05-10T22:43:48.501624image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
826 3
 
0.7%
827 3
 
0.7%
828 2
 
0.4%
1 2
 
0.4%
825 2
 
0.4%
1004 1
 
0.2%
469 1
 
0.2%
468 1
 
0.2%
467 1
 
0.2%
466 1
 
0.2%
Other values (431) 431
96.2%
ValueCountFrequency (%)
0 1
0.2%
1 2
0.4%
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
ValueCountFrequency (%)
1004 1
0.2%
1003 1
0.2%
1002 1
0.2%
1001 1
0.2%
838 1
0.2%
837 1
0.2%
836 1
0.2%
835 1
0.2%
834 1
0.2%
830 1
0.2%

코드
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size3.6 KiB
t1
258 
b2
123 
b1
65 
t2
 
1
CODE
 
1

Length

Max length4
Median length2
Mean length2.0044643
Min length2

Unique

Unique2 ?
Unique (%)0.4%

Sample

1st rowb1
2nd rowb1
3rd rowb1
4th rowb1
5th rowt1

Common Values

ValueCountFrequency (%)
t1 258
57.6%
b2 123
27.5%
b1 65
 
14.5%
t2 1
 
0.2%
CODE 1
 
0.2%

Length

2024-05-10T22:43:48.856986image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-10T22:43:49.194708image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
t1 258
57.6%
b2 123
27.5%
b1 65
 
14.5%
t2 1
 
0.2%
code 1
 
0.2%

자치구
Categorical

HIGH CORRELATION 

Distinct26
Distinct (%)5.8%
Missing0
Missing (%)0.0%
Memory size3.6 KiB
강서구
48 
도봉구
41 
강동구
 
25
노원구
 
24
은평구
 
24
Other values (21)
286 

Length

Max length6
Median length3
Mean length3.1227679
Min length3

Unique

Unique2 ?
Unique (%)0.4%

Sample

1st row동대문구
2nd row은평구
3rd row강서구
4th row동작구
5th row구로구

Common Values

ValueCountFrequency (%)
강서구 48
 
10.7%
도봉구 41
 
9.2%
강동구 25
 
5.6%
노원구 24
 
5.4%
은평구 24
 
5.4%
중랑구 23
 
5.1%
서초구 22
 
4.9%
금천구 19
 
4.2%
송파구 19
 
4.2%
양천구 19
 
4.2%
Other values (16) 184
41.1%

Length

2024-05-10T22:43:49.607775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
강서구 48
 
10.7%
도봉구 41
 
9.2%
강동구 25
 
5.6%
노원구 24
 
5.4%
은평구 24
 
5.4%
중랑구 23
 
5.1%
서초구 22
 
4.9%
금천구 19
 
4.2%
송파구 19
 
4.2%
양천구 19
 
4.2%
Other values (16) 184
41.1%

회사
Text

Distinct441
Distinct (%)98.4%
Missing0
Missing (%)0.0%
Memory size3.6 KiB
2024-05-10T22:43:50.282796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length4
Mean length4.1741071
Min length2

Characters and Unicode

Total characters1870
Distinct characters208
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique434 ?
Unique (%)96.9%

Sample

1st row보광운수
2nd row신수교통
3rd row양천운수
4th row서울매일버스
5th row우종기업
ValueCountFrequency (%)
시온교통 2
 
0.4%
대진여객 2
 
0.4%
범일운수 2
 
0.4%
선진운수 2
 
0.4%
청록운수 2
 
0.4%
대영마을버스 2
 
0.4%
대종상운 2
 
0.4%
한성운수 1
 
0.2%
메트로버스 1
 
0.2%
진아교통 1
 
0.2%
Other values (431) 431
96.2%
2024-05-10T22:43:51.270676image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
222
 
11.9%
181
 
9.7%
108
 
5.8%
95
 
5.1%
56
 
3.0%
53
 
2.8%
52
 
2.8%
43
 
2.3%
40
 
2.1%
38
 
2.0%
Other values (198) 982
52.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1854
99.1%
Uppercase Letter 9
 
0.5%
Lowercase Letter 3
 
0.2%
Decimal Number 2
 
0.1%
Close Punctuation 1
 
0.1%
Open Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
222
 
12.0%
181
 
9.8%
108
 
5.8%
95
 
5.1%
56
 
3.0%
53
 
2.9%
52
 
2.8%
43
 
2.3%
40
 
2.2%
38
 
2.0%
Other values (184) 966
52.1%
Uppercase Letter
ValueCountFrequency (%)
O 2
22.2%
N 1
11.1%
P 1
11.1%
M 1
11.1%
C 1
11.1%
A 1
11.1%
K 1
11.1%
Y 1
11.1%
Lowercase Letter
ValueCountFrequency (%)
t 1
33.3%
r 1
33.3%
b 1
33.3%
Decimal Number
ValueCountFrequency (%)
3 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1854
99.1%
Latin 12
 
0.6%
Common 4
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
222
 
12.0%
181
 
9.8%
108
 
5.8%
95
 
5.1%
56
 
3.0%
53
 
2.9%
52
 
2.8%
43
 
2.3%
40
 
2.2%
38
 
2.0%
Other values (184) 966
52.1%
Latin
ValueCountFrequency (%)
O 2
16.7%
N 1
8.3%
P 1
8.3%
M 1
8.3%
C 1
8.3%
A 1
8.3%
K 1
8.3%
t 1
8.3%
r 1
8.3%
b 1
8.3%
Common
ValueCountFrequency (%)
3 2
50.0%
) 1
25.0%
( 1
25.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1854
99.1%
ASCII 16
 
0.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
222
 
12.0%
181
 
9.8%
108
 
5.8%
95
 
5.1%
56
 
3.0%
53
 
2.9%
52
 
2.8%
43
 
2.3%
40
 
2.2%
38
 
2.0%
Other values (184) 966
52.1%
ASCII
ValueCountFrequency (%)
O 2
12.5%
3 2
12.5%
N 1
 
6.2%
P 1
 
6.2%
M 1
 
6.2%
C 1
 
6.2%
A 1
 
6.2%
) 1
 
6.2%
( 1
 
6.2%
K 1
 
6.2%
Other values (4) 4
25.0%

Interactions

2024-05-10T22:43:46.962421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-10T22:43:51.522962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
IDX코드자치구
IDX1.0000.7780.768
코드0.7781.0000.841
자치구0.7680.8411.000
2024-05-10T22:43:51.762752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
자치구코드
자치구1.0000.582
코드0.5821.000
2024-05-10T22:43:51.930152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
IDX코드자치구
IDX1.0000.6250.428
코드0.6251.0000.582
자치구0.4280.5821.000

Missing values

2024-05-10T22:43:47.303491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-10T22:43:47.619885image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

IDX코드자치구회사
01004b1동대문구보광운수
11003b1은평구신수교통
21002b1강서구양천운수
31001b1동작구서울매일버스
4838t1구로구우종기업
5837t1강서구소망기업
6836t1강동구천마교통
7835t1노원구복흥기업
8834t1동대문구대덕운수
9830t1금천구강북운수
IDX코드자치구회사
4388b2강북구화계운수
4397b2강북구인수운수
4406b2강북구수유운수
4415b2강동구신명운수
4424b2강동구강동교통
4433b2강남구포이운수
4442b2강남구일원교통
4451t1택시운송조합택시운송조합
4461b2강남구개포운수
4470CODEGIYUGCOMPANY