Overview

Dataset statistics

Number of variables5
Number of observations157
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.6 KiB
Average record size in memory42.8 B

Variable types

Numeric2
Categorical1
Text2

Dataset

Description서울특별시 강서구 고시원 현황 - 연번 : 고시원 개수 파악용연번 - 법정동 : 고시원이 위치한 동 - 상호 : 고시원명 - 주소 : 고시원 위치 - 우편번호 : 고시원별 우편번호
Author서울특별시 강서구
URLhttps://www.data.go.kr/data/15077684/fileData.do

Alerts

연번 is highly overall correlated with 우편번호 and 1 other fieldsHigh correlation
우편번호 is highly overall correlated with 연번 and 1 other fieldsHigh correlation
법정동 is highly overall correlated with 연번 and 1 other fieldsHigh correlation
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 09:45:56.927087
Analysis finished2023-12-12 09:45:57.707494
Duration0.78 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct157
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79
Minimum1
Maximum157
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 KiB
2023-12-12T18:45:57.796516image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8.8
Q140
median79
Q3118
95-th percentile149.2
Maximum157
Range156
Interquartile range (IQR)78

Descriptive statistics

Standard deviation45.466105
Coefficient of variation (CV)0.57552031
Kurtosis-1.2
Mean79
Median Absolute Deviation (MAD)39
Skewness0
Sum12403
Variance2067.1667
MonotonicityStrictly increasing
2023-12-12T18:45:57.985330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.6%
109 1
 
0.6%
102 1
 
0.6%
103 1
 
0.6%
104 1
 
0.6%
105 1
 
0.6%
106 1
 
0.6%
107 1
 
0.6%
108 1
 
0.6%
110 1
 
0.6%
Other values (147) 147
93.6%
ValueCountFrequency (%)
1 1
0.6%
2 1
0.6%
3 1
0.6%
4 1
0.6%
5 1
0.6%
6 1
0.6%
7 1
0.6%
8 1
0.6%
9 1
0.6%
10 1
0.6%
ValueCountFrequency (%)
157 1
0.6%
156 1
0.6%
155 1
0.6%
154 1
0.6%
153 1
0.6%
152 1
0.6%
151 1
0.6%
150 1
0.6%
149 1
0.6%
148 1
0.6%

법정동
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
화곡동
73 
등촌동
30 
방화동
21 
공항동
13 
내발산동
 
7
Other values (2)
13 

Length

Max length4
Median length3
Mean length3.044586
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row가양동
2nd row가양동
3rd row가양동
4th row가양동
5th row가양동

Common Values

ValueCountFrequency (%)
화곡동 73
46.5%
등촌동 30
19.1%
방화동 21
 
13.4%
공항동 13
 
8.3%
내발산동 7
 
4.5%
염창동 7
 
4.5%
가양동 6
 
3.8%

Length

2023-12-12T18:45:58.184292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:45:58.324530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
화곡동 73
46.5%
등촌동 30
19.1%
방화동 21
 
13.4%
공항동 13
 
8.3%
내발산동 7
 
4.5%
염창동 7
 
4.5%
가양동 6
 
3.8%

상호
Text

Distinct151
Distinct (%)96.2%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
2023-12-12T18:45:58.604121image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length6.7770701
Min length2

Characters and Unicode

Total characters1064
Distinct characters226
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique145 ?
Unique (%)92.4%

Sample

1st row에덴고시원
2nd row에덴고시원
3rd row주영고시텔
4th row밀레시티타워
5th row해피하우스
ValueCountFrequency (%)
고시원 5
 
2.9%
고시텔 3
 
1.8%
에덴고시원 2
 
1.2%
그린하우스 2
 
1.2%
로얄홈리빙텔(구,여명고시원 2
 
1.2%
임탑고시원 2
 
1.2%
라임하우스 2
 
1.2%
웰빙하우스 2
 
1.2%
화곡고시원 1
 
0.6%
토마토 1
 
0.6%
Other values (148) 148
87.1%
2023-12-12T18:45:59.151107image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
88
 
8.3%
86
 
8.1%
74
 
7.0%
49
 
4.6%
34
 
3.2%
34
 
3.2%
32
 
3.0%
31
 
2.9%
) 21
 
2.0%
( 21
 
2.0%
Other values (216) 594
55.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 942
88.5%
Close Punctuation 21
 
2.0%
Open Punctuation 21
 
2.0%
Other Punctuation 20
 
1.9%
Uppercase Letter 17
 
1.6%
Decimal Number 14
 
1.3%
Space Separator 13
 
1.2%
Lowercase Letter 10
 
0.9%
Dash Punctuation 6
 
0.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
88
 
9.3%
86
 
9.1%
74
 
7.9%
49
 
5.2%
34
 
3.6%
34
 
3.6%
32
 
3.4%
31
 
3.3%
20
 
2.1%
20
 
2.1%
Other values (189) 474
50.3%
Uppercase Letter
ValueCountFrequency (%)
B 2
11.8%
A 2
11.8%
O 2
11.8%
U 2
11.8%
E 2
11.8%
W 1
5.9%
S 1
5.9%
H 1
5.9%
G 1
5.9%
N 1
5.9%
Other values (2) 2
11.8%
Lowercase Letter
ValueCountFrequency (%)
s 3
30.0%
l 2
20.0%
e 2
20.0%
i 2
20.0%
q 1
 
10.0%
Decimal Number
ValueCountFrequency (%)
1 6
42.9%
2 6
42.9%
0 1
 
7.1%
3 1
 
7.1%
Other Punctuation
ValueCountFrequency (%)
. 15
75.0%
, 5
 
25.0%
Close Punctuation
ValueCountFrequency (%)
) 21
100.0%
Open Punctuation
ValueCountFrequency (%)
( 21
100.0%
Space Separator
ValueCountFrequency (%)
13
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 942
88.5%
Common 95
 
8.9%
Latin 27
 
2.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
88
 
9.3%
86
 
9.1%
74
 
7.9%
49
 
5.2%
34
 
3.6%
34
 
3.6%
32
 
3.4%
31
 
3.3%
20
 
2.1%
20
 
2.1%
Other values (189) 474
50.3%
Latin
ValueCountFrequency (%)
s 3
11.1%
B 2
 
7.4%
A 2
 
7.4%
l 2
 
7.4%
O 2
 
7.4%
U 2
 
7.4%
e 2
 
7.4%
i 2
 
7.4%
E 2
 
7.4%
W 1
 
3.7%
Other values (7) 7
25.9%
Common
ValueCountFrequency (%)
) 21
22.1%
( 21
22.1%
. 15
15.8%
13
13.7%
- 6
 
6.3%
1 6
 
6.3%
2 6
 
6.3%
, 5
 
5.3%
0 1
 
1.1%
3 1
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 942
88.5%
ASCII 122
 
11.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
88
 
9.3%
86
 
9.1%
74
 
7.9%
49
 
5.2%
34
 
3.6%
34
 
3.6%
32
 
3.4%
31
 
3.3%
20
 
2.1%
20
 
2.1%
Other values (189) 474
50.3%
ASCII
ValueCountFrequency (%)
) 21
17.2%
( 21
17.2%
. 15
12.3%
13
10.7%
- 6
 
4.9%
1 6
 
4.9%
2 6
 
4.9%
, 5
 
4.1%
s 3
 
2.5%
B 2
 
1.6%
Other values (17) 24
19.7%

주소
Text

Distinct151
Distinct (%)96.2%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
2023-12-12T18:45:59.547987image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length23
Mean length19.305732
Min length16

Characters and Unicode

Total characters3031
Distinct characters52
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique145 ?
Unique (%)92.4%

Sample

1st row서울특별시 강서구 양천로 443-36
2nd row서울특별시 강서구 양천로 443-36
3rd row서울특별시 강서구 양천로 461
4th row서울특별시 강서구 양천로47가길 25-18
5th row서울특별시 강서구 화곡로72길 48
ValueCountFrequency (%)
서울특별시 157
25.0%
강서구 157
25.0%
강서로 9
 
1.4%
강서로17길 8
 
1.3%
양천로 6
 
1.0%
화곡로68길 5
 
0.8%
29 5
 
0.8%
화곡로66길 5
 
0.8%
방화동로 4
 
0.6%
16 4
 
0.6%
Other values (195) 268
42.7%
2023-12-12T18:46:00.470550image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
471
15.5%
355
 
11.7%
198
 
6.5%
157
 
5.2%
157
 
5.2%
157
 
5.2%
157
 
5.2%
157
 
5.2%
157
 
5.2%
123
 
4.1%
Other values (42) 942
31.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1931
63.7%
Decimal Number 595
 
19.6%
Space Separator 471
 
15.5%
Dash Punctuation 34
 
1.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
355
18.4%
198
10.3%
157
8.1%
157
8.1%
157
8.1%
157
8.1%
157
8.1%
157
8.1%
123
 
6.4%
49
 
2.5%
Other values (30) 264
13.7%
Decimal Number
ValueCountFrequency (%)
1 114
19.2%
5 79
13.3%
3 78
13.1%
2 73
12.3%
6 65
10.9%
4 51
8.6%
7 44
 
7.4%
9 34
 
5.7%
8 30
 
5.0%
0 27
 
4.5%
Space Separator
ValueCountFrequency (%)
471
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 34
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1931
63.7%
Common 1100
36.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
355
18.4%
198
10.3%
157
8.1%
157
8.1%
157
8.1%
157
8.1%
157
8.1%
157
8.1%
123
 
6.4%
49
 
2.5%
Other values (30) 264
13.7%
Common
ValueCountFrequency (%)
471
42.8%
1 114
 
10.4%
5 79
 
7.2%
3 78
 
7.1%
2 73
 
6.6%
6 65
 
5.9%
4 51
 
4.6%
7 44
 
4.0%
- 34
 
3.1%
9 34
 
3.1%
Other values (2) 57
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1931
63.7%
ASCII 1100
36.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
471
42.8%
1 114
 
10.4%
5 79
 
7.2%
3 78
 
7.1%
2 73
 
6.6%
6 65
 
5.9%
4 51
 
4.6%
7 44
 
4.0%
- 34
 
3.1%
9 34
 
3.1%
Other values (2) 57
 
5.2%
Hangul
ValueCountFrequency (%)
355
18.4%
198
10.3%
157
8.1%
157
8.1%
157
8.1%
157
8.1%
157
8.1%
157
8.1%
123
 
6.4%
49
 
2.5%
Other values (30) 264
13.7%

우편번호
Real number (ℝ)

HIGH CORRELATION 

Distinct78
Distinct (%)49.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7659.4522
Minimum7516
Maximum7786
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 KiB
2023-12-12T18:46:00.646217image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum7516
5-th percentile7537.2
Q17591
median7649
Q37720
95-th percentile7779
Maximum7786
Range270
Interquartile range (IQR)129

Descriptive statistics

Standard deviation80.103494
Coefficient of variation (CV)0.010458123
Kurtosis-1.1946494
Mean7659.4522
Median Absolute Deviation (MAD)63
Skewness0.090920281
Sum1202534
Variance6416.5698
MonotonicityNot monotonic
2023-12-12T18:46:00.800564image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7620 13
 
8.3%
7770 8
 
5.1%
7591 5
 
3.2%
7649 4
 
2.5%
7773 4
 
2.5%
7569 4
 
2.5%
7785 4
 
2.5%
7622 4
 
2.5%
7551 3
 
1.9%
7686 3
 
1.9%
Other values (68) 105
66.9%
ValueCountFrequency (%)
7516 2
1.3%
7523 1
 
0.6%
7526 1
 
0.6%
7527 2
1.3%
7534 2
1.3%
7538 2
1.3%
7546 2
1.3%
7550 1
 
0.6%
7551 3
1.9%
7558 2
1.3%
ValueCountFrequency (%)
7786 1
 
0.6%
7785 4
2.5%
7781 1
 
0.6%
7779 3
 
1.9%
7777 1
 
0.6%
7776 3
 
1.9%
7775 2
 
1.3%
7773 4
2.5%
7770 8
5.1%
7769 1
 
0.6%

Interactions

2023-12-12T18:45:57.353983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T18:45:57.162366image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T18:45:57.450338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T18:45:57.265544image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T18:46:00.906765image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번법정동우편번호
연번1.0000.8760.885
법정동0.8761.0000.870
우편번호0.8850.8701.000
2023-12-12T18:46:01.013785image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번우편번호법정동
연번1.0000.6460.683
우편번호0.6461.0000.656
법정동0.6830.6561.000

Missing values

2023-12-12T18:45:57.558637image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T18:45:57.663228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번법정동상호주소우편번호
01가양동에덴고시원서울특별시 강서구 양천로 443-367527
12가양동에덴고시원서울특별시 강서구 양천로 443-367527
23가양동주영고시텔서울특별시 강서구 양천로 4617526
34가양동밀레시티타워서울특별시 강서구 양천로47가길 25-187523
45가양동해피하우스서울특별시 강서구 화곡로72길 487534
56가양동노블하우스서울특별시 강서구 화곡로72길 527534
67공항동샤인미니홈서울특별시 강서구 공항대로2길 527622
78공항동궁민오피스텔서울특별시 강서구 공항대로3길 217619
89공항동휘성 고시원서울특별시 강서구 공항대로7나길 217619
910공항동송정빌리지서울특별시 강서구 공항대로8가길 2-37624
연번법정동상호주소우편번호
147148화곡동더큰 원룸텔서울특별시 강서구 화곡로25길 77714
148149화곡동프리미엄텔서울특별시 강서구 화곡로26가길 247715
149150화곡동기훈하우스(고인실건물)서울특별시 강서구 화곡로26길 697720
150151화곡동화곡삼성고시원서울특별시 강서구 화곡로27길 377702
151152화곡동힐탑타운서울특별시 강서구 화곡로29길 237701
152153화곡동로즈펠리스서울특별시 강서구 화곡로35길 57696
153154화곡동진주타워서울특별시 강서구 화곡로42나길 6-137678
154155화곡동이모션하우스고시원서울특별시 강서구 화곡로55길 227685
155156화곡동02고시원서울특별시 강서구 화곡로58길 22-37657
156157화곡동장안고시원서울특별시 강서구 화곡로60길 227654