Overview

Dataset statistics

Number of variables7
Number of observations500
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory28.9 KiB
Average record size in memory59.3 B

Variable types

Text4
Numeric2
Categorical1

Dataset

Description샘플 데이터
Author오픈메이트
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=6

Alerts

엑스좌표_값 is highly overall correlated with 행정구역_중복_여부High correlation
와이좌표_값 is highly overall correlated with 행정구역_중복_여부High correlation
행정구역_중복_여부 is highly overall correlated with 엑스좌표_값 and 1 other fieldsHigh correlation
행정구역_중복_여부 is highly imbalanced (83.7%)Imbalance
아파트_동_코드 has unique valuesUnique

Reproduction

Analysis started2023-12-10 14:58:51.273383
Analysis finished2023-12-10 14:58:52.781056
Duration1.51 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct500
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2023-12-10T23:58:53.091749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters5000
Distinct characters14
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique500 ?
Unique (%)100.0%

Sample

1st rowB000073363
2nd rowB000045014
3rd rowB000071436
4th rowU000010740
5th rowA000081751
ValueCountFrequency (%)
b000073363 1
 
0.2%
a001027259 1
 
0.2%
b000063377 1
 
0.2%
b000020746 1
 
0.2%
u000004143 1
 
0.2%
b000056248 1
 
0.2%
b000001552 1
 
0.2%
a001024947 1
 
0.2%
b000088761 1
 
0.2%
b000010493 1
 
0.2%
Other values (490) 490
98.0%
2023-12-10T23:58:53.778623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 2202
44.0%
1 379
 
7.6%
B 287
 
5.7%
2 270
 
5.4%
4 253
 
5.1%
5 247
 
4.9%
3 245
 
4.9%
8 241
 
4.8%
7 230
 
4.6%
6 219
 
4.4%
Other values (4) 427
 
8.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4500
90.0%
Uppercase Letter 500
 
10.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2202
48.9%
1 379
 
8.4%
2 270
 
6.0%
4 253
 
5.6%
5 247
 
5.5%
3 245
 
5.4%
8 241
 
5.4%
7 230
 
5.1%
6 219
 
4.9%
9 214
 
4.8%
Uppercase Letter
ValueCountFrequency (%)
B 287
57.4%
A 187
37.4%
U 18
 
3.6%
X 8
 
1.6%

Most occurring scripts

ValueCountFrequency (%)
Common 4500
90.0%
Latin 500
 
10.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2202
48.9%
1 379
 
8.4%
2 270
 
6.0%
4 253
 
5.6%
5 247
 
5.5%
3 245
 
5.4%
8 241
 
5.4%
7 230
 
5.1%
6 219
 
4.9%
9 214
 
4.8%
Latin
ValueCountFrequency (%)
B 287
57.4%
A 187
37.4%
U 18
 
3.6%
X 8
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2202
44.0%
1 379
 
7.6%
B 287
 
5.7%
2 270
 
5.4%
4 253
 
5.1%
5 247
 
4.9%
3 245
 
4.9%
8 241
 
4.8%
7 230
 
4.6%
6 219
 
4.4%
Other values (4) 427
 
8.5%
Distinct493
Distinct (%)98.6%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2023-12-10T23:58:54.319938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters5000
Distinct characters14
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique486 ?
Unique (%)97.2%

Sample

1st rowB000077122
2nd rowB000002197
3rd rowB000015590
4th rowA000044415
5th rowB000062484
ValueCountFrequency (%)
x000011476 2
 
0.4%
a000068640 2
 
0.4%
a000017357 2
 
0.4%
u000000678 2
 
0.4%
u000000113 2
 
0.4%
a000069718 2
 
0.4%
a000058015 2
 
0.4%
a000068231 1
 
0.2%
b000010874 1
 
0.2%
a000068698 1
 
0.2%
Other values (483) 483
96.6%
2023-12-10T23:58:55.014636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 2093
41.9%
1 417
 
8.3%
5 305
 
6.1%
4 276
 
5.5%
2 263
 
5.3%
7 254
 
5.1%
6 251
 
5.0%
3 248
 
5.0%
B 239
 
4.8%
A 231
 
4.6%
Other values (4) 423
 
8.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4500
90.0%
Uppercase Letter 500
 
10.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2093
46.5%
1 417
 
9.3%
5 305
 
6.8%
4 276
 
6.1%
2 263
 
5.8%
7 254
 
5.6%
6 251
 
5.6%
3 248
 
5.5%
8 224
 
5.0%
9 169
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
B 239
47.8%
A 231
46.2%
U 23
 
4.6%
X 7
 
1.4%

Most occurring scripts

ValueCountFrequency (%)
Common 4500
90.0%
Latin 500
 
10.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2093
46.5%
1 417
 
9.3%
5 305
 
6.8%
4 276
 
6.1%
2 263
 
5.8%
7 254
 
5.6%
6 251
 
5.6%
3 248
 
5.5%
8 224
 
5.0%
9 169
 
3.8%
Latin
ValueCountFrequency (%)
B 239
47.8%
A 231
46.2%
U 23
 
4.6%
X 7
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2093
41.9%
1 417
 
8.3%
5 305
 
6.1%
4 276
 
5.5%
2 263
 
5.3%
7 254
 
5.1%
6 251
 
5.0%
3 248
 
5.0%
B 239
 
4.8%
A 231
 
4.6%
Other values (4) 423
 
8.5%
Distinct73
Distinct (%)14.6%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2023-12-10T23:58:55.441518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length5
Mean length3.8
Min length1

Characters and Unicode

Total characters1900
Distinct characters26
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique47 ?
Unique (%)9.4%

Sample

1st row동명없음1
2nd row동명없음1
3rd row동명없음1
4th row1
5th row동명없음1
ValueCountFrequency (%)
동명없음1 304
60.8%
1 34
 
6.8%
b 15
 
3.0%
a 12
 
2.4%
101 11
 
2.2%
10
 
2.0%
103 9
 
1.8%
9
 
1.8%
2 7
 
1.4%
108 4
 
0.8%
Other values (63) 85
 
17.0%
2023-12-10T23:58:56.155239image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 420
22.1%
304
16.0%
304
16.0%
304
16.0%
304
16.0%
0 72
 
3.8%
2 39
 
2.1%
3 23
 
1.2%
5 16
 
0.8%
B 15
 
0.8%
Other values (16) 99
 
5.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1241
65.3%
Decimal Number 628
33.1%
Uppercase Letter 31
 
1.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
304
24.5%
304
24.5%
304
24.5%
304
24.5%
10
 
0.8%
10
 
0.8%
1
 
0.1%
1
 
0.1%
1
 
0.1%
1
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 420
66.9%
0 72
 
11.5%
2 39
 
6.2%
3 23
 
3.7%
5 16
 
2.5%
7 14
 
2.2%
6 14
 
2.2%
4 13
 
2.1%
8 10
 
1.6%
9 7
 
1.1%
Uppercase Letter
ValueCountFrequency (%)
B 15
48.4%
A 12
38.7%
E 2
 
6.5%
I 1
 
3.2%
D 1
 
3.2%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1241
65.3%
Common 628
33.1%
Latin 31
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
304
24.5%
304
24.5%
304
24.5%
304
24.5%
10
 
0.8%
10
 
0.8%
1
 
0.1%
1
 
0.1%
1
 
0.1%
1
 
0.1%
Common
ValueCountFrequency (%)
1 420
66.9%
0 72
 
11.5%
2 39
 
6.2%
3 23
 
3.7%
5 16
 
2.5%
7 14
 
2.2%
6 14
 
2.2%
4 13
 
2.1%
8 10
 
1.6%
9 7
 
1.1%
Latin
ValueCountFrequency (%)
B 15
48.4%
A 12
38.7%
E 2
 
6.5%
I 1
 
3.2%
D 1
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1241
65.3%
ASCII 659
34.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 420
63.7%
0 72
 
10.9%
2 39
 
5.9%
3 23
 
3.5%
5 16
 
2.4%
B 15
 
2.3%
7 14
 
2.1%
6 14
 
2.1%
4 13
 
2.0%
A 12
 
1.8%
Other values (5) 21
 
3.2%
Hangul
ValueCountFrequency (%)
304
24.5%
304
24.5%
304
24.5%
304
24.5%
10
 
0.8%
10
 
0.8%
1
 
0.1%
1
 
0.1%
1
 
0.1%
1
 
0.1%

엑스좌표_값
Real number (ℝ)

HIGH CORRELATION 

Distinct495
Distinct (%)99.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean198940.48
Minimum182866
Maximum215235
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T23:58:56.407024image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum182866
5-th percentile186439.3
Q1192717
median199141.5
Q3204794.25
95-th percentile211412.45
Maximum215235
Range32369
Interquartile range (IQR)12077.25

Descriptive statistics

Standard deviation7813.1494
Coefficient of variation (CV)0.039273804
Kurtosis-1.0649422
Mean198940.48
Median Absolute Deviation (MAD)6305
Skewness0.012637556
Sum99470241
Variance61045304
MonotonicityNot monotonic
2023-12-10T23:58:56.650960image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
192321 2
 
0.4%
192474 2
 
0.4%
196750 2
 
0.4%
192036 2
 
0.4%
194602 2
 
0.4%
201484 1
 
0.2%
203248 1
 
0.2%
204660 1
 
0.2%
186287 1
 
0.2%
201453 1
 
0.2%
Other values (485) 485
97.0%
ValueCountFrequency (%)
182866 1
0.2%
183174 1
0.2%
184697 1
0.2%
184987 1
0.2%
185071 1
0.2%
185146 1
0.2%
185154 1
0.2%
185185 1
0.2%
185202 1
0.2%
185216 1
0.2%
ValueCountFrequency (%)
215235 1
0.2%
214973 1
0.2%
213554 1
0.2%
213134 1
0.2%
213125 1
0.2%
213088 1
0.2%
212959 1
0.2%
212893 1
0.2%
212785 1
0.2%
212652 1
0.2%

와이좌표_값
Real number (ℝ)

HIGH CORRELATION 

Distinct495
Distinct (%)99.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean450742.75
Minimum439209
Maximum464512
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T23:58:56.922052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum439209
5-th percentile442031.9
Q1445025.25
median450361.5
Q3455572.5
95-th percentile461156
Maximum464512
Range25303
Interquartile range (IQR)10547.25

Descriptive statistics

Standard deviation6046.4312
Coefficient of variation (CV)0.013414373
Kurtosis-0.95141797
Mean450742.75
Median Absolute Deviation (MAD)5226
Skewness0.19292169
Sum2.2537138 × 108
Variance36559330
MonotonicityNot monotonic
2023-12-10T23:58:57.246380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
449898 2
 
0.4%
448731 2
 
0.4%
443328 2
 
0.4%
441984 2
 
0.4%
443193 2
 
0.4%
458217 1
 
0.2%
442980 1
 
0.2%
452612 1
 
0.2%
445144 1
 
0.2%
443131 1
 
0.2%
Other values (485) 485
97.0%
ValueCountFrequency (%)
439209 1
0.2%
439261 1
0.2%
439601 1
0.2%
439755 1
0.2%
440086 1
0.2%
440281 1
0.2%
440554 1
0.2%
440577 1
0.2%
441005 1
0.2%
441015 1
0.2%
ValueCountFrequency (%)
464512 1
0.2%
464065 1
0.2%
463850 1
0.2%
463834 1
0.2%
463424 1
0.2%
463299 1
0.2%
463069 1
0.2%
463051 1
0.2%
462860 1
0.2%
462777 1
0.2%

행정구역_중복_여부
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
0
488 
<NA>
 
12

Length

Max length4
Median length1
Mean length1.072
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 488
97.6%
<NA> 12
 
2.4%

Length

2023-12-10T23:58:57.633530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T23:58:57.848639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 488
97.6%
na 12
 
2.4%
Distinct327
Distinct (%)65.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2023-12-10T23:58:58.397920image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length5.66
Min length2

Characters and Unicode

Total characters2830
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique220 ?
Unique (%)44.0%

Sample

1st row1*3*8*
2nd row2*7*1*
3rd row2*8*6*
4th row2*7*7*
5th row3*3*1*
ValueCountFrequency (%)
2*1*3 7
 
1.4%
2*0*9 6
 
1.2%
2*2*6 6
 
1.2%
2*0*4 6
 
1.2%
2*9*2 5
 
1.0%
2*1*9 5
 
1.0%
2*2*5 5
 
1.0%
2*1*0 5
 
1.0%
3*3*2 5
 
1.0%
2*1*5 5
 
1.0%
Other values (255) 445
89.0%
2023-12-10T23:58:59.268290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 1353
47.8%
2 336
 
11.9%
1 227
 
8.0%
3 197
 
7.0%
4 134
 
4.7%
5 111
 
3.9%
0 102
 
3.6%
9 101
 
3.6%
8 93
 
3.3%
7 90
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1477
52.2%
Other Punctuation 1353
47.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 336
22.7%
1 227
15.4%
3 197
13.3%
4 134
 
9.1%
5 111
 
7.5%
0 102
 
6.9%
9 101
 
6.8%
8 93
 
6.3%
7 90
 
6.1%
6 86
 
5.8%
Other Punctuation
ValueCountFrequency (%)
* 1353
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2830
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
* 1353
47.8%
2 336
 
11.9%
1 227
 
8.0%
3 197
 
7.0%
4 134
 
4.7%
5 111
 
3.9%
0 102
 
3.6%
9 101
 
3.6%
8 93
 
3.3%
7 90
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2830
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 1353
47.8%
2 336
 
11.9%
1 227
 
8.0%
3 197
 
7.0%
4 134
 
4.7%
5 111
 
3.9%
0 102
 
3.6%
9 101
 
3.6%
8 93
 
3.3%
7 90
 
3.2%

Interactions

2023-12-10T23:58:52.060132image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:51.619737image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:52.252368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:51.853401image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T23:58:59.484285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
아파트_동_명엑스좌표_값와이좌표_값
아파트_동_명1.0000.0000.000
엑스좌표_값0.0001.0000.000
와이좌표_값0.0000.0001.000
2023-12-10T23:58:59.680957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
엑스좌표_값와이좌표_값행정구역_중복_여부
엑스좌표_값1.0000.0201.000
와이좌표_값0.0201.0001.000
행정구역_중복_여부1.0001.0001.000

Missing values

2023-12-10T23:58:52.489509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T23:58:52.700182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트_동_코드아파트_단지_코드아파트_동_명엑스좌표_값와이좌표_값행정구역_중복_여부블록_코드
0B000073363B000077122동명없음118664244313101*3*8*
1B000045014B000002197동명없음119435045390602*7*1*
2B000071436B000015590동명없음119686845019302*8*6*
3U000010740A000044415121265244502002*7*7*
4A000081751B000062484동명없음119604645785603*3*1*
5A000024502A105860897동명없음119385446115502*1*3*
6B000057029U00000115410919054645846703*2*6
7B000016708U000000978동명없음120475246277702*1*0*
8A002013853B00003849490620048944410406*4*5
9U000009344A000050414동명없음119401245037102*3*0*
아파트_동_코드아파트_단지_코드아파트_동_명엑스좌표_값와이좌표_값행정구역_중복_여부블록_코드
490B000065770X000011476동명없음120466645618201*7*7*
491A001048542B000072973동명없음1208148452012<NA>2*2*9*
492B000050926B000079111동명없음118507145346401*3*0
493A001050924B000012552동명없음120748244580601*0*0
494A001032657A001004646동명없음119893044891601*0*8
495B000004270U000001904동명없음121230245132409*8*
496B000022282X000010919동명없음119329744499008*4*
497A000049258B000030390118702046103101*7*4*
498B000066007A101480300119771345812801*3*5*
499A000060625U000002837301201276442638<NA>2*4*0*