Overview

Dataset statistics

Number of variables4
Number of observations647
Missing cells224
Missing cells (%)8.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory21.0 KiB
Average record size in memory33.2 B

Variable types

Numeric1
Categorical1
Text2

Dataset

Description샘플 데이터
Author서울시(신용보증재단)
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=324

Alerts

지역코드(AREA_CD) is highly overall correlated with 시도명(SIDO_NM)High correlation
시도명(SIDO_NM) is highly overall correlated with 지역코드(AREA_CD)High correlation
읍면동명(EMD_NM) has 223 (34.5%) missing valuesMissing
지역코드(AREA_CD) has unique valuesUnique

Reproduction

Analysis started2024-04-16 19:18:30.608341
Analysis finished2024-04-16 19:18:31.172646
Duration0.56 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

지역코드(AREA_CD)
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct647
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7506699.2
Minimum26110
Maximum11740700
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.8 KiB
2024-04-17T04:18:31.568293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum26110
5-th percentile28713
Q146750
median11260580
Q311530525
95-th percentile11710597
Maximum11740700
Range11714590
Interquartile range (IQR)11483775

Descriptive statistics

Standard deviation5420632.1
Coefficient of variation (CV)0.72210595
Kurtosis-1.5753478
Mean7506699.2
Median Absolute Deviation (MAD)360015
Skewness-0.65260047
Sum4.8568344 × 109
Variance2.9383253 × 1013
MonotonicityNot monotonic
2024-04-17T04:18:31.699069image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11110515 1
 
0.2%
27260 1
 
0.2%
26200 1
 
0.2%
26230 1
 
0.2%
26260 1
 
0.2%
26290 1
 
0.2%
26320 1
 
0.2%
26350 1
 
0.2%
26380 1
 
0.2%
26410 1
 
0.2%
Other values (637) 637
98.5%
ValueCountFrequency (%)
26110 1
0.2%
26140 1
0.2%
26170 1
0.2%
26200 1
0.2%
26230 1
0.2%
26260 1
0.2%
26290 1
0.2%
26320 1
0.2%
26350 1
0.2%
26380 1
0.2%
ValueCountFrequency (%)
11740700 1
0.2%
11740690 1
0.2%
11740685 1
0.2%
11740660 1
0.2%
11740650 1
0.2%
11740640 1
0.2%
11740620 1
0.2%
11740610 1
0.2%
11740600 1
0.2%
11740590 1
0.2%

시도명(SIDO_NM)
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size5.2 KiB
서울특별시
424 
경기도
 
42
경상북도
 
23
전라남도
 
22
경상남도
 
22
Other values (12)
114 

Length

Max length7
Median length5
Mean length4.6522411
Min length3

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st row서울특별시
2nd row서울특별시
3rd row서울특별시
4th row서울특별시
5th row서울특별시

Common Values

ValueCountFrequency (%)
서울특별시 424
65.5%
경기도 42
 
6.5%
경상북도 23
 
3.6%
전라남도 22
 
3.4%
경상남도 22
 
3.4%
강원도 18
 
2.8%
부산광역시 16
 
2.5%
전라북도 15
 
2.3%
충청남도 15
 
2.3%
충청북도 14
 
2.2%
Other values (7) 36
 
5.6%

Length

2024-04-17T04:18:31.850489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서울특별시 424
65.5%
경기도 42
 
6.5%
경상북도 23
 
3.6%
전라남도 22
 
3.4%
경상남도 22
 
3.4%
강원도 18
 
2.8%
부산광역시 16
 
2.5%
충청남도 15
 
2.3%
전라북도 15
 
2.3%
충청북도 14
 
2.2%
Other values (7) 36
 
5.6%
Distinct225
Distinct (%)34.8%
Missing1
Missing (%)0.2%
Memory size5.2 KiB
2024-04-17T04:18:32.259763image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length3
Mean length3.2260062
Min length2

Characters and Unicode

Total characters2084
Distinct characters142
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique195 ?
Unique (%)30.2%

Sample

1st row종로구
2nd row종로구
3rd row종로구
4th row종로구
5th row종로구
ValueCountFrequency (%)
송파구 27
 
4.0%
강남구 22
 
3.2%
관악구 21
 
3.1%
강서구 21
 
3.1%
중구 20
 
2.9%
성북구 20
 
2.9%
노원구 19
 
2.8%
강동구 18
 
2.7%
서초구 18
 
2.7%
영등포구 18
 
2.7%
Other values (224) 474
69.9%
2024-04-17T04:18:32.817019image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
519
24.9%
98
 
4.7%
84
 
4.0%
80
 
3.8%
77
 
3.7%
65
 
3.1%
56
 
2.7%
48
 
2.3%
42
 
2.0%
39
 
1.9%
Other values (132) 976
46.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2052
98.5%
Space Separator 32
 
1.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
519
25.3%
98
 
4.8%
84
 
4.1%
80
 
3.9%
77
 
3.8%
65
 
3.2%
56
 
2.7%
48
 
2.3%
42
 
2.0%
39
 
1.9%
Other values (131) 944
46.0%
Space Separator
ValueCountFrequency (%)
32
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2052
98.5%
Common 32
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
519
25.3%
98
 
4.8%
84
 
4.1%
80
 
3.9%
77
 
3.8%
65
 
3.2%
56
 
2.7%
48
 
2.3%
42
 
2.0%
39
 
1.9%
Other values (131) 944
46.0%
Common
ValueCountFrequency (%)
32
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2052
98.5%
ASCII 32
 
1.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
519
25.3%
98
 
4.8%
84
 
4.1%
80
 
3.9%
77
 
3.8%
65
 
3.2%
56
 
2.7%
48
 
2.3%
42
 
2.0%
39
 
1.9%
Other values (131) 944
46.0%
ASCII
ValueCountFrequency (%)
32
100.0%

읍면동명(EMD_NM)
Text

MISSING 

Distinct423
Distinct (%)99.8%
Missing223
Missing (%)34.5%
Memory size5.2 KiB
2024-04-17T04:18:33.194167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length7
Mean length4.2146226
Min length2

Characters and Unicode

Total characters1787
Distinct characters188
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique422 ?
Unique (%)99.5%

Sample

1st row청운효자동
2nd row사직동
3rd row삼청동
4th row부암동
5th row평창동
ValueCountFrequency (%)
신사동 2
 
0.5%
신대방제1동 1
 
0.2%
고척제1동 1
 
0.2%
독산제4동 1
 
0.2%
당산제2동 1
 
0.2%
당산제1동 1
 
0.2%
여의동 1
 
0.2%
영등포동 1
 
0.2%
영등포본동 1
 
0.2%
시흥제5동 1
 
0.2%
Other values (413) 413
97.4%
2024-04-17T04:18:33.644038image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
426
23.8%
183
 
10.2%
1 97
 
5.4%
2 97
 
5.4%
3 43
 
2.4%
38
 
2.1%
4 26
 
1.5%
23
 
1.3%
18
 
1.0%
17
 
1.0%
Other values (178) 819
45.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1486
83.2%
Decimal Number 292
 
16.3%
Other Punctuation 9
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
426
28.7%
183
 
12.3%
38
 
2.6%
23
 
1.5%
18
 
1.2%
17
 
1.1%
17
 
1.1%
16
 
1.1%
16
 
1.1%
16
 
1.1%
Other values (167) 716
48.2%
Decimal Number
ValueCountFrequency (%)
1 97
33.2%
2 97
33.2%
3 43
14.7%
4 26
 
8.9%
5 11
 
3.8%
6 7
 
2.4%
7 6
 
2.1%
8 3
 
1.0%
9 1
 
0.3%
0 1
 
0.3%
Other Punctuation
ValueCountFrequency (%)
. 9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1486
83.2%
Common 301
 
16.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
426
28.7%
183
 
12.3%
38
 
2.6%
23
 
1.5%
18
 
1.2%
17
 
1.1%
17
 
1.1%
16
 
1.1%
16
 
1.1%
16
 
1.1%
Other values (167) 716
48.2%
Common
ValueCountFrequency (%)
1 97
32.2%
2 97
32.2%
3 43
14.3%
4 26
 
8.6%
5 11
 
3.7%
. 9
 
3.0%
6 7
 
2.3%
7 6
 
2.0%
8 3
 
1.0%
9 1
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1486
83.2%
ASCII 301
 
16.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
426
28.7%
183
 
12.3%
38
 
2.6%
23
 
1.5%
18
 
1.2%
17
 
1.1%
17
 
1.1%
16
 
1.1%
16
 
1.1%
16
 
1.1%
Other values (167) 716
48.2%
ASCII
ValueCountFrequency (%)
1 97
32.2%
2 97
32.2%
3 43
14.3%
4 26
 
8.6%
5 11
 
3.7%
. 9
 
3.0%
6 7
 
2.3%
7 6
 
2.0%
8 3
 
1.0%
9 1
 
0.3%

Interactions

2024-04-17T04:18:30.827553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-17T04:18:33.738004image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역코드(AREA_CD)시도명(SIDO_NM)
지역코드(AREA_CD)1.0001.000
시도명(SIDO_NM)1.0001.000
2024-04-17T04:18:33.817149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역코드(AREA_CD)시도명(SIDO_NM)
지역코드(AREA_CD)1.0000.988
시도명(SIDO_NM)0.9881.000

Missing values

2024-04-17T04:18:30.946264image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-17T04:18:31.044280image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-17T04:18:31.128710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

지역코드(AREA_CD)시도명(SIDO_NM)시군구명(SIGNGU_NM)읍면동명(EMD_NM)
011110515서울특별시종로구청운효자동
111110530서울특별시종로구사직동
211110540서울특별시종로구삼청동
311110550서울특별시종로구부암동
411110560서울특별시종로구평창동
511110570서울특별시종로구무악동
611110580서울특별시종로구교남동
711110600서울특별시종로구가회동
811110615서울특별시종로구종로1.2.3.4가동
911110630서울특별시종로구종로5.6가동
지역코드(AREA_CD)시도명(SIDO_NM)시군구명(SIGNGU_NM)읍면동명(EMD_NM)
63748740경상남도창녕군<NA>
63848820경상남도고성군<NA>
63948840경상남도남해군<NA>
64048850경상남도하동군<NA>
64148860경상남도산청군<NA>
64248870경상남도함양군<NA>
64348880경상남도거창군<NA>
64448890경상남도합천군<NA>
64550110제주특별자치도제주시<NA>
64650130제주특별자치도서귀포시<NA>