Overview

Dataset statistics

Number of variables4
Number of observations71
Missing cells30
Missing cells (%)10.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.5 KiB
Average record size in memory35.9 B

Variable types

Numeric2
Categorical1
Text1

Dataset

Description샘플 데이터
Author신한카드
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=50

Alerts

유입지_시도코드(C_SIDO_CD) is highly overall correlated with 유입지_시도코드명(C_SIDO_NM)High correlation
유입지_시도코드명(C_SIDO_NM) is highly overall correlated with 유입지_시도코드(C_SIDO_CD)High correlation
유입지_시군구코드(C_SGG_CD) has 15 (21.1%) missing valuesMissing
유입지_시군구코드명(C_SGG_NM) has 15 (21.1%) missing valuesMissing

Reproduction

Analysis started2024-04-17 14:47:26.899157
Analysis finished2024-04-17 14:47:27.511548
Duration0.61 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

유입지_시도코드(C_SIDO_CD)
Real number (ℝ)

HIGH CORRELATION 

Distinct17
Distinct (%)23.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.830986
Minimum11
Maximum50
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size771.0 B
2024-04-17T23:47:27.554357image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile11
Q111
median41
Q341
95-th percentile45.5
Maximum50
Range39
Interquartile range (IQR)30

Descriptive statistics

Standard deviation14.562168
Coefficient of variation (CV)0.48815578
Kurtosis-1.6783262
Mean29.830986
Median Absolute Deviation (MAD)5
Skewness-0.42692695
Sum2118
Variance212.05674
MonotonicityIncreasing
2024-04-17T23:47:27.659934image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
41 31
43.7%
11 25
35.2%
43 1
 
1.4%
50 1
 
1.4%
48 1
 
1.4%
47 1
 
1.4%
46 1
 
1.4%
45 1
 
1.4%
44 1
 
1.4%
42 1
 
1.4%
Other values (7) 7
 
9.9%
ValueCountFrequency (%)
11 25
35.2%
26 1
 
1.4%
27 1
 
1.4%
28 1
 
1.4%
29 1
 
1.4%
30 1
 
1.4%
31 1
 
1.4%
36 1
 
1.4%
41 31
43.7%
42 1
 
1.4%
ValueCountFrequency (%)
50 1
 
1.4%
48 1
 
1.4%
47 1
 
1.4%
46 1
 
1.4%
45 1
 
1.4%
44 1
 
1.4%
43 1
 
1.4%
42 1
 
1.4%
41 31
43.7%
36 1
 
1.4%

유입지_시군구코드(C_SGG_CD)
Real number (ℝ)

MISSING 

Distinct48
Distinct (%)85.7%
Missing15
Missing (%)21.1%
Infinite0
Infinite (%)0.0%
Mean42.053571
Minimum11
Maximum83
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size771.0 B
2024-04-17T23:47:27.769325image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile13.75
Q125.75
median41
Q357.5
95-th percentile75.5
Maximum83
Range72
Interquartile range (IQR)31.75

Descriptive statistics

Standard deviation19.9047
Coefficient of variation (CV)0.4733177
Kurtosis-0.91711394
Mean42.053571
Median Absolute Deviation (MAD)16
Skewness0.25907722
Sum2355
Variance396.19708
MonotonicityNot monotonic
2024-04-17T23:47:27.872695image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=48)
ValueCountFrequency (%)
29 2
 
2.8%
65 2
 
2.8%
59 2
 
2.8%
50 2
 
2.8%
41 2
 
2.8%
11 2
 
2.8%
21 2
 
2.8%
17 2
 
2.8%
26 1
 
1.4%
46 1
 
1.4%
Other values (38) 38
53.5%
(Missing) 15
 
21.1%
ValueCountFrequency (%)
11 2
2.8%
13 1
1.4%
14 1
1.4%
15 1
1.4%
17 2
2.8%
19 1
1.4%
20 1
1.4%
21 2
2.8%
22 1
1.4%
23 1
1.4%
ValueCountFrequency (%)
83 1
1.4%
82 1
1.4%
80 1
1.4%
74 1
1.4%
71 1
1.4%
68 1
1.4%
67 1
1.4%
65 2
2.8%
63 1
1.4%
62 1
1.4%

유입지_시도코드명(C_SIDO_NM)
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)23.9%
Missing0
Missing (%)0.0%
Memory size700.0 B
경기
31 
서울
25 
강원
 
1
경남
 
1
경북
 
1
Other values (12)
12 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique15 ?
Unique (%)21.1%

Sample

1st row서울
2nd row서울
3rd row서울
4th row서울
5th row서울

Common Values

ValueCountFrequency (%)
경기 31
43.7%
서울 25
35.2%
강원 1
 
1.4%
경남 1
 
1.4%
경북 1
 
1.4%
전남 1
 
1.4%
전북 1
 
1.4%
충남 1
 
1.4%
충북 1
 
1.4%
부산 1
 
1.4%
Other values (7) 7
 
9.9%

Length

2024-04-17T23:47:27.998320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기 31
43.7%
서울 25
35.2%
부산 1
 
1.4%
대구 1
 
1.4%
인천 1
 
1.4%
광주 1
 
1.4%
대전 1
 
1.4%
울산 1
 
1.4%
세종 1
 
1.4%
충북 1
 
1.4%
Other values (7) 7
 
9.9%
Distinct56
Distinct (%)100.0%
Missing15
Missing (%)21.1%
Memory size700.0 B
2024-04-17T23:47:28.185385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length3
Mean length3.0892857
Min length2

Characters and Unicode

Total characters173
Distinct characters61
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique56 ?
Unique (%)100.0%

Sample

1st row종로구
2nd row중구
3rd row용산구
4th row성동구
5th row광진구
ValueCountFrequency (%)
의정부시 1
 
1.8%
마포구 1
 
1.8%
의왕시 1
 
1.8%
광명시 1
 
1.8%
평택시 1
 
1.8%
동두천시 1
 
1.8%
안산시 1
 
1.8%
고양시 1
 
1.8%
과천시 1
 
1.8%
구리시 1
 
1.8%
Other values (46) 46
82.1%
2024-04-17T23:47:28.498200image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
29
 
16.8%
27
 
15.6%
8
 
4.6%
6
 
3.5%
5
 
2.9%
5
 
2.9%
5
 
2.9%
5
 
2.9%
4
 
2.3%
4
 
2.3%
Other values (51) 75
43.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 173
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
29
 
16.8%
27
 
15.6%
8
 
4.6%
6
 
3.5%
5
 
2.9%
5
 
2.9%
5
 
2.9%
5
 
2.9%
4
 
2.3%
4
 
2.3%
Other values (51) 75
43.4%

Most occurring scripts

ValueCountFrequency (%)
Hangul 173
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
29
 
16.8%
27
 
15.6%
8
 
4.6%
6
 
3.5%
5
 
2.9%
5
 
2.9%
5
 
2.9%
5
 
2.9%
4
 
2.3%
4
 
2.3%
Other values (51) 75
43.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 173
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
29
 
16.8%
27
 
15.6%
8
 
4.6%
6
 
3.5%
5
 
2.9%
5
 
2.9%
5
 
2.9%
5
 
2.9%
4
 
2.3%
4
 
2.3%
Other values (51) 75
43.4%

Interactions

2024-04-17T23:47:27.167829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T23:47:27.021106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T23:47:27.233479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T23:47:27.099451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-17T23:47:28.574950image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
유입지_시도코드(C_SIDO_CD)유입지_시군구코드(C_SGG_CD)유입지_시도코드명(C_SIDO_NM)유입지_시군구코드명(C_SGG_NM)
유입지_시도코드(C_SIDO_CD)1.0000.0001.0001.000
유입지_시군구코드(C_SGG_CD)0.0001.0000.0001.000
유입지_시도코드명(C_SIDO_NM)1.0000.0001.0001.000
유입지_시군구코드명(C_SGG_NM)1.0001.0001.0001.000
2024-04-17T23:47:28.653909image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
유입지_시도코드(C_SIDO_CD)유입지_시군구코드(C_SGG_CD)유입지_시도코드명(C_SIDO_NM)
유입지_시도코드(C_SIDO_CD)1.0000.0030.926
유입지_시군구코드(C_SGG_CD)0.0031.0000.000
유입지_시도코드명(C_SIDO_NM)0.9260.0001.000

Missing values

2024-04-17T23:47:27.322836image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-17T23:47:27.395215image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-17T23:47:27.474694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

유입지_시도코드(C_SIDO_CD)유입지_시군구코드(C_SGG_CD)유입지_시도코드명(C_SIDO_NM)유입지_시군구코드명(C_SGG_NM)
01111서울종로구
11114서울중구
21117서울용산구
31120서울성동구
41121서울광진구
51123서울동대문구
61126서울중랑구
71129서울성북구
81130서울강북구
91132서울도봉구
유입지_시도코드(C_SIDO_CD)유입지_시군구코드(C_SGG_CD)유입지_시도코드명(C_SIDO_NM)유입지_시군구코드명(C_SGG_NM)
614182경기가평군
624183경기양평군
6342<NA>강원<NA>
6443<NA>충북<NA>
6544<NA>충남<NA>
6645<NA>전북<NA>
6746<NA>전남<NA>
6847<NA>경북<NA>
6948<NA>경남<NA>
7050<NA>제주<NA>