Overview

Dataset statistics

Number of variables6
Number of observations627
Missing cells43
Missing cells (%)1.1%
Duplicate rows3
Duplicate rows (%)0.5%
Total size in memory30.7 KiB
Average record size in memory50.2 B

Variable types

Numeric1
Text2
Categorical3

Dataset

Description대전광역시 시설관리공단에서 운영중인 대전역 앞 지하도 상가(동구 중앙로 지하 200)의 상가에 대한 정보이력(일렬번호, 상가이름, 구분, 상세구분, 전화번호, 등록일자) 제공
Author대전광역시시설관리공단
URLhttps://www.data.go.kr/data/15123937/fileData.do

Alerts

Dataset has 3 (0.5%) duplicate rowsDuplicates
일렬번호 is highly overall correlated with 상세구분High correlation
구분 is highly overall correlated with 상세구분High correlation
상세구분 is highly overall correlated with 일렬번호 and 1 other fieldsHigh correlation
구분 is highly imbalanced (52.7%)Imbalance
전화번호 has 43 (6.9%) missing valuesMissing

Reproduction

Analysis started2023-12-11 23:18:02.023551
Analysis finished2023-12-11 23:18:02.475046
Duration0.45 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

일렬번호
Real number (ℝ)

HIGH CORRELATION 

Distinct216
Distinct (%)34.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2814.3238
Minimum2706
Maximum2923
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.6 KiB
2023-12-12T08:18:02.539776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2706
5-th percentile2716
Q12756
median2815
Q32870
95-th percentile2913.7
Maximum2923
Range217
Interquartile range (IQR)114

Descriptive statistics

Standard deviation64.285645
Coefficient of variation (CV)0.022842306
Kurtosis-1.2500318
Mean2814.3238
Median Absolute Deviation (MAD)57
Skewness-0.0012274107
Sum1764581
Variance4132.6442
MonotonicityIncreasing
2023-12-12T08:18:02.648829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2741 7
 
1.1%
2742 6
 
1.0%
2706 5
 
0.8%
2855 5
 
0.8%
2919 5
 
0.8%
2876 5
 
0.8%
2904 4
 
0.6%
2805 4
 
0.6%
2782 4
 
0.6%
2857 4
 
0.6%
Other values (206) 578
92.2%
ValueCountFrequency (%)
2706 5
0.8%
2707 3
0.5%
2708 3
0.5%
2709 3
0.5%
2710 3
0.5%
2711 3
0.5%
2712 3
0.5%
2713 2
 
0.3%
2714 3
0.5%
2715 2
 
0.3%
ValueCountFrequency (%)
2923 3
0.5%
2922 3
0.5%
2921 3
0.5%
2920 2
 
0.3%
2919 5
0.8%
2918 3
0.5%
2917 4
0.6%
2916 4
0.6%
2915 2
 
0.3%
2914 3
0.5%
Distinct149
Distinct (%)23.8%
Missing0
Missing (%)0.0%
Memory size5.0 KiB
2023-12-12T08:18:02.875369image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length4.0031898
Min length2

Characters and Unicode

Total characters2510
Distinct characters232
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)2.1%

Sample

1st row몽실
2nd row몽실
3rd row몽실
4th row몽실
5th row몽실
ValueCountFrequency (%)
큰별통신 17
 
2.7%
천보당안경콘택트 11
 
1.7%
연예인 11
 
1.7%
여성크로커 11
 
1.7%
올포유 10
 
1.6%
청담동 10
 
1.6%
밤블비 10
 
1.6%
자방모드 10
 
1.6%
현아통신 9
 
1.4%
흙비 9
 
1.4%
Other values (142) 531
83.1%
2023-12-12T08:18:03.216658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
66
 
2.6%
66
 
2.6%
58
 
2.3%
46
 
1.8%
43
 
1.7%
41
 
1.6%
37
 
1.5%
34
 
1.4%
34
 
1.4%
33
 
1.3%
Other values (222) 2052
81.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2437
97.1%
Uppercase Letter 18
 
0.7%
Space Separator 12
 
0.5%
Close Punctuation 12
 
0.5%
Open Punctuation 12
 
0.5%
Decimal Number 12
 
0.5%
Other Punctuation 3
 
0.1%
Other Symbol 2
 
0.1%
Math Symbol 1
 
< 0.1%
Letter Number 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
66
 
2.7%
66
 
2.7%
58
 
2.4%
46
 
1.9%
43
 
1.8%
41
 
1.7%
37
 
1.5%
34
 
1.4%
34
 
1.4%
33
 
1.4%
Other values (208) 1979
81.2%
Uppercase Letter
ValueCountFrequency (%)
N 4
22.2%
E 4
22.2%
W 4
22.2%
G 3
16.7%
M 3
16.7%
Decimal Number
ValueCountFrequency (%)
0 6
50.0%
2 6
50.0%
Space Separator
ValueCountFrequency (%)
12
100.0%
Close Punctuation
ValueCountFrequency (%)
) 12
100.0%
Open Punctuation
ValueCountFrequency (%)
( 12
100.0%
Other Punctuation
ValueCountFrequency (%)
. 3
100.0%
Other Symbol
ValueCountFrequency (%)
2
100.0%
Math Symbol
ValueCountFrequency (%)
~ 1
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2439
97.2%
Common 52
 
2.1%
Latin 19
 
0.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
66
 
2.7%
66
 
2.7%
58
 
2.4%
46
 
1.9%
43
 
1.8%
41
 
1.7%
37
 
1.5%
34
 
1.4%
34
 
1.4%
33
 
1.4%
Other values (209) 1981
81.2%
Common
ValueCountFrequency (%)
12
23.1%
) 12
23.1%
( 12
23.1%
0 6
11.5%
2 6
11.5%
. 3
 
5.8%
~ 1
 
1.9%
Latin
ValueCountFrequency (%)
N 4
21.1%
E 4
21.1%
W 4
21.1%
G 3
15.8%
M 3
15.8%
1
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2437
97.1%
ASCII 70
 
2.8%
None 2
 
0.1%
Number Forms 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
66
 
2.7%
66
 
2.7%
58
 
2.4%
46
 
1.9%
43
 
1.8%
41
 
1.7%
37
 
1.5%
34
 
1.4%
34
 
1.4%
33
 
1.4%
Other values (208) 1979
81.2%
ASCII
ValueCountFrequency (%)
12
17.1%
) 12
17.1%
( 12
17.1%
0 6
8.6%
2 6
8.6%
N 4
 
5.7%
E 4
 
5.7%
W 4
 
5.7%
G 3
 
4.3%
. 3
 
4.3%
Other values (2) 4
 
5.7%
None
ValueCountFrequency (%)
2
100.0%
Number Forms
ValueCountFrequency (%)
1
100.0%

구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size5.0 KiB
3
427 
<NA>
164 
2
 
15
4
 
11
5
 
8

Length

Max length4
Median length1
Mean length1.784689
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row3
3rd row3
4th row<NA>
5th row3

Common Values

ValueCountFrequency (%)
3 427
68.1%
<NA> 164
 
26.2%
2 15
 
2.4%
4 11
 
1.8%
5 8
 
1.3%
1 2
 
0.3%

Length

2023-12-12T08:18:03.325127image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T08:18:03.408968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
3 427
68.1%
na 164
 
26.2%
2 15
 
2.4%
4 11
 
1.8%
5 8
 
1.3%
1 2
 
0.3%

상세구분
Categorical

HIGH CORRELATION 

Distinct42
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size5.0 KiB
20
205 
<NA>
184 
35
33 
5
22 
29
 
18
Other values (37)
165 

Length

Max length7
Median length2
Mean length2.7400319
Min length1

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st row20
2nd row20
3rd row20
4th row<NA>
5th row20

Common Values

ValueCountFrequency (%)
20 205
32.7%
<NA> 184
29.3%
35 33
 
5.3%
5 22
 
3.5%
29 18
 
2.9%
2,20 14
 
2.2%
3 14
 
2.2%
2,14 11
 
1.8%
22 8
 
1.3%
10 8
 
1.3%
Other values (32) 110
17.5%

Length

2023-12-12T08:18:03.505293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
20 205
32.7%
na 184
29.3%
35 33
 
5.3%
5 22
 
3.5%
29 18
 
2.9%
2,20 14
 
2.2%
3 14
 
2.2%
2,14 11
 
1.8%
2,12,33 8
 
1.3%
22 8
 
1.3%
Other values (32) 110
17.5%

전화번호
Text

MISSING 

Distinct102
Distinct (%)17.5%
Missing43
Missing (%)6.9%
Memory size5.0 KiB
2023-12-12T08:18:03.734366image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length7
Mean length7.0753425
Min length7

Characters and Unicode

Total characters4132
Distinct characters17
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st row2210893
2nd row2210893
3rd row2210893
4th row2210893
5th row2210893
ValueCountFrequency (%)
2522149 22
 
3.8%
2233777 17
 
2.9%
2531904 16
 
2.7%
2554952 13
 
2.2%
2579255 12
 
2.1%
2579064 11
 
1.9%
2565454 11
 
1.9%
2269867 10
 
1.7%
2533670 10
 
1.7%
2228525 10
 
1.7%
Other values (92) 452
77.4%
2023-12-12T08:18:04.070987image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 1143
27.7%
5 657
15.9%
7 352
 
8.5%
4 343
 
8.3%
3 339
 
8.2%
6 335
 
8.1%
9 259
 
6.3%
0 246
 
6.0%
1 236
 
5.7%
8 208
 
5.0%
Other values (7) 14
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4118
99.7%
Other Letter 10
 
0.2%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 1143
27.8%
5 657
16.0%
7 352
 
8.5%
4 343
 
8.3%
3 339
 
8.2%
6 335
 
8.1%
9 259
 
6.3%
0 246
 
6.0%
1 236
 
5.7%
8 208
 
5.1%
Other Letter
ValueCountFrequency (%)
2
20.0%
2
20.0%
2
20.0%
2
20.0%
2
20.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4122
99.8%
Hangul 10
 
0.2%

Most frequent character per script

Common
ValueCountFrequency (%)
2 1143
27.7%
5 657
15.9%
7 352
 
8.5%
4 343
 
8.3%
3 339
 
8.2%
6 335
 
8.1%
9 259
 
6.3%
0 246
 
6.0%
1 236
 
5.7%
8 208
 
5.0%
Other values (2) 4
 
0.1%
Hangul
ValueCountFrequency (%)
2
20.0%
2
20.0%
2
20.0%
2
20.0%
2
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4122
99.8%
Hangul 10
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 1143
27.7%
5 657
15.9%
7 352
 
8.5%
4 343
 
8.3%
3 339
 
8.2%
6 335
 
8.1%
9 259
 
6.3%
0 246
 
6.0%
1 236
 
5.7%
8 208
 
5.0%
Other values (2) 4
 
0.1%
Hangul
ValueCountFrequency (%)
2
20.0%
2
20.0%
2
20.0%
2
20.0%
2
20.0%

등록일자
Categorical

Distinct13
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Memory size5.0 KiB
2021-10-06
228 
2019-09-05
216 
2019-08-30
159 
2022-08-02
 
6
2020-02-24
 
4
Other values (8)
 
14

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique2 ?
Unique (%)0.3%

Sample

1st row2021-10-06
2nd row2021-10-06
3rd row2021-10-06
4th row2019-08-30
5th row2019-09-05

Common Values

ValueCountFrequency (%)
2021-10-06 228
36.4%
2019-09-05 216
34.4%
2019-08-30 159
25.4%
2022-08-02 6
 
1.0%
2020-02-24 4
 
0.6%
2022-01-03 2
 
0.3%
2022-01-20 2
 
0.3%
2023-09-15 2
 
0.3%
2023-05-22 2
 
0.3%
2022-11-01 2
 
0.3%
Other values (3) 4
 
0.6%

Length

2023-12-12T08:18:04.181071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2021-10-06 228
36.4%
2019-09-05 216
34.4%
2019-08-30 159
25.4%
2022-08-02 6
 
1.0%
2020-02-24 4
 
0.6%
2022-01-03 2
 
0.3%
2022-01-20 2
 
0.3%
2023-09-15 2
 
0.3%
2023-05-22 2
 
0.3%
2022-11-01 2
 
0.3%
Other values (3) 4
 
0.6%

Interactions

2023-12-12T08:18:02.277668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T08:18:04.239713image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
일렬번호구분상세구분등록일자
일렬번호1.0000.4280.8670.237
구분0.4281.0000.8150.193
상세구분0.8670.8151.0000.492
등록일자0.2370.1930.4921.000
2023-12-12T08:18:04.325582image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상세구분구분등록일자
상세구분1.0000.5180.184
구분0.5181.0000.106
등록일자0.1840.1061.000
2023-12-12T08:18:04.413246image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
일렬번호구분상세구분등록일자
일렬번호1.0000.1910.5010.099
구분0.1911.0000.5180.106
상세구분0.5010.5181.0000.184
등록일자0.0990.1060.1841.000

Missing values

2023-12-12T08:18:02.361223image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T08:18:02.440957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

일렬번호상가이름구분상세구분전화번호등록일자
02706몽실32022108932021-10-06
12706몽실32022108932021-10-06
22706몽실32022108932021-10-06
32706몽실<NA><NA>22108932019-08-30
42706몽실32022108932019-09-05
52707몽실32022108932021-10-06
62707몽실<NA><NA>22108932019-08-30
72707몽실32022108932019-09-05
82708흙비32022256362021-10-06
92708흙비<NA><NA>22256362019-08-30
일렬번호상가이름구분상세구분전화번호등록일자
6172920미성건강카페53522491152019-09-05
6182921종료<NA><NA><NA>2021-10-06
6192921현금출금기<NA><NA>2212560572019-08-30
6202921현금출금기4352212560572019-09-05
6212922명품가발33522666672021-10-06
6222922명품가발<NA><NA>22666672019-08-30
6232922명품가발33522666672019-09-05
6242923공예협동조합43586376862021-10-06
6252923공예협동조합<NA><NA>86376862019-08-30
6262923공예협동조합43586376862019-09-05

Duplicate rows

Most frequently occurring

일렬번호상가이름구분상세구분전화번호등록일자# duplicates
02706몽실32022108932021-10-063
12753밤블비32025336702021-10-062
22855단골언니32025420082021-10-062