Overview

Dataset statistics

Number of variables20
Number of observations2003
Missing cells2014
Missing cells (%)5.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory313.1 KiB
Average record size in memory160.1 B

Variable types

Unsupported12
Text2
Categorical6

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-21705/F/1/datasetView.do

Alerts

Unnamed: 5 is highly overall correlated with Unnamed: 3 and 4 other fieldsHigh correlation
Unnamed: 4 is highly overall correlated with Unnamed: 3 and 4 other fieldsHigh correlation
Unnamed: 3 is highly overall correlated with Unnamed: 4 and 4 other fieldsHigh correlation
Unnamed: 18 is highly overall correlated with Unnamed: 3 and 4 other fieldsHigh correlation
Unnamed: 19 is highly overall correlated with Unnamed: 3 and 4 other fieldsHigh correlation
Unnamed: 17 is highly overall correlated with Unnamed: 3 and 4 other fieldsHigh correlation
Unnamed: 3 is highly imbalanced (86.0%)Imbalance
Unnamed: 17 is highly imbalanced (99.1%)Imbalance
Unnamed: 18 is highly imbalanced (99.1%)Imbalance
Unnamed: 19 is highly imbalanced (57.8%)Imbalance
Unnamed: 16 has 2001 (99.9%) missing valuesMissing
유동인구_관찰조사_2012 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 2 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 6 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 11 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 12 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 13 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 14 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 15 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-11 09:16:24.780178
Analysis finished2023-12-11 09:16:26.531139
Duration1.75 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

유동인구_관찰조사_2012
Unsupported

REJECTED  UNSUPPORTED 

Missing1
Missing (%)< 0.1%
Memory size15.8 KiB
Distinct1002
Distinct (%)50.0%
Missing1
Missing (%)< 0.1%
Memory size15.8 KiB
2023-12-11T18:16:26.843111image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length6
Mean length6.3736264
Min length6

Characters and Unicode

Total characters12760
Distinct characters30
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row조사지점코드
2nd rowEXAMIN_SPOT_CD
3rd row01-029
4th row01-029
5th row01-033
ValueCountFrequency (%)
13-1131 2
 
0.1%
18-049 2
 
0.1%
17-3059 2
 
0.1%
19-019 2
 
0.1%
17-3060 2
 
0.1%
17-3073 2
 
0.1%
17-3078 2
 
0.1%
17-3085 2
 
0.1%
17-3101 2
 
0.1%
17-3107 2
 
0.1%
Other values (992) 1982
99.0%
2023-12-11T18:16:27.394925image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 2292
18.0%
1 2116
16.6%
- 2000
15.7%
2 1876
14.7%
3 1004
7.9%
4 742
 
5.8%
5 648
 
5.1%
7 540
 
4.2%
6 522
 
4.1%
9 504
 
3.9%
Other values (20) 516
 
4.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 10740
84.2%
Dash Punctuation 2000
 
15.7%
Uppercase Letter 12
 
0.1%
Other Letter 6
 
< 0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 1
8.3%
D 1
8.3%
C 1
8.3%
I 1
8.3%
O 1
8.3%
P 1
8.3%
S 1
8.3%
N 1
8.3%
M 1
8.3%
A 1
8.3%
Other values (2) 2
16.7%
Decimal Number
ValueCountFrequency (%)
0 2292
21.3%
1 2116
19.7%
2 1876
17.5%
3 1004
9.3%
4 742
 
6.9%
5 648
 
6.0%
7 540
 
5.0%
6 522
 
4.9%
9 504
 
4.7%
8 496
 
4.6%
Other Letter
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
Dash Punctuation
ValueCountFrequency (%)
- 2000
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12742
99.9%
Latin 12
 
0.1%
Hangul 6
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2292
18.0%
1 2116
16.6%
- 2000
15.7%
2 1876
14.7%
3 1004
7.9%
4 742
 
5.8%
5 648
 
5.1%
7 540
 
4.2%
6 522
 
4.1%
9 504
 
4.0%
Other values (2) 498
 
3.9%
Latin
ValueCountFrequency (%)
T 1
8.3%
D 1
8.3%
C 1
8.3%
I 1
8.3%
O 1
8.3%
P 1
8.3%
S 1
8.3%
N 1
8.3%
M 1
8.3%
A 1
8.3%
Other values (2) 2
16.7%
Hangul
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12754
> 99.9%
Hangul 6
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2292
18.0%
1 2116
16.6%
- 2000
15.7%
2 1876
14.7%
3 1004
7.9%
4 742
 
5.8%
5 648
 
5.1%
7 540
 
4.2%
6 522
 
4.1%
9 504
 
4.0%
Other values (14) 510
 
4.0%
Hangul
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%

Unnamed: 2
Unsupported

REJECTED  UNSUPPORTED 

Missing1
Missing (%)< 0.1%
Memory size15.8 KiB

Unnamed: 3
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.8 KiB
1890 
 
110
<NA>
 
1
조사요일
 
1
EXAMIN_DATE
 
1

Length

Max length11
Median length1
Mean length1.007988
Min length1

Unique

Unique3 ?
Unique (%)0.1%

Sample

1st row<NA>
2nd row조사요일
3rd rowEXAMIN_DATE
4th row
5th row

Common Values

ValueCountFrequency (%)
1890
94.4%
110
 
5.5%
<NA> 1
 
< 0.1%
조사요일 1
 
< 0.1%
EXAMIN_DATE 1
 
< 0.1%

Length

2023-12-11T18:16:27.536257image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T18:16:27.857172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1890
94.4%
110
 
5.5%
na 1
 
< 0.1%
조사요일 1
 
< 0.1%
examin_date 1
 
< 0.1%

Unnamed: 4
Categorical

HIGH CORRELATION 

Distinct11
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size15.8 KiB
1400
298 
0730
297 
0900
243 
1530
243 
1030
237 
Other values (6)
685 

Length

Max length15
Median length4
Mean length4.0064903
Min length4

Unique

Unique3 ?
Unique (%)0.1%

Sample

1st row<NA>
2nd row조사시작시간
3rd rowEXAMIN_START_TM
4th row0730
5th row1400

Common Values

ValueCountFrequency (%)
1400 298
14.9%
0730 297
14.8%
0900 243
12.1%
1530 243
12.1%
1030 237
11.8%
1700 236
11.8%
1230 223
11.1%
1830 223
11.1%
<NA> 1
 
< 0.1%
조사시작시간 1
 
< 0.1%

Length

2023-12-11T18:16:27.987996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1400 298
14.9%
0730 297
14.8%
0900 243
12.1%
1530 243
12.1%
1030 237
11.8%
1700 236
11.8%
1230 223
11.1%
1830 223
11.1%
na 1
 
< 0.1%
조사시작시간 1
 
< 0.1%

Unnamed: 5
Categorical

HIGH CORRELATION 

Distinct11
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size15.8 KiB
1500
298 
0830
297 
1000
243 
1630
243 
1130
237 
Other values (6)
685 

Length

Max length13
Median length4
Mean length4.0054918
Min length4

Unique

Unique3 ?
Unique (%)0.1%

Sample

1st row<NA>
2nd row조사완료시간
3rd rowEXAMIN_END_TM
4th row0830
5th row1500

Common Values

ValueCountFrequency (%)
1500 298
14.9%
0830 297
14.8%
1000 243
12.1%
1630 243
12.1%
1130 237
11.8%
1800 236
11.8%
1330 223
11.1%
1930 223
11.1%
<NA> 1
 
< 0.1%
조사완료시간 1
 
< 0.1%

Length

2023-12-11T18:16:28.122139image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1500 298
14.9%
0830 297
14.8%
1000 243
12.1%
1630 243
12.1%
1130 237
11.8%
1800 236
11.8%
1330 223
11.1%
1930 223
11.1%
na 1
 
< 0.1%
조사완료시간 1
 
< 0.1%

Unnamed: 6
Unsupported

REJECTED  UNSUPPORTED 

Missing1
Missing (%)< 0.1%
Memory size15.8 KiB

Unnamed: 7
Unsupported

REJECTED  UNSUPPORTED 

Missing1
Missing (%)< 0.1%
Memory size15.8 KiB

Unnamed: 8
Unsupported

REJECTED  UNSUPPORTED 

Missing1
Missing (%)< 0.1%
Memory size15.8 KiB

Unnamed: 9
Unsupported

REJECTED  UNSUPPORTED 

Missing1
Missing (%)< 0.1%
Memory size15.8 KiB

Unnamed: 10
Unsupported

REJECTED  UNSUPPORTED 

Missing1
Missing (%)< 0.1%
Memory size15.8 KiB

Unnamed: 11
Unsupported

REJECTED  UNSUPPORTED 

Missing1
Missing (%)< 0.1%
Memory size15.8 KiB

Unnamed: 12
Unsupported

REJECTED  UNSUPPORTED 

Missing1
Missing (%)< 0.1%
Memory size15.8 KiB

Unnamed: 13
Unsupported

REJECTED  UNSUPPORTED 

Missing1
Missing (%)< 0.1%
Memory size15.8 KiB

Unnamed: 14
Unsupported

REJECTED  UNSUPPORTED 

Missing1
Missing (%)< 0.1%
Memory size15.8 KiB

Unnamed: 15
Unsupported

REJECTED  UNSUPPORTED 

Missing1
Missing (%)< 0.1%
Memory size15.8 KiB

Unnamed: 16
Text

MISSING 

Distinct2
Distinct (%)100.0%
Missing2001
Missing (%)99.9%
Memory size15.8 KiB
2023-12-11T18:16:28.237272image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length4.5
Mean length4.5
Min length3

Characters and Unicode

Total characters9
Distinct characters9
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)100.0%

Sample

1st row기타특이사항
2nd rowETC
ValueCountFrequency (%)
기타특이사항 1
50.0%
etc 1
50.0%
2023-12-11T18:16:28.527109image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
E 1
11.1%
T 1
11.1%
C 1
11.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 6
66.7%
Uppercase Letter 3
33.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
Uppercase Letter
ValueCountFrequency (%)
E 1
33.3%
T 1
33.3%
C 1
33.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 6
66.7%
Latin 3
33.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
Latin
ValueCountFrequency (%)
E 1
33.3%
T 1
33.3%
C 1
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 6
66.7%
ASCII 3
33.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
ASCII
ValueCountFrequency (%)
E 1
33.3%
T 1
33.3%
C 1
33.3%

Unnamed: 17
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.8 KiB
2009
2000 
<NA>
 
1
년도
 
1
YEAR
 
1

Length

Max length4
Median length4
Mean length3.9990015
Min length2

Unique

Unique3 ?
Unique (%)0.1%

Sample

1st row<NA>
2nd row년도
3rd rowYEAR
4th row2009
5th row2009

Common Values

ValueCountFrequency (%)
2009 2000
99.9%
<NA> 1
 
< 0.1%
년도 1
 
< 0.1%
YEAR 1
 
< 0.1%

Length

2023-12-11T18:16:28.644686image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T18:16:28.757794image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2009 2000
99.9%
na 1
 
< 0.1%
년도 1
 
< 0.1%
year 1
 
< 0.1%

Unnamed: 18
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.8 KiB
주중
2000 
<NA>
 
1
주중주말구분
 
1
EXAMIN_WDAY_WEND
 
1

Length

Max length16
Median length2
Mean length2.009985
Min length2

Unique

Unique3 ?
Unique (%)0.1%

Sample

1st row<NA>
2nd row주중주말구분
3rd rowEXAMIN_WDAY_WEND
4th row주중
5th row주중

Common Values

ValueCountFrequency (%)
주중 2000
99.9%
<NA> 1
 
< 0.1%
주중주말구분 1
 
< 0.1%
EXAMIN_WDAY_WEND 1
 
< 0.1%

Length

2023-12-11T18:16:28.897808image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T18:16:29.023391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
주중 2000
99.9%
na 1
 
< 0.1%
주중주말구분 1
 
< 0.1%
examin_wday_wend 1
 
< 0.1%

Unnamed: 19
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.8 KiB
오후
1223 
오전
777 
<NA>
 
1
오전오후구분
 
1
EXAMIN_AM_PM
 
1

Length

Max length12
Median length2
Mean length2.007988
Min length2

Unique

Unique3 ?
Unique (%)0.1%

Sample

1st row<NA>
2nd row오전오후구분
3rd rowEXAMIN_AM_PM
4th row오전
5th row오후

Common Values

ValueCountFrequency (%)
오후 1223
61.1%
오전 777
38.8%
<NA> 1
 
< 0.1%
오전오후구분 1
 
< 0.1%
EXAMIN_AM_PM 1
 
< 0.1%

Length

2023-12-11T18:16:29.149618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T18:16:29.251178image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
오후 1223
61.1%
오전 777
38.8%
na 1
 
< 0.1%
오전오후구분 1
 
< 0.1%
examin_am_pm 1
 
< 0.1%

Correlations

2023-12-11T18:16:29.326265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 16Unnamed: 17Unnamed: 18Unnamed: 19
Unnamed: 31.0000.9210.9210.0001.0001.0000.982
Unnamed: 40.9211.0001.0000.0001.0001.0001.000
Unnamed: 50.9211.0001.0000.0001.0001.0001.000
Unnamed: 160.0000.0000.0001.0000.0000.0000.000
Unnamed: 171.0001.0001.0000.0001.0001.0001.000
Unnamed: 181.0001.0001.0000.0001.0001.0001.000
Unnamed: 190.9821.0001.0000.0001.0001.0001.000
2023-12-11T18:16:29.438330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 5Unnamed: 4Unnamed: 3Unnamed: 18Unnamed: 19Unnamed: 17
Unnamed: 51.0001.0000.8200.9980.9980.998
Unnamed: 41.0001.0000.8200.9980.9980.998
Unnamed: 30.8200.8201.0001.0000.8161.000
Unnamed: 180.9980.9981.0001.0001.0001.000
Unnamed: 190.9980.9980.8161.0001.0001.000
Unnamed: 170.9980.9981.0001.0001.0001.000
2023-12-11T18:16:29.538517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 17Unnamed: 18Unnamed: 19
Unnamed: 31.0000.8200.8201.0001.0000.816
Unnamed: 40.8201.0001.0000.9980.9980.998
Unnamed: 50.8201.0001.0000.9980.9980.998
Unnamed: 171.0000.9980.9981.0001.0001.000
Unnamed: 181.0000.9980.9981.0001.0001.000
Unnamed: 190.8160.9980.9981.0001.0001.000

Missing values

2023-12-11T18:16:25.532037image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T18:16:25.901047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T18:16:26.242738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

유동인구_관찰조사_2012Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14Unnamed: 15Unnamed: 16Unnamed: 17Unnamed: 18Unnamed: 19
0NaN<NA>NaN<NA><NA><NA>NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN<NA><NA><NA><NA>
1ID관찰조사조사지점코드조사일자조사요일조사시작시간조사완료시간남자유동인구수여성유동인구수20세미만유동인구수20대30대유동인구수40대50대유동인구수60대이상유동인구수정장착용유동인구수캐주얼착용유동인구수물건소지유동인구수빈손통행유동인구수기타특이사항년도주중주말구분오전오후구분
2ID_OBSERV_EXAMINEXAMIN_SPOT_CDEXAMIN_DAYEXAMIN_DATEEXAMIN_START_TMEXAMIN_END_TMMALEFEMALETWYO_BELOTWNT_THRTSFRTS_FFTSSXTS_ABOVESUIT_WEARCSL_WEARTHING_POSSESEMTHD_PASNGETCYEAREXAMIN_WDAY_WENDEXAMIN_AM_PM
3224401-029101607300830148757520171021<NA>2009주중오전
4224401-02910161400150086145645301147852972<NA>2009주중오후
5127601-03310090900100021601543125122<NA>2009주중오전
6127601-03310091530163022180958617315<NA>2009주중오후
7127501-0351009073008309464219118138546115<NA>2009주중오전
8127501-035100914001500717901121363712616110<NA>2009주중오후
9100801-0361009073008304818099635392469<NA>2009주중오전
유동인구_관찰조사_2012Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14Unnamed: 15Unnamed: 16Unnamed: 17Unnamed: 18Unnamed: 19
1993212625-4801016103011308150579318538<NA>2009주중오전
1994212625-4801016170018001824781312430422<NA>2009주중오후
1995127425-81410090900100012190171734441444<NA>2009주중오전
1996127425-814100915301630131307713223621<NA>2009주중오후
1997212325-818101612301330911113132024614<NA>2009주중오후
1998212325-81810161830193018194191162281126<NA>2009주중오후
1999211925-82010161030113065016141508<NA>2009주중오전
2000211925-8201016170018006838901011113<NA>2009주중오후
2001212125-82210161030113076010102519<NA>2009주중오전
2002212125-8221016170018007136823521510<NA>2009주중오후