Overview

Dataset statistics

Number of variables6
Number of observations500
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory24.5 KiB
Average record size in memory50.3 B

Variable types

Numeric2
Categorical3
Text1

Dataset

Description샘플 데이터
Author다음소프트
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=57

Alerts

세부키워드(KEYWORD_DETAIL) has constant value ""Constant
수집소스(SOURCE) is highly imbalanced (94.7%)Imbalance

Reproduction

Analysis started2023-12-10 14:54:12.423766
Analysis finished2023-12-10 14:54:13.537381
Duration1.11 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

DOC_DATE(DATE)
Real number (ℝ)

Distinct392
Distinct (%)78.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20181310
Minimum20170101
Maximum20191228
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T23:54:13.625832image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20170101
5-th percentile20170311
Q120171023
median20180818
Q320190406
95-th percentile20191111
Maximum20191228
Range21127
Interquartile range (IQR)19382.5

Descriptive statistics

Standard deviation8149.1223
Coefficient of variation (CV)0.00040379551
Kurtosis-1.4805278
Mean20181310
Median Absolute Deviation (MAD)9613.5
Skewness-0.12124703
Sum1.0090655 × 1010
Variance66408194
MonotonicityNot monotonic
2023-12-10T23:54:13.817246image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20190118 4
 
0.8%
20191010 3
 
0.6%
20190114 3
 
0.6%
20170605 3
 
0.6%
20180817 3
 
0.6%
20191214 3
 
0.6%
20180322 3
 
0.6%
20181128 3
 
0.6%
20190205 3
 
0.6%
20180201 3
 
0.6%
Other values (382) 469
93.8%
ValueCountFrequency (%)
20170101 1
0.2%
20170102 1
0.2%
20170111 1
0.2%
20170114 1
0.2%
20170118 1
0.2%
20170120 1
0.2%
20170126 1
0.2%
20170201 1
0.2%
20170205 1
0.2%
20170206 1
0.2%
ValueCountFrequency (%)
20191228 1
 
0.2%
20191227 1
 
0.2%
20191226 1
 
0.2%
20191222 2
0.4%
20191219 1
 
0.2%
20191216 2
0.4%
20191215 1
 
0.2%
20191214 3
0.6%
20191210 1
 
0.2%
20191207 1
 
0.2%

수집소스(SOURCE)
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
커뮤니티블로그
497 
트위터
 
3

Length

Max length7
Median length7
Mean length6.976
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row커뮤니티블로그
2nd row커뮤니티블로그
3rd row커뮤니티블로그
4th row커뮤니티블로그
5th row커뮤니티블로그

Common Values

ValueCountFrequency (%)
커뮤니티블로그 497
99.4%
트위터 3
 
0.6%

Length

2023-12-10T23:54:13.991553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T23:54:14.113319image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
커뮤니티블로그 497
99.4%
트위터 3
 
0.6%
Distinct293
Distinct (%)58.6%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2023-12-10T23:54:14.409387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length3.382
Min length2

Characters and Unicode

Total characters1691
Distinct characters209
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique165 ?
Unique (%)33.0%

Sample

1st row대림
2nd row압구정로데오역
3rd row중구
4th row보광동
5th row후암동
ValueCountFrequency (%)
서울대입구 6
 
1.2%
혜화역 5
 
1.0%
송파구 5
 
1.0%
광화문 5
 
1.0%
신사역 5
 
1.0%
망원 5
 
1.0%
서울역 5
 
1.0%
뚝섬 5
 
1.0%
중구 5
 
1.0%
한강진 5
 
1.0%
Other values (283) 449
89.8%
2023-12-10T23:54:14.921179image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
117
 
6.9%
111
 
6.6%
62
 
3.7%
53
 
3.1%
37
 
2.2%
31
 
1.8%
31
 
1.8%
30
 
1.8%
30
 
1.8%
26
 
1.5%
Other values (199) 1163
68.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1657
98.0%
Lowercase Letter 26
 
1.5%
Decimal Number 8
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
117
 
7.1%
111
 
6.7%
62
 
3.7%
53
 
3.2%
37
 
2.2%
31
 
1.9%
31
 
1.9%
30
 
1.8%
30
 
1.8%
26
 
1.6%
Other values (191) 1129
68.1%
Lowercase Letter
ValueCountFrequency (%)
c 7
26.9%
v 7
26.9%
g 7
26.9%
n 3
11.5%
d 2
 
7.7%
Decimal Number
ValueCountFrequency (%)
3 5
62.5%
2 2
 
25.0%
6 1
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1657
98.0%
Latin 26
 
1.5%
Common 8
 
0.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
117
 
7.1%
111
 
6.7%
62
 
3.7%
53
 
3.2%
37
 
2.2%
31
 
1.9%
31
 
1.9%
30
 
1.8%
30
 
1.8%
26
 
1.6%
Other values (191) 1129
68.1%
Latin
ValueCountFrequency (%)
c 7
26.9%
v 7
26.9%
g 7
26.9%
n 3
11.5%
d 2
 
7.7%
Common
ValueCountFrequency (%)
3 5
62.5%
2 2
 
25.0%
6 1
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1657
98.0%
ASCII 34
 
2.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
117
 
7.1%
111
 
6.7%
62
 
3.7%
53
 
3.2%
37
 
2.2%
31
 
1.9%
31
 
1.9%
30
 
1.8%
30
 
1.8%
26
 
1.6%
Other values (191) 1129
68.1%
ASCII
ValueCountFrequency (%)
c 7
20.6%
v 7
20.6%
g 7
20.6%
3 5
14.7%
n 3
8.8%
2 2
 
5.9%
d 2
 
5.9%
6 1
 
2.9%

행정구(GU_NM)
Categorical

Distinct26
Distinct (%)5.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
강남구
64 
종로구
60 
마포구
42 
용산구
41 
송파구
35 
Other values (21)
258 

Length

Max length4
Median length3
Mean length3.042
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row양천구
2nd row성북구
3rd row송파구
4th row강남구
5th row서대문구

Common Values

ValueCountFrequency (%)
강남구 64
12.8%
종로구 60
 
12.0%
마포구 42
 
8.4%
용산구 41
 
8.2%
송파구 35
 
7.0%
영등포구 26
 
5.2%
서초구 26
 
5.2%
중구 23
 
4.6%
강서구 17
 
3.4%
성동구 16
 
3.2%
Other values (16) 150
30.0%

Length

2023-12-10T23:54:15.154947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
강남구 64
12.8%
종로구 60
 
12.0%
마포구 42
 
8.4%
용산구 41
 
8.2%
송파구 35
 
7.0%
영등포구 26
 
5.2%
서초구 26
 
5.2%
중구 23
 
4.6%
강서구 17
 
3.4%
성동구 16
 
3.2%
Other values (16) 150
30.0%

세부키워드(KEYWORD_DETAIL)
Categorical

CONSTANT 

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
디저트
500 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row디저트
2nd row디저트
3rd row디저트
4th row디저트
5th row디저트

Common Values

ValueCountFrequency (%)
디저트 500
100.0%

Length

2023-12-10T23:54:15.326809image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T23:54:15.433513image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
디저트 500
100.0%

FREQ(FREQ)
Real number (ℝ)

Distinct23
Distinct (%)4.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.13
Minimum1
Maximum72
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T23:54:15.533456image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q33
95-th percentile9.05
Maximum72
Range71
Interquartile range (IQR)2

Descriptive statistics

Standard deviation5.2863021
Coefficient of variation (CV)1.6889144
Kurtosis71.549791
Mean3.13
Median Absolute Deviation (MAD)0
Skewness7.1559491
Sum1565
Variance27.94499
MonotonicityNot monotonic
2023-12-10T23:54:15.700207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
1 251
50.2%
2 82
 
16.4%
3 53
 
10.6%
4 26
 
5.2%
5 24
 
4.8%
6 14
 
2.8%
8 13
 
2.6%
7 9
 
1.8%
10 5
 
1.0%
11 4
 
0.8%
Other values (13) 19
 
3.8%
ValueCountFrequency (%)
1 251
50.2%
2 82
 
16.4%
3 53
 
10.6%
4 26
 
5.2%
5 24
 
4.8%
6 14
 
2.8%
7 9
 
1.8%
8 13
 
2.6%
9 3
 
0.6%
10 5
 
1.0%
ValueCountFrequency (%)
72 1
0.2%
40 1
0.2%
38 1
0.2%
37 1
0.2%
33 1
0.2%
29 1
0.2%
22 1
0.2%
19 1
0.2%
16 1
0.2%
15 1
0.2%

Interactions

2023-12-10T23:54:13.065058image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:54:12.795734image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:54:13.188885image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:54:12.937527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T23:54:15.806595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
DOC_DATE(DATE)수집소스(SOURCE)행정구(GU_NM)FREQ(FREQ)
DOC_DATE(DATE)1.0000.0000.0000.000
수집소스(SOURCE)0.0001.0000.0000.000
행정구(GU_NM)0.0000.0001.0000.234
FREQ(FREQ)0.0000.0000.2341.000
2023-12-10T23:54:15.915745image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수집소스(SOURCE)행정구(GU_NM)
수집소스(SOURCE)1.0000.000
행정구(GU_NM)0.0001.000
2023-12-10T23:54:16.015627image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
DOC_DATE(DATE)FREQ(FREQ)수집소스(SOURCE)행정구(GU_NM)
DOC_DATE(DATE)1.000-0.0540.0000.000
FREQ(FREQ)-0.0541.0000.0000.099
수집소스(SOURCE)0.0000.0001.0000.000
행정구(GU_NM)0.0000.0990.0001.000

Missing values

2023-12-10T23:54:13.364924image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T23:54:13.488945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

DOC_DATE(DATE)수집소스(SOURCE)행정동(DONG_NM)행정구(GU_NM)세부키워드(KEYWORD_DETAIL)FREQ(FREQ)
020191120커뮤니티블로그대림양천구디저트10
120190118커뮤니티블로그압구정로데오역성북구디저트2
220190501커뮤니티블로그중구송파구디저트1
320190216커뮤니티블로그보광동강남구디저트1
420180825커뮤니티블로그후암동서대문구디저트1
520191130트위터마포구용산구디저트3
620180405커뮤니티블로그뚝섬역종로구디저트2
720170912커뮤니티블로그청계천용산구디저트1
820171024커뮤니티블로그상도동송파구디저트1
920170829커뮤니티블로그광화문중구디저트1
DOC_DATE(DATE)수집소스(SOURCE)행정동(DONG_NM)행정구(GU_NM)세부키워드(KEYWORD_DETAIL)FREQ(FREQ)
49020180323커뮤니티블로그서울양천구디저트1
49120190215커뮤니티블로그보문서대문구디저트72
49220180716커뮤니티블로그신사역중랑구디저트4
49320180906커뮤니티블로그홍대입구서대문구디저트1
49420191010커뮤니티블로그회현동강동구디저트1
49520190807커뮤니티블로그뚝섬용산구디저트1
49620180619커뮤니티블로그공덕강남구디저트3
49720170412커뮤니티블로그잠실역구로구디저트2
49820190810커뮤니티블로그서교동종로구디저트4
49920190520커뮤니티블로그방이동금천구디저트7