Overview

Dataset statistics

Number of variables7
Number of observations30
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.8 KiB
Average record size in memory62.4 B

Variable types

Numeric1
Categorical4
Text2

Dataset

Description샘플 데이터
Author다음소프트
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=57

Alerts

수집소스(SOURCE) has constant value ""Constant
행정구(GU_NM) is highly overall correlated with FREQ(FREQ)High correlation
FREQ(FREQ) is highly overall correlated with 행정구(GU_NM)High correlation
FREQ(FREQ) is highly imbalanced (78.9%)Imbalance
DOC_DATE(DATE) has unique valuesUnique

Reproduction

Analysis started2023-12-10 14:53:45.673543
Analysis finished2023-12-10 14:53:46.746401
Duration1.07 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

DOC_DATE(DATE)
Real number (ℝ)

UNIQUE 

Distinct30
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20180633
Minimum20170426
Maximum20191201
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size402.0 B
2023-12-10T23:53:46.826894image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20170426
5-th percentile20170510
Q120170843
median20180564
Q320190319
95-th percentile20190988
Maximum20191201
Range20775
Interquartile range (IQR)19476.5

Descriptive statistics

Standard deviation9361.1736
Coefficient of variation (CV)0.00046386918
Kurtosis-1.9605081
Mean20180633
Median Absolute Deviation (MAD)9751
Skewness0.0062111161
Sum6.0541898 × 108
Variance87631571
MonotonicityNot monotonic
2023-12-10T23:53:46.987264image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=30)
ValueCountFrequency (%)
20170726 1
 
3.3%
20170426 1
 
3.3%
20190102 1
 
3.3%
20171012 1
 
3.3%
20170530 1
 
3.3%
20190302 1
 
3.3%
20170522 1
 
3.3%
20171119 1
 
3.3%
20170624 1
 
3.3%
20190707 1
 
3.3%
Other values (20) 20
66.7%
ValueCountFrequency (%)
20170426 1
3.3%
20170501 1
3.3%
20170522 1
3.3%
20170530 1
3.3%
20170624 1
3.3%
20170719 1
3.3%
20170726 1
3.3%
20170823 1
3.3%
20170902 1
3.3%
20171012 1
3.3%
ValueCountFrequency (%)
20191201 1
3.3%
20191125 1
3.3%
20190820 1
3.3%
20190728 1
3.3%
20190707 1
3.3%
20190511 1
3.3%
20190423 1
3.3%
20190323 1
3.3%
20190308 1
3.3%
20190302 1
3.3%

수집소스(SOURCE)
Categorical

CONSTANT 

Distinct1
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Memory size372.0 B
블로그
30 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row블로그
2nd row블로그
3rd row블로그
4th row블로그
5th row블로그

Common Values

ValueCountFrequency (%)
블로그 30
100.0%

Length

2023-12-10T23:53:47.174131image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T23:53:47.316195image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
블로그 30
100.0%
Distinct21
Distinct (%)70.0%
Missing0
Missing (%)0.0%
Memory size372.0 B
2023-12-10T23:53:47.461315image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length6
Mean length3.3
Min length2

Characters and Unicode

Total characters99
Distinct characters57
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15 ?
Unique (%)50.0%

Sample

1st row상봉
2nd row용산
3rd row청계천
4th row서울
5th row서울
ValueCountFrequency (%)
서울 4
 
13.3%
용산 3
 
10.0%
상봉 2
 
6.7%
강남 2
 
6.7%
연남동 2
 
6.7%
코엑스 2
 
6.7%
용산cgv 1
 
3.3%
망원동 1
 
3.3%
왕십리cgv 1
 
3.3%
대학로 1
 
3.3%
Other values (11) 11
36.7%
2023-12-10T23:53:47.868329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6
 
6.1%
5
 
5.1%
4
 
4.0%
4
 
4.0%
4
 
4.0%
4
 
4.0%
3
 
3.0%
v 3
 
3.0%
g 3
 
3.0%
c 3
 
3.0%
Other values (47) 60
60.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 90
90.9%
Lowercase Letter 9
 
9.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6
 
6.7%
5
 
5.6%
4
 
4.4%
4
 
4.4%
4
 
4.4%
4
 
4.4%
3
 
3.3%
3
 
3.3%
2
 
2.2%
2
 
2.2%
Other values (44) 53
58.9%
Lowercase Letter
ValueCountFrequency (%)
v 3
33.3%
g 3
33.3%
c 3
33.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 90
90.9%
Latin 9
 
9.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6
 
6.7%
5
 
5.6%
4
 
4.4%
4
 
4.4%
4
 
4.4%
4
 
4.4%
3
 
3.3%
3
 
3.3%
2
 
2.2%
2
 
2.2%
Other values (44) 53
58.9%
Latin
ValueCountFrequency (%)
v 3
33.3%
g 3
33.3%
c 3
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 90
90.9%
ASCII 9
 
9.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
6
 
6.7%
5
 
5.6%
4
 
4.4%
4
 
4.4%
4
 
4.4%
4
 
4.4%
3
 
3.3%
3
 
3.3%
2
 
2.2%
2
 
2.2%
Other values (44) 53
58.9%
ASCII
ValueCountFrequency (%)
v 3
33.3%
g 3
33.3%
c 3
33.3%

행정구(GU_NM)
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)46.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
용산구
서울
강남구
영등포구
중구
Other values (9)
14 

Length

Max length4
Median length3
Mean length2.9333333
Min length2

Unique

Unique4 ?
Unique (%)13.3%

Sample

1st row동대문구
2nd row영등포구
3rd row용산구
4th row용산구
5th row중구

Common Values

ValueCountFrequency (%)
용산구 5
16.7%
서울 4
13.3%
강남구 3
10.0%
영등포구 2
 
6.7%
중구 2
 
6.7%
마포구 2
 
6.7%
종로구 2
 
6.7%
광진구 2
 
6.7%
성동구 2
 
6.7%
구로구 2
 
6.7%
Other values (4) 4
13.3%

Length

2023-12-10T23:53:48.021749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
용산구 5
16.7%
서울 4
13.3%
강남구 3
10.0%
영등포구 2
 
6.7%
중구 2
 
6.7%
마포구 2
 
6.7%
종로구 2
 
6.7%
광진구 2
 
6.7%
성동구 2
 
6.7%
구로구 2
 
6.7%
Other values (4) 4
13.3%
Distinct17
Distinct (%)56.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
2023-12-10T23:53:48.210772image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length2
Mean length2.5333333
Min length2

Characters and Unicode

Total characters76
Distinct characters39
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)30.0%

Sample

1st row팝콘
2nd row콤보
3rd row무대인사
4th row관크
5th row화면
ValueCountFrequency (%)
팝콘 4
13.3%
콤보 3
10.0%
시설 3
10.0%
행사 3
10.0%
무대인사 2
 
6.7%
화면 2
 
6.7%
매점 2
 
6.7%
시사회 2
 
6.7%
인디 1
 
3.3%
포스터 1
 
3.3%
Other values (7) 7
23.3%
2023-12-10T23:53:48.551713image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7
 
9.2%
6
 
7.9%
4
 
5.3%
4
 
5.3%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
Other values (29) 37
48.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 74
97.4%
Decimal Number 1
 
1.3%
Lowercase Letter 1
 
1.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
7
 
9.5%
6
 
8.1%
4
 
5.4%
4
 
5.4%
3
 
4.1%
3
 
4.1%
3
 
4.1%
3
 
4.1%
3
 
4.1%
3
 
4.1%
Other values (27) 35
47.3%
Decimal Number
ValueCountFrequency (%)
3 1
100.0%
Lowercase Letter
ValueCountFrequency (%)
d 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 74
97.4%
Common 1
 
1.3%
Latin 1
 
1.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
7
 
9.5%
6
 
8.1%
4
 
5.4%
4
 
5.4%
3
 
4.1%
3
 
4.1%
3
 
4.1%
3
 
4.1%
3
 
4.1%
3
 
4.1%
Other values (27) 35
47.3%
Common
ValueCountFrequency (%)
3 1
100.0%
Latin
ValueCountFrequency (%)
d 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 74
97.4%
ASCII 2
 
2.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
7
 
9.5%
6
 
8.1%
4
 
5.4%
4
 
5.4%
3
 
4.1%
3
 
4.1%
3
 
4.1%
3
 
4.1%
3
 
4.1%
3
 
4.1%
Other values (27) 35
47.3%
ASCII
ValueCountFrequency (%)
3 1
50.0%
d 1
50.0%
Distinct3
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Memory size372.0 B
취향
16 
먹거리
관람환경

Length

Max length4
Median length2
Mean length2.6666667
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row취향
2nd row먹거리
3rd row먹거리
4th row관람환경
5th row먹거리

Common Values

ValueCountFrequency (%)
취향 16
53.3%
먹거리 8
26.7%
관람환경 6
 
20.0%

Length

2023-12-10T23:53:48.709746image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T23:53:48.851951image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
취향 16
53.3%
먹거리 8
26.7%
관람환경 6
 
20.0%

FREQ(FREQ)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size372.0 B
1
29 
2
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique1 ?
Unique (%)3.3%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 29
96.7%
2 1
 
3.3%

Length

2023-12-10T23:53:48.977422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T23:53:49.068939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 29
96.7%
2 1
 
3.3%

Interactions

2023-12-10T23:53:45.993921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T23:53:49.133413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
DOC_DATE(DATE)행정동(DONG_NM)행정구(GU_NM)세부견인요소(KEYWORD_DETAIL)견인요소(KEYWORD)FREQ(FREQ)
DOC_DATE(DATE)1.0000.9730.0000.0000.0000.000
행정동(DONG_NM)0.9731.0000.0000.0000.0000.000
행정구(GU_NM)0.0000.0001.0000.3320.3501.000
세부견인요소(KEYWORD_DETAIL)0.0000.0000.3321.0000.4430.000
견인요소(KEYWORD)0.0000.0000.3500.4431.0000.000
FREQ(FREQ)0.0000.0001.0000.0000.0001.000
2023-12-10T23:53:49.254609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
견인요소(KEYWORD)행정구(GU_NM)FREQ(FREQ)
견인요소(KEYWORD)1.0000.1010.000
행정구(GU_NM)0.1011.0000.756
FREQ(FREQ)0.0000.7561.000
2023-12-10T23:53:49.356670image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
DOC_DATE(DATE)행정구(GU_NM)견인요소(KEYWORD)FREQ(FREQ)
DOC_DATE(DATE)1.0000.0000.0000.000
행정구(GU_NM)0.0001.0000.1010.756
견인요소(KEYWORD)0.0000.1011.0000.000
FREQ(FREQ)0.0000.7560.0001.000

Missing values

2023-12-10T23:53:46.479974image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T23:53:46.688531image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

DOC_DATE(DATE)수집소스(SOURCE)행정동(DONG_NM)행정구(GU_NM)세부견인요소(KEYWORD_DETAIL)견인요소(KEYWORD)FREQ(FREQ)
020170726블로그상봉동대문구팝콘취향1
120190820블로그용산영등포구콤보먹거리1
220180223블로그청계천용산구무대인사먹거리1
320190308블로그서울용산구관크관람환경1
420170501블로그서울중구화면먹거리1
520170902블로그상봉마포구시야관람환경1
620180801블로그강남역마포구매점먹거리1
720170823블로그잠실새내강남구행사취향1
820191201블로그국회의사당종로구3d먹거리1
920180517블로그영등포동강남구팬덤취향1
DOC_DATE(DATE)수집소스(SOURCE)행정동(DONG_NM)행정구(GU_NM)세부견인요소(KEYWORD_DETAIL)견인요소(KEYWORD)FREQ(FREQ)
2020171016블로그용산cgv구로구팝콘취향1
2120190423블로그동대문디자인플라자용산구행사취향1
2220190707블로그연남동용산구다큐멘터리영화관람환경1
2320170624블로그대학로서울시사회취향1
2420171119블로그코엑스서울포스터먹거리1
2520170522블로그연남동성동구인디먹거리1
2620190302블로그왕십리cgv송파구매점취향1
2720170530블로그망원동구로구시설관람환경1
2820171012블로그서울광진구콤보취향1
2920190102블로그신촌서울세트메뉴취향1