Overview

Dataset statistics

Number of variables6
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.0 KiB
Average record size in memory51.3 B

Variable types

Numeric2
Categorical3
Text1

Dataset

Description샘플 데이터
Author다음소프트
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=57

Alerts

수집소스(SOURCE) is highly overall correlated with 행정구(GU_NM)High correlation
행정구(GU_NM) is highly overall correlated with 수집소스(SOURCE)High correlation
수집소스(SOURCE) is highly imbalanced (85.9%)Imbalance

Reproduction

Analysis started2023-12-10 14:54:00.539845
Analysis finished2023-12-10 14:54:02.222632
Duration1.68 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

DOC_DATE(DATE)
Real number (ℝ)

Distinct94
Distinct (%)94.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20180248
Minimum20170101
Maximum20191231
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T23:54:02.344977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20170101
5-th percentile20170206
Q120170927
median20180621
Q320190148
95-th percentile20191021
Maximum20191231
Range21130
Interquartile range (IQR)19220.75

Descriptive statistics

Standard deviation7771.832
Coefficient of variation (CV)0.00038512074
Kurtosis-1.328081
Mean20180248
Median Absolute Deviation (MAD)9599
Skewness0.060382441
Sum2.0180248 × 109
Variance60401372
MonotonicityNot monotonic
2023-12-10T23:54:02.543125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20190318 2
 
2.0%
20170927 2
 
2.0%
20191217 2
 
2.0%
20190816 2
 
2.0%
20180220 2
 
2.0%
20181204 2
 
2.0%
20180609 1
 
1.0%
20190526 1
 
1.0%
20181221 1
 
1.0%
20181120 1
 
1.0%
Other values (84) 84
84.0%
ValueCountFrequency (%)
20170101 1
1.0%
20170113 1
1.0%
20170116 1
1.0%
20170118 1
1.0%
20170204 1
1.0%
20170206 1
1.0%
20170214 1
1.0%
20170226 1
1.0%
20170315 1
1.0%
20170320 1
1.0%
ValueCountFrequency (%)
20191231 1
1.0%
20191221 1
1.0%
20191217 2
2.0%
20191124 1
1.0%
20191016 1
1.0%
20190906 1
1.0%
20190903 1
1.0%
20190820 1
1.0%
20190816 2
2.0%
20190731 1
1.0%

수집소스(SOURCE)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
블로그커뮤니티
98 
트위터
 
2

Length

Max length7
Median length7
Mean length6.92
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row블로그커뮤니티
2nd row블로그커뮤니티
3rd row블로그커뮤니티
4th row트위터
5th row블로그커뮤니티

Common Values

ValueCountFrequency (%)
블로그커뮤니티 98
98.0%
트위터 2
 
2.0%

Length

2023-12-10T23:54:02.730172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T23:54:02.881907image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
블로그커뮤니티 98
98.0%
트위터 2
 
2.0%
Distinct61
Distinct (%)61.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-10T23:54:03.175464image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length7
Mean length3.89
Min length2

Characters and Unicode

Total characters389
Distinct characters101
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique40 ?
Unique (%)40.0%

Sample

1st row광화문
2nd rowk현대미술관
3rd row서울
4th row서울
5th row강남
ValueCountFrequency (%)
서울 6
 
6.0%
서울시립미술관 5
 
5.0%
용산 5
 
5.0%
한가람미술관 4
 
4.0%
경복궁 3
 
3.0%
동대문디자인플라자 3
 
3.0%
국립현대미술관 3
 
3.0%
예술의전당 3
 
3.0%
광화문 3
 
3.0%
국립중앙박물관 3
 
3.0%
Other values (51) 62
62.0%
2023-12-10T23:54:03.814332image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
23
 
5.9%
19
 
4.9%
18
 
4.6%
15
 
3.9%
15
 
3.9%
14
 
3.6%
13
 
3.3%
10
 
2.6%
9
 
2.3%
8
 
2.1%
Other values (91) 245
63.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 387
99.5%
Lowercase Letter 2
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
23
 
5.9%
19
 
4.9%
18
 
4.7%
15
 
3.9%
15
 
3.9%
14
 
3.6%
13
 
3.4%
10
 
2.6%
9
 
2.3%
8
 
2.1%
Other values (89) 243
62.8%
Lowercase Letter
ValueCountFrequency (%)
d 1
50.0%
k 1
50.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 387
99.5%
Latin 2
 
0.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
23
 
5.9%
19
 
4.9%
18
 
4.7%
15
 
3.9%
15
 
3.9%
14
 
3.6%
13
 
3.4%
10
 
2.6%
9
 
2.3%
8
 
2.1%
Other values (89) 243
62.8%
Latin
ValueCountFrequency (%)
d 1
50.0%
k 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 387
99.5%
ASCII 2
 
0.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
23
 
5.9%
19
 
4.9%
18
 
4.7%
15
 
3.9%
15
 
3.9%
14
 
3.6%
13
 
3.4%
10
 
2.6%
9
 
2.3%
8
 
2.1%
Other values (89) 243
62.8%
ASCII
ValueCountFrequency (%)
d 1
50.0%
k 1
50.0%

행정구(GU_NM)
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)17.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
종로구
37 
서초구
13 
용산구
12 
서울
마포구
Other values (12)
26 

Length

Max length4
Median length3
Mean length2.93
Min length2

Unique

Unique5 ?
Unique (%)5.0%

Sample

1st row종로구
2nd row종로구
3rd row중구
4th row금천구
5th row종로구

Common Values

ValueCountFrequency (%)
종로구 37
37.0%
서초구 13
 
13.0%
용산구 12
 
12.0%
서울 7
 
7.0%
마포구 5
 
5.0%
강남구 4
 
4.0%
중구 4
 
4.0%
영등포구 3
 
3.0%
동작구 3
 
3.0%
송파구 3
 
3.0%
Other values (7) 9
 
9.0%

Length

2023-12-10T23:54:04.079245image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
종로구 37
37.0%
서초구 13
 
13.0%
용산구 12
 
12.0%
서울 7
 
7.0%
마포구 5
 
5.0%
강남구 4
 
4.0%
중구 4
 
4.0%
송파구 3
 
3.0%
동작구 3
 
3.0%
영등포구 3
 
3.0%
Other values (7) 9
 
9.0%
Distinct15
Distinct (%)15.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
전시
48 
전시회
28 
개인전
 
4
예술품
 
3
명화
 
3
Other values (10)
14 

Length

Max length5
Median length2
Mean length2.57
Min length2

Unique

Unique6 ?
Unique (%)6.0%

Sample

1st row전시
2nd row전시
3rd row예술품
4th row전시회
5th row전시

Common Values

ValueCountFrequency (%)
전시 48
48.0%
전시회 28
28.0%
개인전 4
 
4.0%
예술품 3
 
3.0%
명화 3
 
3.0%
전시공간 2
 
2.0%
사진전 2
 
2.0%
ddp전시 2
 
2.0%
도슨트 2
 
2.0%
뒤샹전 1
 
1.0%
Other values (5) 5
 
5.0%

Length

2023-12-10T23:54:04.360556image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
전시 48
48.0%
전시회 28
28.0%
개인전 4
 
4.0%
예술품 3
 
3.0%
명화 3
 
3.0%
전시공간 2
 
2.0%
사진전 2
 
2.0%
ddp전시 2
 
2.0%
도슨트 2
 
2.0%
뒤샹전 1
 
1.0%
Other values (5) 5
 
5.0%

FREQ(FREQ)
Real number (ℝ)

Distinct6
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.37
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T23:54:04.546099image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile3
Maximum9
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.2685393
Coefficient of variation (CV)0.92594108
Kurtosis26.017572
Mean1.37
Median Absolute Deviation (MAD)0
Skewness4.8730352
Sum137
Variance1.6091919
MonotonicityNot monotonic
2023-12-10T23:54:04.703000image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
1 85
85.0%
2 8
 
8.0%
3 3
 
3.0%
9 2
 
2.0%
4 1
 
1.0%
5 1
 
1.0%
ValueCountFrequency (%)
1 85
85.0%
2 8
 
8.0%
3 3
 
3.0%
4 1
 
1.0%
5 1
 
1.0%
9 2
 
2.0%
ValueCountFrequency (%)
9 2
 
2.0%
5 1
 
1.0%
4 1
 
1.0%
3 3
 
3.0%
2 8
 
8.0%
1 85
85.0%

Interactions

2023-12-10T23:54:01.718639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:54:01.415239image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:54:01.832810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:54:01.606985image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T23:54:04.856232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
DOC_DATE(DATE)수집소스(SOURCE)행정동(DONG_NM)행정구(GU_NM)세부키워드(KEYWORD_DETAIL)FREQ(FREQ)
DOC_DATE(DATE)1.0000.1870.4700.0000.0000.121
수집소스(SOURCE)0.1871.0000.0000.7740.0000.000
행정동(DONG_NM)0.4700.0001.0000.0000.6150.000
행정구(GU_NM)0.0000.7740.0001.0000.2340.000
세부키워드(KEYWORD_DETAIL)0.0000.0000.6150.2341.0000.000
FREQ(FREQ)0.1210.0000.0000.0000.0001.000
2023-12-10T23:54:05.062658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
세부키워드(KEYWORD_DETAIL)수집소스(SOURCE)행정구(GU_NM)
세부키워드(KEYWORD_DETAIL)1.0000.0000.066
수집소스(SOURCE)0.0001.0000.659
행정구(GU_NM)0.0660.6591.000
2023-12-10T23:54:05.227148image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
DOC_DATE(DATE)FREQ(FREQ)수집소스(SOURCE)행정구(GU_NM)세부키워드(KEYWORD_DETAIL)
DOC_DATE(DATE)1.000-0.0130.1150.0000.000
FREQ(FREQ)-0.0131.0000.0000.0000.000
수집소스(SOURCE)0.1150.0001.0000.6590.000
행정구(GU_NM)0.0000.0000.6591.0000.066
세부키워드(KEYWORD_DETAIL)0.0000.0000.0000.0661.000

Missing values

2023-12-10T23:54:02.028907image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T23:54:02.158507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

DOC_DATE(DATE)수집소스(SOURCE)행정동(DONG_NM)행정구(GU_NM)세부키워드(KEYWORD_DETAIL)FREQ(FREQ)
020180609블로그커뮤니티광화문종로구전시1
120180823블로그커뮤니티k현대미술관종로구전시1
220190410블로그커뮤니티서울중구예술품1
320170927트위터서울금천구전시회1
420190903블로그커뮤니티강남종로구전시1
520190219블로그커뮤니티북촌종로구전시1
620180109블로그커뮤니티뚝섬유원지광진구전시회1
720190416블로그커뮤니티국립중앙박물관용산구전시1
820170214블로그커뮤니티d타워서초구전시1
920180913블로그커뮤니티평창동중구전시회1
DOC_DATE(DATE)수집소스(SOURCE)행정동(DONG_NM)행정구(GU_NM)세부키워드(KEYWORD_DETAIL)FREQ(FREQ)
9020180802블로그커뮤니티서초마포구전시1
9120180523블로그커뮤니티이태원서울전시1
9220180701블로그커뮤니티청계천종로구전시1
9320171021블로그커뮤니티용산구용산구현대미술1
9420170514블로그커뮤니티어린이미술관서초구전시1
9520180304블로그커뮤니티서울미술관송파구전시1
9620181204블로그커뮤니티신촌강남구전시회1
9720181102블로그커뮤니티한강용산구전시9
9820190820블로그커뮤니티익선동서초구전시회1
9920180328블로그커뮤니티동대문역사문화공원종로구미술품1