Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells1
Missing cells (%)< 0.1%
Duplicate rows77
Duplicate rows (%)0.8%
Total size in memory732.4 KiB
Average record size in memory75.0 B

Variable types

Categorical4
Text1
Numeric3

Dataset

Description경기도 포천시에서 제공하는 2015년 1월 1일부터 2022년 3월 31일까지 건축허가, 건축신고, 가설건물허가 등에 대한 건축구분, 위치, 면적, 연면적, 최대지상층, 용도 등 데이터 입니다.
Author경기도 포천시
URLhttps://www.data.go.kr/data/15100071/fileData.do

Alerts

데이터기준일 has constant value ""Constant
Dataset has 77 (0.8%) duplicate rowsDuplicates
건축면적(제곱미터) is highly overall correlated with 연면적(제곱미터)High correlation
연면적(제곱미터) is highly overall correlated with 건축면적(제곱미터)High correlation
종류 is highly overall correlated with 건축구분 and 1 other fieldsHigh correlation
건축구분 is highly overall correlated with 종류High correlation
주용도 is highly overall correlated with 종류High correlation
건축구분 is highly imbalanced (62.6%)Imbalance
건축면적(제곱미터) is highly skewed (γ1 = 22.78520993)Skewed
연면적(제곱미터) is highly skewed (γ1 = 33.55307021)Skewed
최대지상층수 is highly skewed (γ1 = 28.69755767)Skewed

Reproduction

Analysis started2023-10-09 19:14:08.203382
Analysis finished2023-10-09 19:14:13.012277
Duration4.81 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

종류
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
건축신고
6134 
건축허가
3864 
기설건축물축조
 
2

Length

Max length7
Median length4
Mean length4.0006
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row건축신고
2nd row건축허가
3rd row건축신고
4th row건축허가
5th row건축신고

Common Values

ValueCountFrequency (%)
건축신고 6134
61.3%
건축허가 3864
38.6%
기설건축물축조 2
 
< 0.1%

Length

2023-10-10T04:14:13.137012image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-10T04:14:13.346377image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
건축신고 6134
61.3%
건축허가 3864
38.6%
기설건축물축조 2
 
< 0.1%

건축구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
신축
6976 
증축
2265 
용도변경
 
678
대수선
 
29
재축
 
20
Other values (4)
 
32

Length

Max length9
Median length2
Mean length2.149
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row신축
2nd row신축
3rd row신축
4th row신축
5th row신축

Common Values

ValueCountFrequency (%)
신축 6976
69.8%
증축 2265
 
22.7%
용도변경 678
 
6.8%
대수선 29
 
0.3%
재축 20
 
0.2%
가설건축물축조허가 15
 
0.1%
개축 14
 
0.1%
허가 2
 
< 0.1%
이전 1
 
< 0.1%

Length

2023-10-10T04:14:13.587865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-10T04:14:13.801068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
신축 6976
69.8%
증축 2265
 
22.7%
용도변경 678
 
6.8%
대수선 29
 
0.3%
재축 20
 
0.2%
가설건축물축조허가 15
 
0.1%
개축 14
 
0.1%
허가 2
 
< 0.1%
이전 1
 
< 0.1%
Distinct8900
Distinct (%)89.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-10-10T04:14:14.585452image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length34
Median length28
Mean length22.1007
Min length13

Characters and Unicode

Total characters221007
Distinct characters128
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8006 ?
Unique (%)80.1%

Sample

1st row경기도 포천시 내촌면 마명리 253-7
2nd row경기도 포천시 신읍동 308-14
3rd row경기도 포천시 창수면 오가리 670-1
4th row경기도 포천시 영북면 산정리 659
5th row경기도 포천시 군내면 용정리 28
ValueCountFrequency (%)
경기도 10000
19.0%
포천시 10000
19.0%
외1필지 1996
 
3.8%
소흘읍 1785
 
3.4%
가산면 1355
 
2.6%
군내면 977
 
1.9%
신북면 864
 
1.6%
외2필지 835
 
1.6%
내촌면 815
 
1.6%
일동면 618
 
1.2%
Other values (6183) 23331
44.4%
2023-10-10T04:14:15.888613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
42576
19.3%
10392
 
4.7%
10132
 
4.6%
10088
 
4.6%
10001
 
4.5%
10000
 
4.5%
10000
 
4.5%
1 9105
 
4.1%
8675
 
3.9%
- 7862
 
3.6%
Other values (118) 92176
41.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 128092
58.0%
Space Separator 42576
 
19.3%
Decimal Number 42476
 
19.2%
Dash Punctuation 7862
 
3.6%
Uppercase Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10392
 
8.1%
10132
 
7.9%
10088
 
7.9%
10001
 
7.8%
10000
 
7.8%
10000
 
7.8%
8675
 
6.8%
6890
 
5.4%
3947
 
3.1%
3685
 
2.9%
Other values (105) 44282
34.6%
Decimal Number
ValueCountFrequency (%)
1 9105
21.4%
2 6104
14.4%
3 4884
11.5%
4 4236
10.0%
5 3936
9.3%
6 3374
 
7.9%
7 3077
 
7.2%
8 2787
 
6.6%
0 2503
 
5.9%
9 2470
 
5.8%
Space Separator
ValueCountFrequency (%)
42576
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7862
100.0%
Uppercase Letter
ValueCountFrequency (%)
H 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 128092
58.0%
Common 92914
42.0%
Latin 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10392
 
8.1%
10132
 
7.9%
10088
 
7.9%
10001
 
7.8%
10000
 
7.8%
10000
 
7.8%
8675
 
6.8%
6890
 
5.4%
3947
 
3.1%
3685
 
2.9%
Other values (105) 44282
34.6%
Common
ValueCountFrequency (%)
42576
45.8%
1 9105
 
9.8%
- 7862
 
8.5%
2 6104
 
6.6%
3 4884
 
5.3%
4 4236
 
4.6%
5 3936
 
4.2%
6 3374
 
3.6%
7 3077
 
3.3%
8 2787
 
3.0%
Other values (2) 4973
 
5.4%
Latin
ValueCountFrequency (%)
H 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 128092
58.0%
ASCII 92915
42.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
42576
45.8%
1 9105
 
9.8%
- 7862
 
8.5%
2 6104
 
6.6%
3 4884
 
5.3%
4 4236
 
4.6%
5 3936
 
4.2%
6 3374
 
3.6%
7 3077
 
3.3%
8 2787
 
3.0%
Other values (3) 4974
 
5.4%
Hangul
ValueCountFrequency (%)
10392
 
8.1%
10132
 
7.9%
10088
 
7.9%
10001
 
7.8%
10000
 
7.8%
10000
 
7.8%
8675
 
6.8%
6890
 
5.4%
3947
 
3.1%
3685
 
2.9%
Other values (105) 44282
34.6%

건축면적(제곱미터)
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct7145
Distinct (%)71.5%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean640.38716
Minimum0
Maximum59529.47
Zeros3
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-10-10T04:14:16.173346image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile60.359
Q1108.28
median253.53
Q3663.835
95-th percentile2180.172
Maximum59529.47
Range59529.47
Interquartile range (IQR)555.555

Descriptive statistics

Standard deviation1799.0837
Coefficient of variation (CV)2.8093689
Kurtosis705.91909
Mean640.38716
Median Absolute Deviation (MAD)169.02
Skewness22.78521
Sum6403231.2
Variance3236702.3
MonotonicityNot monotonic
2023-10-10T04:14:16.448645image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
198.0 189
 
1.9%
396.0 137
 
1.4%
594.0 56
 
0.6%
792.0 37
 
0.4%
99.0 35
 
0.4%
66.0 34
 
0.3%
84.51 29
 
0.3%
96.0 27
 
0.3%
196.0 23
 
0.2%
330.0 23
 
0.2%
Other values (7135) 9409
94.1%
ValueCountFrequency (%)
0.0 3
< 0.1%
6.25 3
< 0.1%
7.66 1
 
< 0.1%
8.0 2
< 0.1%
10.24 1
 
< 0.1%
11.03 1
 
< 0.1%
12.0 1
 
< 0.1%
13.0 1
 
< 0.1%
13.68 1
 
< 0.1%
13.8 1
 
< 0.1%
ValueCountFrequency (%)
59529.47 1
 
< 0.1%
59435.62 4
< 0.1%
59378.02 1
 
< 0.1%
38303.6 1
 
< 0.1%
15800.59 1
 
< 0.1%
14931.22 1
 
< 0.1%
14467.87 1
 
< 0.1%
13388.51 1
 
< 0.1%
13287.49 1
 
< 0.1%
11847.85 1
 
< 0.1%

연면적(제곱미터)
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct7316
Distinct (%)73.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1026.0157
Minimum6.25
Maximum310028
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-10-10T04:14:16.835892image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum6.25
5-th percentile66.471
Q1136.09
median342.92
Q3851.2825
95-th percentile2906.362
Maximum310028
Range310021.75
Interquartile range (IQR)715.1925

Descriptive statistics

Standard deviation7118.775
Coefficient of variation (CV)6.9382707
Kurtosis1221.2614
Mean1026.0157
Median Absolute Deviation (MAD)243.92
Skewness33.55307
Sum10260157
Variance50676958
MonotonicityNot monotonic
2023-10-10T04:14:17.104440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
198.0 190
 
1.9%
396.0 137
 
1.4%
594.0 56
 
0.6%
99.0 42
 
0.4%
792.0 37
 
0.4%
66.0 32
 
0.3%
84.51 29
 
0.3%
196.0 27
 
0.3%
330.0 22
 
0.2%
96.0 20
 
0.2%
Other values (7306) 9408
94.1%
ValueCountFrequency (%)
6.25 3
< 0.1%
7.66 1
 
< 0.1%
8.0 2
< 0.1%
9.0 1
 
< 0.1%
11.03 1
 
< 0.1%
12.0 1
 
< 0.1%
13.0 1
 
< 0.1%
13.68 1
 
< 0.1%
13.8 1
 
< 0.1%
14.0 1
 
< 0.1%
ValueCountFrequency (%)
310028.0 1
 
< 0.1%
247837.07 1
 
< 0.1%
247743.22 4
< 0.1%
247685.62 1
 
< 0.1%
66829.66 1
 
< 0.1%
66823.39 1
 
< 0.1%
59416.38 1
 
< 0.1%
47437.38 1
 
< 0.1%
45805.76 1
 
< 0.1%
42631.92 1
 
< 0.1%

최대지상층수
Real number (ℝ)

SKEWED 

Distinct20
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.585
Minimum0
Maximum98
Zeros18
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-10-10T04:14:17.366511image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q32
95-th percentile4
Maximum98
Range98
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.5155213
Coefficient of variation (CV)0.95616483
Kurtosis1668.0981
Mean1.585
Median Absolute Deviation (MAD)0
Skewness28.697558
Sum15850
Variance2.2968047
MonotonicityNot monotonic
2023-10-10T04:14:17.593325image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
1 6312
63.1%
2 2737
27.4%
3 361
 
3.6%
4 320
 
3.2%
5 165
 
1.7%
7 28
 
0.3%
6 20
 
0.2%
0 18
 
0.2%
8 12
 
0.1%
10 5
 
0.1%
Other values (10) 22
 
0.2%
ValueCountFrequency (%)
0 18
 
0.2%
1 6312
63.1%
2 2737
27.4%
3 361
 
3.6%
4 320
 
3.2%
5 165
 
1.7%
6 20
 
0.2%
7 28
 
0.3%
8 12
 
0.1%
9 4
 
< 0.1%
ValueCountFrequency (%)
98 1
 
< 0.1%
26 2
 
< 0.1%
24 2
 
< 0.1%
18 1
 
< 0.1%
16 1
 
< 0.1%
15 4
< 0.1%
13 4
< 0.1%
12 1
 
< 0.1%
11 2
 
< 0.1%
10 5
0.1%

주용도
Categorical

HIGH CORRELATION 

Distinct31
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
단독주택
2703 
공장
2074 
제2종근린생활시설
1908 
제1종근린생활시설
1356 
동.식물관련시설
746 
Other values (26)
1213 

Length

Max length18
Median length14
Mean length5.6132
Min length2

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row공장
2nd row업무시설
3rd row단독주택
4th row제1종근린생활시설
5th row단독주택

Common Values

ValueCountFrequency (%)
단독주택 2703
27.0%
공장 2074
20.7%
제2종근린생활시설 1908
19.1%
제1종근린생활시설 1356
13.6%
동.식물관련시설 746
 
7.5%
창고시설 304
 
3.0%
공동주택 287
 
2.9%
야영장시설 120
 
1.2%
노유자시설 86
 
0.9%
자동차관련시설 73
 
0.7%
Other values (21) 343
 
3.4%

Length

2023-10-10T04:14:17.888302image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
단독주택 2703
27.0%
공장 2074
20.7%
제2종근린생활시설 1908
19.1%
제1종근린생활시설 1356
13.6%
동.식물관련시설 746
 
7.5%
창고시설 304
 
3.0%
공동주택 287
 
2.9%
야영장시설 120
 
1.2%
노유자시설 86
 
0.9%
자동차관련시설 73
 
0.7%
Other values (22) 344
 
3.4%

데이터기준일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2022-04-27
10000 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022-04-27
2nd row2022-04-27
3rd row2022-04-27
4th row2022-04-27
5th row2022-04-27

Common Values

ValueCountFrequency (%)
2022-04-27 10000
100.0%

Length

2023-10-10T04:14:18.163773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-10-10T04:14:18.348145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2022-04-27 10000
100.0%

Interactions

2023-10-10T04:14:11.883848image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-10T04:14:10.554815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-10T04:14:11.247450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-10T04:14:12.087521image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-10T04:14:10.835840image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-10T04:14:11.440012image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-10T04:14:12.299176image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-10T04:14:11.038864image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-10-10T04:14:11.666392image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-10-10T04:14:18.465116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
종류건축구분건축면적(제곱미터)연면적(제곱미터)최대지상층수주용도
종류1.0000.9410.0680.0330.0370.914
건축구분0.9411.0000.0690.0210.0340.801
건축면적(제곱미터)0.0680.0691.0000.8920.0550.526
연면적(제곱미터)0.0330.0210.8921.0000.2590.473
최대지상층수0.0370.0340.0550.2591.0000.296
주용도0.9140.8010.5260.4730.2961.000
2023-10-10T04:14:18.740365image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
주용도건축구분종류
주용도1.0000.4540.770
건축구분0.4541.0000.714
종류0.7700.7141.000
2023-10-10T04:14:18.975593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
건축면적(제곱미터)연면적(제곱미터)최대지상층수종류건축구분주용도
건축면적(제곱미터)1.0000.9510.0300.0510.0400.279
연면적(제곱미터)0.9511.0000.2640.0250.0120.244
최대지상층수0.0300.2641.0000.0350.0220.157
종류0.0510.0250.0351.0000.7140.770
건축구분0.0400.0120.0220.7141.0000.454
주용도0.2790.2440.1570.7700.4541.000

Missing values

2023-10-10T04:14:12.589596image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-10-10T04:14:12.881912image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

종류건축구분대지위치건축면적(제곱미터)연면적(제곱미터)최대지상층수주용도데이터기준일
563건축신고신축경기도 포천시 내촌면 마명리 253-7594.0594.01공장2022-04-27
7825건축허가신축경기도 포천시 신읍동 308-14162.921994.8813업무시설2022-04-27
3770건축신고신축경기도 포천시 창수면 오가리 670-168.2465.241단독주택2022-04-27
8728건축허가신축경기도 포천시 영북면 산정리 659342.92498.042제1종근린생활시설2022-04-27
6082건축신고신축경기도 포천시 군내면 용정리 28185.14185.141단독주택2022-04-27
2180건축신고신축경기도 포천시 가산면 금현리 76-3268.0268.01제2종근린생활시설2022-04-27
826건축신고신축경기도 포천시 영중면 성동리 313-1212.012.01제2종근린생활시설2022-04-27
2967건축신고신축경기도 포천시 영중면 금주리 225-1352.78103.252단독주택2022-04-27
2048건축신고증축경기도 포천시 내촌면 진목리 210-1 외1필지673.68673.681공장2022-04-27
7409건축허가증축경기도 포천시 이동면 노곡리 746-6 외1필지924.87924.871동.식물관련시설2022-04-27
종류건축구분대지위치건축면적(제곱미터)연면적(제곱미터)최대지상층수주용도데이터기준일
7078건축허가신축경기도 포천시 선단동 394-29 외1필지476.821305.044공동주택2022-04-27
2640건축신고증축경기도 포천시 군내면 하성북리 681-35183.2266.642단독주택2022-04-27
5933건축신고증축경기도 포천시 신북면 만세교리 85-2 외4필지1474.231474.231동.식물관련시설2022-04-27
1370건축신고신축경기도 포천시 이동면 장암리 464-6165.3591.142단독주택2022-04-27
10291건축허가신축경기도 포천시 소흘읍 직동리 249-7 외2필지348.38493.012제1종근린생활시설2022-04-27
9667건축허가증축경기도 포천시 창수면 오가리 99-2 외1필지1004.08995.131동.식물관련시설2022-04-27
3152건축신고증축경기도 포천시 내촌면 진목리 915-9 외1필지1078.511078.511공장2022-04-27
2716건축신고신축경기도 포천시 내촌면 신팔리 66-2 외4필지549.5594.52제1종근린생활시설2022-04-27
7765건축허가증축경기도 포천시 내촌면 소학리 179-16200.18340.072제2종근린생활시설2022-04-27
9725건축허가증축경기도 포천시 가산면 마산리 183-2422.231070.862제1종근린생활시설2022-04-27

Duplicate rows

Most frequently occurring

종류건축구분대지위치건축면적(제곱미터)연면적(제곱미터)최대지상층수주용도데이터기준일# duplicates
33건축신고신축경기도 포천시 이동면 장암리 464-184.5184.511단독주택2022-04-2710
32건축신고신축경기도 포천시 이동면 장암리 463-284.5184.511단독주택2022-04-279
30건축신고신축경기도 포천시 영북면 문암리 301-890.6990.691단독주택2022-04-276
8건축신고신축경기도 포천시 선단동 235-255.599.832단독주택2022-04-274
10건축신고신축경기도 포천시 설운동 산 34-2373.5199.632단독주택2022-04-274
14건축신고신축경기도 포천시 소흘읍 이곡리 504-156.6496.12단독주택2022-04-274
29건축신고신축경기도 포천시 영북면 문암리 30191.4191.411단독주택2022-04-274
4건축신고신축경기도 포천시 관인면 중리 1159198.99198.991단독주택2022-04-273
6건축신고신축경기도 포천시 내촌면 내리 180-791.291.21노유자시설2022-04-273
15건축신고신축경기도 포천시 소흘읍 이곡리 504-179.41122.212단독주택2022-04-273