Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells10718
Missing cells (%)21.4%
Duplicate rows1071
Duplicate rows (%)10.7%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Numeric2
Categorical1
Text2

Dataset

Description공장설립온라인지원시스템(팩토리온) 내 공장별(공장관리번호로 구분) 도로명주소변경신청내역을 보여줍니다.
Author한국산업단지공단
URLhttps://www.data.go.kr/data/15127209/fileData.do

Alerts

Dataset has 1071 (10.7%) duplicate rowsDuplicates
공장관리번호 is highly overall correlated with 시도명High correlation
시도명 is highly overall correlated with 공장관리번호High correlation
행정동명 has 724 (7.2%) missing valuesMissing
변경주소 has 9994 (99.9%) missing valuesMissing

Reproduction

Analysis started2024-03-23 05:48:29.339629
Analysis finished2024-03-23 05:48:31.257947
Duration1.92 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

공장관리번호
Real number (ℝ)

HIGH CORRELATION 

Distinct652
Distinct (%)6.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.6401404 × 1014
Minimum503
Maximum9.52051 × 1014
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T14:48:31.353868image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum503
5-th percentile1.15452 × 1014
Q12.82602 × 1014
median4.13902 × 1014
Q34.37452 × 1014
95-th percentile4.82502 × 1014
Maximum9.52051 × 1014
Range9.52051 × 1014
Interquartile range (IQR)1.5485 × 1014

Descriptive statistics

Standard deviation1.1692644 × 1014
Coefficient of variation (CV)0.32121408
Kurtosis0.93276174
Mean3.6401404 × 1014
Median Absolute Deviation (MAD)4.35505 × 1013
Skewness-1.1403366
Sum3.6401404 × 1018
Variance1.3671791 × 1028
MonotonicityNot monotonic
2024-03-23T14:48:31.551362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
115452000000000 504
 
5.0%
415902000000000 493
 
4.9%
413902000000000 467
 
4.7%
412732000000000 413
 
4.1%
282002000000000 338
 
3.4%
282602000000000 254
 
2.5%
415702000000000 242
 
2.4%
414802000000000 222
 
2.2%
482502000000000 166
 
1.7%
412202000000000 164
 
1.6%
Other values (642) 6737
67.4%
ValueCountFrequency (%)
503 1
< 0.1%
869 1
< 0.1%
2016 1
< 0.1%
3832 1
< 0.1%
3988 1
< 0.1%
4022 1
< 0.1%
5695 1
< 0.1%
5946 1
< 0.1%
2003059510 1
< 0.1%
2009000012 1
< 0.1%
ValueCountFrequency (%)
952051000000000 1
< 0.1%
920561000000000 1
< 0.1%
917481000000000 1
< 0.1%
913411000000000 1
< 0.1%
911011000000000 1
< 0.1%
671101000000000 1
< 0.1%
660208000000000 1
< 0.1%
651030000000000 1
< 0.1%
630815000000000 1
< 0.1%
630504000000000 1
< 0.1%

최초등록일시
Real number (ℝ)

Distinct21
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.0229937 × 1013
Minimum2.02207 × 1013
Maximum2.02403 × 1013
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-23T14:48:31.800398image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2.02207 × 1013
5-th percentile2.02209 × 1013
Q12.02301 × 1013
median2.02306 × 1013
Q32.02311 × 1013
95-th percentile2.02402 × 1013
Maximum2.02403 × 1013
Range1.96 × 1010
Interquartile range (IQR)1 × 109

Descriptive statistics

Standard deviation6.0527312 × 109
Coefficient of variation (CV)0.00029919674
Kurtosis-0.51878915
Mean2.0229937 × 1013
Median Absolute Deviation (MAD)5 × 108
Skewness0.039323107
Sum2.0229937 × 1017
Variance3.6635556 × 1019
MonotonicityNot monotonic
2024-03-23T14:48:31.979018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
20240100000000 678
 
6.8%
20240200000000 628
 
6.3%
20231100000000 608
 
6.1%
20221200000000 581
 
5.8%
20231200000000 538
 
5.4%
20230100000000 533
 
5.3%
20230300000000 528
 
5.3%
20230800000000 504
 
5.0%
20230400000000 490
 
4.9%
20221100000000 485
 
4.9%
Other values (11) 4427
44.3%
ValueCountFrequency (%)
20220700000000 53
 
0.5%
20220800000000 435
4.3%
20220900000000 399
4.0%
20221000000000 421
4.2%
20221100000000 485
4.9%
20221200000000 581
5.8%
20230100000000 533
5.3%
20230200000000 470
4.7%
20230300000000 528
5.3%
20230400000000 490
4.9%
ValueCountFrequency (%)
20240300000000 345
3.5%
20240200000000 628
6.3%
20240100000000 678
6.8%
20231200000000 538
5.4%
20231100000000 608
6.1%
20231000000000 456
4.6%
20230900000000 416
4.2%
20230800000000 504
5.0%
20230700000000 485
4.9%
20230600000000 480
4.8%

시도명
Categorical

HIGH CORRELATION 

Distinct22
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기도
3391 
서울특별시
1059 
인천광역시
728 
<NA>
723 
경상남도
585 
Other values (17)
3514 

Length

Max length7
Median length6
Mean length4.1035
Min length3

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row경상북도
2nd row충청남도
3rd row전라남도
4th row경기도
5th row경기도

Common Values

ValueCountFrequency (%)
경기도 3391
33.9%
서울특별시 1059
 
10.6%
인천광역시 728
 
7.3%
<NA> 723
 
7.2%
경상남도 585
 
5.9%
경상북도 522
 
5.2%
부산광역시 480
 
4.8%
충청북도 376
 
3.8%
대구광역시 362
 
3.6%
충청남도 360
 
3.6%
Other values (12) 1414
14.1%

Length

2024-03-23T14:48:32.219611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기도 3391
33.9%
서울특별시 1059
 
10.6%
인천광역시 728
 
7.3%
na 723
 
7.2%
경상남도 585
 
5.9%
경상북도 522
 
5.2%
부산광역시 480
 
4.8%
충청북도 376
 
3.8%
대구광역시 362
 
3.6%
충청남도 360
 
3.6%
Other values (12) 1414
14.1%

행정동명
Text

MISSING 

Distinct1910
Distinct (%)20.6%
Missing724
Missing (%)7.2%
Memory size156.2 KiB
2024-03-23T14:48:32.693139image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length3
Mean length3.4886805
Min length2

Characters and Unicode

Total characters32361
Distinct characters315
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique731 ?
Unique (%)7.9%

Sample

1st row쌍림면
2nd row정미면
3rd row금천면
4th row송탄동
5th row파장동
ValueCountFrequency (%)
초지동 256
 
2.8%
정왕1동 166
 
1.8%
가산동 161
 
1.7%
논현고잔동 143
 
1.5%
상대원1동 98
 
1.1%
녹산동 91
 
1.0%
평동 78
 
0.8%
정왕2동 70
 
0.8%
구로제3동 64
 
0.7%
논현2동 55
 
0.6%
Other values (1900) 8094
87.3%
2024-03-23T14:48:33.358767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6732
 
20.8%
1688
 
5.2%
1229
 
3.8%
1 1114
 
3.4%
2 1007
 
3.1%
754
 
2.3%
596
 
1.8%
571
 
1.8%
455
 
1.4%
3 451
 
1.4%
Other values (305) 17764
54.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 29402
90.9%
Decimal Number 2847
 
8.8%
Other Punctuation 112
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6732
22.9%
1688
 
5.7%
1229
 
4.2%
754
 
2.6%
596
 
2.0%
571
 
1.9%
455
 
1.5%
396
 
1.3%
396
 
1.3%
393
 
1.3%
Other values (295) 16192
55.1%
Decimal Number
ValueCountFrequency (%)
1 1114
39.1%
2 1007
35.4%
3 451
15.8%
4 103
 
3.6%
5 94
 
3.3%
6 45
 
1.6%
7 27
 
0.9%
8 4
 
0.1%
9 2
 
0.1%
Other Punctuation
ValueCountFrequency (%)
. 112
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 29402
90.9%
Common 2959
 
9.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6732
22.9%
1688
 
5.7%
1229
 
4.2%
754
 
2.6%
596
 
2.0%
571
 
1.9%
455
 
1.5%
396
 
1.3%
396
 
1.3%
393
 
1.3%
Other values (295) 16192
55.1%
Common
ValueCountFrequency (%)
1 1114
37.6%
2 1007
34.0%
3 451
15.2%
. 112
 
3.8%
4 103
 
3.5%
5 94
 
3.2%
6 45
 
1.5%
7 27
 
0.9%
8 4
 
0.1%
9 2
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 29402
90.9%
ASCII 2959
 
9.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
6732
22.9%
1688
 
5.7%
1229
 
4.2%
754
 
2.6%
596
 
2.0%
571
 
1.9%
455
 
1.5%
396
 
1.3%
396
 
1.3%
393
 
1.3%
Other values (295) 16192
55.1%
ASCII
ValueCountFrequency (%)
1 1114
37.6%
2 1007
34.0%
3 451
15.2%
. 112
 
3.8%
4 103
 
3.5%
5 94
 
3.2%
6 45
 
1.5%
7 27
 
0.9%
8 4
 
0.1%
9 2
 
0.1%

변경주소
Text

MISSING 

Distinct6
Distinct (%)100.0%
Missing9994
Missing (%)99.9%
Memory size156.2 KiB
2024-03-23T14:48:33.686789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length27
Median length22.5
Mean length21.666667
Min length19

Characters and Unicode

Total characters130
Distinct characters56
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)100.0%

Sample

1st row경기도 여주시 흥천면 흥천로 63-2
2nd row충청북도 청원군 오창읍 2산단로 81
3rd row경상남도 김해시 생림면 인제로694번길 19-52
4th row경북 경산시 하양읍 지식산업로 36
5th row경상북도 영천시 화산면 납이길 69
ValueCountFrequency (%)
김해시 2
 
6.7%
경기도 1
 
3.3%
경북 1
 
3.3%
테크노밸리로 1
 
3.3%
진례면 1
 
3.3%
경남 1
 
3.3%
69 1
 
3.3%
납이길 1
 
3.3%
화산면 1
 
3.3%
영천시 1
 
3.3%
Other values (19) 19
63.3%
2024-03-23T14:48:34.203079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
25
 
19.2%
6
 
4.6%
9 5
 
3.8%
5
 
3.8%
5
 
3.8%
4
 
3.1%
4
 
3.1%
6 4
 
3.1%
2 4
 
3.1%
4
 
3.1%
Other values (46) 64
49.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 80
61.5%
Space Separator 25
 
19.2%
Decimal Number 22
 
16.9%
Dash Punctuation 3
 
2.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6
 
7.5%
5
 
6.2%
5
 
6.2%
4
 
5.0%
4
 
5.0%
4
 
5.0%
3
 
3.8%
3
 
3.8%
2
 
2.5%
2
 
2.5%
Other values (36) 42
52.5%
Decimal Number
ValueCountFrequency (%)
9 5
22.7%
6 4
18.2%
2 4
18.2%
3 3
13.6%
1 3
13.6%
5 1
 
4.5%
4 1
 
4.5%
8 1
 
4.5%
Space Separator
ValueCountFrequency (%)
25
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 80
61.5%
Common 50
38.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6
 
7.5%
5
 
6.2%
5
 
6.2%
4
 
5.0%
4
 
5.0%
4
 
5.0%
3
 
3.8%
3
 
3.8%
2
 
2.5%
2
 
2.5%
Other values (36) 42
52.5%
Common
ValueCountFrequency (%)
25
50.0%
9 5
 
10.0%
6 4
 
8.0%
2 4
 
8.0%
3 3
 
6.0%
- 3
 
6.0%
1 3
 
6.0%
5 1
 
2.0%
4 1
 
2.0%
8 1
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 80
61.5%
ASCII 50
38.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
25
50.0%
9 5
 
10.0%
6 4
 
8.0%
2 4
 
8.0%
3 3
 
6.0%
- 3
 
6.0%
1 3
 
6.0%
5 1
 
2.0%
4 1
 
2.0%
8 1
 
2.0%
Hangul
ValueCountFrequency (%)
6
 
7.5%
5
 
6.2%
5
 
6.2%
4
 
5.0%
4
 
5.0%
4
 
5.0%
3
 
3.8%
3
 
3.8%
2
 
2.5%
2
 
2.5%
Other values (36) 42
52.5%

Interactions

2024-03-23T14:48:30.425551image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T14:48:30.045952image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T14:48:30.602100image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T14:48:30.231297image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T14:48:34.366946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
공장관리번호최초등록일시시도명변경주소
공장관리번호1.0000.0790.8331.000
최초등록일시0.0791.0000.0801.000
시도명0.8330.0801.0001.000
변경주소1.0001.0001.0001.000
2024-03-23T14:48:34.523890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
공장관리번호최초등록일시시도명
공장관리번호1.0000.0310.511
최초등록일시0.0311.0000.043
시도명0.5110.0431.000

Missing values

2024-03-23T14:48:30.849870image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T14:48:31.020287image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-23T14:48:31.175756image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

공장관리번호최초등록일시시도명행정동명변경주소
87777202292911920221000000000경상북도쌍림면<NA>
5347044270200000000020230500000000충청남도정미면<NA>
9847829200200000000020220800000000전라남도금천면<NA>
4699141220200000000020230600000000경기도송탄동<NA>
2768841111200000000020231100000000경기도파장동<NA>
4117327230200000000020230800000000대구광역시노원동<NA>
834527170200000000020240200000000대구광역시용산1동<NA>
2490941480200000000020231100000000경기도조리읍<NA>
1570041570200000000020240100000000경기도고촌읍<NA>
9541941131200000000020220800000000서울특별시서교동<NA>
공장관리번호최초등록일시시도명행정동명변경주소
32236202294370020231000000000경기도향남읍<NA>
103611530200000000020240300000000서울특별시세곡동<NA>
3346344131200000000020230900000000<NA><NA><NA>
2958141480200000000020231000000000경기도법원읍<NA>
8437411545200000000020221100000000경기도별내동<NA>
2345341730200000000020231100000000경기도가남읍<NA>
4403341285200000000020230700000000경기도백석1동<NA>
3283141150200000000020230900000000경기도송산1동<NA>
3634041430200000000020230900000000경기도부곡동<NA>
3162441461200000000020231000000000경기도모현읍<NA>

Duplicate rows

Most frequently occurring

공장관리번호최초등록일시시도명행정동명변경주소# duplicates
3711545200000000020221200000000서울특별시서초2동<NA>44
36141113200000000020231100000000경기도평동<NA>30
3311545200000000020221100000000경기도별내동<NA>24
46141273200000000020221100000000경기도초지동<NA>19
49241273200000000020240100000000경기도초지동<NA>19
3611545200000000020221200000000서울특별시가산동<NA>18
51841390200000000020220800000000경기도정왕1동<NA>17
45441273200000000020220800000000경기도초지동<NA>16
53241390200000000020221100000000경기도정왕1동<NA>13
53841390200000000020221200000000경기도정왕1동<NA>13