Overview

Dataset statistics

Number of variables5
Number of observations38
Missing cells66
Missing cells (%)34.7%
Duplicate rows1
Duplicate rows (%)2.6%
Total size in memory1.7 KiB
Average record size in memory46.5 B

Variable types

Numeric2
Text2
Unsupported1

Dataset

Description샘플 데이터
Author(재)전남정보문화산업진흥원
URLhttps://kadx.co.kr/opmk/frn/pmumkproductDetail/PMU_cea40bbf-90c0-4cc7-8c5a-3e2f392c18b1/5

Alerts

Dataset has 1 (2.6%) duplicate rowsDuplicates
FAMP_ID is highly overall correlated with PHT_DTHigh correlation
PHT_DT is highly overall correlated with FAMP_IDHigh correlation
FAMP_ID has 7 (18.4%) missing valuesMissing
FMLD_ADDR has 7 (18.4%) missing valuesMissing
PHT_DT has 7 (18.4%) missing valuesMissing
FILE_NM has 7 (18.4%) missing valuesMissing
IMG_URL has 38 (100.0%) missing valuesMissing
IMG_URL is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-11 20:15:10.668122
Analysis finished2023-12-11 20:15:12.826303
Duration2.16 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

FAMP_ID
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct31
Distinct (%)100.0%
Missing7
Missing (%)18.4%
Infinite0
Infinite (%)0.0%
Mean8030339.5
Minimum3150146
Maximum12928994
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size474.0 B
2023-12-12T05:15:12.899678image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3150146
5-th percentile3394067.5
Q14900834.5
median7960020
Q310582810
95-th percentile12923684
Maximum12928994
Range9778848
Interquartile range (IQR)5681975.5

Descriptive statistics

Standard deviation3324959.2
Coefficient of variation (CV)0.41404964
Kurtosis-1.3421651
Mean8030339.5
Median Absolute Deviation (MAD)2642375
Skewness0.1275724
Sum2.4894052 × 108
Variance1.1055354 × 1013
MonotonicityNot monotonic
2023-12-12T05:15:13.020211image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
4880046 1
 
2.6%
10596670 1
 
2.6%
6346639 1
 
2.6%
12582204 1
 
2.6%
10568950 1
 
2.6%
6382180 1
 
2.6%
3666984 1
 
2.6%
3150146 1
 
2.6%
12920875 1
 
2.6%
12928994 1
 
2.6%
Other values (21) 21
55.3%
(Missing) 7
 
18.4%
ValueCountFrequency (%)
3150146 1
2.6%
3169863 1
2.6%
3618272 1
2.6%
3666984 1
2.6%
3939339 1
2.6%
4880046 1
2.6%
4880052 1
2.6%
4900543 1
2.6%
4901126 1
2.6%
6338062 1
2.6%
ValueCountFrequency (%)
12928994 1
2.6%
12926492 1
2.6%
12920875 1
2.6%
12920266 1
2.6%
12584706 1
2.6%
12582204 1
2.6%
10602395 1
2.6%
10596670 1
2.6%
10568950 1
2.6%
10557859 1
2.6%

FMLD_ADDR
Text

MISSING 

Distinct31
Distinct (%)100.0%
Missing7
Missing (%)18.4%
Memory size436.0 B
2023-12-12T05:15:13.257817image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length27
Median length22
Mean length22.322581
Min length20

Characters and Unicode

Total characters692
Distinct characters93
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)100.0%

Sample

1st row경상북도 고령군 대가야읍 외리 859-7
2nd row경상북도 영천시 화산면 용평리 331-12
3rd row경상북도 영천시 화산면 용평리 331-1
4th row경상남도 창녕군 남지읍 칠현리 4-2
5th row충청남도 서산시 성연면 해성리 184-22
ValueCountFrequency (%)
전라남도 9
 
5.8%
경상남도 8
 
5.2%
충청남도 7
 
4.5%
서산시 7
 
4.5%
경상북도 5
 
3.2%
고흥군 5
 
3.2%
화산면 4
 
2.6%
영천시 4
 
2.6%
창녕군 4
 
2.6%
합천군 4
 
2.6%
Other values (79) 98
63.2%
2023-12-12T05:15:13.620908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
124
 
17.9%
32
 
4.6%
31
 
4.5%
31
 
4.5%
- 31
 
4.5%
24
 
3.5%
0 20
 
2.9%
18
 
2.6%
2 17
 
2.5%
1 16
 
2.3%
Other values (83) 348
50.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 410
59.2%
Decimal Number 127
 
18.4%
Space Separator 124
 
17.9%
Dash Punctuation 31
 
4.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
32
 
7.8%
31
 
7.6%
31
 
7.6%
24
 
5.9%
18
 
4.4%
13
 
3.2%
13
 
3.2%
13
 
3.2%
13
 
3.2%
13
 
3.2%
Other values (71) 209
51.0%
Decimal Number
ValueCountFrequency (%)
0 20
15.7%
2 17
13.4%
1 16
12.6%
4 16
12.6%
5 13
10.2%
3 12
9.4%
7 12
9.4%
8 8
 
6.3%
9 7
 
5.5%
6 6
 
4.7%
Space Separator
ValueCountFrequency (%)
124
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 410
59.2%
Common 282
40.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
32
 
7.8%
31
 
7.6%
31
 
7.6%
24
 
5.9%
18
 
4.4%
13
 
3.2%
13
 
3.2%
13
 
3.2%
13
 
3.2%
13
 
3.2%
Other values (71) 209
51.0%
Common
ValueCountFrequency (%)
124
44.0%
- 31
 
11.0%
0 20
 
7.1%
2 17
 
6.0%
1 16
 
5.7%
4 16
 
5.7%
5 13
 
4.6%
3 12
 
4.3%
7 12
 
4.3%
8 8
 
2.8%
Other values (2) 13
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 410
59.2%
ASCII 282
40.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
124
44.0%
- 31
 
11.0%
0 20
 
7.1%
2 17
 
6.0%
1 16
 
5.7%
4 16
 
5.7%
5 13
 
4.6%
3 12
 
4.3%
7 12
 
4.3%
8 8
 
2.8%
Other values (2) 13
 
4.6%
Hangul
ValueCountFrequency (%)
32
 
7.8%
31
 
7.6%
31
 
7.6%
24
 
5.9%
18
 
4.4%
13
 
3.2%
13
 
3.2%
13
 
3.2%
13
 
3.2%
13
 
3.2%
Other values (71) 209
51.0%

PHT_DT
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct31
Distinct (%)100.0%
Missing7
Missing (%)18.4%
Infinite0
Infinite (%)0.0%
Mean2.0230107 × 1013
Minimum2.0230105 × 1013
Maximum2.0230108 × 1013
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size474.0 B
2023-12-12T05:15:13.730047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2.0230105 × 1013
5-th percentile2.0230105 × 1013
Q12.0230106 × 1013
median2.0230107 × 1013
Q32.0230107 × 1013
95-th percentile2.0230108 × 1013
Maximum2.0230108 × 1013
Range3099208
Interquartile range (IQR)1069177

Descriptive statistics

Standard deviation953394.03
Coefficient of variation (CV)4.7127484 × 10-8
Kurtosis-0.38340394
Mean2.0230107 × 1013
Median Absolute Deviation (MAD)976305
Skewness-0.56483503
Sum6.2713331 × 1014
Variance9.0896018 × 1011
MonotonicityNot monotonic
2023-12-12T05:15:13.849409image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
20230106044912 1
 
2.6%
20230108095558 1
 
2.6%
20230105043701 1
 
2.6%
20230107100831 1
 
2.6%
20230107023803 1
 
2.6%
20230107012504 1
 
2.6%
20230107023658 1
 
2.6%
20230106040941 1
 
2.6%
20230108110739 1
 
2.6%
20230108012238 1
 
2.6%
Other values (21) 21
55.3%
(Missing) 7
 
18.4%
ValueCountFrequency (%)
20230105012522 1
2.6%
20230105043701 1
2.6%
20230105101831 1
2.6%
20230105103317 1
2.6%
20230106013426 1
2.6%
20230106031410 1
2.6%
20230106040941 1
2.6%
20230106044713 1
2.6%
20230106044912 1
2.6%
20230107010633 1
2.6%
ValueCountFrequency (%)
20230108111730 1
2.6%
20230108110739 1
2.6%
20230108095558 1
2.6%
20230108091004 1
2.6%
20230108085346 1
2.6%
20230108023608 1
2.6%
20230108012238 1
2.6%
20230107115435 1
2.6%
20230107112544 1
2.6%
20230107111933 1
2.6%

FILE_NM
Text

MISSING 

Distinct31
Distinct (%)100.0%
Missing7
Missing (%)18.4%
Memory size436.0 B
2023-12-12T05:15:14.059419image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length53
Median length48
Mean length48.322581
Min length46

Characters and Unicode

Total characters1498
Distinct characters98
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)100.0%

Sample

1st row03939339_경상북도 고령군 대가야읍 외리 859-7_230108023608.jpg
2nd row04880046_경상북도 영천시 화산면 용평리 331-12_230106044912.jpg
3rd row04880052_경상북도 영천시 화산면 용평리 331-1_230106044713.jpg
4th row12920266_경상남도 창녕군 남지읍 칠현리 4-2_230108111730.jpg
5th row10557859_충청남도 서산시 성연면 해성리 184-22_230107010633.jpg
ValueCountFrequency (%)
서산시 7
 
4.5%
고흥군 5
 
3.2%
영천시 4
 
2.6%
화산면 4
 
2.6%
창녕군 4
 
2.6%
합천군 4
 
2.6%
해남군 3
 
1.9%
송지면 3
 
1.9%
율곡면 3
 
1.9%
점암면 3
 
1.9%
Other values (105) 115
74.2%
2023-12-12T05:15:14.359718image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 171
 
11.4%
124
 
8.3%
1 106
 
7.1%
3 91
 
6.1%
2 90
 
6.0%
_ 62
 
4.1%
6 55
 
3.7%
4 53
 
3.5%
7 49
 
3.3%
5 47
 
3.1%
Other values (88) 650
43.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 747
49.9%
Other Letter 410
27.4%
Space Separator 124
 
8.3%
Lowercase Letter 93
 
6.2%
Connector Punctuation 62
 
4.1%
Other Punctuation 31
 
2.1%
Dash Punctuation 31
 
2.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
32
 
7.8%
31
 
7.6%
31
 
7.6%
24
 
5.9%
18
 
4.4%
13
 
3.2%
13
 
3.2%
13
 
3.2%
13
 
3.2%
13
 
3.2%
Other values (71) 209
51.0%
Decimal Number
ValueCountFrequency (%)
0 171
22.9%
1 106
14.2%
3 91
12.2%
2 90
12.0%
6 55
 
7.4%
4 53
 
7.1%
7 49
 
6.6%
5 47
 
6.3%
9 43
 
5.8%
8 42
 
5.6%
Lowercase Letter
ValueCountFrequency (%)
g 31
33.3%
p 31
33.3%
j 31
33.3%
Space Separator
ValueCountFrequency (%)
124
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 62
100.0%
Other Punctuation
ValueCountFrequency (%)
. 31
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 995
66.4%
Hangul 410
27.4%
Latin 93
 
6.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
32
 
7.8%
31
 
7.6%
31
 
7.6%
24
 
5.9%
18
 
4.4%
13
 
3.2%
13
 
3.2%
13
 
3.2%
13
 
3.2%
13
 
3.2%
Other values (71) 209
51.0%
Common
ValueCountFrequency (%)
0 171
17.2%
124
12.5%
1 106
10.7%
3 91
9.1%
2 90
9.0%
_ 62
 
6.2%
6 55
 
5.5%
4 53
 
5.3%
7 49
 
4.9%
5 47
 
4.7%
Other values (4) 147
14.8%
Latin
ValueCountFrequency (%)
g 31
33.3%
p 31
33.3%
j 31
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1088
72.6%
Hangul 410
 
27.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 171
15.7%
124
11.4%
1 106
9.7%
3 91
 
8.4%
2 90
 
8.3%
_ 62
 
5.7%
6 55
 
5.1%
4 53
 
4.9%
7 49
 
4.5%
5 47
 
4.3%
Other values (7) 240
22.1%
Hangul
ValueCountFrequency (%)
32
 
7.8%
31
 
7.6%
31
 
7.6%
24
 
5.9%
18
 
4.4%
13
 
3.2%
13
 
3.2%
13
 
3.2%
13
 
3.2%
13
 
3.2%
Other values (71) 209
51.0%

IMG_URL
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing38
Missing (%)100.0%
Memory size474.0 B

Interactions

2023-12-12T05:15:12.381790image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T05:15:12.156910image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T05:15:12.454265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T05:15:12.306368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T05:15:14.437393image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
FAMP_IDFMLD_ADDRPHT_DTFILE_NM
FAMP_ID1.0001.0000.5361.000
FMLD_ADDR1.0001.0001.0001.000
PHT_DT0.5361.0001.0001.000
FILE_NM1.0001.0001.0001.000
2023-12-12T05:15:14.516767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
FAMP_IDPHT_DT
FAMP_ID1.0000.536
PHT_DT0.5361.000

Missing values

2023-12-12T05:15:12.579172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T05:15:12.674261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T05:15:12.767183image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

FAMP_IDFMLD_ADDRPHT_DTFILE_NMIMG_URL
03939339경상북도 고령군 대가야읍 외리 859-72023010802360803939339_경상북도 고령군 대가야읍 외리 859-7_230108023608.jpg<NA>
14880046경상북도 영천시 화산면 용평리 331-122023010604491204880046_경상북도 영천시 화산면 용평리 331-12_230106044912.jpg<NA>
24880052경상북도 영천시 화산면 용평리 331-12023010604471304880052_경상북도 영천시 화산면 용평리 331-1_230106044713.jpg<NA>
312920266경상남도 창녕군 남지읍 칠현리 4-22023010811173012920266_경상남도 창녕군 남지읍 칠현리 4-2_230108111730.jpg<NA>
410557859충청남도 서산시 성연면 해성리 184-222023010701063310557859_충청남도 서산시 성연면 해성리 184-22_230107010633.jpg<NA>
57156762전라남도 신안군 안좌면 대척리 830-02023010603141007156762_전라남도 신안군 안좌면 대척리 830-0_230106031410.jpg<NA>
612926492경상남도 합천군 율곡면 율진리 423-02023010703092412926492_경상남도 합천군 율곡면 율진리 423-0_230107030924.jpg<NA>
710602395충청남도 서산시 대산읍 오지리 607-22023010703593310602395_충청남도 서산시 대산읍 오지리 607-2_230107035933.jpg<NA>
89646141제주특별자치도 서귀포시 대정읍 무릉리 2249-32023010711193309646141_제주특별자치도 서귀포시 대정읍 무릉리 2249-3_230107111933.jpg<NA>
96370737전라남도 고흥군 대서면 안남리 495-12023010710563506370737_전라남도 고흥군 대서면 안남리 495-1_230107105635.jpg<NA>
FAMP_IDFMLD_ADDRPHT_DTFILE_NMIMG_URL
2812582204충청남도 서산시 지곡면 무장리 145-02023010710083112582204_충청남도 서산시 지곡면 무장리 145-0_230107100831.jpg<NA>
296346639전라남도 고흥군 점암면 화계리 567-02023010504370106346639_전라남도 고흥군 점암면 화계리 567-0_230105043701.jpg<NA>
3010596670충청남도 서산시 대산읍 대로리 407-02023010809555810596670_충청남도 서산시 대산읍 대로리 407-0_230108095558.jpg<NA>
31<NA><NA><NA><NA><NA>
32<NA><NA><NA><NA><NA>
33<NA><NA><NA><NA><NA>
34<NA><NA><NA><NA><NA>
35<NA><NA><NA><NA><NA>
36<NA><NA><NA><NA><NA>
37<NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

FAMP_IDFMLD_ADDRPHT_DTFILE_NM# duplicates
0<NA><NA><NA><NA>7