Overview

Dataset statistics

Number of variables5
Number of observations22
Missing cells11
Missing cells (%)10.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1012.0 B
Average record size in memory46.0 B

Variable types

Text3
Categorical2

Dataset

Description용인도시공사가 보유하고 있는 개인정보파일 현황입니다. 개인정보파일 보유기간 만료 및 신규 파일 추가가 있을 때마다 변경됩니다.
URLhttps://www.data.go.kr/data/15060015/fileData.do

Alerts

보유기간 is highly overall correlated with 개인정보 처리방법High correlation
개인정보 처리방법 is highly overall correlated with 보유기간High correlation
개인정보의항목(선택) has 11 (50.0%) missing valuesMissing
파일명 has unique valuesUnique

Reproduction

Analysis started2023-12-11 23:58:05.542461
Analysis finished2023-12-11 23:58:05.925000
Duration0.38 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

파일명
Text

UNIQUE 

Distinct22
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size308.0 B
2023-12-12T08:58:06.032873image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length19
Mean length14.363636
Min length10

Characters and Unicode

Total characters316
Distinct characters102
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)100.0%

Sample

1st row기술자문위원회 위원
2nd row기존주택 전세임대사업
3rd row광교, 흥덕 분양 입주자 정보
4th row역분 분양입주자 정보
5th row주차미납압류 대상자 정보
ValueCountFrequency (%)
정보 11
 
15.7%
회원 4
 
5.7%
관계인 3
 
4.3%
3
 
4.3%
보상관련 3
 
4.3%
소유자 3
 
4.3%
홈페이지 3
 
4.3%
회원관리 2
 
2.9%
교통약자 2
 
2.9%
회원정보 2
 
2.9%
Other values (34) 34
48.6%
2023-12-12T08:58:06.298833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
48
 
15.2%
21
 
6.6%
18
 
5.7%
11
 
3.5%
11
 
3.5%
10
 
3.2%
10
 
3.2%
6
 
1.9%
5
 
1.6%
5
 
1.6%
Other values (92) 171
54.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 260
82.3%
Space Separator 48
 
15.2%
Open Punctuation 3
 
0.9%
Close Punctuation 3
 
0.9%
Decimal Number 1
 
0.3%
Other Punctuation 1
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
21
 
8.1%
18
 
6.9%
11
 
4.2%
11
 
4.2%
10
 
3.8%
10
 
3.8%
6
 
2.3%
5
 
1.9%
5
 
1.9%
5
 
1.9%
Other values (87) 158
60.8%
Space Separator
ValueCountFrequency (%)
48
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Decimal Number
ValueCountFrequency (%)
1 1
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 260
82.3%
Common 56
 
17.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
21
 
8.1%
18
 
6.9%
11
 
4.2%
11
 
4.2%
10
 
3.8%
10
 
3.8%
6
 
2.3%
5
 
1.9%
5
 
1.9%
5
 
1.9%
Other values (87) 158
60.8%
Common
ValueCountFrequency (%)
48
85.7%
( 3
 
5.4%
) 3
 
5.4%
1 1
 
1.8%
, 1
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 260
82.3%
ASCII 56
 
17.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
48
85.7%
( 3
 
5.4%
) 3
 
5.4%
1 1
 
1.8%
, 1
 
1.8%
Hangul
ValueCountFrequency (%)
21
 
8.1%
18
 
6.9%
11
 
4.2%
11
 
4.2%
10
 
3.8%
10
 
3.8%
6
 
2.3%
5
 
1.9%
5
 
1.9%
5
 
1.9%
Other values (87) 158
60.8%

보유기간
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)45.5%
Missing0
Missing (%)0.0%
Memory size308.0 B
회원탈퇴시 까지
영구
준영구
5년
3년
Other values (5)

Length

Max length13
Median length8
Mean length4.9545455
Min length2

Unique

Unique5 ?
Unique (%)22.7%

Sample

1st row2년
2nd row영구
3rd row준영구
4th row준영구
5th row영구

Common Values

ValueCountFrequency (%)
회원탈퇴시 까지 8
36.4%
영구 3
 
13.6%
준영구 2
 
9.1%
5년 2
 
9.1%
3년 2
 
9.1%
2년 1
 
4.5%
협약해지시 1
 
4.5%
10년 1
 
4.5%
1년 1
 
4.5%
지정판매소 해지(폐업)시 1
 
4.5%

Length

2023-12-12T08:58:06.410325image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T08:58:06.507668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
회원탈퇴시 8
25.8%
까지 8
25.8%
영구 3
 
9.7%
준영구 2
 
6.5%
5년 2
 
6.5%
3년 2
 
6.5%
2년 1
 
3.2%
협약해지시 1
 
3.2%
10년 1
 
3.2%
1년 1
 
3.2%
Other values (2) 2
 
6.5%
Distinct19
Distinct (%)86.4%
Missing0
Missing (%)0.0%
Memory size308.0 B
2023-12-12T08:58:06.646285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length45
Median length32
Mean length23.545455
Min length7

Characters and Unicode

Total characters518
Distinct characters51
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)77.3%

Sample

1st row이름, 생년월일, 집연락처, 집주소, 핸드폰, E-Mail, 직장연락처, 직장주소
2nd row이름, 핸드폰, 주민번호, 집주소
3rd row이름, 핸드폰, 주민등록번호
4th row이름, 집주소, 핸드폰, 주민번호
5th row이름, 집주소, 차량번호
ValueCountFrequency (%)
이름 22
21.8%
집주소 17
16.8%
핸드폰 17
16.8%
생년월일 13
12.9%
e-mail 7
 
6.9%
주민번호 7
 
6.9%
외국인등록번호 3
 
3.0%
직장주소 3
 
3.0%
차량번호 2
 
2.0%
핸드폰(연락처 2
 
2.0%
Other values (7) 8
 
7.9%
2023-12-12T08:58:06.890233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
80
15.4%
, 78
15.1%
28
 
5.4%
22
 
4.2%
22
 
4.2%
20
 
3.9%
19
 
3.7%
19
 
3.7%
19
 
3.7%
19
 
3.7%
Other values (41) 192
37.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 314
60.6%
Space Separator 80
 
15.4%
Other Punctuation 78
 
15.1%
Lowercase Letter 24
 
4.6%
Uppercase Letter 11
 
2.1%
Dash Punctuation 7
 
1.4%
Open Punctuation 2
 
0.4%
Close Punctuation 2
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
28
 
8.9%
22
 
7.0%
22
 
7.0%
20
 
6.4%
19
 
6.1%
19
 
6.1%
19
 
6.1%
19
 
6.1%
14
 
4.5%
14
 
4.5%
Other values (30) 118
37.6%
Lowercase Letter
ValueCountFrequency (%)
l 7
29.2%
i 7
29.2%
a 7
29.2%
m 3
12.5%
Uppercase Letter
ValueCountFrequency (%)
E 7
63.6%
M 4
36.4%
Space Separator
ValueCountFrequency (%)
80
100.0%
Other Punctuation
ValueCountFrequency (%)
, 78
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 314
60.6%
Common 169
32.6%
Latin 35
 
6.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
28
 
8.9%
22
 
7.0%
22
 
7.0%
20
 
6.4%
19
 
6.1%
19
 
6.1%
19
 
6.1%
19
 
6.1%
14
 
4.5%
14
 
4.5%
Other values (30) 118
37.6%
Latin
ValueCountFrequency (%)
E 7
20.0%
l 7
20.0%
i 7
20.0%
a 7
20.0%
M 4
11.4%
m 3
8.6%
Common
ValueCountFrequency (%)
80
47.3%
, 78
46.2%
- 7
 
4.1%
( 2
 
1.2%
) 2
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 314
60.6%
ASCII 204
39.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
80
39.2%
, 78
38.2%
E 7
 
3.4%
l 7
 
3.4%
i 7
 
3.4%
a 7
 
3.4%
- 7
 
3.4%
M 4
 
2.0%
m 3
 
1.5%
( 2
 
1.0%
Hangul
ValueCountFrequency (%)
28
 
8.9%
22
 
7.0%
22
 
7.0%
20
 
6.4%
19
 
6.1%
19
 
6.1%
19
 
6.1%
19
 
6.1%
14
 
4.5%
14
 
4.5%
Other values (30) 118
37.6%
Distinct9
Distinct (%)81.8%
Missing11
Missing (%)50.0%
Memory size308.0 B
2023-12-12T08:58:07.028059image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length68
Median length47
Mean length20.454545
Min length4

Characters and Unicode

Total characters225
Distinct characters69
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)72.7%

Sample

1st rowE-mail, 집연락처, 직장연락처 ,여권번호, 외국인등록번호
2nd row집연락처, 집주소, E-Mail, 직장연락처, 직장주소, 운전면허번호, 외국인등록번호
3rd row감면정보 등록사항(차량등록증, 장애인 복지카드, 고엽제후유의중환자 여부, 국가유공자 여부, 주민등록등본, 병역명문가 여부)
4th row집연락처
5th row집연락처, 핸드폰, E-Mail
ValueCountFrequency (%)
직장주소 5
13.5%
집연락처 5
13.5%
e-mail 5
13.5%
직장연락처 4
10.8%
여부 3
 
8.1%
외국인등록번호 2
 
5.4%
고엽제후유의중환자 1
 
2.7%
핸드폰 1
 
2.7%
병역명문가 1
 
2.7%
주민등록등본 1
 
2.7%
Other values (9) 9
24.3%
2023-12-12T08:58:07.291526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
27
 
12.0%
, 21
 
9.3%
10
 
4.4%
9
 
4.0%
9
 
4.0%
9
 
4.0%
9
 
4.0%
7
 
3.1%
6
 
2.7%
6
 
2.7%
Other values (59) 112
49.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 145
64.4%
Space Separator 27
 
12.0%
Other Punctuation 21
 
9.3%
Lowercase Letter 18
 
8.0%
Uppercase Letter 7
 
3.1%
Dash Punctuation 5
 
2.2%
Close Punctuation 1
 
0.4%
Open Punctuation 1
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10
 
6.9%
9
 
6.2%
9
 
6.2%
9
 
6.2%
9
 
6.2%
7
 
4.8%
6
 
4.1%
6
 
4.1%
6
 
4.1%
5
 
3.4%
Other values (48) 69
47.6%
Lowercase Letter
ValueCountFrequency (%)
i 5
27.8%
l 5
27.8%
a 5
27.8%
m 3
16.7%
Uppercase Letter
ValueCountFrequency (%)
E 5
71.4%
M 2
 
28.6%
Space Separator
ValueCountFrequency (%)
27
100.0%
Other Punctuation
ValueCountFrequency (%)
, 21
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 145
64.4%
Common 55
 
24.4%
Latin 25
 
11.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10
 
6.9%
9
 
6.2%
9
 
6.2%
9
 
6.2%
9
 
6.2%
7
 
4.8%
6
 
4.1%
6
 
4.1%
6
 
4.1%
5
 
3.4%
Other values (48) 69
47.6%
Latin
ValueCountFrequency (%)
E 5
20.0%
i 5
20.0%
l 5
20.0%
a 5
20.0%
m 3
12.0%
M 2
 
8.0%
Common
ValueCountFrequency (%)
27
49.1%
, 21
38.2%
- 5
 
9.1%
) 1
 
1.8%
( 1
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 145
64.4%
ASCII 80
35.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
27
33.8%
, 21
26.2%
E 5
 
6.2%
i 5
 
6.2%
l 5
 
6.2%
- 5
 
6.2%
a 5
 
6.2%
m 3
 
3.8%
M 2
 
2.5%
) 1
 
1.2%
Hangul
ValueCountFrequency (%)
10
 
6.9%
9
 
6.2%
9
 
6.2%
9
 
6.2%
9
 
6.2%
7
 
4.8%
6
 
4.1%
6
 
4.1%
6
 
4.1%
5
 
3.4%
Other values (48) 69
47.6%

개인정보 처리방법
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)40.9%
Missing0
Missing (%)0.0%
Memory size308.0 B
개인정보처리시스템
개인정보처리시스템, 업무용PC, 종이문서
종이문서
개인정보처리시스템, 업무용 PC
업무용 PC
Other values (4)

Length

Max length22
Median length17
Mean length12.181818
Min length4

Unique

Unique3 ?
Unique (%)13.6%

Sample

1st row업무용 PC
2nd row개인정보처리시스템, 업무용PC, 종이문서
3rd row종이문서
4th row종이문서
5th row개인정보처리시스템, 업무용 PC

Common Values

ValueCountFrequency (%)
개인정보처리시스템 6
27.3%
개인정보처리시스템, 업무용PC, 종이문서 3
13.6%
종이문서 3
13.6%
개인정보처리시스템, 업무용 PC 3
13.6%
업무용 PC 2
 
9.1%
개인정보시스템, 업무용 PC, 종이문서 2
 
9.1%
업무용PC, 종이문서 1
 
4.5%
<NA> 1
 
4.5%
개인정보처리시스템, 업무용PC 1
 
4.5%

Length

2023-12-12T08:58:07.403567image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T08:58:07.530409image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
개인정보처리시스템 13
29.5%
종이문서 9
20.5%
업무용 7
15.9%
pc 7
15.9%
업무용pc 5
 
11.4%
개인정보시스템 2
 
4.5%
na 1
 
2.3%

Correlations

2023-12-12T08:58:07.618902image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
파일명보유기간개인정보의항목(필수)개인정보의항목(선택)개인정보 처리방법
파일명1.0001.0001.0001.0001.000
보유기간1.0001.0000.9240.8200.857
개인정보의항목(필수)1.0000.9241.0001.0000.930
개인정보의항목(선택)1.0000.8201.0001.0000.931
개인정보 처리방법1.0000.8570.9300.9311.000
2023-12-12T08:58:07.691917image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
개인정보 처리방법보유기간
개인정보 처리방법1.0000.572
보유기간0.5721.000
2023-12-12T08:58:07.972181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
보유기간개인정보 처리방법
보유기간1.0000.572
개인정보 처리방법0.5721.000

Missing values

2023-12-12T08:58:05.805114image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T08:58:05.888556image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

파일명보유기간개인정보의항목(필수)개인정보의항목(선택)개인정보 처리방법
0기술자문위원회 위원2년이름, 생년월일, 집연락처, 집주소, 핸드폰, E-Mail, 직장연락처, 직장주소<NA>업무용 PC
1기존주택 전세임대사업영구이름, 핸드폰, 주민번호, 집주소E-mail, 집연락처, 직장연락처 ,여권번호, 외국인등록번호개인정보처리시스템, 업무용PC, 종이문서
2광교, 흥덕 분양 입주자 정보준영구이름, 핸드폰, 주민등록번호집연락처, 집주소, E-Mail, 직장연락처, 직장주소, 운전면허번호, 외국인등록번호종이문서
3역분 분양입주자 정보준영구이름, 집주소, 핸드폰, 주민번호<NA>종이문서
4주차미납압류 대상자 정보영구이름, 집주소, 차량번호<NA>개인정보처리시스템, 업무용 PC
5용인시공영주차장 홈페이지 회원 정보회원탈퇴시 까지이름, 집주소, 핸드폰, E-Mail, 차량번호, 감면관련 자료감면정보 등록사항(차량등록증, 장애인 복지카드, 고엽제후유의중환자 여부, 국가유공자 여부, 주민등록등본, 병역명문가 여부)개인정보처리시스템, 업무용 PC
6교통약자 이용자 정보회원탈퇴시 까지이름, 생년월일, 핸드폰, 건강정보집연락처개인정보시스템, 업무용 PC, 종이문서
7교통약자 개인콜택시 기사관리협약해지시이름, 생년월일, 집주소, 핸드폰, 운전면허번호<NA>개인정보시스템, 업무용 PC, 종이문서
8남사스포츠센터 회원관리회원탈퇴시 까지이름, 생년월일, 집주소집연락처, 핸드폰, E-Mail개인정보처리시스템
9정보공개모니터단 정보10년이름, 생년월일, 집주소, 핸드폰, E-mail, 주민번호집연락처, 직장연락처, 직장주소업무용PC, 종이문서
파일명보유기간개인정보의항목(필수)개인정보의항목(선택)개인정보 처리방법
12미르스타디움 대관자 정보5년이름, 핸드폰직장주소종이문서
13보상관련 소유자 및 관계인 정보(보상1팀)영구이름, 생년월일, 집주소, 핸드폰, 주민번호, 외국인등록번호<NA>개인정보처리시스템, 업무용 PC
14보상관련 소유자 및 관계인 정보(반도체)3년이름, 생년월일, 집주소, 핸드폰, 주민번호, 외국인등록번호직장주소개인정보처리시스템, 업무용PC, 종이문서
15보상관련 소유자 및 관계인 정보(플랫폼)3년이름, 생년월일, 집주소, 핸드폰, 주민번호, 외국인등록번호직장주소개인정보처리시스템, 업무용PC, 종이문서
16모현복지회관 회원 정보5년이름, 생년월일, 집주소, 핸드폰<NA><NA>
17종량제물품 지정판매소 정보지정판매소 해지(폐업)시이름, 직장주소생년월일, E-mail, 직장연락처개인정보처리시스템
18시민체육센터 회원 정보회원탈퇴시 까지이름, 집주소, 직장주소, E-Mail, 핸드폰(연락처), 생년월일<NA>개인정보처리시스템
19생활체육실 회원 정보회원탈퇴시 까지이름, 생년월일, 집주소, 핸드폰<NA>개인정보처리시스템
20아르피아스포츠센터 회원관리회원탈퇴시 까지이름, 생년월일, 집연락처, 집주소, 핸드폰E-mail개인정보처리시스템
21용인평온의숲 홈페이지 회원정보회원탈퇴시 까지이름, 집주소, 핸드폰(연락처), E-Mail<NA>개인정보처리시스템, 업무용PC