Overview

Dataset statistics

Number of variables9
Number of observations3065
Missing cells15348
Missing cells (%)55.6%
Duplicate rows4
Duplicate rows (%)0.1%
Total size in memory233.6 KiB
Average record size in memory78.0 B

Variable types

Numeric1
Text2
Unsupported5
DateTime1

Alerts

Dataset has 4 (0.1%) duplicate rowsDuplicates
전화번호 has 3065 (100.0%) missing valuesMissing
정제도로명주소 has 3065 (100.0%) missing valuesMissing
정제지번주소 has 3065 (100.0%) missing valuesMissing
정제WGS84위도 has 3065 (100.0%) missing valuesMissing
정제WGS84경도 has 3065 (100.0%) missing valuesMissing
전화번호 is an unsupported type, check if it needs cleaning or further analysisUnsupported
정제도로명주소 is an unsupported type, check if it needs cleaning or further analysisUnsupported
정제지번주소 is an unsupported type, check if it needs cleaning or further analysisUnsupported
정제WGS84위도 is an unsupported type, check if it needs cleaning or further analysisUnsupported
정제WGS84경도 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-03-12 23:35:17.389573
Analysis finished2024-03-12 23:35:17.947551
Duration0.56 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

등록번호
Real number (ℝ)

Distinct3060
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean270747.84
Minimum110003
Maximum640110
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size27.1 KiB
2024-03-13T08:35:18.020807image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum110003
5-th percentile110823.2
Q1201252
median311188
Q3312127
95-th percentile350465.8
Maximum640110
Range530107
Interquartile range (IQR)110875

Descriptive statistics

Standard deviation98750.692
Coefficient of variation (CV)0.36473307
Kurtosis0.65648261
Mean270747.84
Median Absolute Deviation (MAD)1161
Skewness-0.055993395
Sum8.2984213 × 108
Variance9.7516991 × 109
MonotonicityNot monotonic
2024-03-13T08:35:18.142114image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
310590 3
 
0.1%
202912 2
 
0.1%
550186 2
 
0.1%
530173 2
 
0.1%
311656 1
 
< 0.1%
311659 1
 
< 0.1%
203072 1
 
< 0.1%
203071 1
 
< 0.1%
311658 1
 
< 0.1%
350348 1
 
< 0.1%
Other values (3050) 3050
99.5%
ValueCountFrequency (%)
110003 1
< 0.1%
110013 1
< 0.1%
110018 1
< 0.1%
110019 1
< 0.1%
110022 1
< 0.1%
110026 1
< 0.1%
110031 1
< 0.1%
110037 1
< 0.1%
110041 1
< 0.1%
110045 1
< 0.1%
ValueCountFrequency (%)
640110 1
< 0.1%
640107 1
< 0.1%
640086 1
< 0.1%
640044 1
< 0.1%
640011 1
< 0.1%
630231 1
< 0.1%
630187 1
< 0.1%
630169 1
< 0.1%
630010 1
< 0.1%
620293 1
< 0.1%
Distinct3047
Distinct (%)99.4%
Missing0
Missing (%)0.0%
Memory size24.1 KiB
2024-03-13T08:35:18.436200image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length58
Median length53
Mean length9.5960848
Min length2

Characters and Unicode

Total characters29412
Distinct characters551
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3030 ?
Unique (%)98.9%

Sample

1st row(주)아이티언
2nd row(주)성문텔레콤
3rd row(주)메인
4th row(주)호맘코리아
5th row중앙아이.엔.티.(주)
ValueCountFrequency (%)
주식회사 1150
 
25.4%
ltd 46
 
1.0%
co 45
 
1.0%
co.,ltd 40
 
0.9%
inc 16
 
0.4%
사회적협동조합 7
 
0.2%
사단법인 4
 
0.1%
system 4
 
0.1%
3
 
0.1%
가온정보통신 3
 
0.1%
Other values (3163) 3201
70.8%
2024-03-13T08:35:18.876175image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2956
 
10.1%
( 1878
 
6.4%
) 1877
 
6.4%
1454
 
4.9%
1239
 
4.2%
1200
 
4.1%
1181
 
4.0%
1098
 
3.7%
931
 
3.2%
511
 
1.7%
Other values (541) 15087
51.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 22307
75.8%
Open Punctuation 1878
 
6.4%
Close Punctuation 1877
 
6.4%
Space Separator 1454
 
4.9%
Uppercase Letter 867
 
2.9%
Lowercase Letter 729
 
2.5%
Other Punctuation 294
 
1.0%
Dash Punctuation 4
 
< 0.1%
Other Symbol 1
 
< 0.1%
Decimal Number 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2956
 
13.3%
1239
 
5.6%
1200
 
5.4%
1181
 
5.3%
1098
 
4.9%
931
 
4.2%
511
 
2.3%
474
 
2.1%
375
 
1.7%
372
 
1.7%
Other values (485) 11970
53.7%
Uppercase Letter
ValueCountFrequency (%)
C 135
15.6%
L 110
12.7%
N 68
 
7.8%
O 65
 
7.5%
T 62
 
7.2%
I 56
 
6.5%
S 52
 
6.0%
E 52
 
6.0%
A 42
 
4.8%
D 38
 
4.4%
Other values (15) 187
21.6%
Lowercase Letter
ValueCountFrequency (%)
o 129
17.7%
t 110
15.1%
d 81
11.1%
n 70
9.6%
e 53
7.3%
i 47
 
6.4%
c 39
 
5.3%
a 37
 
5.1%
r 35
 
4.8%
s 24
 
3.3%
Other values (12) 104
14.3%
Other Punctuation
ValueCountFrequency (%)
. 189
64.3%
, 94
32.0%
& 11
 
3.7%
Open Punctuation
ValueCountFrequency (%)
( 1878
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1877
100.0%
Space Separator
ValueCountFrequency (%)
1454
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%
Other Symbol
ValueCountFrequency (%)
1
100.0%
Decimal Number
ValueCountFrequency (%)
2 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 22292
75.8%
Common 5508
 
18.7%
Latin 1596
 
5.4%
Han 16
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2956
 
13.3%
1239
 
5.6%
1200
 
5.4%
1181
 
5.3%
1098
 
4.9%
931
 
4.2%
511
 
2.3%
474
 
2.1%
375
 
1.7%
372
 
1.7%
Other values (470) 11955
53.6%
Latin
ValueCountFrequency (%)
C 135
 
8.5%
o 129
 
8.1%
t 110
 
6.9%
L 110
 
6.9%
d 81
 
5.1%
n 70
 
4.4%
N 68
 
4.3%
O 65
 
4.1%
T 62
 
3.9%
I 56
 
3.5%
Other values (37) 710
44.5%
Han
ValueCountFrequency (%)
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
Other values (6) 6
37.5%
Common
ValueCountFrequency (%)
( 1878
34.1%
) 1877
34.1%
1454
26.4%
. 189
 
3.4%
, 94
 
1.7%
& 11
 
0.2%
- 4
 
0.1%
2 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 22291
75.8%
ASCII 7104
 
24.2%
CJK 16
 
0.1%
None 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2956
 
13.3%
1239
 
5.6%
1200
 
5.4%
1181
 
5.3%
1098
 
4.9%
931
 
4.2%
511
 
2.3%
474
 
2.1%
375
 
1.7%
372
 
1.7%
Other values (469) 11954
53.6%
ASCII
ValueCountFrequency (%)
( 1878
26.4%
) 1877
26.4%
1454
20.5%
. 189
 
2.7%
C 135
 
1.9%
o 129
 
1.8%
t 110
 
1.5%
L 110
 
1.5%
, 94
 
1.3%
d 81
 
1.1%
Other values (45) 1047
14.7%
CJK
ValueCountFrequency (%)
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
Other values (6) 6
37.5%
None
ValueCountFrequency (%)
1
100.0%

전화번호
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing3065
Missing (%)100.0%
Memory size27.1 KiB
Distinct2939
Distinct (%)96.6%
Missing23
Missing (%)0.8%
Memory size24.1 KiB
2024-03-13T08:35:19.101247image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length12
Mean length12.092373
Min length11

Characters and Unicode

Total characters36785
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2840 ?
Unique (%)93.4%

Sample

1st row031-8040-9980
2nd row031-503-0099
3rd row02-6499-0833
4th row031-520-6793
5th row031-420-4429
ValueCountFrequency (%)
031-235-6336 3
 
0.1%
031-658-4904 3
 
0.1%
031-541-8255 3
 
0.1%
031-871-2771 3
 
0.1%
031-339-9191 2
 
0.1%
031-904-8981 2
 
0.1%
031-968-3748 2
 
0.1%
050-5304-2020 2
 
0.1%
031-477-3407 2
 
0.1%
031-921-0933 2
 
0.1%
Other values (2929) 3018
99.2%
2024-03-13T08:35:19.432133image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 6084
16.5%
0 5687
15.5%
3 4534
12.3%
1 4055
11.0%
2 2979
8.1%
7 2424
 
6.6%
4 2342
 
6.4%
5 2318
 
6.3%
8 2134
 
5.8%
9 2114
 
5.7%
Other values (2) 2114
 
5.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 30690
83.4%
Dash Punctuation 6084
 
16.5%
Other Punctuation 11
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 5687
18.5%
3 4534
14.8%
1 4055
13.2%
2 2979
9.7%
7 2424
7.9%
4 2342
7.6%
5 2318
7.6%
8 2134
 
7.0%
9 2114
 
6.9%
6 2103
 
6.9%
Dash Punctuation
ValueCountFrequency (%)
- 6084
100.0%
Other Punctuation
ValueCountFrequency (%)
* 11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 36785
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 6084
16.5%
0 5687
15.5%
3 4534
12.3%
1 4055
11.0%
2 2979
8.1%
7 2424
 
6.6%
4 2342
 
6.4%
5 2318
 
6.3%
8 2134
 
5.8%
9 2114
 
5.7%
Other values (2) 2114
 
5.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 36785
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 6084
16.5%
0 5687
15.5%
3 4534
12.3%
1 4055
11.0%
2 2979
8.1%
7 2424
 
6.6%
4 2342
 
6.4%
5 2318
 
6.3%
8 2134
 
5.8%
9 2114
 
5.7%
Other values (2) 2114
 
5.7%

정제도로명주소
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing3065
Missing (%)100.0%
Memory size27.1 KiB

정제지번주소
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing3065
Missing (%)100.0%
Memory size27.1 KiB

정제WGS84위도
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing3065
Missing (%)100.0%
Memory size27.1 KiB

정제WGS84경도
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing3065
Missing (%)100.0%
Memory size27.1 KiB
Distinct2150
Distinct (%)70.1%
Missing0
Missing (%)0.0%
Memory size24.1 KiB
Minimum1971-11-27 00:00:00
Maximum2023-10-06 00:00:00
2024-03-13T08:35:19.579305image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-13T08:35:19.693907image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2024-03-13T08:35:17.697385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Missing values

2024-03-13T08:35:17.804765image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-13T08:35:17.905324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

등록번호상호명전화번호팩스번호정제도로명주소정제지번주소정제WGS84위도정제WGS84경도등록일자
0112761(주)아이티언<NA>031-8040-9980<NA><NA><NA><NA>2002-10-10
1112742(주)성문텔레콤<NA>031-503-0099<NA><NA><NA><NA>2002-09-27
2112744(주)메인<NA>02-6499-0833<NA><NA><NA><NA>2002-09-27
3112731(주)호맘코리아<NA>031-520-6793<NA><NA><NA><NA>2002-09-19
4112732중앙아이.엔.티.(주)<NA>031-420-4429<NA><NA><NA><NA>2002-09-19
5112726(주)천일정보통신<NA>031-785-4303<NA><NA><NA><NA>2002-09-19
6150580(주)대륜통신<NA>031-658-3796<NA><NA><NA><NA>2002-09-18
7112716(주)위드텍<NA>031-321-6009<NA><NA><NA><NA>2002-09-09
8112700(주)이지넷콤<NA>031-681-2926<NA><NA><NA><NA>2002-08-26
9112684(주)에스에이티<NA>031-450-1300<NA><NA><NA><NA>2002-08-20
등록번호상호명전화번호팩스번호정제도로명주소정제지번주소정제WGS84위도정제WGS84경도등록일자
3055120449주식회사 빛이라<NA>031-323-3681<NA><NA><NA><NA>2000-08-28
3056111559지에스앤티(주)<NA>031-768-6503<NA><NA><NA><NA>2000-08-21
3057111553(주)삼정보안시스템<NA>031-211-1643<NA><NA><NA><NA>2000-08-21
3058111560삼일씨티에스(주)<NA>031-720-5166<NA><NA><NA><NA>2000-08-21
3059111551주식회사 경도<NA>031-901-5506<NA><NA><NA><NA>2000-08-11
3060111540건영네트웍스(주)<NA>031-211-2100<NA><NA><NA><NA>2000-08-07
3061111507우주미디어정보통신(주)<NA>031-207-4104<NA><NA><NA><NA>2000-08-02
3062111502(주)쏠리드(Solid, Inc.)<NA>031-627-6009<NA><NA><NA><NA>2000-07-25
3063350024(주)경기방재<NA>031-871-2771<NA><NA><NA><NA>2000-07-24
3064111482(주)승진정보<NA>031-872-4202<NA><NA><NA><NA>2000-07-05

Duplicate rows

Most frequently occurring

등록번호상호명팩스번호등록일자# duplicates
0202912주식회사 엔씨씨디지탈031-553-93172018-06-152
1310590주식회사 태원031-541-82552009-09-292
2530173영한산업 주식회사02-922-35332009-07-172
3550186주식회사 이에스(ES Co., Ltd.)053-715-13722012-03-292