Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells9998
Missing cells (%)16.7%
Duplicate rows350
Duplicate rows (%)3.5%
Total size in memory566.4 KiB
Average record size in memory58.0 B

Variable types

Categorical2
Text2
Numeric2

Dataset

Description용달화물(운송사업)업체 현황
Author행정안전부
URLhttps://data.gg.go.kr/portal/data/service/selectServicePage.do?&infId=8Z5W0HZY4678V34SM3731032087&infSeq=1

Alerts

영업상태명 has constant value ""Constant
Dataset has 350 (3.5%) duplicate rowsDuplicates
폐업일자 is highly overall correlated with 시군명High correlation
시군명 is highly overall correlated with 폐업일자High correlation
폐업일자 has 9992 (99.9%) missing valuesMissing

Reproduction

Analysis started2023-12-10 21:19:13.594599
Analysis finished2023-12-10 21:19:14.629160
Duration1.03 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시군명
Categorical

HIGH CORRELATION 

Distinct32
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
성남시
792 
남양주시
782 
부천시
774 
수원시
743 
고양시
726 
Other values (27)
6183 

Length

Max length4
Median length3
Mean length3.1188
Min length3

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row김포시
2nd row광주시
3rd row남양주시
4th row고양시
5th row고양시

Common Values

ValueCountFrequency (%)
성남시 792
 
7.9%
남양주시 782
 
7.8%
부천시 774
 
7.7%
수원시 743
 
7.4%
고양시 726
 
7.3%
시흥시 562
 
5.6%
안산시 547
 
5.5%
용인시 529
 
5.3%
화성시 492
 
4.9%
안양시 454
 
4.5%
Other values (22) 3599
36.0%

Length

2023-12-11T06:19:14.688371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
성남시 792
 
7.9%
남양주시 782
 
7.8%
부천시 774
 
7.7%
수원시 743
 
7.4%
고양시 726
 
7.3%
시흥시 562
 
5.6%
안산시 547
 
5.5%
용인시 529
 
5.3%
화성시 492
 
4.9%
안양시 454
 
4.5%
Other values (22) 3599
36.0%
Distinct83
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-11T06:19:14.861108image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length32
Median length3
Mean length3.0458
Min length3

Characters and Unicode

Total characters30458
Distinct characters134
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique82 ?
Unique (%)0.8%

Sample

1st row***
2nd row***
3rd row***
4th row***
5th row***
ValueCountFrequency (%)
9919
99.2%
주식회사 2
 
< 0.1%
주)진산물류 1
 
< 0.1%
주)영훈운수 1
 
< 0.1%
주)케이알 1
 
< 0.1%
주)디앤아이 1
 
< 0.1%
이주운수(주 1
 
< 0.1%
주)삼마통운 1
 
< 0.1%
현대종합물류㈜ 1
 
< 0.1%
주)이스테이지 1
 
< 0.1%
Other values (75) 75
 
0.7%
2023-12-11T06:19:15.168511image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 29754
97.7%
( 79
 
0.3%
) 79
 
0.3%
77
 
0.3%
29
 
0.1%
28
 
0.1%
23
 
0.1%
22
 
0.1%
17
 
0.1%
17
 
0.1%
Other values (124) 333
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Other Punctuation 29755
97.7%
Other Letter 496
 
1.6%
Open Punctuation 79
 
0.3%
Close Punctuation 79
 
0.3%
Decimal Number 24
 
0.1%
Space Separator 22
 
0.1%
Other Symbol 2
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
77
 
15.5%
29
 
5.8%
28
 
5.6%
23
 
4.6%
17
 
3.4%
17
 
3.4%
16
 
3.2%
15
 
3.0%
11
 
2.2%
10
 
2.0%
Other values (108) 253
51.0%
Decimal Number
ValueCountFrequency (%)
4 5
20.8%
9 4
16.7%
7 3
12.5%
5 3
12.5%
1 2
 
8.3%
8 2
 
8.3%
2 2
 
8.3%
6 2
 
8.3%
0 1
 
4.2%
Other Punctuation
ValueCountFrequency (%)
* 29754
> 99.9%
: 1
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 79
100.0%
Close Punctuation
ValueCountFrequency (%)
) 79
100.0%
Space Separator
ValueCountFrequency (%)
22
100.0%
Other Symbol
ValueCountFrequency (%)
2
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 29960
98.4%
Hangul 498
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
77
 
15.5%
29
 
5.8%
28
 
5.6%
23
 
4.6%
17
 
3.4%
17
 
3.4%
16
 
3.2%
15
 
3.0%
11
 
2.2%
10
 
2.0%
Other values (109) 255
51.2%
Common
ValueCountFrequency (%)
* 29754
99.3%
( 79
 
0.3%
) 79
 
0.3%
22
 
0.1%
4 5
 
< 0.1%
9 4
 
< 0.1%
7 3
 
< 0.1%
5 3
 
< 0.1%
1 2
 
< 0.1%
8 2
 
< 0.1%
Other values (5) 7
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 29960
98.4%
Hangul 496
 
1.6%
None 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 29754
99.3%
( 79
 
0.3%
) 79
 
0.3%
22
 
0.1%
4 5
 
< 0.1%
9 4
 
< 0.1%
7 3
 
< 0.1%
5 3
 
< 0.1%
1 2
 
< 0.1%
8 2
 
< 0.1%
Other values (5) 7
 
< 0.1%
Hangul
ValueCountFrequency (%)
77
 
15.5%
29
 
5.8%
28
 
5.6%
23
 
4.6%
17
 
3.4%
17
 
3.4%
16
 
3.2%
15
 
3.0%
11
 
2.2%
10
 
2.0%
Other values (108) 253
51.0%
None
ValueCountFrequency (%)
2
100.0%

인허가일자
Real number (ℝ)

Distinct3997
Distinct (%)40.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20032030
Minimum19000102
Maximum20170507
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T06:19:15.326185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum19000102
5-th percentile19910531
Q120001019
median20030217
Q320070221
95-th percentile20160627
Maximum20170507
Range1170405
Interquartile range (IQR)69202.25

Descriptive statistics

Standard deviation86768.884
Coefficient of variation (CV)0.0043315073
Kurtosis54.508529
Mean20032030
Median Absolute Deviation (MAD)30005
Skewness-4.8085504
Sum2.003203 × 1011
Variance7.5288392 × 109
MonotonicityNot monotonic
2023-12-11T06:19:15.454307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20040421 25
 
0.2%
20170501 24
 
0.2%
20161125 21
 
0.2%
20170502 19
 
0.2%
19801115 15
 
0.1%
20160627 15
 
0.1%
20021107 15
 
0.1%
20170504 15
 
0.1%
19990705 14
 
0.1%
19801113 13
 
0.1%
Other values (3987) 9824
98.2%
ValueCountFrequency (%)
19000102 1
< 0.1%
19000129 1
< 0.1%
19000221 1
< 0.1%
19000305 1
< 0.1%
19000320 1
< 0.1%
19000329 2
< 0.1%
19000428 1
< 0.1%
19000520 1
< 0.1%
19000710 1
< 0.1%
19000714 1
< 0.1%
ValueCountFrequency (%)
20170507 2
 
< 0.1%
20170504 15
0.1%
20170502 19
0.2%
20170501 24
0.2%
20170428 10
0.1%
20170427 4
 
< 0.1%
20170426 10
0.1%
20170425 8
 
0.1%
20170424 1
 
< 0.1%
20170421 7
 
0.1%

영업상태명
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
운영중
10000 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row운영중
2nd row운영중
3rd row운영중
4th row운영중
5th row운영중

Common Values

ValueCountFrequency (%)
운영중 10000
100.0%

Length

2023-12-11T06:19:15.570565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T06:19:15.651030image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
운영중 10000
100.0%

폐업일자
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct8
Distinct (%)100.0%
Missing9992
Missing (%)99.9%
Infinite0
Infinite (%)0.0%
Mean20129494
Minimum20100813
Maximum20160504
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T06:19:15.731543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20100813
5-th percentile20104246
Q120110772
median20125678
Q320150818
95-th percentile20157221
Maximum20160504
Range59691
Interquartile range (IQR)40045.75

Descriptive statistics

Standard deviation23499.106
Coefficient of variation (CV)0.0011673967
Kurtosis-2.1998546
Mean20129494
Median Absolute Deviation (MAD)19960.5
Skewness0.11220968
Sum1.6103596 × 108
Variance5.5220798 × 108
MonotonicityNot monotonic
2023-12-11T06:19:15.840137image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
20110822 1
 
< 0.1%
20110621 1
 
< 0.1%
20100813 1
 
< 0.1%
20160504 1
 
< 0.1%
20111031 1
 
< 0.1%
20151125 1
 
< 0.1%
20150715 1
 
< 0.1%
20140324 1
 
< 0.1%
(Missing) 9992
99.9%
ValueCountFrequency (%)
20100813 1
< 0.1%
20110621 1
< 0.1%
20110822 1
< 0.1%
20111031 1
< 0.1%
20140324 1
< 0.1%
20150715 1
< 0.1%
20151125 1
< 0.1%
20160504 1
< 0.1%
ValueCountFrequency (%)
20160504 1
< 0.1%
20151125 1
< 0.1%
20150715 1
< 0.1%
20140324 1
< 0.1%
20111031 1
< 0.1%
20110822 1
< 0.1%
20110621 1
< 0.1%
20100813 1
< 0.1%
Distinct426
Distinct (%)4.3%
Missing6
Missing (%)0.1%
Memory size156.2 KiB
2023-12-11T06:19:16.169118image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length11
Mean length11.138183
Min length10

Characters and Unicode

Total characters111315
Distinct characters208
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique71 ?
Unique (%)0.7%

Sample

1st row경기도 김포시 대곶면
2nd row경기도 광주시 오포읍
3rd row경기도 남양주시 오남읍
4th row경기도 고양시 덕양구
5th row경기도 고양시 일산동구
ValueCountFrequency (%)
경기도 9991
33.3%
성남시 792
 
2.6%
남양주시 782
 
2.6%
부천시 773
 
2.6%
수원시 743
 
2.5%
고양시 726
 
2.4%
시흥시 558
 
1.9%
안산시 547
 
1.8%
용인시 529
 
1.8%
화성시 492
 
1.6%
Other values (442) 14049
46.9%
2023-12-11T06:19:16.674552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
19988
18.0%
10471
 
9.4%
10292
 
9.2%
10276
 
9.2%
10004
 
9.0%
4957
 
4.5%
3993
 
3.6%
2703
 
2.4%
2090
 
1.9%
1963
 
1.8%
Other values (198) 34578
31.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 91327
82.0%
Space Separator 19988
 
18.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10471
 
11.5%
10292
 
11.3%
10276
 
11.3%
10004
 
11.0%
4957
 
5.4%
3993
 
4.4%
2703
 
3.0%
2090
 
2.3%
1963
 
2.1%
1677
 
1.8%
Other values (197) 32901
36.0%
Space Separator
ValueCountFrequency (%)
19988
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 91327
82.0%
Common 19988
 
18.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10471
 
11.5%
10292
 
11.3%
10276
 
11.3%
10004
 
11.0%
4957
 
5.4%
3993
 
4.4%
2703
 
3.0%
2090
 
2.3%
1963
 
2.1%
1677
 
1.8%
Other values (197) 32901
36.0%
Common
ValueCountFrequency (%)
19988
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 91327
82.0%
ASCII 19988
 
18.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
19988
100.0%
Hangul
ValueCountFrequency (%)
10471
 
11.5%
10292
 
11.3%
10276
 
11.3%
10004
 
11.0%
4957
 
5.4%
3993
 
4.4%
2703
 
3.0%
2090
 
2.3%
1963
 
2.1%
1677
 
1.8%
Other values (197) 32901
36.0%

Interactions

2023-12-11T06:19:14.212828image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:19:14.012416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:19:14.294547image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:19:14.121853image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T06:19:16.775050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시군명사업장명인허가일자폐업일자
시군명1.0000.2660.2451.000
사업장명0.2661.0000.000NaN
인허가일자0.2450.0001.0000.711
폐업일자1.000NaN0.7111.000
2023-12-11T06:19:16.868381image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
인허가일자폐업일자시군명
인허가일자1.000-0.2380.117
폐업일자-0.2381.0000.816
시군명0.1170.8161.000

Missing values

2023-12-11T06:19:14.403456image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T06:19:14.496840image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T06:19:14.578324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

시군명사업장명인허가일자영업상태명폐업일자소재지지번주소
6828김포시***20050908운영중<NA>경기도 김포시 대곶면
4376광주시***20020521운영중<NA>경기도 광주시 오포읍
9659남양주시***20010716운영중<NA>경기도 남양주시 오남읍
1909고양시***20030925운영중<NA>경기도 고양시 덕양구
1161고양시***20020318운영중<NA>경기도 고양시 일산동구
7236남양주시***19990723운영중<NA>경기도 남양주시 진접읍
14326성남시***20020618운영중<NA>경기도 성남시 수정구
15756수원시***20050404운영중<NA>경기도 수원시 권선구
21386안성시***20020822운영중<NA>경기도 안성시 금석동
13515성남시***19980213운영중<NA>경기도 성남시 중원구
시군명사업장명인허가일자영업상태명폐업일자소재지지번주소
22058안양시***20071120운영중<NA>경기도 안양시 만안구
22993안양시***20070726운영중<NA>경기도 안양시 동안구
21600안성시***20030730운영중<NA>경기도 안성시 미양면
17489수원시***20050601운영중<NA>경기도 수원시 팔달구
14019성남시***20020221운영중<NA>경기도 성남시 수정구
28906파주시***20010828운영중<NA>경기도 파주시 법원읍
5201구리시***20030122운영중<NA>경기도 구리시 교문동
9506남양주시***20070704운영중<NA>경기도 남양주시 호평동
21612안성시***20030421운영중<NA>경기도 안성시 공도읍
17827시흥시***20040821운영중<NA>경기도 시흥시 논곡동

Duplicate rows

Most frequently occurring

시군명사업장명인허가일자영업상태명폐업일자소재지지번주소# duplicates
285안양시***20170504운영중<NA>경기도 안양시 만안구8
315용인시***20150521운영중<NA>경기도 용인시 처인구6
325용인시***20170501운영중<NA>경기도 용인시 기흥구6
34고양시***20150423운영중<NA>경기도 고양시 덕양구5
46고양시***20161205운영중<NA>경기도 고양시 일산동구5
48고양시***20161206운영중<NA>경기도 고양시 덕양구5
210수원시***20130624운영중<NA>경기도 수원시 권선구5
284안양시***20170504운영중<NA>경기도 안양시 동안구5
115부천시***20141218운영중<NA>경기도 부천시 고강동4
123성남시***19880128운영중<NA>경기도 성남시 중원구4