"TKG102_���������_������_������_������_���������_������.csv"의 파일명이 "TKG102_시스템_관리_공통_코드_마스터_이력.csv"으로 변경 됨.

Overview

Dataset statistics

Number of variables10
Number of observations500
Missing cells2500
Missing cells (%)50.0%
Duplicate rows3
Duplicate rows (%)0.6%
Total size in memory39.2 KiB
Average record size in memory80.3 B

Variable types

Categorical3
Unsupported5
Numeric2

Dataset

Description해당 파일 데이터는 신용보증기금의 시스템관리공통코드마스터이력에 대한 정보를 확인하실 수 있는 자료이니 데이터 활용에 참고하여 주시기 바랍니다.
Author신용보증기금
URLhttps://www.data.go.kr/data/15093316/fileData.do

Alerts

코드유형구분코드 has constant value "C" Constant
Dataset has 3 (0.6%) duplicate rowsDuplicates
코드값 has a high cardinality: 462 distinct values High cardinality
주제영역코드 is highly correlated with 코드유형구분코드High correlation
코드유형구분코드 is highly correlated with 주제영역코드High correlation
목록코드테이블명 has 500 (100.0%) missing values Missing
목록코드테이블논리명 has 500 (100.0%) missing values Missing
목록코드컬럼명 has 500 (100.0%) missing values Missing
목록코드컬럼논리명 has 500 (100.0%) missing values Missing
처리시각 has 500 (100.0%) missing values Missing
목록코드테이블명 is an unsupported type, check if it needs cleaning or further analysis Unsupported
목록코드테이블논리명 is an unsupported type, check if it needs cleaning or further analysis Unsupported
목록코드컬럼명 is an unsupported type, check if it needs cleaning or further analysis Unsupported
목록코드컬럼논리명 is an unsupported type, check if it needs cleaning or further analysis Unsupported
처리시각 is an unsupported type, check if it needs cleaning or further analysis Unsupported

Reproduction

Analysis started2022-07-17 16:59:00.538327
Analysis finished2022-07-17 16:59:01.955462
Duration1.42 second
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

코드값
Categorical

HIGH CARDINALITY

Distinct462
Distinct (%)92.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
V4
 
4
736
 
4
1157
 
4
2946
 
3
G203
 
3
Other values (457)
482 

Length

Max length9
Median length8
Mean length5.456
Min length1

Unique

Unique435 ?
Unique (%)87.0%

Sample

1st row28
2nd row1137
3rd row27
4th row40
5th row1137

Common Values

ValueCountFrequency (%)
V44
 
0.8%
7364
 
0.8%
11574
 
0.8%
29463
 
0.6%
G2033
 
0.6%
1583
 
0.6%
G1843
 
0.6%
23
 
0.6%
99392
 
0.4%
990392
 
0.4%
Other values (452)469
93.8%

Length

2022-07-18T01:59:02.098380image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
v44
 
0.8%
11574
 
0.8%
7364
 
0.8%
29463
 
0.6%
g2033
 
0.6%
1583
 
0.6%
g1843
 
0.6%
23
 
0.6%
32
 
0.4%
42
 
0.4%
Other values (452)469
93.8%

주제영역코드
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
B
463 
A
 
16
K
 
12
T
 
4
G
 
2
Other values (2)
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st rowB
2nd rowB
3rd rowB
4th rowB
5th rowB

Common Values

ValueCountFrequency (%)
B463
92.6%
A16
 
3.2%
K12
 
2.4%
T4
 
0.8%
G2
 
0.4%
Z2
 
0.4%
I1
 
0.2%

Length

2022-07-18T01:59:02.293260image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-18T01:59:02.496466image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
b463
92.6%
a16
 
3.2%
k12
 
2.4%
t4
 
0.8%
g2
 
0.4%
z2
 
0.4%
i1
 
0.2%

코드유형구분코드
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
C
500 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowC
2nd rowC
3rd rowC
4th rowC
5th rowC

Common Values

ValueCountFrequency (%)
C500
100.0%

Length

2022-07-18T01:59:02.673327image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-18T01:59:02.847703image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
c500
100.0%

목록코드테이블명
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing500
Missing (%)100.0%
Memory size4.0 KiB

목록코드테이블논리명
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing500
Missing (%)100.0%
Memory size4.0 KiB

목록코드컬럼명
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing500
Missing (%)100.0%
Memory size4.0 KiB

목록코드컬럼논리명
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing500
Missing (%)100.0%
Memory size4.0 KiB

최종수정수
Real number (ℝ≥0)

Distinct20
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.37
Minimum1
Maximum36
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-07-18T01:59:03.001641image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile4
Maximum36
Range35
Interquartile range (IQR)1

Descriptive statistics

Standard deviation5.240613328
Coefficient of variation (CV)2.211229252
Kurtosis25.6009528
Mean2.37
Median Absolute Deviation (MAD)0
Skewness5.145566325
Sum1185
Variance27.46402806
MonotonicityNot monotonic
2022-07-18T01:59:03.202955image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
1324
64.8%
2142
28.4%
38
 
1.6%
44
 
0.8%
53
 
0.6%
242
 
0.4%
312
 
0.4%
302
 
0.4%
332
 
0.4%
231
 
0.2%
Other values (10)10
 
2.0%
ValueCountFrequency (%)
1324
64.8%
2142
28.4%
38
 
1.6%
44
 
0.8%
53
 
0.6%
71
 
0.2%
101
 
0.2%
231
 
0.2%
242
 
0.4%
251
 
0.2%
ValueCountFrequency (%)
361
0.2%
351
0.2%
341
0.2%
332
0.4%
321
0.2%
312
0.4%
302
0.4%
291
0.2%
281
0.2%
271
0.2%

처리시각
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing500
Missing (%)100.0%
Memory size4.0 KiB

처리직원번호
Real number (ℝ≥0)

Distinct16
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5329.58
Minimum4509
Maximum6105
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-07-18T01:59:03.383346image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum4509
5-th percentile5099
Q15099
median5099
Q35220
95-th percentile6105
Maximum6105
Range1596
Interquartile range (IQR)121

Descriptive statistics

Standard deviation377.0279881
Coefficient of variation (CV)0.07074253283
Kurtosis0.1238210734
Mean5329.58
Median Absolute Deviation (MAD)0
Skewness1.280898591
Sum2664790
Variance142150.1038
MonotonicityNot monotonic
2022-07-18T01:59:03.584573image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
5099255
51.0%
5220126
25.2%
610551
 
10.2%
609924
 
4.8%
580316
 
3.2%
58236
 
1.2%
57426
 
1.2%
60093
 
0.6%
45093
 
0.6%
55442
 
0.4%
Other values (6)8
 
1.6%
ValueCountFrequency (%)
45093
 
0.6%
49171
 
0.2%
5099255
51.0%
51131
 
0.2%
5220126
25.2%
52221
 
0.2%
54761
 
0.2%
55442
 
0.4%
57426
 
1.2%
580316
 
3.2%
ValueCountFrequency (%)
610551
10.2%
609924
4.8%
60093
 
0.6%
58732
 
0.4%
58702
 
0.4%
58236
 
1.2%
580316
 
3.2%
57426
 
1.2%
55442
 
0.4%
54761
 
0.2%

Interactions

2022-07-18T01:59:00.923800image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-07-18T01:59:00.675365image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-07-18T01:59:01.077379image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-07-18T01:59:00.800980image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-07-18T01:59:03.732903image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-07-18T01:59:03.918491image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-07-18T01:59:04.144334image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-07-18T01:59:04.278771image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-07-18T01:59:04.432256image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-07-18T01:59:01.377152image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-07-18T01:59:01.637972image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-07-18T01:59:01.866583image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

코드값주제영역코드코드유형구분코드목록코드테이블명목록코드테이블논리명목록코드컬럼명목록코드컬럼논리명최종수정수처리시각처리직원번호
028BCNaNNaNNaNNaN2NaN5099
11137BCNaNNaNNaNNaN4NaN5099
227BCNaNNaNNaNNaN2NaN5099
340BCNaNNaNNaNNaN2NaN5099
41137BCNaNNaNNaNNaN3NaN5099
545BCNaNNaNNaNNaN2NaN5099
644BCNaNNaNNaNNaN2NaN5099
743BCNaNNaNNaNNaN2NaN5099
842BCNaNNaNNaNNaN2NaN5099
941BCNaNNaNNaNNaN2NaN5099

Last rows

코드값주제영역코드코드유형구분코드목록코드테이블명목록코드테이블논리명목록코드컬럼명목록코드컬럼논리명최종수정수처리시각처리직원번호
490110101BCNaNNaNNaNNaN2NaN5099
491100305BCNaNNaNNaNNaN2NaN5099
492100304BCNaNNaNNaNNaN2NaN5099
493100303BCNaNNaNNaNNaN2NaN5099
494100302BCNaNNaNNaNNaN2NaN5099
495100301BCNaNNaNNaNNaN2NaN5099
496100204BCNaNNaNNaNNaN2NaN5099
497100203BCNaNNaNNaNNaN2NaN5099
498100202BCNaNNaNNaNNaN2NaN5099
499100201BCNaNNaNNaNNaN2NaN5099

Duplicate rows

Most frequently occurring

코드값주제영역코드코드유형구분코드최종수정수처리직원번호# duplicates
0G184BC161053
1G203BC150993
2G204BC150992