Dataset statistics
Number of variables | 10 |
---|---|
Number of observations | 500 |
Missing cells | 2500 |
Missing cells (%) | 50.0% |
Duplicate rows | 3 |
Duplicate rows (%) | 0.6% |
Total size in memory | 39.2 KiB |
Average record size in memory | 80.3 B |
Variable types
Categorical | 3 |
---|---|
Unsupported | 5 |
Numeric | 2 |
Dataset
Description | 해당 파일 데이터는 신용보증기금의 시스템관리공통코드마스터이력에 대한 정보를 확인하실 수 있는 자료이니 데이터 활용에 참고하여 주시기 바랍니다. |
---|---|
Author | 신용보증기금 |
URL | https://www.data.go.kr/data/15093316/fileData.do |
코드유형구분코드 has constant value "C" | Constant |
Dataset has 3 (0.6%) duplicate rows | Duplicates |
코드값 has a high cardinality: 462 distinct values | High cardinality |
주제영역코드 is highly correlated with 코드유형구분코드 | High correlation |
코드유형구분코드 is highly correlated with 주제영역코드 | High correlation |
목록코드테이블명 has 500 (100.0%) missing values | Missing |
목록코드테이블논리명 has 500 (100.0%) missing values | Missing |
목록코드컬럼명 has 500 (100.0%) missing values | Missing |
목록코드컬럼논리명 has 500 (100.0%) missing values | Missing |
처리시각 has 500 (100.0%) missing values | Missing |
목록코드테이블명 is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
목록코드테이블논리명 is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
목록코드컬럼명 is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
목록코드컬럼논리명 is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
처리시각 is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Reproduction
Analysis started | 2022-07-17 16:59:00.538327 |
---|---|
Analysis finished | 2022-07-17 16:59:01.955462 |
Duration | 1.42 second |
Software version | pandas-profiling v3.2.0 |
Download configuration | config.json |
Distinct | 462 |
---|---|
Distinct (%) | 92.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
V4 | 4 |
---|---|
736 | 4 |
1157 | 4 |
2946 | 3 |
G203 | 3 |
Other values (457) |
Length
Max length | 9 |
---|---|
Median length | 8 |
Mean length | 5.456 |
Min length | 1 |
Unique
Unique | 435 ? |
---|---|
Unique (%) | 87.0% |
Sample
1st row | 28 |
---|---|
2nd row | 1137 |
3rd row | 27 |
4th row | 40 |
5th row | 1137 |
Common Values
Value | Count | Frequency (%) |
V4 | 4 | 0.8% |
736 | 4 | 0.8% |
1157 | 4 | 0.8% |
2946 | 3 | 0.6% |
G203 | 3 | 0.6% |
158 | 3 | 0.6% |
G184 | 3 | 0.6% |
2 | 3 | 0.6% |
9939 | 2 | 0.4% |
99039 | 2 | 0.4% |
Other values (452) | 469 |
Length
Histogram of lengths of the category
Value | Count | Frequency (%) |
v4 | 4 | 0.8% |
1157 | 4 | 0.8% |
736 | 4 | 0.8% |
2946 | 3 | 0.6% |
g203 | 3 | 0.6% |
158 | 3 | 0.6% |
g184 | 3 | 0.6% |
2 | 3 | 0.6% |
3 | 2 | 0.4% |
4 | 2 | 0.4% |
Other values (452) | 469 |
Distinct | 7 |
---|---|
Distinct (%) | 1.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
B | |
---|---|
A | 16 |
K | 12 |
T | 4 |
G | 2 |
Other values (2) | 3 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | 0.2% |
Sample
1st row | B |
---|---|
2nd row | B |
3rd row | B |
4th row | B |
5th row | B |
Common Values
Value | Count | Frequency (%) |
B | 463 | |
A | 16 | 3.2% |
K | 12 | 2.4% |
T | 4 | 0.8% |
G | 2 | 0.4% |
Z | 2 | 0.4% |
I | 1 | 0.2% |
Length
Histogram of lengths of the category
Category Frequency Plot
Value | Count | Frequency (%) |
b | 463 | |
a | 16 | 3.2% |
k | 12 | 2.4% |
t | 4 | 0.8% |
g | 2 | 0.4% |
z | 2 | 0.4% |
i | 1 | 0.2% |
Distinct | 1 |
---|---|
Distinct (%) | 0.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
C |
---|
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | C |
---|---|
2nd row | C |
3rd row | C |
4th row | C |
5th row | C |
Common Values
Value | Count | Frequency (%) |
C | 500 |
Length
Histogram of lengths of the category
Category Frequency Plot
Value | Count | Frequency (%) |
c | 500 |
최종수정수
Real number (ℝ≥0)
Distinct | 20 |
---|---|
Distinct (%) | 4.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2.37 |
Minimum | 1 |
---|---|
Maximum | 36 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 4.0 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 1 |
median | 1 |
Q3 | 2 |
95-th percentile | 4 |
Maximum | 36 |
Range | 35 |
Interquartile range (IQR) | 1 |
Descriptive statistics
Standard deviation | 5.240613328 |
---|---|
Coefficient of variation (CV) | 2.211229252 |
Kurtosis | 25.6009528 |
Mean | 2.37 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 5.145566325 |
Sum | 1185 |
Variance | 27.46402806 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=20)
Value | Count | Frequency (%) |
1 | 324 | |
2 | 142 | |
3 | 8 | 1.6% |
4 | 4 | 0.8% |
5 | 3 | 0.6% |
24 | 2 | 0.4% |
31 | 2 | 0.4% |
30 | 2 | 0.4% |
33 | 2 | 0.4% |
23 | 1 | 0.2% |
Other values (10) | 10 | 2.0% |
Value | Count | Frequency (%) |
1 | 324 | |
2 | 142 | |
3 | 8 | 1.6% |
4 | 4 | 0.8% |
5 | 3 | 0.6% |
7 | 1 | 0.2% |
10 | 1 | 0.2% |
23 | 1 | 0.2% |
24 | 2 | 0.4% |
25 | 1 | 0.2% |
Value | Count | Frequency (%) |
36 | 1 | |
35 | 1 | |
34 | 1 | |
33 | 2 | |
32 | 1 | |
31 | 2 | |
30 | 2 | |
29 | 1 | |
28 | 1 | |
27 | 1 |
처리직원번호
Real number (ℝ≥0)
Distinct | 16 |
---|---|
Distinct (%) | 3.2% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 5329.58 |
Minimum | 4509 |
---|---|
Maximum | 6105 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 4.0 KiB |
Quantile statistics
Minimum | 4509 |
---|---|
5-th percentile | 5099 |
Q1 | 5099 |
median | 5099 |
Q3 | 5220 |
95-th percentile | 6105 |
Maximum | 6105 |
Range | 1596 |
Interquartile range (IQR) | 121 |
Descriptive statistics
Standard deviation | 377.0279881 |
---|---|
Coefficient of variation (CV) | 0.07074253283 |
Kurtosis | 0.1238210734 |
Mean | 5329.58 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 1.280898591 |
Sum | 2664790 |
Variance | 142150.1038 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=16)
Value | Count | Frequency (%) |
5099 | 255 | |
5220 | 126 | |
6105 | 51 | 10.2% |
6099 | 24 | 4.8% |
5803 | 16 | 3.2% |
5823 | 6 | 1.2% |
5742 | 6 | 1.2% |
6009 | 3 | 0.6% |
4509 | 3 | 0.6% |
5544 | 2 | 0.4% |
Other values (6) | 8 | 1.6% |
Value | Count | Frequency (%) |
4509 | 3 | 0.6% |
4917 | 1 | 0.2% |
5099 | 255 | |
5113 | 1 | 0.2% |
5220 | 126 | |
5222 | 1 | 0.2% |
5476 | 1 | 0.2% |
5544 | 2 | 0.4% |
5742 | 6 | 1.2% |
5803 | 16 | 3.2% |
Value | Count | Frequency (%) |
6105 | 51 | |
6099 | 24 | |
6009 | 3 | 0.6% |
5873 | 2 | 0.4% |
5870 | 2 | 0.4% |
5823 | 6 | 1.2% |
5803 | 16 | 3.2% |
5742 | 6 | 1.2% |
5544 | 2 | 0.4% |
5476 | 1 | 0.2% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.
First rows
코드값 | 주제영역코드 | 코드유형구분코드 | 목록코드테이블명 | 목록코드테이블논리명 | 목록코드컬럼명 | 목록코드컬럼논리명 | 최종수정수 | 처리시각 | 처리직원번호 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 28 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
1 | 1137 | B | C | NaN | NaN | NaN | NaN | 4 | NaN | 5099 |
2 | 27 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
3 | 40 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
4 | 1137 | B | C | NaN | NaN | NaN | NaN | 3 | NaN | 5099 |
5 | 45 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
6 | 44 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
7 | 43 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
8 | 42 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
9 | 41 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
Last rows
코드값 | 주제영역코드 | 코드유형구분코드 | 목록코드테이블명 | 목록코드테이블논리명 | 목록코드컬럼명 | 목록코드컬럼논리명 | 최종수정수 | 처리시각 | 처리직원번호 | |
---|---|---|---|---|---|---|---|---|---|---|
490 | 110101 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
491 | 100305 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
492 | 100304 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
493 | 100303 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
494 | 100302 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
495 | 100301 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
496 | 100204 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
497 | 100203 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
498 | 100202 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
499 | 100201 | B | C | NaN | NaN | NaN | NaN | 2 | NaN | 5099 |
Most frequently occurring
코드값 | 주제영역코드 | 코드유형구분코드 | 최종수정수 | 처리직원번호 | # duplicates | |
---|---|---|---|---|---|---|
0 | G184 | B | C | 1 | 6105 | 3 |
1 | G203 | B | C | 1 | 5099 | 3 |
2 | G204 | B | C | 1 | 5099 | 2 |