Overview

Dataset statistics

Number of variables8
Number of observations6104
Missing cells6919
Missing cells (%)14.2%
Duplicate rows98
Duplicate rows (%)1.6%
Total size in memory393.6 KiB
Average record size in memory66.0 B

Variable types

Text1
Unsupported1
Categorical1
DateTime4
Numeric1

Dataset

Description수원도시공사에서 운영중인 문화스포츠시설인 장안구민회관, 종합운동장, 가족여성회관 등에서 발생한 수입금에 대한 회계항목정보를 제공합니다.
Author수원도시공사
URLhttps://www.data.go.kr/data/15123871/fileData.do

Alerts

Dataset has 98 (1.6%) duplicate rowsDuplicates
총금액 is highly overall correlated with 거래상태High correlation
거래상태 is highly overall correlated with 총금액High correlation
거래상태 is highly imbalanced (72.3%)Imbalance
현금 has 6104 (100.0%) missing valuesMissing
수입일자 has 815 (13.4%) missing valuesMissing
현금 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-12 14:06:48.866809
Analysis finished2023-12-12 14:06:50.235190
Duration1.37 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

카드
Text

Distinct58
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size47.8 KiB
2023-12-12T23:06:50.637110image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length2
Mean length2.525557
Min length2

Characters and Unicode

Total characters15416
Distinct characters78
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)0.2%

Sample

1st row삼성 마스타
2nd rowKB국민카드
3rd row현대비자개인
4th rowKB국민카드
5th rowKB국민카드
ValueCountFrequency (%)
현대 1637
26.6%
국민 935
15.2%
신한 934
15.2%
삼성 893
14.5%
구하나 438
 
7.1%
비씨 302
 
4.9%
롯데(신 188
 
3.1%
농협 139
 
2.3%
하나(외환 122
 
2.0%
kb국민카드 80
 
1.3%
Other values (49) 489
 
7.9%
2023-12-12T23:06:50.933845image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1698
 
11.0%
1697
 
11.0%
1208
 
7.8%
1020
 
6.6%
1016
 
6.6%
1016
 
6.6%
950
 
6.2%
950
 
6.2%
584
 
3.8%
584
 
3.8%
Other values (68) 4693
30.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 14388
93.3%
Uppercase Letter 348
 
2.3%
Open Punctuation 310
 
2.0%
Close Punctuation 310
 
2.0%
Space Separator 53
 
0.3%
Lowercase Letter 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1698
11.8%
1697
11.8%
1208
 
8.4%
1020
 
7.1%
1016
 
7.1%
1016
 
7.1%
950
 
6.6%
950
 
6.6%
584
 
4.1%
584
 
4.1%
Other values (48) 3665
25.5%
Uppercase Letter
ValueCountFrequency (%)
B 102
29.3%
K 96
27.6%
N 60
17.2%
H 60
17.2%
C 10
 
2.9%
I 7
 
2.0%
J 6
 
1.7%
S 4
 
1.1%
P 2
 
0.6%
V 1
 
0.3%
Lowercase Letter
ValueCountFrequency (%)
u 1
14.3%
n 1
14.3%
i 1
14.3%
t 1
14.3%
a 1
14.3%
l 1
14.3%
m 1
14.3%
Open Punctuation
ValueCountFrequency (%)
( 310
100.0%
Close Punctuation
ValueCountFrequency (%)
) 310
100.0%
Space Separator
ValueCountFrequency (%)
53
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 14388
93.3%
Common 673
 
4.4%
Latin 355
 
2.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1698
11.8%
1697
11.8%
1208
 
8.4%
1020
 
7.1%
1016
 
7.1%
1016
 
7.1%
950
 
6.6%
950
 
6.6%
584
 
4.1%
584
 
4.1%
Other values (48) 3665
25.5%
Latin
ValueCountFrequency (%)
B 102
28.7%
K 96
27.0%
N 60
16.9%
H 60
16.9%
C 10
 
2.8%
I 7
 
2.0%
J 6
 
1.7%
S 4
 
1.1%
P 2
 
0.6%
u 1
 
0.3%
Other values (7) 7
 
2.0%
Common
ValueCountFrequency (%)
( 310
46.1%
) 310
46.1%
53
 
7.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 14388
93.3%
ASCII 1028
 
6.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1698
11.8%
1697
11.8%
1208
 
8.4%
1020
 
7.1%
1016
 
7.1%
1016
 
7.1%
950
 
6.6%
950
 
6.6%
584
 
4.1%
584
 
4.1%
Other values (48) 3665
25.5%
ASCII
ValueCountFrequency (%)
( 310
30.2%
) 310
30.2%
B 102
 
9.9%
K 96
 
9.3%
N 60
 
5.8%
H 60
 
5.8%
53
 
5.2%
C 10
 
1.0%
I 7
 
0.7%
J 6
 
0.6%
Other values (10) 14
 
1.4%

현금
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing6104
Missing (%)100.0%
Memory size53.8 KiB

거래상태
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size47.8 KiB
결제완료
5570 
신용승인
 
525
신용취소
 
9

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row신용승인
2nd row신용승인
3rd row신용승인
4th row신용승인
5th row신용취소

Common Values

ValueCountFrequency (%)
결제완료 5570
91.3%
신용승인 525
 
8.6%
신용취소 9
 
0.1%

Length

2023-12-12T23:06:51.045076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:06:51.164604image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
결제완료 5570
91.3%
신용승인 525
 
8.6%
신용취소 9
 
0.1%
Distinct84
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size47.8 KiB
Minimum2023-07-01 00:00:00
Maximum2023-09-22 00:00:00
2023-12-12T23:06:51.272491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T23:06:51.403801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

수입일자
Date

MISSING 

Distinct55
Distinct (%)1.0%
Missing815
Missing (%)13.4%
Memory size47.8 KiB
Minimum2023-07-07 00:00:00
Maximum2023-09-22 00:00:00
2023-12-12T23:06:51.554912image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T23:06:51.710447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

총금액
Real number (ℝ)

HIGH CORRELATION 

Distinct120
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24574.723
Minimum-90000
Maximum180000
Zeros0
Zeros (%)0.0%
Negative9
Negative (%)0.1%
Memory size53.8 KiB
2023-12-12T23:06:51.893294image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-90000
5-th percentile400
Q12400
median4000
Q345000
95-th percentile90000
Maximum180000
Range270000
Interquartile range (IQR)42600

Descriptive statistics

Standard deviation31928.922
Coefficient of variation (CV)1.2992587
Kurtosis3.1751652
Mean24574.723
Median Absolute Deviation (MAD)3200
Skewness1.5575097
Sum1.5000411 × 108
Variance1.0194561 × 109
MonotonicityNot monotonic
2023-12-12T23:06:52.105924image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4000 1297
21.2%
2400 349
 
5.7%
2000 266
 
4.4%
45000 251
 
4.1%
35000 245
 
4.0%
90000 222
 
3.6%
3000 208
 
3.4%
900 178
 
2.9%
1800 159
 
2.6%
60000 151
 
2.5%
Other values (110) 2778
45.5%
ValueCountFrequency (%)
-90000 5
 
0.1%
-15000 1
 
< 0.1%
-10000 2
 
< 0.1%
-5000 1
 
< 0.1%
50 2
 
< 0.1%
100 87
1.4%
200 56
 
0.9%
300 34
 
0.6%
350 1
 
< 0.1%
400 141
2.3%
ValueCountFrequency (%)
180000 38
 
0.6%
150000 14
 
0.2%
125810 5
 
0.1%
120000 24
 
0.4%
108000 2
 
< 0.1%
106000 1
 
< 0.1%
100650 10
 
0.2%
100000 68
 
1.1%
90000 222
3.6%
84000 35
 
0.6%
Distinct84
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size47.8 KiB
Minimum2023-07-01 00:00:00
Maximum2023-09-22 00:00:00
2023-12-12T23:06:52.298489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T23:06:52.496296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct1492
Distinct (%)24.4%
Missing0
Missing (%)0.0%
Memory size47.8 KiB
Minimum2023-12-12 00:04:00
Maximum2023-12-12 23:54:00
2023-12-12T23:06:52.711319image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T23:06:52.885530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2023-12-12T23:06:49.877338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T23:06:53.001054image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
카드거래상태징수결의일자수입일자총금액거래일자
카드1.0000.9010.8290.8300.6470.829
거래상태0.9011.0000.9231.0000.8410.923
징수결의일자0.8290.9231.0001.0000.7751.000
수입일자0.8301.0001.0001.0000.7361.000
총금액0.6470.8410.7750.7361.0000.775
거래일자0.8290.9231.0001.0000.7751.000
2023-12-12T23:06:53.145503image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
총금액거래상태
총금액1.0000.830
거래상태0.8301.000

Missing values

2023-12-12T23:06:50.040537image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T23:06:50.181126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

카드현금거래상태징수결의일자수입일자총금액거래일자거래시간
0삼성 마스타<NA>신용승인2023-07-032023-07-10100002023-07-039:05:40
1KB국민카드<NA>신용승인2023-07-032023-07-10100002023-07-039:06:23
2현대비자개인<NA>신용승인2023-07-032023-07-10100002023-07-039:08:37
3KB국민카드<NA>신용승인2023-07-03<NA>100002023-07-039:08:49
4KB국민카드<NA>신용취소2023-07-03<NA>-100002023-07-039:08:49
5신한카드체크<NA>신용승인2023-07-032023-07-10100002023-07-039:09:23
6신한카드<NA>신용승인2023-07-032023-07-10100002023-07-039:10:57
7KB국민카드<NA>신용승인2023-07-032023-07-10100002023-07-039:12:12
8현대 카드<NA>신용취소2023-07-03<NA>-100002023-07-039:13:12
9현대 카드<NA>신용승인2023-07-03<NA>100002023-07-039:13:14
카드현금거래상태징수결의일자수입일자총금액거래일자거래시간
6094삼성<NA>결제완료202307202023072763000202307209:21
6095삼성<NA>결제완료20230920<NA>452902023092010:34
6096삼성<NA>결제완료202307272023080324002023072719:54
6097삼성<NA>결제완료202308112023082130002023081111:42
6098삼성<NA>결제완료2023071920230726585002023071918:10
6099삼성<NA>결제완료2023083020230906900202308309:58
6100삼성<NA>결제완료202309032023090840002023090311:47
6101삼성<NA>결제완료20230814202308224000202308147:45
6102삼성<NA>결제완료20230803202308104000202308037:45
6103삼성<NA>결제완료20230822202308294000202308228:16

Duplicate rows

Most frequently occurring

카드거래상태징수결의일자수입일자총금액거래일자거래시간# duplicates
4국민결제완료202307032023071024002023070318:043
66신한결제완료202308282023090454000202308286:013
87현대결제완료202308282023090454000202308286:033
89현대결제완료202308282023090465000202308286:033
0구하나결제완료2023072420230731450002023072415:212
1구하나결제완료20230801202308089002023080115:412
2구하나결제완료20230906202309139002023090613:362
3국민결제완료202307012023070740002023070115:022
5국민결제완료202307032023071024002023070318:052
6국민결제완료202307032023071040002023070317:532