Overview

Dataset statistics

Number of variables15
Number of observations500
Missing cells1018
Missing cells (%)13.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory46.9 KiB
Average record size in memory96.1 B

Variable types

DateTime1
Categorical8
Numeric5
Boolean1

Alerts

Rooli has a high cardinality: 261 distinct values High cardinality
Työpaikka has a high cardinality: 73 distinct values High cardinality
Työkokemus is highly correlated with Kuukausipalkka and 2 other fieldsHigh correlation
Kuukausipalkka is highly correlated with Työkokemus and 2 other fieldsHigh correlation
Vuositulot is highly correlated with Työkokemus and 2 other fieldsHigh correlation
Kk-tulot is highly correlated with Työkokemus and 2 other fieldsHigh correlation
Työkokemus is highly correlated with KuukausipalkkaHigh correlation
Kuukausipalkka is highly correlated with Työkokemus and 2 other fieldsHigh correlation
Vuositulot is highly correlated with Kuukausipalkka and 1 other fieldsHigh correlation
Kk-tulot is highly correlated with Kuukausipalkka and 1 other fieldsHigh correlation
Kuukausipalkka is highly correlated with Vuositulot and 1 other fieldsHigh correlation
Vuositulot is highly correlated with Kuukausipalkka and 1 other fieldsHigh correlation
Kk-tulot is highly correlated with Kuukausipalkka and 1 other fieldsHigh correlation
Vapaa sana is highly correlated with Työpaikka and 1 other fieldsHigh correlation
Kaupunki is highly correlated with TyöpaikkaHigh correlation
Työpaikka is highly correlated with Vapaa sana and 2 other fieldsHigh correlation
Kilpailukykyinen is highly correlated with Vapaa sanaHigh correlation
Työsuhteen luonne is highly correlated with TyöpaikkaHigh correlation
Kaupunki is highly correlated with Työsuhteen luonne and 5 other fieldsHigh correlation
Ikä is highly correlated with Työkokemus and 2 other fieldsHigh correlation
Sukupuoli is highly correlated with Vapaa sanaHigh correlation
Työkokemus is highly correlated with Ikä and 5 other fieldsHigh correlation
Työsuhteen luonne is highly correlated with Kaupunki and 4 other fieldsHigh correlation
Työaika is highly correlated with Työpaikka and 1 other fieldsHigh correlation
Etä is highly correlated with Vapaa sanaHigh correlation
Kuukausipalkka is highly correlated with Kaupunki and 6 other fieldsHigh correlation
Vuositulot is highly correlated with Kaupunki and 6 other fieldsHigh correlation
Kilpailukykyinen is highly correlated with Kuukausipalkka and 1 other fieldsHigh correlation
Työpaikka is highly correlated with Kaupunki and 8 other fieldsHigh correlation
Vapaa sana is highly correlated with Kaupunki and 11 other fieldsHigh correlation
Kk-tulot is highly correlated with Kaupunki and 6 other fieldsHigh correlation
Sukupuoli has 35 (7.0%) missing values Missing
Työaika has 19 (3.8%) missing values Missing
Rooli has 13 (2.6%) missing values Missing
Kuukausipalkka has 44 (8.8%) missing values Missing
Vuositulot has 13 (2.6%) missing values Missing
Kilpailukykyinen has 15 (3.0%) missing values Missing
Työpaikka has 387 (77.4%) missing values Missing
Vapaa sana has 462 (92.4%) missing values Missing
Kk-tulot has 13 (2.6%) missing values Missing
Vapaa sana is uniformly distributed Uniform
Timestamp has unique values Unique

Reproduction

Analysis started2022-09-27 11:15:11.996317
Analysis finished2022-09-27 11:15:18.416059
Duration6.42 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Timestamp
Date

UNIQUE

Distinct500
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Minimum2021-02-15 11:57:08.316000
Maximum2021-02-27 17:49:24.789000
2022-09-27T11:15:18.485460image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:18.630335image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Kaupunki
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct28
Distinct (%)5.7%
Missing5
Missing (%)1.0%
Memory size1.9 KiB
PK-Seutu
250 
Tampere
117 
Turku
47 
Oulu
26 
Jyväskylä
 
18
Other values (23)
37 

Length

Max length15
Median length8
Mean length7.234343434
Min length2

Characters and Unicode

Total characters3581
Distinct characters40
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)2.8%

Sample

1st rowPK-Seutu
2nd rowTurku
3rd rowPK-Seutu
4th rowTampere
5th rowPK-Seutu

Common Values

ValueCountFrequency (%)
PK-Seutu250
50.0%
Tampere117
23.4%
Turku47
 
9.4%
Oulu26
 
5.2%
Jyväskylä18
 
3.6%
Kuopio7
 
1.4%
Lontoo2
 
0.4%
Vaasa2
 
0.4%
Tallinna2
 
0.4%
Pori2
 
0.4%
Other values (18)22
 
4.4%
(Missing)5
 
1.0%

Length

2022-09-27T11:15:18.754978image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pk-seutu250
50.1%
tampere117
23.4%
turku47
 
9.4%
oulu26
 
5.2%
jyväskylä18
 
3.6%
kuopio7
 
1.4%
eu2
 
0.4%
hämeenlinna2
 
0.4%
kouvola2
 
0.4%
lahti2
 
0.4%
Other values (22)26
 
5.2%

Most occurring characters

ValueCountFrequency (%)
u661
18.5%
e496
13.9%
K261
 
7.3%
t257
 
7.2%
P253
 
7.1%
-252
 
7.0%
S252
 
7.0%
r170
 
4.7%
T166
 
4.6%
a145
 
4.0%
Other values (30)668
18.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2323
64.9%
Uppercase Letter1001
28.0%
Dash Punctuation252
 
7.0%
Space Separator4
 
0.1%
Other Punctuation1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
u661
28.5%
e496
21.4%
t257
 
11.1%
r170
 
7.3%
a145
 
6.2%
p125
 
5.4%
m123
 
5.3%
k70
 
3.0%
l58
 
2.5%
ä44
 
1.9%
Other values (10)174
 
7.5%
Uppercase Letter
ValueCountFrequency (%)
K261
26.1%
P253
25.3%
S252
25.2%
T166
16.6%
O26
 
2.6%
J19
 
1.9%
L5
 
0.5%
E4
 
0.4%
H3
 
0.3%
V3
 
0.3%
Other values (7)9
 
0.9%
Dash Punctuation
ValueCountFrequency (%)
-252
100.0%
Space Separator
ValueCountFrequency (%)
4
100.0%
Other Punctuation
ValueCountFrequency (%)
,1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3324
92.8%
Common257
 
7.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
u661
19.9%
e496
14.9%
K261
 
7.9%
t257
 
7.7%
P253
 
7.6%
S252
 
7.6%
r170
 
5.1%
T166
 
5.0%
a145
 
4.4%
p125
 
3.8%
Other values (27)538
16.2%
Common
ValueCountFrequency (%)
-252
98.1%
4
 
1.6%
,1
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII3537
98.8%
None44
 
1.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
u661
18.7%
e496
14.0%
K261
 
7.4%
t257
 
7.3%
P253
 
7.2%
-252
 
7.1%
S252
 
7.1%
r170
 
4.8%
T166
 
4.7%
a145
 
4.1%
Other values (29)624
17.6%
None
ValueCountFrequency (%)
ä44
100.0%

Ikä
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7
Distinct (%)1.4%
Missing3
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean33.77464789
Minimum23
Maximum53
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-09-27T11:15:18.847362image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum23
5-th percentile23
Q128
median33
Q338
95-th percentile43
Maximum53
Range30
Interquartile range (IQR)10

Descriptive statistics

Standard deviation6.053651351
Coefficient of variation (CV)0.1792365496
Kurtosis0.2306290239
Mean33.77464789
Median Absolute Deviation (MAD)5
Skewness0.480434113
Sum16786
Variance36.64669468
MonotonicityNot monotonic
2022-09-27T11:15:18.928447image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
33170
34.0%
28121
24.2%
38106
21.2%
4354
 
10.8%
2332
 
6.4%
488
 
1.6%
536
 
1.2%
(Missing)3
 
0.6%
ValueCountFrequency (%)
2332
 
6.4%
28121
24.2%
33170
34.0%
38106
21.2%
4354
 
10.8%
488
 
1.6%
536
 
1.2%
ValueCountFrequency (%)
536
 
1.2%
488
 
1.6%
4354
 
10.8%
38106
21.2%
33170
34.0%
28121
24.2%
2332
 
6.4%

Sukupuoli
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)0.6%
Missing35
Missing (%)7.0%
Memory size760.0 B
mies
419 
nainen
 
37
muu
 
9

Length

Max length6
Median length4
Mean length4.139784946
Min length3

Characters and Unicode

Total characters1925
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmies
2nd rowmies
3rd rowmies
4th rowmies
5th rowmies

Common Values

ValueCountFrequency (%)
mies419
83.8%
nainen37
 
7.4%
muu9
 
1.8%
(Missing)35
 
7.0%

Length

2022-09-27T11:15:19.172407image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-27T11:15:19.275812image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
mies419
90.1%
nainen37
 
8.0%
muu9
 
1.9%

Most occurring characters

ValueCountFrequency (%)
i456
23.7%
e456
23.7%
m428
22.2%
s419
21.8%
n111
 
5.8%
a37
 
1.9%
u18
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1925
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i456
23.7%
e456
23.7%
m428
22.2%
s419
21.8%
n111
 
5.8%
a37
 
1.9%
u18
 
0.9%

Most occurring scripts

ValueCountFrequency (%)
Latin1925
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i456
23.7%
e456
23.7%
m428
22.2%
s419
21.8%
n111
 
5.8%
a37
 
1.9%
u18
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII1925
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i456
23.7%
e456
23.7%
m428
22.2%
s419
21.8%
n111
 
5.8%
a37
 
1.9%
u18
 
0.9%

Työkokemus
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct27
Distinct (%)5.5%
Missing5
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean9.523232323
Minimum0
Maximum30
Zeros4
Zeros (%)0.8%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-09-27T11:15:19.364032image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q15
median9
Q313
95-th percentile21
Maximum30
Range30
Interquartile range (IQR)8

Descriptive statistics

Standard deviation6.053319568
Coefficient of variation (CV)0.6356370781
Kurtosis-0.03938790912
Mean9.523232323
Median Absolute Deviation (MAD)4
Skewness0.7271444909
Sum4714
Variance36.64267779
MonotonicityNot monotonic
2022-09-27T11:15:19.468027image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
554
 
10.8%
1040
 
8.0%
431
 
6.2%
730
 
6.0%
1529
 
5.8%
229
 
5.8%
2028
 
5.6%
328
 
5.6%
627
 
5.4%
1325
 
5.0%
Other values (17)174
34.8%
ValueCountFrequency (%)
04
 
0.8%
117
 
3.4%
229
5.8%
328
5.6%
431
6.2%
554
10.8%
627
5.4%
730
6.0%
825
5.0%
922
4.4%
ValueCountFrequency (%)
302
 
0.4%
256
 
1.2%
243
 
0.6%
234
 
0.8%
225
 
1.0%
217
 
1.4%
2028
5.6%
191
 
0.2%
182
 
0.4%
173
 
0.6%

Työsuhteen luonne
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.6%
Missing1
Missing (%)0.2%
Memory size4.0 KiB
Työntekijä / palkollinen
446 
Freelancer
 
27
Yrittäjä
 
26

Length

Max length24
Median length24
Mean length22.40881764
Min length8

Characters and Unicode

Total characters11182
Distinct characters20
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTyöntekijä / palkollinen
2nd rowTyöntekijä / palkollinen
3rd rowTyöntekijä / palkollinen
4th rowYrittäjä
5th rowTyöntekijä / palkollinen

Common Values

ValueCountFrequency (%)
Työntekijä / palkollinen446
89.2%
Freelancer27
 
5.4%
Yrittäjä26
 
5.2%
(Missing)1
 
0.2%

Length

2022-09-27T11:15:19.580156image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-27T11:15:19.684465image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
työntekijä446
32.1%
446
32.1%
palkollinen446
32.1%
freelancer27
 
1.9%
yrittäjä26
 
1.9%

Most occurring characters

ValueCountFrequency (%)
n1365
12.2%
l1365
12.2%
e973
 
8.7%
i918
 
8.2%
892
 
8.0%
k892
 
8.0%
t498
 
4.5%
ä498
 
4.5%
a473
 
4.2%
j472
 
4.2%
Other values (10)2836
25.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter9345
83.6%
Space Separator892
 
8.0%
Uppercase Letter499
 
4.5%
Other Punctuation446
 
4.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n1365
14.6%
l1365
14.6%
e973
10.4%
i918
9.8%
k892
9.5%
t498
 
5.3%
ä498
 
5.3%
a473
 
5.1%
j472
 
5.1%
p446
 
4.8%
Other values (5)1445
15.5%
Uppercase Letter
ValueCountFrequency (%)
T446
89.4%
F27
 
5.4%
Y26
 
5.2%
Space Separator
ValueCountFrequency (%)
892
100.0%
Other Punctuation
ValueCountFrequency (%)
/446
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin9844
88.0%
Common1338
 
12.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n1365
13.9%
l1365
13.9%
e973
9.9%
i918
9.3%
k892
9.1%
t498
 
5.1%
ä498
 
5.1%
a473
 
4.8%
j472
 
4.8%
p446
 
4.5%
Other values (8)1944
19.7%
Common
ValueCountFrequency (%)
892
66.7%
/446
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII10238
91.6%
None944
 
8.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n1365
13.3%
l1365
13.3%
e973
9.5%
i918
9.0%
892
8.7%
k892
8.7%
t498
 
4.9%
a473
 
4.6%
j472
 
4.6%
p446
 
4.4%
Other values (8)1944
19.0%
None
ValueCountFrequency (%)
ä498
52.8%
ö446
47.2%

Työaika
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)1.0%
Missing19
Missing (%)3.8%
Memory size4.0 KiB
1.0
452 
0.8
 
23
0.5
 
4
0.7
 
1
0.6
 
1

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1443
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.4%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0452
90.4%
0.823
 
4.6%
0.54
 
0.8%
0.71
 
0.2%
0.61
 
0.2%
(Missing)19
 
3.8%

Length

2022-09-27T11:15:19.769008image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-27T11:15:19.867418image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
1.0452
94.0%
0.823
 
4.8%
0.54
 
0.8%
0.71
 
0.2%
0.61
 
0.2%

Most occurring characters

ValueCountFrequency (%)
.481
33.3%
0481
33.3%
1452
31.3%
823
 
1.6%
54
 
0.3%
71
 
0.1%
61
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number962
66.7%
Other Punctuation481
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0481
50.0%
1452
47.0%
823
 
2.4%
54
 
0.4%
71
 
0.1%
61
 
0.1%
Other Punctuation
ValueCountFrequency (%)
.481
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1443
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.481
33.3%
0481
33.3%
1452
31.3%
823
 
1.6%
54
 
0.3%
71
 
0.1%
61
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1443
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.481
33.3%
0481
33.3%
1452
31.3%
823
 
1.6%
54
 
0.3%
71
 
0.1%
61
 
0.1%

Rooli
Categorical

HIGH CARDINALITY
MISSING

Distinct261
Distinct (%)53.6%
Missing13
Missing (%)2.6%
Memory size4.0 KiB
Ohjelmistokehittäjä
42 
full-stack
36 
Full-stack
 
25
ohjelmistokehittäjä
 
17
Arkkitehti
 
16
Other values (256)
351 

Length

Max length67
Median length52
Mean length19.23408624
Min length2

Characters and Unicode

Total characters9367
Distinct characters58
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique213 ?
Unique (%)43.7%

Sample

1st rowArkkitehti
2nd rowfull-stack
3rd rowFull-stack ohjelmistokehittäjä
4th rowweb-arkkitehti
5th rowOhjelmistokehittäjä

Common Values

ValueCountFrequency (%)
Ohjelmistokehittäjä42
 
8.4%
full-stack36
 
7.2%
Full-stack25
 
5.0%
ohjelmistokehittäjä17
 
3.4%
Arkkitehti16
 
3.2%
Full-stack ohjelmistokehittäjä8
 
1.6%
full-stack ohjelmistokehittäjä7
 
1.4%
arkkitehti6
 
1.2%
Frontend6
 
1.2%
frontend6
 
1.2%
Other values (251)318
63.6%
(Missing)13
 
2.6%

Length

2022-09-27T11:15:19.989568image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
full-stack145
 
16.1%
ohjelmistokehittäjä115
 
12.8%
developer61
 
6.8%
arkkitehti36
 
4.0%
35
 
3.9%
lead33
 
3.7%
frontend28
 
3.1%
senior21
 
2.3%
backend17
 
1.9%
kehittäjä16
 
1.8%
Other values (196)393
43.7%

Most occurring characters

ValueCountFrequency (%)
t975
 
10.4%
e862
 
9.2%
l683
 
7.3%
i679
 
7.2%
k517
 
5.5%
o489
 
5.2%
s449
 
4.8%
a448
 
4.8%
419
 
4.5%
h374
 
4.0%
Other values (48)3472
37.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8134
86.8%
Uppercase Letter474
 
5.1%
Space Separator420
 
4.5%
Dash Punctuation177
 
1.9%
Other Punctuation99
 
1.1%
Open Punctuation27
 
0.3%
Close Punctuation27
 
0.3%
Math Symbol8
 
0.1%
Decimal Number1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t975
12.0%
e862
 
10.6%
l683
 
8.4%
i679
 
8.3%
k517
 
6.4%
o489
 
6.0%
s449
 
5.5%
a448
 
5.5%
h374
 
4.6%
j355
 
4.4%
Other values (16)2303
28.3%
Uppercase Letter
ValueCountFrequency (%)
F107
22.6%
O99
20.9%
S52
11.0%
D42
 
8.9%
T28
 
5.9%
A28
 
5.9%
L21
 
4.4%
C18
 
3.8%
E12
 
2.5%
P11
 
2.3%
Other values (11)56
11.8%
Other Punctuation
ValueCountFrequency (%)
,53
53.5%
/42
42.4%
&3
 
3.0%
.1
 
1.0%
Space Separator
ValueCountFrequency (%)
419
99.8%
 1
 
0.2%
Dash Punctuation
ValueCountFrequency (%)
-177
100.0%
Open Punctuation
ValueCountFrequency (%)
(27
100.0%
Close Punctuation
ValueCountFrequency (%)
)27
100.0%
Math Symbol
ValueCountFrequency (%)
+8
100.0%
Decimal Number
ValueCountFrequency (%)
11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8608
91.9%
Common759
 
8.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
t975
 
11.3%
e862
 
10.0%
l683
 
7.9%
i679
 
7.9%
k517
 
6.0%
o489
 
5.7%
s449
 
5.2%
a448
 
5.2%
h374
 
4.3%
j355
 
4.1%
Other values (37)2777
32.3%
Common
ValueCountFrequency (%)
419
55.2%
-177
23.3%
,53
 
7.0%
/42
 
5.5%
(27
 
3.6%
)27
 
3.6%
+8
 
1.1%
&3
 
0.4%
11
 
0.1%
 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII9013
96.2%
None354
 
3.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t975
 
10.8%
e862
 
9.6%
l683
 
7.6%
i679
 
7.5%
k517
 
5.7%
o489
 
5.4%
s449
 
5.0%
a448
 
5.0%
419
 
4.6%
h374
 
4.1%
Other values (45)3118
34.6%
None
ValueCountFrequency (%)
ä337
95.2%
ö16
 
4.5%
 1
 
0.3%

Etä
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.6%
Missing3
Missing (%)0.6%
Memory size760.0 B
Etä
208 
Toimisto
173 
50/50
116 

Length

Max length8
Median length5
Mean length5.207243461
Min length3

Characters and Unicode

Total characters2588
Distinct characters11
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row50/50
2nd rowEtä
3rd rowEtä
4th rowEtä
5th rowEtä

Common Values

ValueCountFrequency (%)
Etä208
41.6%
Toimisto173
34.6%
50/50116
23.2%
(Missing)3
 
0.6%

Length

2022-09-27T11:15:20.116966image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-27T11:15:20.219200image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
etä208
41.9%
toimisto173
34.8%
50/50116
23.3%

Most occurring characters

ValueCountFrequency (%)
t381
14.7%
o346
13.4%
i346
13.4%
5232
9.0%
0232
9.0%
E208
8.0%
ä208
8.0%
T173
6.7%
m173
6.7%
s173
6.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1627
62.9%
Decimal Number464
 
17.9%
Uppercase Letter381
 
14.7%
Other Punctuation116
 
4.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t381
23.4%
o346
21.3%
i346
21.3%
ä208
12.8%
m173
10.6%
s173
10.6%
Decimal Number
ValueCountFrequency (%)
5232
50.0%
0232
50.0%
Uppercase Letter
ValueCountFrequency (%)
E208
54.6%
T173
45.4%
Other Punctuation
ValueCountFrequency (%)
/116
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2008
77.6%
Common580
 
22.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
t381
19.0%
o346
17.2%
i346
17.2%
E208
10.4%
ä208
10.4%
T173
8.6%
m173
8.6%
s173
8.6%
Common
ValueCountFrequency (%)
5232
40.0%
0232
40.0%
/116
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2380
92.0%
None208
 
8.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t381
16.0%
o346
14.5%
i346
14.5%
5232
9.7%
0232
9.7%
E208
8.7%
T173
7.3%
m173
7.3%
s173
7.3%
/116
 
4.9%
None
ValueCountFrequency (%)
ä208
100.0%

Kuukausipalkka
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct130
Distinct (%)28.5%
Missing44
Missing (%)8.8%
Infinite0
Infinite (%)0.0%
Mean4671.388158
Minimum1081
Maximum15000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-09-27T11:15:20.322948image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1081
5-th percentile2792.5
Q13800
median4500
Q35477.5
95-th percentile7000
Maximum15000
Range13919
Interquartile range (IQR)1677.5

Descriptive statistics

Standard deviation1443.054453
Coefficient of variation (CV)0.3089134117
Kurtosis7.900697718
Mean4671.388158
Median Absolute Deviation (MAD)765.5
Skewness1.62359699
Sum2130153
Variance2082406.154
MonotonicityNot monotonic
2022-09-27T11:15:20.448053image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
400026
 
5.2%
450024
 
4.8%
500018
 
3.6%
600017
 
3.4%
550017
 
3.4%
480013
 
2.6%
430013
 
2.6%
300012
 
2.4%
420012
 
2.4%
380012
 
2.4%
Other values (120)292
58.4%
(Missing)44
 
8.8%
ValueCountFrequency (%)
10811
 
0.2%
11001
 
0.2%
16661
 
0.2%
17001
 
0.2%
18001
 
0.2%
21001
 
0.2%
22001
 
0.2%
22751
 
0.2%
23001
 
0.2%
24003
0.6%
ValueCountFrequency (%)
150001
 
0.2%
120002
 
0.4%
93001
 
0.2%
85002
 
0.4%
82001
 
0.2%
80006
1.2%
75003
 
0.6%
72001
 
0.2%
700011
2.2%
69561
 
0.2%

Vuositulot
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct185
Distinct (%)38.0%
Missing13
Missing (%)2.6%
Infinite0
Infinite (%)0.0%
Mean65593.46304
Minimum0
Maximum300000
Zeros2
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-09-27T11:15:20.583445image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile34020
Q149562.5
median58750
Q375000
95-th percentile123500
Maximum300000
Range300000
Interquartile range (IQR)25437.5

Descriptive statistics

Standard deviation31817.79458
Coefficient of variation (CV)0.4850756937
Kurtosis11.75121598
Mean65593.46304
Median Absolute Deviation (MAD)11750
Skewness2.645875828
Sum31944016.5
Variance1012372052
MonotonicityNot monotonic
2022-09-27T11:15:20.711992image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5500018
 
3.6%
5000018
 
3.6%
7500017
 
3.4%
6000014
 
2.8%
7000011
 
2.2%
8500011
 
2.2%
6500010
 
2.0%
6250010
 
2.0%
5400010
 
2.0%
3750010
 
2.0%
Other values (175)358
71.6%
(Missing)13
 
2.6%
ValueCountFrequency (%)
02
0.4%
40001
0.2%
61001
0.2%
75001
0.2%
137501
0.2%
140001
0.2%
200001
0.2%
220001
0.2%
225001
0.2%
250001
0.2%
ValueCountFrequency (%)
3000001
 
0.2%
2500001
 
0.2%
2200001
 
0.2%
2000004
0.8%
1900001
 
0.2%
1800001
 
0.2%
1650001
 
0.2%
1573001
 
0.2%
1550001
 
0.2%
1500001
 
0.2%

Kilpailukykyinen
Boolean

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)0.4%
Missing15
Missing (%)3.0%
Memory size4.0 KiB
True
329 
False
156 
(Missing)
 
15
ValueCountFrequency (%)
True329
65.8%
False156
31.2%
(Missing)15
 
3.0%
2022-09-27T11:15:20.834062image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Työpaikka
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct73
Distinct (%)64.6%
Missing387
Missing (%)77.4%
Memory size4.0 KiB
Gofore
12 
Vincit
 
8
Futurice
 
5
Fraktio
 
4
Mavericks
 
4
Other values (68)
80 

Length

Max length132
Median length28
Mean length10.15044248
Min length2

Characters and Unicode

Total characters1147
Distinct characters54
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique59 ?
Unique (%)52.2%

Sample

1st rowQuestrade
2nd rowDigiaj
3rd rowGofore
4th rowOura Health
5th rowWirepas

Common Values

ValueCountFrequency (%)
Gofore12
 
2.4%
Vincit8
 
1.6%
Futurice5
 
1.0%
Fraktio4
 
0.8%
Mavericks4
 
0.8%
Pankki3
 
0.6%
Siili3
 
0.6%
Arado3
 
0.6%
Qvik2
 
0.4%
KVTES-alainen kunnan omistama 2
 
0.4%
Other values (63)67
 
13.4%
(Missing)387
77.4%

Length

2022-09-27T11:15:20.931732image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gofore12
 
7.5%
vincit8
 
5.0%
mavericks6
 
3.7%
futurice5
 
3.1%
siili5
 
3.1%
fraktio4
 
2.5%
if3
 
1.9%
pankki3
 
1.9%
arado3
 
1.9%
konsulttitalo3
 
1.9%
Other values (96)109
67.7%

Most occurring characters

ValueCountFrequency (%)
i128
 
11.2%
a89
 
7.8%
o89
 
7.8%
e86
 
7.5%
t82
 
7.1%
r63
 
5.5%
n59
 
5.1%
51
 
4.4%
k49
 
4.3%
l47
 
4.1%
Other values (44)404
35.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter955
83.3%
Uppercase Letter135
 
11.8%
Space Separator51
 
4.4%
Dash Punctuation3
 
0.3%
Other Punctuation3
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i128
13.4%
a89
9.3%
o89
9.3%
e86
 
9.0%
t82
 
8.6%
r63
 
6.6%
n59
 
6.2%
k49
 
5.1%
l47
 
4.9%
u45
 
4.7%
Other values (16)218
22.8%
Uppercase Letter
ValueCountFrequency (%)
S15
 
11.1%
G15
 
11.1%
V14
 
10.4%
F10
 
7.4%
K8
 
5.9%
A7
 
5.2%
M7
 
5.2%
P6
 
4.4%
T6
 
4.4%
C6
 
4.4%
Other values (15)41
30.4%
Space Separator
ValueCountFrequency (%)
51
100.0%
Dash Punctuation
ValueCountFrequency (%)
-3
100.0%
Other Punctuation
ValueCountFrequency (%)
.3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1090
95.0%
Common57
 
5.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i128
 
11.7%
a89
 
8.2%
o89
 
8.2%
e86
 
7.9%
t82
 
7.5%
r63
 
5.8%
n59
 
5.4%
k49
 
4.5%
l47
 
4.3%
u45
 
4.1%
Other values (41)353
32.4%
Common
ValueCountFrequency (%)
51
89.5%
-3
 
5.3%
.3
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1135
99.0%
None12
 
1.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i128
 
11.3%
a89
 
7.8%
o89
 
7.8%
e86
 
7.6%
t82
 
7.2%
r63
 
5.6%
n59
 
5.2%
51
 
4.5%
k49
 
4.3%
l47
 
4.1%
Other values (42)392
34.5%
None
ValueCountFrequency (%)
ä11
91.7%
ö1
 
8.3%

Vapaa sana
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING
UNIFORM

Distinct37
Distinct (%)97.4%
Missing462
Missing (%)92.4%
Memory size4.0 KiB
palkan lisänä lounas- ja virkistysetu
 
2
it-ala 10+v koodaus 6v
 
1
Opiskelija
 
1
Teen 80% työaikaa jotta ehtisin harrastaa kaikenlaista työnteon lisäksi
 
1
Halpaa freelancer laskutusta oman tuotekehityksen sivussa
 
1
Other values (32)
32 

Length

Max length286
Median length104.5
Mean length95.57894737
Min length7

Characters and Unicode

Total characters3632
Distinct characters56
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique36 ?
Unique (%)94.7%

Sample

1st rowKuukausipalkkaan tulossa ihan juuri firman laajuinen pieni (muistaakseni 50 e) yleiskorotus + palkka nousee ainakin 2800 e/kk, kunhan valmistuisi.
2nd rowTyöskentelen toimistolla, koska täällä ei ole ketään muita. Työnantajan puolesta voisin työskennellä myös kotoa.
3rd rowpalkan lisäksi kompensaatioon kuuluu varsin runsas ja suomen it-alalla uniikki etupaketti. pelkkä palkka ei välttämättä ole kilpailukykyinen, mutta koko kompensaatio yleisesti työstäni on ehdottomasti kilpailukykyinen.
4th rowRahapalkan päälle tulee vielä kohtuullinen optiopotti, mutta se toki on lähinnä arpalippu
5th rowOsittain laskutukseen perustuva palkka joten vaihtelee.

Common Values

ValueCountFrequency (%)
palkan lisänä lounas- ja virkistysetu2
 
0.4%
it-ala 10+v koodaus 6v1
 
0.2%
Opiskelija1
 
0.2%
Teen 80% työaikaa jotta ehtisin harrastaa kaikenlaista työnteon lisäksi1
 
0.2%
Halpaa freelancer laskutusta oman tuotekehityksen sivussa1
 
0.2%
Palkka riippuu osittain firman tuloksesta, joten vaikea sanoa tarkkaan.1
 
0.2%
Vaikea vastata henkilönä joka tekee yrityksen kautta yhdelle ulkomaalaiselle yritykselle töitä (jolla ei ole entiteettiä suomessa). Vastasin nyt ikään kuin olisin yrittäjä vaikka käytännössä tämä on sama kuin olisin palkkaduunissa.1
 
0.2%
Pakettiin kuuluu reilu määrä optioita ja palkka nousee (ja laskee) firman liikevaihdon myötä.1
 
0.2%
Vaikka merkitsin, että palkkani ei ole mielestäni kilpailukykyinen, se ei tarkoita ettenkö olisi siihen tyytyväinen. Tilanne yrittäjillä ei yleensä vastaa samaa kuin palkansaajilla, joten palkka ei ole yrittäjille monestikaan niin mustavalkoinen asia vaan kysymys on isommasta kuviosta.1
 
0.2%
Kuukausipalkkaan tulossa ihan juuri firman laajuinen pieni (muistaakseni 50 e) yleiskorotus + palkka nousee ainakin 2800 e/kk, kunhan valmistuisi.1
 
0.2%
Other values (27)27
 
5.4%
(Missing)462
92.4%

Length

2022-09-27T11:15:21.059149image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ja11
 
2.4%
ei11
 
2.4%
palkka10
 
2.2%
on10
 
2.2%
mutta9
 
2.0%
ole6
 
1.3%
nyt5
 
1.1%
palkan4
 
0.9%
ihan4
 
0.9%
joten4
 
0.9%
Other values (321)383
83.8%

Most occurring characters

ValueCountFrequency (%)
422
11.6%
a383
 
10.5%
i311
 
8.6%
t284
 
7.8%
n245
 
6.7%
s237
 
6.5%
e228
 
6.3%
k206
 
5.7%
l183
 
5.0%
o169
 
4.7%
Other values (46)964
26.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3025
83.3%
Space Separator422
 
11.6%
Other Punctuation85
 
2.3%
Uppercase Letter53
 
1.5%
Decimal Number28
 
0.8%
Dash Punctuation8
 
0.2%
Close Punctuation4
 
0.1%
Open Punctuation4
 
0.1%
Math Symbol3
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a383
12.7%
i311
10.3%
t284
9.4%
n245
 
8.1%
s237
 
7.8%
e228
 
7.5%
k206
 
6.8%
l183
 
6.0%
o169
 
5.6%
u140
 
4.6%
Other values (14)639
21.1%
Uppercase Letter
ValueCountFrequency (%)
P9
17.0%
O7
13.2%
T7
13.2%
E6
11.3%
V6
11.3%
K5
9.4%
S4
7.5%
H2
 
3.8%
J2
 
3.8%
I2
 
3.8%
Other values (3)3
 
5.7%
Decimal Number
ValueCountFrequency (%)
015
53.6%
13
 
10.7%
52
 
7.1%
22
 
7.1%
82
 
7.1%
62
 
7.1%
31
 
3.6%
71
 
3.6%
Other Punctuation
ValueCountFrequency (%)
.44
51.8%
,28
32.9%
/5
 
5.9%
%4
 
4.7%
"2
 
2.4%
?2
 
2.4%
Space Separator
ValueCountFrequency (%)
422
100.0%
Dash Punctuation
ValueCountFrequency (%)
-8
100.0%
Close Punctuation
ValueCountFrequency (%)
)4
100.0%
Open Punctuation
ValueCountFrequency (%)
(4
100.0%
Math Symbol
ValueCountFrequency (%)
+3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3078
84.7%
Common554
 
15.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a383
12.4%
i311
10.1%
t284
9.2%
n245
 
8.0%
s237
 
7.7%
e228
 
7.4%
k206
 
6.7%
l183
 
5.9%
o169
 
5.5%
u140
 
4.5%
Other values (27)692
22.5%
Common
ValueCountFrequency (%)
422
76.2%
.44
 
7.9%
,28
 
5.1%
015
 
2.7%
-8
 
1.4%
/5
 
0.9%
)4
 
0.7%
(4
 
0.7%
%4
 
0.7%
13
 
0.5%
Other values (9)17
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII3479
95.8%
None153
 
4.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
422
12.1%
a383
11.0%
i311
 
8.9%
t284
 
8.2%
n245
 
7.0%
s237
 
6.8%
e228
 
6.6%
k206
 
5.9%
l183
 
5.3%
o169
 
4.9%
Other values (44)811
23.3%
None
ValueCountFrequency (%)
ä126
82.4%
ö27
 
17.6%

Kk-tulot
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct185
Distinct (%)38.0%
Missing13
Missing (%)2.6%
Infinite0
Infinite (%)0.0%
Mean5466.12192
Minimum0
Maximum25000
Zeros2
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size4.0 KiB
2022-09-27T11:15:21.324214image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2835
Q14130.208333
median4895.833333
Q36250
95-th percentile10291.66667
Maximum25000
Range25000
Interquartile range (IQR)2119.791667

Descriptive statistics

Standard deviation2651.482882
Coefficient of variation (CV)0.4850756937
Kurtosis11.75121598
Mean5466.12192
Median Absolute Deviation (MAD)979.1666667
Skewness2.645875828
Sum2662001.375
Variance7030361.474
MonotonicityNot monotonic
2022-09-27T11:15:21.459455image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4583.33333318
 
3.6%
4166.66666718
 
3.6%
625017
 
3.4%
500014
 
2.8%
5833.33333311
 
2.2%
7083.33333311
 
2.2%
5416.66666710
 
2.0%
5208.33333310
 
2.0%
450010
 
2.0%
312510
 
2.0%
Other values (175)358
71.6%
(Missing)13
 
2.6%
ValueCountFrequency (%)
02
0.4%
333.33333331
0.2%
508.33333331
0.2%
6251
0.2%
1145.8333331
0.2%
1166.6666671
0.2%
1666.6666671
0.2%
1833.3333331
0.2%
18751
0.2%
2083.3333331
0.2%
ValueCountFrequency (%)
250001
 
0.2%
20833.333331
 
0.2%
18333.333331
 
0.2%
16666.666674
0.8%
15833.333331
 
0.2%
150001
 
0.2%
137501
 
0.2%
13108.333331
 
0.2%
12916.666671
 
0.2%
125001
 
0.2%

Interactions

2022-09-27T11:15:17.065224image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:14.833372image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:15.354878image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:15.872576image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:16.397433image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:17.162033image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:14.940219image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:15.457332image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:15.974263image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:16.500402image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:17.262172image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:15.043908image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:15.561856image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:16.082128image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:16.607594image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:17.364162image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:15.149338image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:15.667063image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:16.188191image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:16.862061image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:17.466341image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:15.254463image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:15.771546image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:16.294954image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-27T11:15:16.965759image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-09-27T11:15:21.567035image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-27T11:15:21.732663image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-27T11:15:21.859905image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-27T11:15:21.992605image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-09-27T11:15:22.140384image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-27T11:15:17.639322image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-27T11:15:17.887252image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-09-27T11:15:18.117403image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-09-27T11:15:18.312983image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sanaKk-tulot
02021-02-15 11:57:08.316PK-Seutu33NaN10.0Työntekijä / palkollinen1.0Arkkitehti50/506500.083000.0TrueNaNNaN6916.666667
12021-02-15 11:57:19.676Turku33mies14.0Työntekijä / palkollinen1.0full-stackEtä5000.062500.0TrueNaNNaN5208.333333
22021-02-15 11:58:03.592PK-Seutu28mies2.0Työntekijä / palkollinen1.0Full-stack ohjelmistokehittäjäEtä2475.030000.0FalseNaNNaN2500.000000
32021-02-15 11:58:15.261Tampere33mies22.0Yrittäjä1.0web-arkkitehtiEtä4300.0100000.0TrueNaNNaN8333.333333
42021-02-15 11:58:16.983PK-Seutu28mies2.0Työntekijä / palkollinen1.0OhjelmistokehittäjäEtä3000.037500.0FalseNaNNaN3125.000000
52021-02-15 11:58:49.454PK-Seutu43mies23.0Työntekijä / palkollinen1.0OhjelmistokehittäjäToimisto8000.0100000.0TrueNaNNaN8333.333333
62021-02-15 12:00:03.771PK-Seutu33mies10.0Freelancer1.0OhjelmistokehittäjäEtä6000.0140000.0TrueNaNNaN11666.666667
72021-02-15 12:00:04.655Tampere33NaN10.0Työntekijä / palkollinen1.0OhjelmistokehittäjäToimisto4250.054000.0TrueNaNNaN4500.000000
82021-02-15 12:01:00.769Tampere33mies6.0Työntekijä / palkollinen1.0Lead developerToimisto4000.050000.0FalseNaNNaN4166.666667
92021-02-15 12:02:03.577Tallinna33mies12.0Freelancer1.0NaNEtäNaN200000.0TrueQuestradeNaN16666.666667

Last rows

TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sanaKk-tulot
4902021-02-25 21:17:36.323PK-Seutu33mies10.0Työntekijä / palkollinen1.0Full-stack ohjemistokehittäjäToimisto4600.058000.0TrueNaNNaN4833.333333
4912021-02-26 09:32:59.778Oulu48mies21.0Työntekijä / palkollinen1.0Backend-koodariEtä5000.070000.0TrueNokiaNaN5833.333333
4922021-02-26 12:16:19.696Tampere38mies15.0Työntekijä / palkollinen1.0OhjelmistosuunnittelijaToimisto4300.053750.0FalseGoforeNaN4479.166667
4932021-02-26 12:21:52.296Tampere33mies11.0Freelancer1.0frontendEtäNaN157300.0TrueNaNNaN13108.333333
4942021-02-26 12:46:37.404PK-Seutu33mies11.0Työntekijä / palkollinen1.0ArkkitehtiToimisto6500.081250.0TrueSiiliNaN6770.833333
4952021-02-26 12:47:26.116PK-Seutu33nainen3.0Työntekijä / palkollinen1.0Full-stack50/503800.0NaNFalseNaNNaNNaN
4962021-02-26 13:24:35.647PK-Seutu33miesNaNTyöntekijä / palkollinen1.0Ohjelmistokehittäjä50/50NaN75000.0TrueVincitNaN6250.000000
4972021-02-26 16:28:30.010Tampere43mies20.0Työntekijä / palkollinen1.0full-stackToimisto4800.061000.0TrueNaNNaN5083.333333
4982021-02-27 12:38:00.760Tampere33mies9.0Työntekijä / palkollinen1.0backend ja devopsEtä4270.054000.0FalseNaNNaN4500.000000
4992021-02-27 17:49:24.789Kouvola33mies2.0Työntekijä / palkollinen1.0Full-stack OhjelmistosuunnittelijaEtä2800.035000.0FalseNaNNaN2916.666667