Winsorization means truncating the extremes of a numeric range by replacing extreme values with a predetermined minimum and maximum. winsorize()
returns the input vector values with values less than or greater than the provided minimum or maximum replaced by the provided minimum or maximum, respectively.
win_mae()
and win_rmse()
return MAE and RMSE respectively with winsorized predictions. The fundamental idea underlying the winsorization of predictions is that if the actual data has well-defined bounds, then models should not be penalized for being overzealous in predicting beyond the extremes of the data. Models that are overzealous in the boundaries might sometimes be superior within normal ranges; the extremes can be easily corrected by winsorization.
Arguments
- x
numeric vector.
- win_range
numeric(2). The minimum and maximum allowable values for the
pred
predictions or forx
. For functions withpred
,win_range
defaults to the minimum and maximum values of the providedactual
values. For functions withx
, there is no default.- actual
numeric vector. Actual (true) values of target outcome data.
- pred
numeric vector. Predictions corresponding to each respective element in
actual
.- na.rm
logical(1).
TRUE
if missing values should be removed;FALSE
if they should be retained. IfTRUE
, then if any element of eitheractual
orpred
is missing, its paired element will be also removed.
Value
winsorize()
returns a winsorized vector.
win_mae()
returns the mean absolute error (MAE) of winsorized predicted values pred
compared to the actual
values. See mae()
for details.
win_rmse()
returns the root mean squared error (RMSE) of winsorized predicted values pred
compared to the actual
values. See rmse()
for details.
Examples
a <- c(3, 5, 2, 7, 9, 4, 6, 8, 2, 10)
p <- c(2.5, 5.5, 1.5, 6.5, 10.5, 3.5, 6, 7.5, 0.5, 11.5)
a # the original data
#> [1] 3 5 2 7 9 4 6 8 2 10
winsorize(a, c(2, 8)) # a winsorized on defined boundaries
#> [1] 3 5 2 7 8 4 6 8 2 8
# range of the original data
a
#> [1] 3 5 2 7 9 4 6 8 2 10
range(a)
#> [1] 2 10
# some overzealous predictions
p
#> [1] 2.5 5.5 1.5 6.5 10.5 3.5 6.0 7.5 0.5 11.5
range(p)
#> [1] 0.5 11.5
# MAE penalizes overzealous predictions
mae(a, p)
#> [1] 0.75
# Winsorized MAE forgives overzealous predictions
win_mae(a, p)
#> [1] 0.35
# RMSE penalizes overzealous predictions
rmse(a, p)
#> [1] 0.9082951
# Winsorized RMSE forgives overzealous predictions
win_rmse(a, p)
#> [1] 0.4743416