Statistical tests for the differences between standardized accuracies (staccuracies)
Source:R/staccuracy.R
sa_diff.Rd
Because the distribution of staccuracies is uncertain (and indeed, different staccuracies likely have different distributions), bootstrapping is used to empirically estimate the distributions and calculate the p-values. See the return value description for details on what the function provides.
Usage
sa_diff(
actual,
preds,
...,
na.rm = FALSE,
sa = NULL,
pct = c(0.01, 0.02, 0.03, 0.04, 0.05),
boot_alpha = 0.05,
boot_it = 1000,
seed = 0
)
Arguments
- actual
numeric vector. The actual (true) labels.
- preds
named list of at least two numeric vectors. Each element is a vector of the same length as actual with predictions for each row corresponding to each element of actual. The names of the list elements should be the names of the models that produced each respective prediction; these names will be used to distinguish the results.
- ...
not used. Forces explicit naming of subsequent arguments.
- na.rm
See documentation for
staccuracy()
- sa
list of functions. Each element is the unquoted name of a valid staccuracy function (see
staccuracy()
for the required function signature.) If an element is named, the name will be displayed as the value of thesa
column of the result. Otherwise, the function name will be displayed. If NULL (default), staccuracy functions will be automatically selected based on the datatypes of actual andpreds
.- pct
numeric with values from (0, 1). The percentage values on which the difference in staccuracies will be tested.
- boot_alpha
numeric(1) from 0 to 1. Alpha for percentile-based confidence interval range for the bootstrapped means; the bootstrap confidence intervals will be the lowest and highest
(1 - 0.05) / 2
percentiles. For example, ifboot_alpha = 0.05
(default), the intervals will be at the 2.5 and 97.5 percentiles.- boot_it
positive integer(1). The number of bootstrap iterations.
- seed
integer(1). Random seed for the bootstrap sampling. Supply this between runs to assure identical results.
Value
tibble with staccuracy difference results:
staccuracy
: name of staccuracy measurepred
: Each named element (model name) in the inputpreds
. The row values give the staccuracy for that prediction. Whenpred
isNA
, the row represents the difference between prediction staccuracies (diff
) instead of staccuracies themselves.diff
: Whendiff
takes the form 'model1-model2', then the row values give the difference in staccuracies between two named elements (model names) in the inputpreds
. Whendiff
isNA
, the row instead represents the staccuracy of a specific model prediction (pred
).lo
,mean
,hi
: The lower bound, mean, and upper bound of the bootstrapped staccuracy. The lower and upper bounds are confidence intervals specified by the inputboot_alpha
.p__
: p-values that the difference in staccuracies are at least the specified percentage amount or greater. E.g., for the default inputpct = c(0.01, 0.02, 0.03, 0.04, 0.05)
, these columns would bep01
,p02
,p03
,p04
, andp05
. As they apply only to differences between staccuracies, they are provided only fordiff
rows and areNA
forpred
rows. As an example of their meaning, if themean
difference for 'model1-model2' is 0.0832 withp01
of 0.012 andp02
of 0.035, then 1.2% of bootstrapped staccuracies had a model1 - model2 difference of less than 0.01 and 3.5% were less than 0.02. (That is, 98.8% of differences were greater than 0.01 and 96.5% were greater than 0.02.)
Examples
lm_attitude_all <- lm(rating ~ ., data = attitude)
lm_attitude__a <- lm(rating ~ . - advance, data = attitude)
lm_attitude__c <- lm(rating ~ . - complaints, data = attitude)
sdf <- sa_diff(
attitude$rating,
list(
all = predict(lm_attitude_all),
madv = predict(lm_attitude__a),
mcmp = predict(lm_attitude__c)
),
boot_it = 10
)
sdf
#> # A tibble: 12 × 11
#> staccuracy pred diff lo mean hi p01 p02 p03
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 WinMAE on MAD all NA 0.672 0.719 0.776 NA NA NA
#> 2 WinMAE on MAD madv NA 0.640 0.705 0.767 NA NA NA
#> 3 WinMAE on MAD mcmp NA 0.586 0.635 0.692 NA NA NA
#> 4 WinMAE on MAD NA all-madv -0.00660 0.0139 0.0369 0.455 0.727 0.818
#> 5 WinMAE on MAD NA all-mcmp 0.0440 0.0840 0.133 0.0909 0.0909 0.0909
#> 6 WinMAE on MAD NA madv-mcmp 0.0291 0.0702 0.122 0.0909 0.0909 0.182
#> 7 WinRMSE on SD all NA 0.684 0.737 0.781 NA NA NA
#> 8 WinRMSE on SD madv NA 0.670 0.732 0.782 NA NA NA
#> 9 WinRMSE on SD mcmp NA 0.616 0.670 0.723 NA NA NA
#> 10 WinRMSE on SD NA all-madv -0.00781 0.00529 0.0272 0.636 0.909 0.909
#> 11 WinRMSE on SD NA all-mcmp 0.0335 0.0666 0.107 0.0909 0.0909 0.182
#> 12 WinRMSE on SD NA madv-mcmp 0.0273 0.0613 0.108 0.0909 0.0909 0.182
#> # ℹ 2 more variables: p04 <dbl>, p05 <dbl>