Area under the ROC curve (AUCROC) is a classification measure. By dichotomizing the range of actual values, reg_aucroc() turns regression evaluation into classification evaluation for any regression model. Note that the model that generates the predictions is assumed to be a regression model; however, any numeric inputs are allowed for the pred argument, so there is no check for the nature of the source model.
Usage
reg_aucroc(
actual,
pred,
num_quants = 100,
...,
cuts = NULL,
imbalance = 0.05,
na.rm = FALSE,
sample_size = 10000,
seed = 0
)Arguments
- actual
numeric vector. Actual label values from a dataset. They must be numeric.
- pred
numeric vector. Predictions corresponding to each respective element in
actual.- num_quants
scalar positive integer. If
cutsisNULL(default),actualwill be dichotomized intoquantsquantiles and that many ROCs will be returned in therocselement. However, ifcutsis specified, thenquantsis ignored.- ...
Not used. Forces explicit naming of the arguments that follow.
- cuts
numeric vector. If
cutsis provided, it overridesquantsto specify the cut points for dichotomization ofactualfor the creation ofcuts + 1ROCs.- imbalance
numeric(1) in (0, 0.5]. The result element
mean_aucaverages the AUCs over three regions (see details of the return value).imbalanceis the supposed percentage of the less frequent class in the data. If not provided, defaults to 0.05 (5%).- na.rm
See documentation for
aucroc()- sample_size
See documentation for
aucroc(). In addition to those notes, forreg_aucroc(), any sampling is conducted before the dichotomization ofactualso that all classification ROCs are based on identical data.- seed
See documentation for
aucroc()
Value
List with the following elements:
rocs: List of results foraucroc()for each dichotomized segment ofactual.auc: named numeric vector of AUC extracted from each element ofrocs. Named by the percentile that the AUC represents.mean_auc: named numeric(3). The average AUC over the low, middle, and high quantiles of dichotomization:lo: average AUC withimbalance% (e.g., 5%) or less of the actual target values;mid: average AUC in betweenloandhi;hi: average AUC with (1 -imbalance)% (e.g., 95%) or more of the actual target values;
Details
The ROC data and AUCROC values are calculated with aucroc().
Examples
# Remove rows with missing values from airquality dataset
airq <- airquality |>
na.omit()
# Create binary version where the target variable 'Ozone' is dichotomized based on its median
airq_bin <- airq
airq_bin$Ozone <- airq_bin$Ozone >= median(airq_bin$Ozone)
# Create a generic regression model; use autogam
req_aq <- autogam::autogam(airq, 'Ozone', family = gaussian())
#> Warning: basis dimension, k, increased to minimum possible
req_aq$perf$sa_wmae_mad # Standardized accuracy for regression
#> NULL
# Create a generic classification model; use autogam
class_aq <- autogam::autogam(airq_bin, 'Ozone', family = binomial())
#> Warning: basis dimension, k, increased to minimum possible
class_aq$perf$auc # AUC (standardized accuracy for classification)
#> NULL
# Compute AUC for regression predictions
reg_auc_aq <- reg_aucroc(
airq$Ozone,
predict(req_aq)
)
# Average AUC over the lo, mid, and hi quantiles of dichotomization:
reg_auc_aq$mean_auc
#> lo mid hi
#> 0.8541380 0.9398248 0.9876410