Area under the ROC curve (AUCROC) is a classification measure. By dichotomizing the range of actual
values, reg_aucroc()
turns regression evaluation into classification evaluation for any regression model. Note that the model that generates the predictions is assumed to be a regression model; however, any numeric inputs are allowed for the pred
argument, so there is no check for the nature of the source model.
Usage
reg_aucroc(
actual,
pred,
num_quants = 100,
...,
cuts = NULL,
imbalance = 0.05,
na.rm = FALSE,
sample_size = 10000,
seed = 0
)
Arguments
- actual
numeric vector. Actual label values from a dataset. They must be numeric.
- pred
numeric vector. Predictions corresponding to each respective element in
actual
.- num_quants
scalar positive integer. If
cuts
isNULL
(default),actual
will be dichotomized intoquants
quantiles and that many ROCs will be returned in therocs
element. However, ifcuts
is specified, thenquants
is ignored.- ...
Not used. Forces explicit naming of the arguments that follow.
- cuts
numeric vector. If
cuts
is provided, it overridesquants
to specify the cut points for dichotomization ofactual
for the creation ofcuts + 1
ROCs.- imbalance
numeric(1) in (0, 0.5]. The result element
mean_auc
averages the AUCs over three regions (see details of the return value).imbalance
is the supposed percentage of the less frequent class in the data. If not provided, defaults to 0.05 (5%).- na.rm
See documentation for
aucroc()
- sample_size
See documentation for
aucroc()
. In addition to those notes, forreg_aucroc()
, any sampling is conducted before the dichotomization ofactual
so that all classification ROCs are based on identical data.- seed
See documentation for
aucroc()
Value
List with the following elements:
rocs
: List of results foraucroc()
for each dichotomized segment ofactual
.auc
: named numeric vector of AUC extracted from each element ofrocs
. Named by the percentile that the AUC represents.mean_auc
: named numeric(3). The average AUC over the low, middle, and high quantiles of dichotomization:lo
: average AUC withimbalance
% (e.g., 5%) or less of the actual target values;mid
: average AUC in betweenlo
andhi
;hi
: average AUC with (1 -imbalance
)% (e.g., 95%) or more of the actual target values;
Details
The ROC data and AUCROC values are calculated with aucroc()
.
Examples
# Remove rows with missing values from airquality dataset
airq <- airquality |>
na.omit()
# Create binary version where the target variable 'Ozone' is dichotomized based on its median
airq_bin <- airq
airq_bin$Ozone <- airq_bin$Ozone >= median(airq_bin$Ozone)
# Create a generic regression model; use autogam
req_aq <- autogam::autogam(airq, 'Ozone', family = gaussian())
#> Warning: basis dimension, k, increased to minimum possible
req_aq$perf$sa_wmae_mad # Standardized accuracy for regression
#> NULL
# Create a generic classification model; use autogam
class_aq <- autogam::autogam(airq_bin, 'Ozone', family = binomial())
#> Warning: basis dimension, k, increased to minimum possible
class_aq$perf$auc # AUC (standardized accuracy for classification)
#> NULL
# Compute AUC for regression predictions
reg_auc_aq <- reg_aucroc(
airq$Ozone,
predict(req_aq)
)
# Average AUC over the lo, mid, and hi quantiles of dichotomization:
reg_auc_aq$mean_auc
#> lo mid hi
#> 0.8541380 0.9398248 0.9876410