Returns the area under the ROC curve based on comparing the predicted scores to the actual binary values. Tied predictions are handled by calculating the optimistic AUC (positive cases sorted first, resulting in higher AUC) and the pessimistic AUC (positive cases sorted last, resulting in lower AUC) and then returning the average of the two. For the ROC, a "tie" means at least one pair of pred
predictions whose value is identical yet their corresponding values of actual
are different. (If the value of actual
are the same for identical predictions, then these are unproblematic and are not considered "ties".)
Usage
aucroc(
actual,
pred,
na.rm = FALSE,
binary_true_value = NULL,
sample_size = 10000,
seed = 0
)
Arguments
- actual
any atomic vector. Actual label values from a dataset. They must be binary; that is, there must be exactly two distinct values (other than missing values, which are allowed). The "true" or "positive" class is determined by coercing
actual
to logicalTRUE
andFALSE
following the rules ofas.logical()
. If this is not the intended meaning of "positive", then specify which of the two values should be consideredTRUE
with the argumentbinary_true_value
.- pred
numeric vector. Predictions corresponding to each respective element in
actual
. Any numeric value (not only probabilities) are permissible.- na.rm
logical(1).
TRUE
if missing values should be removed;FALSE
if they should be retained. IfTRUE
, then if any element of eitheractual
orpred
is missing, its paired element will be also removed.- binary_true_value
any single atomic value. The value of
actual
that is consideredTRUE
; any other value ofactual
is consideredFALSE
. For example, if2
meansTRUE
and1
meansFALSE
, then setbinary_true_value = 2
.- sample_size
single positive integer. To keep the computation relatively rapid, when
actual
andpred
are longer thansample_size
elements, then a random sample ofsample_size
ofactual
andpred
will be selected and the ROC and AUC will be calculated on this sample. To disable random sampling for long inputs, setsample_size = NA
.- seed
numeric(1). Random seed used only if
length(actual) > sample_size
.
Value
List with the following elements:
roc_opt
: tibble with optimistic ROC data. "Optimistic" means that when predictions are tied, the TRUE/positive actual values are ordered before the FALSE/negative ones.roc_pess
: tibble with pessimistic ROC data. "Pessimistic" means that when predictions are tied, the FALSE/negative actual values are ordered before the TRUE/positive ones. Note that this difference is not merely in the sort order: when there are ties, the way that true positives, true negatives, etc. are counted is different for optimistic and pessimistic approaches. If there are no tied predictions, thenroc_opt
androc_pess
are identical.auc_opt
: area under the ROC curve for optimistic ROC.auc_pess
: area under the ROC curve for pessimistic ROC.auc
: mean ofauc_opt
andauc_pess
. If there are no tied predictions, thenauc_opt
,auc_pess
, andauc
are identical.ties
:TRUE
if there are two or more tied predictions;FALSE
if there are no ties.