Returns the area under the ROC curve based on comparing the predicted scores to the actual binary values. Tied predictions are handled by calculating the optimistic AUC (positive cases sorted first, resulting in higher AUC) and the pessimistic AUC (positive cases sorted last, resulting in lower AUC) and then returning the average of the two. For the ROC, a "tie" means at least one pair of pred predictions whose value is identical yet their corresponding values of actual are different. (If the value of actual are the same for identical predictions, then these are unproblematic and are not considered "ties".)
Usage
aucroc(
actual,
pred,
na.rm = FALSE,
binary_true_value = NULL,
sample_size = 10000,
seed = 0
)Arguments
- actual
any atomic vector. Actual label values from a dataset. They must be binary; that is, there must be exactly two distinct values (other than missing values, which are allowed). The "true" or "positive" class is determined by coercing
actualto logicalTRUEandFALSEfollowing the rules ofas.logical(). If this is not the intended meaning of "positive", then specify which of the two values should be consideredTRUEwith the argumentbinary_true_value.- pred
numeric vector. Predictions corresponding to each respective element in
actual. Any numeric value (not only probabilities) are permissible.- na.rm
logical(1).
TRUEif missing values should be removed;FALSEif they should be retained. IfTRUE, then if any element of eitheractualorpredis missing, its paired element will be also removed.- binary_true_value
any single atomic value. The value of
actualthat is consideredTRUE; any other value ofactualis consideredFALSE. For example, if2meansTRUEand1meansFALSE, then setbinary_true_value = 2.- sample_size
single positive integer. To keep the computation relatively rapid, when
actualandpredare longer thansample_sizeelements, then a random sample ofsample_sizeofactualandpredwill be selected and the ROC and AUC will be calculated on this sample. To disable random sampling for long inputs, setsample_size = NA.- seed
numeric(1). Random seed used only if
length(actual) > sample_size.
Value
List with the following elements:
roc_opt: tibble with optimistic ROC data. "Optimistic" means that when predictions are tied, the TRUE/positive actual values are ordered before the FALSE/negative ones.roc_pess: tibble with pessimistic ROC data. "Pessimistic" means that when predictions are tied, the FALSE/negative actual values are ordered before the TRUE/positive ones. Note that this difference is not merely in the sort order: when there are ties, the way that true positives, true negatives, etc. are counted is different for optimistic and pessimistic approaches. If there are no tied predictions, thenroc_optandroc_pessare identical.auc_opt: area under the ROC curve for optimistic ROC.auc_pess: area under the ROC curve for pessimistic ROC.auc: mean ofauc_optandauc_pess. If there are no tied predictions, thenauc_opt,auc_pess, andaucare identical.ties:TRUEif there are two or more tied predictions;FALSEif there are no ties.