Resolve x_cols and exclude_cols to their standardized format

Resolve x_cols and exclude_cols to their standardized format of x_cols to specify which 1D and 2D ALE elements are required. This specification is used throughout the ALE package. x_cols specifies the desired columns or interactions whereas exclude_cols optionally specifies any columns or interactions to remove from x_cols. The result is x_cols – exclude_cols, giving considerable flexibility in specifying the precise columns desired.

Usage

resolve_x_cols(x_cols, col_names, y_col, exclude_cols = NULL, silent = FALSE)

Arguments

x_cols: character, list, or formula. Columns and interactions requested in one of the special x_cols formats. x_cols variable names not found in col_names will error. See examples.
col_names: character. All the column names from a dataset. All values in x_cols must be contained among the values in col_names. For interaction terms in x_cols, e.g., "a:b", the individual variable names must be contained in col_names, e.g, c("a", "b").
y_col: character(1). The y outcome column. If found in any x_cols value, it will be silently removed.
exclude_cols: Same possible formats as x_cols. Columns and interactions to exclude from those requested in x_cols. exclude_cols values not found in col_names will be ignored with a message (which can be silenced with silent).
silent: logical(1). If TRUE, no message will be given; in particular, x_cols not found in col_names will be silently ignored. Default is FALSE. Regardless, warnings and errors are never silenced (e.g, invalid x_cols formats will still report errors).

Value

x_cols in canonical format, which is always a list with two elements, d1 and d2. Each element is a character vector with each requested column for 1D ALE (d1) or 2D ALE interaction pair (d2). If either dimension is empty, its value is an empty character, character().

See examples for details.

`x_cols` format options

The x_cols argument determines which predictor variables and interactions are included in the analysis. It supports multiple input formats:

Character vector: Users can explicitly specify 1D terms and 2D ALE interactions, e.g., c("a", "b", "a:b", "a:c").
Formula (~): Allows specifying variables and interactions in formula notation (e.g., ~ a + b + a:b), which is automatically converted into a structured format. The outcome term is optional and will be ignored regardless. So, ~ a + b + a:b produces results identical to whatever ~ a + b + a:b.
List format:
- The basic list format is a list of character vectors named d1 for 1D ALE terms, d2 for 2D interactions, or both. For example, list(d1 = c("a", "b"), d2 = c("a:b", "a:c"))
- Boolean selection for an entire dimension:
  - list(d1 = TRUE) selects all available variables for 1D ALE, excluding y_col.
  - list(d2 = TRUE) selects all possible 2D interactions among all columns in col_names, excluding y_col.
- A character vector of 1D terms only named d2_all may be used to include all 2D interactions that include the specified 1D terms. For example, specifying list(d2_all = "a") would select c("a:b", "a:c", "a:d"), etc. This is in addition to any terms requested in the d1 or d2 elements.
NULL (or unspecified): If x_cols = NULL, no variables are selected.

The function ensures all variables are valid and in col_names, providing informative messages unless silent = TRUE. And regardless of the specification format, the result will always be standardized in the format specified in the return value. Note that y_col is not removed if included in x_cols. However, a message alerts when it is included, in case it is a mistake.

Run examples for details.

Examples