Resolve x_cols and exclude_cols to their standardized format of x_cols to specify which 1D and 2D ALE elements are required. This specification is used throughout the ALE package. x_cols specifies the desired columns or interactions whereas exclude_cols optionally specifies any columns or interactions to remove from x_cols. The result is x_cols – exclude_cols, giving considerable flexibility in specifying the precise columns desired.
Arguments
- x_cols
character, list, or formula. Columns and interactions requested in one of the special
x_colsformats.x_colsvariable names not found incol_nameswill error. See examples.- col_names
character. All the column names from a dataset. All values in
x_colsmust be contained among the values incol_names. For interaction terms inx_cols, e.g.,"a:b", the individual variable names must be contained incol_names, e.g,c("a", "b").- y_col
character(1). The y outcome column. If found in any
x_colsvalue, it will be silently removed.- exclude_cols
Same possible formats as
x_cols. Columns and interactions to exclude from those requested inx_cols.exclude_colsvalues not found incol_nameswill be ignored with a message (which can be silenced withsilent).- silent
logical(1). If
TRUE, no message will be given; in particular,x_colsnot found incol_nameswill be silently ignored. Default isFALSE. Regardless, warnings and errors are never silenced (e.g, invalidx_colsformats will still report errors).
Value
x_cols in canonical format, which is always a list with two elements, d1 and d2. Each element is a character vector with each requested column for 1D ALE (d1) or 2D ALE interaction pair (d2). If either dimension is empty, its value is an empty character, character().
See examples for details.
x_cols format options
The x_cols argument determines which predictor variables and interactions are included in the analysis. It supports multiple input formats:
Character vector: Users can explicitly specify 1D terms and 2D ALE interactions, e.g.,
c("a", "b", "a:b", "a:c").Formula (
~): Allows specifying variables and interactions in formula notation (e.g.,~ a + b + a:b), which is automatically converted into a structured format. The outcome term is optional and will be ignored regardless. So,~ a + b + a:bproduces results identical towhatever ~ a + b + a:b.List format:
The basic list format is a list of character vectors named
d1for 1D ALE terms,d2for 2D interactions, or both. For example,list(d1 = c("a", "b"), d2 = c("a:b", "a:c"))Boolean selection for an entire dimension:
list(d1 = TRUE)selects all available variables for 1D ALE, excludingy_col.list(d2 = TRUE)selects all possible 2D interactions among all columns incol_names, excludingy_col.
A character vector of 1D terms only named
d2_allmay be used to include all 2D interactions that include the specified 1D terms. For example, specifyinglist(d2_all = "a")would selectc("a:b", "a:c", "a:d"), etc. This is in addition to any terms requested in thed1ord2elements.
NULL (or unspecified): If
x_cols = NULL, no variables are selected.
The function ensures all variables are valid and in col_names, providing informative messages unless silent = TRUE. And regardless of the specification format, the result will always be standardized in the format specified in the return value. Note that y_col is not removed if included in x_cols. However, a message alerts when it is included, in case it is a mistake.
Run examples for details.
Examples
## Sample data
set.seed(0)
df <- data.frame(
y = runif(10),
a = sample(letters[1:3], 10, replace = TRUE),
b = rnorm(10),
c = sample(1:5, 10, replace = TRUE)
)
col_names <- names(df)
y_col <- "y" # Assume 'y' is the outcome variable
## Examples with just x_cols to show different formats for specifying x_cols
## (same format for exclude_cols)
# Character vector: Simple ALE with no interactions
resolve_x_cols(c("a", "b"), col_names, y_col)
#> $d1
#> [1] "a" "b"
#>
#> $d2
#> character(0)
#>
# Character string: Select just one 1D element
resolve_x_cols("c", col_names, y_col)
#> $d1
#> [1] "c"
#>
#> $d2
#> character(0)
#>
# list of 1- and 2-length character vectors: specify precise 1D and 2D elements desired
resolve_x_cols(c('a:b', "c", 'c:a', "b"), col_names, y_col)
#> $d1
#> [1] "c" "b"
#>
#> $d2
#> [1] "a:b" "c:a"
#>
# Formula: Converts to a list of individual elements
resolve_x_cols(~ a + b, col_names, y_col)
#> $d1
#> [1] "a" "b"
#>
#> $d2
#> character(0)
#>
# Formula with interactions (1D and 2D).
# This format is probably more convenient if you know precisely which terms you want.
# Note that the outcome on the left-hand-side is always silently ignored.
resolve_x_cols(whatever ~ a + b + a:b + c:b, col_names, y_col)
#> $d1
#> [1] "a" "b"
#>
#> $d2
#> [1] "a:b" "b:c"
#>
# List specifying d1 (1D ALE)
resolve_x_cols(list(d1 = c("a", "b")), col_names, y_col)
#> $d1
#> [1] "a" "b"
#>
#> $d2
#> character(0)
#>
# List specifying d2 (2D ALE)
resolve_x_cols(list(d2 = 'a:b'), col_names, y_col)
#> $d1
#> character(0)
#>
#> $d2
#> [1] "a:b"
#>
# List specifying both d1 and d2
resolve_x_cols(list(d1 = c("a", "b"), d2 = 'a:b'), col_names, y_col)
#> $d1
#> [1] "a" "b"
#>
#> $d2
#> [1] "a:b"
#>
# d1 as TRUE (select all columns except y_col)
resolve_x_cols(list(d1 = TRUE), col_names, y_col)
#> $d1
#> [1] "a" "b" "c"
#>
#> $d2
#> character(0)
#>
# d2 as TRUE (select all possible 2D interactions)
resolve_x_cols(list(d2 = TRUE), col_names, y_col)
#> $d1
#> character(0)
#>
#> $d2
#> [1] "a:b" "a:c" "b:c"
#>
# d2_all: Request all 2D interactions involving a specific variable
resolve_x_cols(list(d2_all = "a"), col_names, y_col)
#> $d1
#> character(0)
#>
#> $d2
#> [1] "a:b" "a:c"
#>
# NULL: No variables selected
resolve_x_cols(NULL, col_names, y_col)
#> $d1
#> character(0)
#>
#> $d2
#> character(0)
#>
## Examples of how exclude_cols are removed from x_cols to obtain various desired results
# Exclude one column from a simple character vector
resolve_x_cols(
x_cols = c("a", "b", "c"),
col_names = col_names,
y_col = y_col,
exclude_cols = "b"
)
#> $d1
#> [1] "a" "c"
#>
#> $d2
#> character(0)
#>
# Exclude multiple columns
resolve_x_cols(
x_cols = c("a", "b", "c"),
col_names = col_names,
y_col = y_col,
exclude_cols = c("a", "c")
)
#> $d1
#> [1] "b"
#>
#> $d2
#> character(0)
#>
# Exclude an interaction term from a formula input
resolve_x_cols(
x_cols = ~ a + b + a:b,
col_names = col_names,
y_col = y_col,
exclude_cols = ~ a:b
)
#> $d1
#> [1] "a" "b"
#>
#> $d2
#> character(0)
#>
# Exclude all columns from x_cols
resolve_x_cols(
x_cols = c("a", "b", "c"),
col_names = col_names,
y_col = y_col,
exclude_cols = c("a", "b", "c")
)
#> $d1
#> character(0)
#>
#> $d2
#> character(0)
#>
# Exclude non-existent columns (should be ignored)
resolve_x_cols(
x_cols = c("a", "b"),
col_names = col_names,
y_col = y_col,
exclude_cols = "z"
)
#> ℹ The following columns in exclude_cols were not found in `col_names`: z
#> $d1
#> [1] "a" "b"
#>
#> $d2
#> character(0)
#>
# Exclude one column from a list-based input
resolve_x_cols(
x_cols = list(d1 = c("a", "b"), d2 = c("a:b", "a:c")),
col_names = col_names,
y_col = y_col,
exclude_cols = list(d1 = "a")
)
#> $d1
#> [1] "b"
#>
#> $d2
#> [1] "a:b" "a:c"
#>
# Exclude interactions only
resolve_x_cols(
x_cols = list(d1 = c("a", "b", "c"), d2 = c("a:b", "a:c")),
col_names = col_names,
y_col = y_col,
exclude_cols = list(d2 = 'a:b')
)
#> $d1
#> [1] "a" "b" "c"
#>
#> $d2
#> [1] "a:c"
#>
# Exclude everything, including interactions
resolve_x_cols(
x_cols = list(d1 = c("a", "b", "c"), d2 = c("a:b", "a:c")),
col_names = col_names,
y_col = y_col,
exclude_cols = list(d1 = c("a", "b", "c"), d2 = c("a:b", "a:c"))
)
#> $d1
#> character(0)
#>
#> $d2
#> character(0)
#>
# Exclude a column implicitly removed by y_col
resolve_x_cols(
x_cols = c("y", "a", "b"),
col_names = col_names,
y_col = "y",
exclude_cols = "a"
)
#> ℹ `y_col` (y) was requested in x_cols.
#> $d1
#> [1] "y" "b"
#>
#> $d2
#> character(0)
#>
# Exclude entire 2D dimension from x_cols with d2 = TRUE
resolve_x_cols(
x_cols = list(d1 = TRUE, d2 = c("a:b", "a:c")),
col_names = col_names,
y_col = y_col,
exclude_cols = list(d1 = c("a"), d2 = TRUE)
)
#> $d1
#> [1] "b" "c"
#>
#> $d2
#> character(0)
#>
