Skip to contents

Create a character string that wraps appropriate variables in a dataframe with s() smooth functions. Based on the datatype of each variable, it determines whether it is a numeric variable to be smoothed:

  • Non-numeric: no smoothing.

  • Numeric: determine knots based on the number of unique values for that variable:

    • <= 4: no smoothing

    • 5 to 19 (inclusive): smooth function with knots equal to the floored half of the number of unique values. E.g., 6 unique values receive 3 knots, 7 will receive 3 knots, and 8 will receive 4 knots.

    • >= 20: smooth function with no specified number of knots, allowing the gam() function to detect the appropriate number.

Usage

smooth_formula_string(data, y_col, smooth_fun = "s", expand_parametric = TRUE)

Arguments

data

dataframe. All the variables in data except y_col will be listed in the resulting formula string. To exclude any variables, assign as data only the subset of variables desired.

y_col

character(1). Name of the y outcome variable.

smooth_fun

character(1). Function to use for smooth wraps; default is 's' for the s() function.

expand_parametric

logical(1). If TRUE (default), explicitly list each non-smooth (parametric) term. If FALSE, use . to lump together all non-smooth terms.

Value

Returns a single character string that represents a formula with y_col on the left and all other variables in data on the right, each formatted with an appropriate s() function when applicable.

Examples

smooth_formula_string(mtcars, 'mpg')
#> [1] "mpg ~ cyl + s(disp) + s(hp) + s(drat) + s(wt) + s(qsec) + vs + am + gear + s(carb,k=3)"