I am trying to create a new model for the parsnip
package from an existing modeling function foo
.
I have followed the tutorial in building new models in parsnip and followed the README on Github, but I still cannot figure out some things.
How does the fit
function in parsnip
know how to assign its input data (e.g. a matrix) to my idiosyncratic function call?
Imagine if there was an idiosyncratic model function foo
where the conventional roles of x
and y
arguments were reversed: i.e. foo(x,y)
where x
should be an outcome vector and y
should be a predictor matrix, bizarrely.
For example: suppose a
is a matrix of predictors and b
is a vector of outcomes. Then I call fit_xy(object=my_model, x=a, y=b)
. Internally, how does fit_xy()
know to call foo(x=y,y=x)
?
The way we do this is through the set_fit()
function. Most models are pretty sensible and we can use default mappings (for example, from data
argument to data
argument or x
to x
) but you are right that some models use different norms. An example of this are the Spark models that use x
to mean what we might normally call data
with a formula
method.
The random forest set_fit()
function for Spark looks like this:
set_fit(
model = "rand_forest",
eng = "spark",
mode = "classification",
value = list(
interface = "formula",
data = c(formula = "formula", data = "x"),
protect = c("x", "formula", "type"),
func = c(pkg = "sparklyr", fun = "ml_random_forest"),
defaults = list(seed = expr(sample.int(10 ^ 5, 1)))
)
)
Notice especially the data
element of the value
argument. You can read a bit more here.