rtidymodelsr-parsnip

how does parsnip know how to match `fit` arguments to function arguments for a model?


I am trying to create a new model for the parsnip package from an existing modeling function foo.

I have followed the tutorial in building new models in parsnip and followed the README on Github, but I still cannot figure out some things.

How does the fit function in parsnip know how to assign its input data (e.g. a matrix) to my idiosyncratic function call?

Imagine if there was an idiosyncratic model function foo where the conventional roles of x and y arguments were reversed: i.e. foo(x,y) where x should be an outcome vector and y should be a predictor matrix, bizarrely.

For example: suppose a is a matrix of predictors and b is a vector of outcomes. Then I call fit_xy(object=my_model, x=a, y=b). Internally, how does fit_xy() know to call foo(x=y,y=x) ?


Solution

  • The way we do this is through the set_fit() function. Most models are pretty sensible and we can use default mappings (for example, from data argument to data argument or x to x) but you are right that some models use different norms. An example of this are the Spark models that use x to mean what we might normally call data with a formula method.

    The random forest set_fit() function for Spark looks like this:

    set_fit(
      model = "rand_forest",
      eng = "spark",
      mode = "classification",
      value = list(
        interface = "formula",
        data = c(formula = "formula", data = "x"),
        protect = c("x", "formula", "type"),
        func = c(pkg = "sparklyr", fun = "ml_random_forest"),
        defaults = list(seed = expr(sample.int(10 ^ 5, 1)))
      )
    )
    

    Notice especially the data element of the value argument. You can read a bit more here.