rgbm

Extracting Model from GBM in R


is anyone familiar with how to figure out what's going on inside a gbm model in R?

Let's say we wanted to see how to predict the Petal.Length in iris. Just to keep it simple I ran:

tg=gbm(Petal.Length~.,data=iris)

This works and when you run:

summary(tg)

Then you get:

Hit <Return> to see next plot: 
                      var rel.inf
Petal.Width   Petal.Width   67.39
Species           Species   32.61
Sepal.Length Sepal.Length    0.00
Sepal.Width   Sepal.Width    0.00

This makes sense intuitively. When you run pretty.gbm.tree(tg) You get:

  SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight    Prediction
0        2  0.8000000000        1         2           3       184.6764     75  0.0001366667
1       -1 -0.0022989091       -1        -1          -1         0.0000     22 -0.0022989091
2       -1  0.0011476604       -1        -1          -1         0.0000     53  0.0011476604
3       -1  0.0001366667       -1        -1          -1         0.0000     75  0.0001366667

So clearly gbm thinks that you split by Variable #2 and get back three separate regressions. I assume that SplitVar==2 is Petal.Width since the order you see in str(iris) makes sense.

But what data is missing? iris has no missing data. And then how do we see what is going on in each of the three nodes that were created?

Let's say I wanted to code this up in C++ for production, I don't get how one would know what to code beyond knowing that you should do something differently depending on if Petal.Width >.8.

Thanks,

Josh


Solution

  • See the function gbm2sas in the package mlmeta, which uses metaprogramming to convert the R object to SAS format.

    The SAS format is similar to C++, so it is both easy to read and easy hack to C++.