When plotting the first tree from a regression using create_tree_digraph
, the leaf values make
no sense to me. For example:
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)
import lightgbm as lgb
data = lgb.Dataset(X, label=y)
bst = lgb.train({}, data, num_boost_round=1)
lgb.create_tree_digraph(bst)
Gives the following tree:
Focusing on leaf 3, for example, it seems like these are the fitted values:
bst.predict(X, num_iteration=0)[X[:,5]>7.437]
array([24.78919238, 24.78919238, 24.78919238, 24.78919238, 24.78919238,
24.78919238, 24.78919238, 24.78919238, 24.78919238, 24.78919238,
24.78919238, 24.78919238, 24.78919238, 24.78919238, 24.78919238,
24.78919238, 24.78919238, 24.78919238, 24.78919238, 24.78919238,
24.78919238, 24.78919238, 24.78919238, 24.78919238, 24.78919238,
24.78919238, 24.78919238, 24.78919238, 24.78919238, 24.78919238])
But these seem like terrible predictions compared to the obvious and trivial method of taking the mean:
y[X[:,5]>7.437]
array([38.7, 43.8, 50. , 50. , 50. , 50. , 39.8, 50. , 50. , 42.3, 48.5,
50. , 44.8, 50. , 37.6, 46.7, 41.7, 48.3, 42.8, 44. , 50. , 43.1,
48.8, 50. , 43.5, 35.2, 45.4, 46. , 50. , 21.9])
y[X[:,5]>7.437].mean()
45.09666666666667
What am I missing here?
LightGBM's leaf node output values show the prediction from that leaf node, which includes multiplying by the learning rate.
The default learning rate is 0.1
(https://lightgbm.readthedocs.io/en/latest/Parameters.html#learning_rate). If you change it to 1.0
, you should see that the the output value for leaf 3 is 45.097
(exactly the mean of y
for all observations that fall into that leaf node).
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)
import lightgbm as lgb
data = lgb.Dataset(X, label=y)
bst = lgb.train({"learning_rate": 1.0}, data, num_boost_round=1)
lgb.create_tree_digraph(bst)
Similarly, if you set the learning_rate
to something very very very small, you should see that most of the leaf nodes from the first tree will have values very similar to the global mean of y
. The global mean of y
(y.mean()
) in your example data is 22.532
.
bst = lgb.train({"learning_rate": 0.0000000000001}, data, num_boost_round=1)
lgb.create_tree_digraph(bst)
I don't recommend setting learning_rate=1.0
in practice, as it can lead to worse accuracy. For gradient boosting libraries like LightGBM, it's preferred to use a learning rate < 1.0
and higher num_boost_round
(try 100
) , so that each individual tree only has a limited impact on the final prediction.
If you do that, you'll find that each subsequent tree added to the model should add a small incremental improvement in accuracy. This is what happened in your original example. The global mean of y
(y.mean()
) in your example data is 22.532
. For a group of records with local mean 45.097
and with learning rate set to 0.1
, the first tree predicted 24.789
. Not a great prediction by itself, but a better prediction for that group than the global mean.