When plotting a ctree
model from partykit, I understand that it choose a default to prevent overfitting with overgrown trees. This default value sometimes results in an overly simple tree. To use a post-pruning technique I want to make an overfitting tree, potentially full-grown, using ctree and then work on the pruning later. Try many different things but my code is getting an error.
This stack overflow answer on using all variables to make the tree is not what I want. I don't necessarily want all variables, but I want maximum depth for a tree to go as overgrown as possible.
Basically, how to have the tree go as many depths as possible?
See code and output below:
treemodel <- ctree(Species ~ ., iris)
plot(treemodel)
And I use the Help + documentation from the package but don't see a lot of options to customize this. Promising one is the control parameter, but the documentation isn't very detailed. From searching on other forums, I gave the following a try:
treemodel <- ctree(Species ~ ., iris, control=mincriterion)
I also tried:
treemodel <- ctree(Species ~ ., iris, control="mincriterion")
But both code throws an error. The error:
Error in if (sum(weights) < ctrl$minsplit) return(partynode(as.integer(id))) : argument is of length zero
I am using partykit 1.1-1 and r on mac os.
ctree
from partykit
accepts a ctree_control
parameter through the control
argument that you can use to control aspects of the tree fit.
Doing control=mincriterion
or control="mincriterion"
is not correct and hence you get an error. control
expects a list with control parameters, not a character value.
In particular, you want to pass into ctree_control
the following:
mincriterion
: Act as a "regulator" for the depth of the tree,
smaller values result in larger trees; When mincriterion is 0.8,
p-value must be smaller than 0.2 in order for a node to splitminsplit
and minbucket
: Set to 0 so the minimum criterion is
always met and thus splitting never stopFrom the package's author itself:
A split is implemented when the criterion exceeds the value given by mincriterion as specified in ctree_control. For example, when mincriterion = 0.95, the p-value must be smaller than 0.05 in order to split this node. This statistical approach ensures that the right-sized tree is grown without additional (post-)pruning or cross-validation
So with that, the final code using control=ctree_control()
:
diab_model <- ctree(diabetes ~ ., diab_train, control = ctree_control(mincriterion=0.005, minsplit=0, minbucket=0))
plot(diab_model)
The first line of code creates your decision tree by overriding the defaults, and the second line of code plots the ctree
object. You'll get a fully grown tree with maximum depth. Experiment with the values of mincriterion
, minsplit
, and minbucket
. They can also be treated as a hyperparameter. Here's the output of plot(diab_model)