I wonder if partkykit::ctree
function ignores variables with non syntactic names or am I missing something?
Toy example:
myData<-data.frame(
Y = factor(rep(LETTERS[1:3], each=10)),
x1 = 1:30,
x2 = c(1:10,2:11,3:12)
)
Clearly x1
is the best "predictor" of Y
:
ctree(Y~., data=myData)
Model formula:
Y ~ x1 + x2
Fitted party:
[1] root
| [2] x1 <= 10: A (n = 10, err = 0,0%)
| [3] x1 > 10
| | [4] x1 <= 20: B (n = 10, err = 0,0%)
| | [5] x1 > 20: C (n = 10, err = 0,0%)
Number of inner nodes: 2
Number of terminal nodes: 3
But when I change it's name to non syntactic one, it seems to be ignored in tree construction process:
myData<-data.frame(
Y = factor(rep(LETTERS[1:3], each=10)),
`x 1` = 1:30,
x2 = c(1:10,2:11,3:12),
check.names = F
)
ctree(Y~., data=myData)
Model formula:
Y ~ `x 1` + x2
Fitted party:
[1] root: A (n = 30, err = 66,7%)
Number of inner nodes: 0
Number of terminal nodes: 1
Can you suggest any way to overcome this behaviour ('cos I really-really-really wish to use x 1
as a name, don't ask why)?
Thanks for pointing this out. This was indeed a bug in partykit::ctree
but has been fixed now in version 1.2-11 (the current development version on R-Forge).
Furthermore, if you just want the non-syntactic label to be used in printing and plotting you can use the following quick & dirty workaround: First learn the data with nice syntactic names.
myData <- data.frame(
Y = factor(rep(LETTERS[1:3], each = 10)),
x1 = 1:30,
x2 = c(1:10, 2:11, 3:12)
)
ct <- ctree(Y ~ ., data = myData)
then after fitting the tree, change the name of the variable in the $data
stored in the tree.
names(ct$data)[2] <- "x 1"
This is then used in printing and plotting.
print(ct)
## Model formula:
## Y ~ x1 + x2
##
## Fitted party:
## [1] root
## | [2] x 1 <= 10: A (n = 10, err = 0.0%)
## | [3] x 1 > 10
## | | [4] x 1 <= 20: B (n = 10, err = 0.0%)
## | | [5] x 1 > 20: C (n = 10, err = 0.0%)
plot(ct)