rpartyctree

ctree ignores variables with non syntactic names?


I wonder if partkykit::ctree function ignores variables with non syntactic names or am I missing something?

Toy example:

myData<-data.frame(
   Y = factor(rep(LETTERS[1:3], each=10)),
   x1 = 1:30,
   x2 = c(1:10,2:11,3:12)
 )

Clearly x1 is the best "predictor" of Y:

ctree(Y~., data=myData)

Model formula:
Y ~ x1 + x2

Fitted party:
[1] root
|   [2] x1 <= 10: A (n = 10, err = 0,0%)
|   [3] x1 > 10
|   |   [4] x1 <= 20: B (n = 10, err = 0,0%)
|   |   [5] x1 > 20: C (n = 10, err = 0,0%)

Number of inner nodes:    2
Number of terminal nodes: 3

But when I change it's name to non syntactic one, it seems to be ignored in tree construction process:

 myData<-data.frame(
   Y = factor(rep(LETTERS[1:3], each=10)),
   `x 1` = 1:30,
   x2 = c(1:10,2:11,3:12),
   check.names = F
 )
 
ctree(Y~., data=myData)

Model formula:
Y ~ `x 1` + x2

Fitted party:
[1] root: A (n = 30, err = 66,7%) 

Number of inner nodes:    0
Number of terminal nodes: 1

Can you suggest any way to overcome this behaviour ('cos I really-really-really wish to use x 1 as a name, don't ask why)?


Solution

  • Thanks for pointing this out. This was indeed a bug in partykit::ctree but has been fixed now in version 1.2-11 (the current development version on R-Forge).

    Furthermore, if you just want the non-syntactic label to be used in printing and plotting you can use the following quick & dirty workaround: First learn the data with nice syntactic names.

    myData <- data.frame(
      Y = factor(rep(LETTERS[1:3], each = 10)),
      x1 = 1:30,
      x2 = c(1:10, 2:11, 3:12)
    )
    ct <- ctree(Y ~ ., data = myData)
    

    then after fitting the tree, change the name of the variable in the $data stored in the tree.

    names(ct$data)[2] <- "x 1"
    

    This is then used in printing and plotting.

    print(ct)
    ## Model formula:
    ## Y ~ x1 + x2
    ## 
    ## Fitted party:
    ## [1] root
    ## |   [2] x 1 <= 10: A (n = 10, err = 0.0%)
    ## |   [3] x 1 > 10
    ## |   |   [4] x 1 <= 20: B (n = 10, err = 0.0%)
    ## |   |   [5] x 1 > 20: C (n = 10, err = 0.0%)
    plot(ct)
    

    ctree