I'm trying to analyse some tennis data and I'm hitting a problem with the code:
library(rpart)
library(rpart.plot)
library(ggplot2)
library(wesanderson)
train=read.csv("/ags_test.csv",header=T, na.strings=c("","NA"))
Please note this is a complete set, not one I've cobbled together through the code. All the gaps have NA values in them.
control=rpart.control(cp=0.007)
train$res=as.factor(train$res)
tree=rpart(res~Tournament+Surface+Round+J1Rank+J2Rank+J1Pts+J2Pts+DRank+DPts,data=train,control=control,parms=list(split="gini"))
All good until the last line when it kicks out:
Error in cbind(yval2, yprob, nodeprob) :
number of rows of matrices must match (see arg 2)
The data isn't a massive set but comprises of 17 columns and 50 lines.
Any ideas would be much appreciated.
Turns out that the problem the data is too certain, i.e. the pros are all in the same columns and the cons in a similar structure.
Therefore, there's little to run the decision tree against.