rmicrosoft-rrevoscaler

RxCrossTable: transformation doesn't work


I am doing this exercise and I can't find the error.

The data is a subsample of the New York taxi dataset (mht_lab2.zip on Github).

In this current exercise I am supposed to tabulate short and long duration taxi trips against short/long distance taxi trips without using rxDataStep. Therefore I did this:

mht_xdf <- RxXdfData('mht_lab2.xdf') # make sure the xdf-file is in your directory
rxCrossTabs(~dist_rule:dur_rule,mht_xdf,transformFunc = function(datalist){
datalist$dist_rule=as.factor(ifelse(datalist$trip_distance>5,'long','short'),levels=c('short','long'))
datalist$dur_rule=as.factor(ifelse(datalist$trip_duration>10,'long','short'),levels=c('short','long'))
return(datalist)},transformVars = c('trip_distance','trip_duration')
)

However it returns me an error:

Error in doTryCatch(return(expr), name, parentenv, handler) : 
  ERROR: The sample data set for the analysis has no variables.

I tried using the transformObjects, the transforms-argument and tried using combinations. It consistently returned an error.


Solution

  • The above error message is misleading. The problem is that the as.factor() function does not have a 'levels' argument - you need to use the factor() function.

    The following will work:

     myTransform <- function(dataList)
     {
        dataList$dist_rule <- factor(ifelse(dataList$trip_distance>5, 'long', 'short'), 
              levels = c('short', 'long')) 
        dataList$dur_rule <- factor(ifelse(dataList$trip_duration>10, 'long', 'short'),
              levels = c('long', 'short')) 
        dataList
     }
    
     rxCrossTabs(~dist_rule:dur_rule, data = mht_xdf, transformFunc = myTransform, 
            transformVars = c("trip_distance", "trip_duration"))'