rrevolution-r

Changing the column positions of an xdf dataset


I would like to know if there is a way to reorder the column positions of an xdf dataset. For example, if I have an xdf dataset with columns [,a],[,c],[,b], I would like to reorder the columns to [,a],[,b],[,c] without having to create a dataframe, reorder the columns, and use RxImport or rxDataFrameToXdf to convert it back to an xdf file (because the xdf file potentially has 100s of millions of rows and I don't want to write the dataset to memory).

One potential solution I see is using the rxSetVarInfoXdf function, which has information on the column position.

Something like: Swap positions for columns 2 and 3

varInfo <- list(list(position = 2, position = 3), list(position = 3, position = 2))

But this will not work as position is a value which you call to reference the column but not to change it.


Solution

  • You can use varsToKeep in rxDataStep to reorder your columns, which keeps it all in XDF. I'm not totally certain about this, but I believe this all happens in C++ - so it should be relatively quick.

    # First, set up pointers to the source XDF file
    sourcePath <- file.path(rxGetOption("sampleDataDir"), "mortDefaultSmall.xdf")
    
    # Look at the top several rows
    rxDataStep(sourcePath, numRows = 10)
    
    # Create a new path for the reordered dataset
    reorderPath <- paste0(tempfile(), ".xdf")
    
    
    # If you've got a lot of columns and only want to move one, you probably
    # don't want to type them all out. Try this instead:
    varNames <- names(rxGetVarInfo(sourcePath))
    varToMove <- "creditScore"
    otherVars <- varNames[!varNames %in% varToMove]
    
    
    # Reorder them using varsToKeep - just put varToMove at the end
    rxDataStep(inData = sourcePath,
               outFile = reorderPath,
               varsToKeep = c(otherVars, varToMove),
               overwrite = TRUE
    )
    
    
    # Check that the order has changed
    rxDataStep(reorderPath, numRows = 10)