I would like to know if there is a way to reorder the column positions of an xdf dataset. For example, if I have an xdf dataset with columns [,a],[,c],[,b], I would like to reorder the columns to [,a],[,b],[,c] without having to create a dataframe, reorder the columns, and use RxImport or rxDataFrameToXdf to convert it back to an xdf file (because the xdf file potentially has 100s of millions of rows and I don't want to write the dataset to memory).
One potential solution I see is using the rxSetVarInfoXdf function, which has information on the column position.
Something like: Swap positions for columns 2 and 3
varInfo <- list(list(position = 2, position = 3), list(position = 3, position = 2))
But this will not work as position is a value which you call to reference the column but not to change it.
You can use varsToKeep
in rxDataStep
to reorder your columns, which keeps it all in XDF. I'm not totally certain about this, but I believe this all happens in C++ - so it should be relatively quick.
# First, set up pointers to the source XDF file
sourcePath <- file.path(rxGetOption("sampleDataDir"), "mortDefaultSmall.xdf")
# Look at the top several rows
rxDataStep(sourcePath, numRows = 10)
# Create a new path for the reordered dataset
reorderPath <- paste0(tempfile(), ".xdf")
# If you've got a lot of columns and only want to move one, you probably
# don't want to type them all out. Try this instead:
varNames <- names(rxGetVarInfo(sourcePath))
varToMove <- "creditScore"
otherVars <- varNames[!varNames %in% varToMove]
# Reorder them using varsToKeep - just put varToMove at the end
rxDataStep(inData = sourcePath,
outFile = reorderPath,
varsToKeep = c(otherVars, varToMove),
overwrite = TRUE
)
# Check that the order has changed
rxDataStep(reorderPath, numRows = 10)