In SAS its possible to go through a dataset and used lagged values.
The way I would do it is to use a function that does a "lag", but this presumably would produce a wrong value at the beginning of a chunk. For example if a chunk starts at row 200,000, then it will assume an NA for a lagged value that should come instead from row 199,999.
Is there a solution for this?
Here's another approach for lagging: self-merging using a shifted date. This is dramatically simpler to code and can lag several variables at once. The downsides are that it takes 2-3 times longer to run than my answer using transformFunc
, and requires a second copy of the dataset.
# Get a sample dataset
sourcePath <- file.path(rxGetOption("sampleDataDir"), "DJIAdaily.xdf")
# Set up paths for two copies of it
xdfPath <- tempfile(fileext = ".xdf")
xdfPathShifted <- tempfile(fileext = ".xdf")
# Convert "Date" to be Date-classed
rxDataStep(inData = sourcePath,
outFile = xdfPath,
transforms = list(Date = as.Date(Date)),
overwrite = TRUE
)
# Then make the second copy, but shift all the dates up
# one (or however much you want to lag)
# Use varsToKeep to subset to just the date and
# the variables you want to lag
rxDataStep(inData = xdfPath,
outFile = xdfPathShifted,
varsToKeep = c("Date", "Open", "Close"),
transforms = list(Date = as.Date(Date) + 1),
overwrite = TRUE
)
# Create an output XDF (or just overwrite xdfPath)
xdfLagged2 <- tempfile(fileext = ".xdf")
# Use that incremented date to merge variables back on.
# duplicateVarExt will automatically tag variables from the
# second dataset as "Lagged".
# Note that there's no need to sort manually in this one -
# rxMerge does it automatically.
rxMerge(inData1 = xdfPath,
inData2 = xdfPathShifted,
outFile = xdfLagged2,
matchVars = "Date",
type = "left",
duplicateVarExt = c("", "Lagged")
)