I often run into the same issue of how to handle NA values when modelling quantitative trading models. The example below is about a stock with EOD data since 1997-01-01 stored in a xts object with four columns named "High","Low","Close","Volume". The data is from Bloomberg. When I want to calculate rolling 20-day volume the error message occurs:
SMA(stock$Volume, 20)
Error in runSum(x, n) : Series contains non-leading NAs
I quickly located the problem (which I knew was NA values since I have tried this a 1000 times) and found the two days where volume data is missing. I have reproduced those days' data below. As a quick observation the SMA
, EMA
etc. functions in TTR cannot handle NAs if they are preceded by numbers and followed by numbers.
stock <- as.xts(matrix(c(94.46,92.377,94.204,NA,71.501,70.457,70.979,NA), 2, 4,
byrow = TRUE, dimnames = list(NULL, c("High","Low","Close","Volume"))),
as.Date(c("1998-07-07", "1999-02-22")))
What is the best way to handle this issue? Is it to store the stock$Volume
as a temporary object where NA values are removed and then calculate the rolling volume and the merge it back in with merge.xts
while adding the fill = NA
so NA values are inserted again? But is that correct since you take the last 20 trading days and not just the 19 available in the 20-day window?
It is my hope that some sort of "best practice" can be the outcome of this post as I assume this issue also happens for other R-users in finance whether they get their data from Bloomberg, Yahoo Finance or another source.
I don't know about "best practice" but one alternative might be what are called "inhomogeneous time series operators", as presented in Operators on Inhomogeneous Time Series.
This type of question is a good fit for the Quantitative Finance stack exchange site (e.g. see How to update an exponential moving average with missing values?).