I have a OHLC time series for some stock prices:
library(quantmod)
library(mnormt)
library(MASS)
download.file("http://dl.dropbox.com/u/25747565/941.RData", destfile="test.RData")
load("test.RData")
chartSeries(p)
As you can see from the plot, there are two downward spikes, most likely due to some sort of data error. I want to use a multivariate Gaussian to detect the rows which contain these two offending data points.
> x[122,]
941.Open 941.High 941.Low 941.Close
85.60 86.65 5.36 86.20
> x[136,]
941.Open 941.High 941.Low 941.Close
84.15 85.60 54.20 85.45
Here is my code to fit the distribution and calculate the probabilities of each data point:
x <- coredata(p[,1:4])
mu <- apply(x, 2, mean)
sigma <- cov.rob(x)$cov
prob <- apply(x, 1, dmnorm, mean = mu, varcov = sigma, log = TRUE)
However, this code throws up the following error:
Error in pd.solve(varcov, log.det = TRUE) : x appears to be not symmetric
This error did not come up when I used the standard cov()
function to calculate the covariance matrix, but only with the Robust covariance matrix function. The covariance matrix itself looks quite benign to me so I'm not sure what is going on. The reasons I want to use a robust estimation of the covariance matrix is because the standard covariance matrix was giving a few false positives as I was including anomalies in my training set.
Can someone tell me:
how to fix this
if this approach even makes sense
You are going right. I don't know the how to fix the error but I can suggest you how to find the indices of novel points.
Calculate mean and covariance in MATLAB as
mu=mean(traindata);
sigma=cov(traindata);
set epsilon as per your requirement
count=0;
Actual=[];
for j=1:size(cv,1)
p=mvnpdf(cv(j,:),mu,sigma);
if p<eplison
count=count+1;
Actual=[Actual;j];
fprintf('j=%d \t p=%e\n',j,p);
end
end
tune threshold if results are not satisfactory