I am trying to use universal kriging on some irregular data. I keep getting an error:
Error in chol.default(A) : the leading minor of order 2 is not positive definite
When I run the krigST command. Here is a sample of my data:
long | lat | date_from | date_to | var1 | var2 | var3 | var4 | resp | tid | spid |
---|---|---|---|---|---|---|---|---|---|---|
144.12 | -33.73 | 2019-01-03 | 2019-02-02 | 700 | 103 | 48.6 | 4.93 | -4.04 | 1 | 1 |
144.46 | -32.59 | 2019-01-03 | 2019-02-02 | 698 | 89 | 65.8 | 16.69 | -2.93 | 1 | 2 |
144.98 | -31.05 | 2019-01-03 | 2019-02-02 | 1947 | 145 | 145.4 | 8.22 | -7.05 | 1 | 3 |
144.46 | -32.59 | 2019-01-07 | 2019-02-06 | 686 | 95 | 61.4 | 7.03 | -3.94 | 2 | 2 |
145.44 | -33.25 | 2019-01-07 | 2019-02-06 | 637 | 151 | 25 | 7.76 | -5.09 | 2 | 4 |
146.15 | -34.01 | 2019-01-07 | 2019-02-06 | 849 | 75 | 11 | 9.86 | -5.46 | 2 | 6 |
145.8 | -33.45 | 2019-02-03 | 2019-03-02 | 704 | 90 | 14.2 | 10.91 | -5.28 | 3 | 5 |
146.15 | -34.01 | 2019-02-03 | 2019-03-02 | 15 | 1 | 12.9 | 8.4 | -3.41 | 3 | 6 |
144.98 | -31.05 | 2019-03-13 | 2019-04-12 | 975 | 153 | 87.3 | 13.09 | -6.1 | 4 | 3 |
145.44 | -33.25 | 2019-03-13 | 2019-04-12 | 1025 | 124 | 33.4 | 22.13 | -6.89 | 4 | 4 |
144.46 | -32.59 | 2019-04-22 | 2019-05-21 | 264 | 192 | 20 | 5.52 | -4.28 | 5 | 2 |
145.44 | -33.25 | 2019-04-22 | 2019-05-21 | 530 | 136 | 17.2 | 9.27 | -5.48 | 5 | 4 |
145.8 | -33.45 | 2019-04-22 | 2019-05-21 | 790 | 87 | 16.6 | 15.63 | -6.57 | 5 | 5 |
145.8 | -33.45 | 2019-05-03 | 2019-06-02 | 1074 | 67 | 19.7 | 9.35 | -6.94 | 6 | 5 |
146.15 | -34.01 | 2019-05-03 | 2019-06-02 | 140 | 7 | 49.1 | 13.98 | -3.12 | 6 | 6 |
144.98 | -31.05 | 2019-06-02 | 2019-07-01 | 1104 | 126 | 21.2 | 17.88 | -6.96 | 7 | 3 |
144.46 | -32.59 | 2019-06-03 | 2019-07-02 | 351 | 174 | 61 | 7.45 | -5.14 | 8 | 2 |
145.44 | -33.25 | 2019-06-03 | 2019-07-02 | 573 | 122 | 1.4 | 13.98 | -6.42 | 8 | 4 |
146.15 | -34.01 | 2019-06-03 | 2019-07-02 | 651 | 53 | 12.4 | 9.06 | -6.21 | 8 | 6 |
145.8 | -33.45 | 2019-07-15 | 2019-08-14 | 1058 | 75 | 11.8 | 12.73 | -6.77 | 9 | 5 |
This is the code I'm using:
dat<- read.csv("mydat.csv")
#load libraries
library(gstat)
library(sp)
library(spacetime)
#Format lat and long & remove the duplicates
COOR=unique(cbind(dat$long,dat$lat))
#create SpatialPoints object from coordinates CRS stands for Coordinate reference system
COOR.formatted=SpatialPoints(COOR,proj4string=CRS("+proj=longlat +datum=WGS84"))
#create index for STSDF object
(index <- matrix(cbind(dat$spid, dat$tid), nrow= length(dat$spid), ncol=2, byrow = FALSE ))
dataframe<-data.frame(var1 = dat$var1,var2=dat$var2,
var3 = dat$var3,var4 = dat$var4, resp = dat$resp,
lat=dat$lat, lat.sq=dat$lat^2, ID=paste("ID",1:nrow(dat)))
DATE.formatted <- unique(as.Date(dat$date_from))
DATE.formatted.end <- as.POSIXct(unique(as.Date(dat$date_to)))
datST = STSDF(COOR.formatted, DATE.formatted, dataframe,
index, endTime=DATE.formatted.end )
summary(datST)
#################################create some theoretical models###################
var.uk.o=variogramST(resp~ var1+var2+var3+var4,
datST,tlags=0:7,
na.omit=TRUE,)
plot(var.uk.o)
type <-c("Exp", "Sph")
psill=2
spatialVgm = vgm(psill=psill,type[1], range=120, nugget=0)
temporalVgm=vgm(psill=psill,type[1], range=150, nugget=0)
jointVgm = vgm(psill=psill,type[2], range=120, nugget=0)
fit.ssum.o = vgmST("sumMetric", space = spatialVgm,
time = temporalVgm,
joint = jointVgm,
nugget = 0,
stAni = 25)
fit.ssum.o=fit.StVariogram(var.uk.o,
fit.ssum.o ,fit.method = 8)
#this is where it goes wrong.
pred.a <- krigeST(resp~1,
data=datST, newdata=datST,
modelList=fit.ssum.o)
pred.b <- krigeST(resp~var1+var2+var3+var4,
data=datST, newdata=datST,
modelList=fit.ssum.o)
#I tried increasing the value of the response variable but I still get the error.
dat$resp= dat$resp+10000
I tried to increase the response variable to move it away from zero, because my data is very noisy, but it doesn't work (I've tried small increments 10, 20, 50....1000, 10000 nothing works. I added the optional endTime option in the STSDF object but that didn't work either.
I'm able to answer this question at last. It turns out you also get this error if there are 'duplicate' entries in your data. My data was too noisy so I grouped the dates so they all started on the 1st of each month. This smoothed the data, however I was still getting an error because my smoothing had resulted in some peusdo-duplicate data i.e.
144.46 -32.59 2019-01-03 2019-02-02 698 89 65.8 16.69 -2.93 1 2
144.46 -32.59 2019-01-07 2019-02-06 686 95 61.4 7.03 -3.94 2 2
became
144.46 -32.59 2019-01-01 2019-02-02 698 89 65.8 16.69 -2.93 1 2
144.46 -32.59 2019-01-01 2019-02-06 686 95 61.4 7.03 -3.94 1 2
So I had to calculate the mean values of my response and predictor variables and now the code works correctly.
This is a fine fix for this data, as I want to interpolate monthly values for several years. This is the little bit of code that fixed it
p<- length(colnames(data))
avg=aggregate(data[,5:p],by=list(data$long, data$lat, data$Date_from),FUN=mean,simplify=TRUE,na.rm=TRUE)
The date_to data, was no longer relevant as I fixed that when I grouped the data into monthly bags.