rspatialr-spkriginggstat

Different outcomes from autokriging and manual kriging


Can someone help me understand why I am getting such different results from auto- and manual kriging? I see the two algorithms are using different variogram models, but is that alone the reason for all the discrepancy? I am also uncomfortable with the fact that autokrige predicts significantly higher values at the locations where there is no data, e.g. bottom left and bottom right of the grid. Does it have to do with transforming the (log-transformed) kriged output back via exponentiation? Moreover, both methods predict much lower values than the data. It seems I am doing something wrong with either or both algorithm(s), any help is greatly appreciated.

EDIT: GitHub link to the data used here (I used this suggestion for sharing csv file).

library(sp)
library(gstat)
library(automap)

# get data
df <- read.csv("data.csv")
---------------------------
> head(df)
      long     lat n
1 -76.7116 39.2992 1
2 -76.7113 39.3553 1
3 -76.7113 39.3607 1
4 -76.7112 39.3126 2
5 -76.7110 39.2973 1
6 -76.7110 39.3364 1
---------------------------

# generate regular grid
coordinates(df) <- ~ long + lat
grd <- spsample(df, type = "regular", n = 10000)
colnames(grd@coords) <- c("long", "lat")        # rename coords as in data
gridded(grd) <- TRUE

# manual kriging
form <- as.formula("log(n) ~ long + lat")       # universal kriging formula
v <- variogram(object = form, locations = df)
vmfit <- fit.variogram(v,
                     model = vgm(model = c("Sph", "Exp", "Mat", "Gau", "Ste")))
-----------------------------------
> vmfit
  model     psill       range kappa
1   Ste 0.2886451 0.001786237   0.5
-----------------------------------
krg <- krige(formula = form, locations = df, newdata = grd, model = vmfit)
krg_df <- data.frame(krg@coords,                # extract kriged data
                     pred = exp(krg@data$var1.pred))

# auto kriging
krg2 <- autoKrige(formula = form, input_data = df, new_data = grd)
krg2_df <- data.frame(krg2$krige_output@coords,
                      pred = exp(krg2$krige_output@data$var1.pred))
-----------------------------------
> krg2$var_model
  model      psill      range
1   Nug 0.26473893 0.00000000
2   Sph 0.02574092 0.01657061
-----------------------------------

enter image description here


Solution

  • Robert's answer above gave me the idea of trying the formula n ~ long + lat (instead of log(n) ~ long + lat I used before). Now the two methods seem to behave consistently with reasonable difference. I wonder if this is because of the skewness of my data, 70% of which has n = 1. Using log(n) in the earlier formula left only 30% non-zero values, leading to a sparse field with very little spatial correlation for krige to work with.

    meuse data in the example above does not have this problem: log(zinc) > 0 for all zinc values.

    enter image description here