I am attempting to compute variograms in R with the gstat package of biomass data across management areas. The biomass data is a raster dataset with a 3.5 ft resolution or 1.0668m. The size of the spatialpointsDataFrame I am passing to the variogram function is 18.6 Mb (814223 elements). (I have also tried the spatialpixelsDataFrame, but it does not like the 1.0668m pixel size). When I run the code:
v = variogram(ras.grid1@data[[1]]~1, data = ras.grid1)
and look at output "v", I get distance values that are much larger than the management area (and much larger than 1/3 of the diagonal length).
When I run the variogram function on smaller management units (40 ha) it gives me results that I would expect (this is using a SpatialPointsDataFrame with the size of 7.9 Mb and 344259 elements).
If I hard code the cutoff to be smaller, with the initial larger raster dataset to 200m, it again provides distance values I expect. If I try upping the distance let's say 600m again it provides distance values much larger than the 600m cutoff specified. 300m also provides unexpected results. For example:
####variogram computation with 200m cutoff....It works
v = variogram(ras.grid1@data[[1]]~1, data = ras.grid1, cutoff=200)
v
np dist gamma dir.hor dir.ver id
1 195954282 8.874169 4990.504 0 0 var1
2 572500880 20.621626 5627.534 0 0 var1
3 958185761 33.701344 5996.423 0 0 var1
4 1288501796 46.920392 6264.396 0 0 var1
5 1652274803 60.198360 6472.187 0 0 var1
6 1947750363 73.502011 6642.960 0 0 var1
7 2282469596 86.807781 6802.124 0 0 var1
8 2551355646 100.131946 6942.277 0 0 var1
9 2849678492 113.441335 7049.838 0 0 var1
10 3093057361 126.751400 7149.102 0 0 var1
11 3375989515 140.081110 7240.848 0 0 var1
12 3585116223 153.418095 7322.990 0 0 var1
13 3821495516 166.721460 7394.616 0 0 var1
14 4036375072 180.053643 7443.040 0 0 var1
15 4235205167 193.389119 7476.061 0 0 var1
####variogram computation with 600m cutoff....It returns unexpected
####distance values
v2 = variogram(ras.grid1@data[[1]]~1, data = ras.grid1, cutoff=600)
v2
np dist gamma dir.hor dir.ver id
1 1726640923 26.54691 5759.951 0 0 var1
2 593559666 510.62232 53413.914 0 0 var1
3 3388536438 229.26702 15737.659 0 0 var1
4 1464228507 966.36789 49726.788 0 0 var1
5 3503141163 623.13559 25680.965 0 0 var1
6 878031648 3454.21122 117680.266 0 0 var1
7 2233138601 1761.91799 50996.719 0 0 var1
8 3266098834 1484.40162 37369.451 0 0 var1
9 4056578316 1420.49358 31556.527 0 0 var1
10 254561085 26030.66780 517601.669 0 0 var1
11 562144107 13256.59985 239163.649 0 0 var1
12 557621435 14631.84504 243476.857 0 0 var1
13 385648032 22771.12890 352898.971 0 0 var1
14 4285655256 2163.11091 31213.201 0 0 var1
15 3744542323 2575.19496 34709.529 0 0 var1
Also if I scale the data up to 3m I again get the expected distance values.
I am not sure if the large size of raster dataset is causing the issue and what I am trying to do is not possible, or if I doing something wrong or if there is another way?
Thank you for the help and interest.
After exploring this in more detail, it does seem to be the size of the SpatialPointsDataFrame causing the issue. On my machine keeping the size under 10 Mb seemed to do the trick. To reduce the size of the SpatialPointsDataFrame I sampled the original raster using:
ras.grid<-ras.grid[sample(1:length(ras.grid), 350000),]