[SOLVED] Weird predictions from well fitted GAM model

Weird predictions from well fitted GAM model

I am modeling the biomass (AGB) from Sentinel-2 vegetation indices, calibrated with field measurements. After feature selection, following bands and indices were selected: GNDVI, B2, LAI, STVI2 I succeeded in fitting a good GAM model, using the Gamma(link='inverse') family of functions, because of the right skewness of the AGB data and all positive values. data:

library(mgcv) #for gam modeling
library(Metrics) #for prediction validation
set.seed(2)  # For reproducibility
**# split into train- en test dataset**
train_ind <- sample(1:nrow(data_gams), 0.9 * nrow(data_gams))
gams_train <- data_gams[train_ind, ]
gams_test<- data_gams[-train_ind, ]
**# model**
model<- gam(AGB~s(GNDVI)+s(B2)+s(LAI)+s(STVI2), data=gams_train, family=Gamma("inverse"))
summary(model)
gam.check(model)
plot.gam(model, pages=1)
**# predict & validate**
pred<- predict.gam(model, newdata = gams_test)
rmse <- rmse(pred, gams_test$AGB)
rmse
mae<-  mae(pred, gams_test$AGB)
mae
plot(pred, gams_test$AGB)

The R² and R²adj are good (~70%) and the plots don't look overfitted. The fitted partial curves and fitted values look reasonable. However, the rmse and mae values are very high. Predicted AGB values are very low (0.001-0.05 and sometimes negative!) while all measured AGB are 18-140. The dataset only has 49 observations and is thus quite low, however the predictions are not logical in any way.

I tried different test- and train datasets, all resulting in similar predictions. I compared the VI values of the test-dataset, ranging in the same range as the training set values on the partial plot. I standardized the dataset and tried Leave-one-out cross validation to make the model more robust. I don't understand what goes wrong with the prediction.

Solution

I suspect you need to use type = 'response' in your call to predict.gam(). The default behaviour is to return predictions on the link scale (see ?predict.gam for more details)