rgisrasternetcdf

How to extract point data for sea surface temperature for specific dates over multiple years from multiple netcdf files using terra::extract() in R


I have downloaded multiple Lm3 4 km netcdf files (n = 4640) for sea surface temperature from the Ocean Colour Project, NASA.

class       : SpatRaster 
dimensions  : 766, 709, 1  (nrow, ncol, nlyr)
resolution  : 0.04165021, 0.04165796  (x, y)
extent      : 24.61, 54.14, 5.8, 37.71  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 
source      : AQUA_MODIS.20120101.L3m.DAY.SST.x_sst.nc:sst 
varname     : sst (Sea Surface Temperature) 
name        :      sst 
unit        : degree_C 

My data spans from 2012 to 2024 for an 82 km area in The Red Sea, and I want to extract point data for sea surface temperature for specific dates.

I have successfully extracted values for sea surface temperature from the netcdf files using the code below, but I'm not sure if I'm just extracting values from the first netcdf layer rather than looping through all the files (1:4640), and I'm unsure how to incorporate the temporal element (the column 'Dates') with the point data over multiple years on specific dates.

I can't share the data because of ownership issues.

library(terra)
library(lubricate)

#Download Sea Surface Temperature 
folder_SST<-"~/Documents/GIS_Data/SST/requested_files"
files_SST <-list.files(folder_SST, pattern='*.nc', full.names ="TRUE")

#Loop through all the file paths and the AQUA MODIS netcdf4 files to gain access to them:
for (file in files_SST) {
  nc_SST=open.nc(files_SST)
  print.nc(nc_SST)
}

#Select the sea surface temperature values from AQUA MODIS files 
SSTs <- terra::rast(files_SST[1], "sst")

#Make the dataframe a spatial object of class = "sf" with a CRS of 4326
Ds_Points <- st_as_sf(x=MyDf, 
                      coords = c("Longitude_E_DD", "Latitude_N_DD"), 
                      crs = 4326)

#Format the dates in the dataframe 
#Get dates
MyDf <- MyDf %>% mutate(Dates = as.Date(as.character(MyDf$Date), format = "%d/%m/%Y"))

#Extract the SST values from the AQUA MODIS files 
SSTs_Data <- terra::extract(SSTs, Ds_Points)

Desired output:

   Survey_Number       Dates Longitude_E_DD Latitude_N_DD    SST
              1   2012-08-01       33.89083      27.26778 23.635
              2   2012-06-02       33.86782      27.40854 23.640
              3   2012-02-07       33.86230      27.44623 23.690
              4   2012-02-12       33.88653      27.26957 23.635
              5   2012-02-13       33.88766      27.26848 23.635
              6   2012-02-14       33.85000      27.36111 23.780
              7   2012-02-15       33.86177      27.41302 23.640

Solution

  • In the code you have, you are indeed only extracting values from the first netcdf file. This line in your script:

    SSTs <- terra::rast(files_SST[1], "sst")
    

    means you are loading the first file in your files_SST list and only using that file in the rest of your script. You will need to use a "for loop" to iterate over each file. I do not see a reference to MyDf above your Ds_Points definition, so I manually created MyDf using the lat/long values of the first 4 points that show up in your desired results. I also downloaded 3 of the more recent SST files from the given source to use for this.

    This is my solution below:

    library(tidyverse)
    library(terra)
    library(tidyterra)
    library(sf)
    library(ncdf4)
    library(lubridate)
    library(stringr)
    
    # create vectors of longs and lats for sample points
    # add these to MyDf, then convert MyDf to sf and save sf to Ds_Points
    longs <- c(33.89083, 33.86782, 33.86230, 33.88653)
    lats <- c(27.26778, 27.40854, 27.44623, 27.26957)
    
    MyDf <- data.frame(longs, lats) %>%
      rename(
        Longitude_E_DD = longs,
        Latitude_N_DD = lats
      )
    
    Ds_Points <- st_as_sf(x = MyDf, 
                          coords = c("Longitude_E_DD", "Latitude_N_DD"), 
                          crs = 4326)
    
    ######################################################################
    
    # set path to data folder
    # get sea surface temperature data files
    folder_SST <- paste(here::here(), '/data', sep = '')
    files_SST <- list.files(folder_SST, full.names ="TRUE")
    
    # set column names in expected output df,
    # then add them to new sst_df
    cols <- c('Survey Number', 'Date', 'Longitude_E_DD', 'Latitude_N_DD', 'SST')
    
    sst_df <- data.frame()
    
    for (col in cols) {
      sst_df[1, col] <- NA 
    }
    
    ######################################################################
    
    # initialize index and survey_num
    index <- 1
    survey_num <- 1
    
    # loop through all the AQUA MODIS netcdf4 files
    for (file in files_SST) {
      # open file as raster and get sst layer
      nc_SST = rast(file, "sst")
      
      # get formatted date from file name
      date <- as.Date(as.character(str_split(tail(str_split(file, '/')[[1]], 1), '\\.')[[1]][2]), format = "%Y%m%d")
      
      # iterate over each point
      for (i in 1:nrow(Ds_Points)) {
        # get point
        d_point <- Ds_Points[i,]
        
        # get point's lat and long
        lon <- st_coordinates(d_point)[,'X']
        lat <- st_coordinates(d_point)[,'Y']
        
        # get SST at that point from SST file
        # must convert point geometry to spatvector to use in terra::extract function
        SST_data <- terra::extract(nc_SST, as_spatvector(d_point, geom = 'geometry', crs = st_crs(Ds_Points)))
        
        # insert data into sst_df at row index
        sst_df[index, 'Survey Number'] <- survey_num
        sst_df[index, 'Date'] <- as.character(date)
        sst_df[index, 'Longitude_E_DD'] <- lon
        sst_df[index, 'Latitude_N_DD'] <- lat
        sst_df[index, 'SST'] <- SST_data$sst
        
        # next row index
        index <- index + 1
      }
    
      # next survey number/file
      survey_num <- survey_num + 1
    }
    
    # show completed sst_df
    sst_df
    

    This is the output:

       Survey Number       Date Longitude_E_DD Latitude_N_DD    SST
                   1 2024-10-28       33.89083      27.26778 26.980
                   1 2024-10-28       33.86782      27.40854 27.000
                   1 2024-10-28       33.86230      27.44623 27.075
                   1 2024-10-28       33.88653      27.26957 26.980
                   2 2024-10-30       33.89083      27.26778 27.055
                   2 2024-10-30       33.86782      27.40854 27.005
                   2 2024-10-30       33.86230      27.44623 27.060
                   2 2024-10-30       33.88653      27.26957 27.055
                   3 2024-11-01       33.89083      27.26778 27.265
                   3 2024-11-01       33.86782      27.40854 27.220
                   3 2024-11-01       33.86230      27.44623 27.165
                   3 2024-11-01       33.88653      27.26957 27.265
    

    This script finds the SST values at each of the points in Ds_Points for each of the netcdf files, then adds the data to sst_df. There are 4 points, so there are 4 SST values for each of the 3 dates/SST files. The date was extracted from the file name of the netcdf file. I figured the survey number was unique for each date.

    Keep in mind, this particular solution will be a bit slow given how many files you have and even slower depending on how many points you have (since it will iterate over each point for each file).