rnetcdfopendap

How to extract OPeNDAP data for a single location in R?


I am trying to download vegetation index data for a single point location, and I want it to be automated in R so I can do this for multiple points in different runs. So I found the opendap.catalog package which seems to have the most straightforward method of subsetting the data prior to download. This is what I have tried:

library(opendap.catalog)
library(sf)

lat <- 42.443962
lon <- -76.501884
city <- data.frame(Lat = lat, Lon = lon)
sf <- st_as_sf(x = city,
               coords = c("Lon", "Lat"),
               crs = "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
bbox <- st_buffer(sf, 10)
# i created a small bounding box around my point because the download wasn't working with a single point

nc <- dap(URL = "https://www.ncei.noaa.gov/thredds/dodsC/ncFC/cdr/ndvi-fc/AVHRR_and_VIIRS_NDVI:_aggregation_best.ncd",
    AOI = bbox,
    startDate = "2011-01-01",
    endDate = "2020-12-31",
    varname = "NDVI",
    verbose = T)

This seems to run fine and I get this printout while running:

source:  https://www.ncei.noaa.gov/thredds/dodsC/ncFC/cdr/ndvi-fc/AVH... 
varname(s):
   > NDVI [1] (NOAA Climate Data Record of Normalized Difference Vegetation Index)
==================================================
diminsions:  1, 1, 3653 (names: longitude,latitude,time)
resolution:  0.05, 0.05, 1 days
extent:      -76.53, -76.53, 42.43, 42.43 (xmin, xmax, ymin, ymax)
crs:         +proj=longlat +a=6378137 +f=0.00335281066474748 +p...
time:        2011-01-01 to 2020-12-31
==================================================
values: 3,653 (vars*X*Y*T)

But then when I check my nc object, it is not in a format I expect. It is a list of 1, type 'glue'.

print(nc)

## $NDVI
https://www.ncei.noaa.gov/thredds/dodsC/ncFC/cdr/ndvi-fc/AVHRR_and_VIIRS_NDVI:_aggregation_best.ncd?NDVI[10957:1:14609][951:1:951][2069:1:2069]

Is my data in this glue object or is there an error somewhere. When I try it just for a few days, instead of this long timespan, it seems to be just a double list of the NDVI values. My end goal is to put the NDVI results into a dataframe against time so I can plot it.


Solution

  • Files on OPeNDAP servers tend to be large so you indeed need to subset before you download. Access is also relatively "expensive" over the internet and there may be rate limits set by the administrator of the server. So prepare well.

    One typical problem is a time-out when there is too much processing involved in generating the data. With time-series this can happen easily because the data for a single location is scattered throughout the file(s), generating lots of I/O on the server. Reduce the length of your time series if you get any errors.

    First off, the glue object that you are receiving is produced by the glue package, a dependency of opendap.catalog. That looks like a bug in the latter package so you might want to flag that to the developers through an issue.

    Secondly, use package ncdfCF for a 1-stop solution:

    library(ncdfCF)
    
    # Opening the NetCDF file will only read the metadata from the OPeNDAP
    # server and is quick
    url <- "https://www.ncei.noaa.gov/thredds/dodsC/ncFC/cdr/ndvi-fc/AVHRR_and_VIIRS_NDVI:_aggregation_best.ncd"
    ds <- open_ncdf(url)
    names(ds)
    #> [1] "time_offset" "NDVI"        "TIMEOFDAY"   "QA" 
    
    # Subset the NDVI data variable to a suitably small array. Here I use your
    # temporal range and a box around your location. If you know all of your
    # locations in advance, adjust the box accordingly. Smaller is better.
    subset <- ds[["NDVI"]]$subset(time = c("2011-01-01", "2021-01-01"),
                                  longitude = c(-76.6, -76.4),
                                  latitude = c(42.3, 42.5))
    
    # Extract a profile of NDVI values for a single location. You can extract
    # multiple locations at once by turning the first 3 arguments into vectors.
    ndvi <- subset$profile(.names = "my_location",
                           longitude = -76.501884,
                           latitude = 42.443962,
                           .as_table = T)
    

    The ndvi object is a data.table (must have that package installed as well) which you can use directly with ggplot2 or the base::plot() function.

    If you used shorter time series, such as a sequence of years, you can append the data.tables in the usual fashion. That would look like this, in the form of a function:

    get_NDVI <- function() {
      cat("Connecting...\n")
      url <- "https://www.ncei.noaa.gov/thredds/dodsC/ncFC/cdr/ndvi-fc/AVHRR_and_VIIRS_NDVI:_aggregation_best.ncd"
      ds <- open_ncdf(url)
      
      out <- data.table()
      for (y in 2011:2020) {
        cat("Processing", y, "\n")
        subset <- ds[["NDVI"]]$subset(time = c(paste0(y, "-01-01"), paste0(y+1, "-01-01")),
                                      longitude = c(-76.6, -76.4),
                                      latitude = c(42.3, 42.5))
      
        ndvi <- subset$profile(.names = "my_location",
                               longitude = -76.501884,
                               latitude = 42.443962,
                               .as_table = T)
        
        out <- rbind(out, ndvi)
      }
      out
    }