geospatialpython-xarraynetcdf4opendap

Fastest way to slice and download hundreds of NetCDF files from THREDDS/OPeNDap server


I am working with NASA-NEX-GDDP CMIP6 data. I currently have working code that individually opens and slices each file, however it takes days to download one variable for all model outputs and scenarios. My goal is to have all temperature and precipitation data for all models outputs and scenarios then apply climate indicators and make an ensemble with xclim.

url = 'https://ds.nccs.nasa.gov/thredds2/dodsC/AMES/NEX/GDDP-CMIP6/UKESM1-0-LL/ssp585/r1i1p1f2/tasmax/tasmax_day_UKESM1-0-LL_ssp585_r1i1p1f2_gn_2098.nc'
lat = 53
lon = 0

try:
    with xr.open_dataset(url) as ds:
        ds.interp(lat=lat,lon=lon).to_netcdf(url.split('/')[-1])
except Exception as e: print(e)

This code works but is very slow (days for one variable, one location). Wondering if there is a better, faster way? I'd rather not download the whole files as they are each 240 MB!

Update:

I have also tried the following to take advantage of dask parallel tasks and it is slightly faster but still on the order of days to complete for a full variable output:

 def interp_one_url(path,lat,lon):
       with xr.open_dataset(path) as ds: 
           ds = ds.interp(lat=lat,lon=lon)
           return ds 
urls = ['https://ds.nccs.nasa.gov/thredds2/dodsC/AMES/NEX/GDDP-CMIP6/UKESM1-0-LL/ssp585/r1i1p1f2/tasmax/tasmax_day_UKESM1-0-LL_ssp585_r1i1p1f2_gn_2100.nc',
        'https://ds.nccs.nasa.gov/thredds2/dodsC/AMES/NEX/GDDP-CMIP6/UKESM1-0-LL/ssp585/r1i1p1f2/tasmax/tasmax_day_UKESM1-0-LL_ssp585_r1i1p1f2_gn_2099.nc']
lat = 53
lon = 0
paths = [url.split('/')[-1] for url in urls]
datasets = [interp_one_url(url,lat,lon) for url in urls]
xr.save_mfdataset(datasets, paths=paths)

Solution

  • One way is to download via the ncss portal instead of the OpenDAP, available via NASA. The URL is different but it is iterative as well.

    e.g.

    lat = 53
    lon = 0
       
    URL = "https://ds.nccs.nasa.gov/thredds/ncss/AMES/NEX/GDDP-CMIP6/ACCESS-CM2/historical/r1i1p1f1/pr/pr_day_ACCESS-CM2_historical_r1i1p1f1_gn_2014.nc?var=pr&north={}&west={}&east={}&south={}&disableProjSubset=on&horizStride=1&time_start=2014-01-01T12%3A00%3A00Z&time_end=2014-12-31T12%3A00%3A00Z&timeStride=1&addLatLon=true"
    
    wget.download(URL.format(lat,lon,lon+1,lat-1) #north, west, east, south boundary
    

    This accomplishes the slicing and download in one step. Once you have the URL, you can use something like wget, and complete downloads in parallel, which will speed up compared to selecting and saving one at a time