pythonpandasextractpython-xarraynetcdf

Python extract multiple lat/lon from NETCDF files using xarray


I have a NC file (time, lat, lon) and I am trying to extracting time series of multiple stations (lat/lon points. So I tried it this way to read the coordinates and extract the nearest values from the NC file :

import pandas as pd
import xarray as xr
nc_file = r"C:\Users\lab\Desktop\harvey\example.nc"
NC = xr.open_dataset(nc_file)
csv = r"C:\Users\lab\Desktop\harvey\stations.csv"
df = pd.read_csv(csv,delimiter=',')
Newdf = pd.DataFrame([])
# grid point lists
lat = df["Lat"]
lon = df["Lon"]
point_list = zip(lat,lon)
for i, j in point_list:
    dsloc = NC.sel(lat=i,lon=j,method='nearest')
    DT=dsloc.to_dataframe()
    Newdf=Newdf.append(DT,sort=True)

The code works fine and returns this:

                        EVP     lat      lon
time                                        
2019-01-01 19:00:00  0.0546  40.063  -88.313
2019-01-01 23:00:00  0.0049  40.063  -88.313
2019-01-01 19:00:00  0.0052  41.938  -93.688
2019-01-01 23:00:00  0.0029  41.938  -93.688
2019-01-01 19:00:00  0.0101  52.938 -124.938
2019-01-01 23:00:00  0.0200  52.938 -124.938
2019-01-01 19:00:00  0.1644  39.063  -79.438
2019-01-01 23:00:00 -0.0027  39.063  -79.438

However, I need to associate the station-ID (from my original lat/long file) for each of the coordinates like this:

  Station-ID       Lat        Lon            time     EVP     lat      lon
0        Bo1  40.00620  -88.29040  1/1/2019 19:00  0.0546  40.063  -88.313
1                                  1/1/2019 23:00  0.0049  40.063  -88.313
2        Br1  41.97490  -93.69060  1/1/2019 19:00  0.0052  41.938  -93.688
3                                  1/1/2019 23:00  0.0029  41.938  -93.688
4        Brw  71.32250 -156.60917  1/1/2019 19:00  0.0101  52.938 -124.938
5                                  1/1/2019 23:00  0.0200  52.938 -124.938
6        CaV  39.06333  -79.42083  1/1/2019 19:00  0.1644  39.063  -79.438
7                                  1/1/2019 23:00 -0.0027  39.063  -79.438

Any thoughts how can merge my data frames them like the provided example?


Solution

  • What about if you include the station name in your zip command, and then insert the ID into the pandas dataframe line like this (by the way, I couldn't access your CSV file, so I simplified slightly the example with a dummy list).

    import pandas as pd
    import xarray as xr
    nc_file = "example.nc"
    NC = xr.open_dataset(nc_file)
    
    #dummy locations and station id as I can't access the CSV
    lat=[40,42,41]
    lon=[-100,-105,-99]
    name=["a","b","c"]
    
    Newdf = pd.DataFrame([])
    
    for i,j,id in zip(lat,lon,name):
        dsloc = NC.sel(lat=i,lon=j,method='nearest')
        DT=dsloc.to_dataframe()
    
        # insert the name with your preferred column title:
        DT.insert(loc=0,column="station",value=id)
        Newdf=Newdf.append(DT,sort=True)
    
    print(Newdf)
    

    This gives me:

                            EVP     lat      lon station
    time                                                
    2019-01-01 19:00:00  0.0527  39.938  -99.938       a
    2019-01-01 23:00:00  0.0232  39.938  -99.938       a
    2019-01-01 19:00:00  0.0125  41.938 -104.938       b
    2019-01-01 23:00:00  0.0055  41.938 -104.938       b
    2019-01-01 19:00:00  0.0527  40.938  -98.938       c
    2019-01-01 23:00:00  0.0184  40.938  -98.938       c