I have a NC file (time, lat, lon) and I am trying to extracting time series of multiple stations (lat/lon points. So I tried it this way to read the coordinates and extract the nearest values from the NC file :
import pandas as pd
import xarray as xr
nc_file = r"C:\Users\lab\Desktop\harvey\example.nc"
NC = xr.open_dataset(nc_file)
csv = r"C:\Users\lab\Desktop\harvey\stations.csv"
df = pd.read_csv(csv,delimiter=',')
Newdf = pd.DataFrame([])
# grid point lists
lat = df["Lat"]
lon = df["Lon"]
point_list = zip(lat,lon)
for i, j in point_list:
dsloc = NC.sel(lat=i,lon=j,method='nearest')
DT=dsloc.to_dataframe()
Newdf=Newdf.append(DT,sort=True)
The code works fine and returns this:
EVP lat lon
time
2019-01-01 19:00:00 0.0546 40.063 -88.313
2019-01-01 23:00:00 0.0049 40.063 -88.313
2019-01-01 19:00:00 0.0052 41.938 -93.688
2019-01-01 23:00:00 0.0029 41.938 -93.688
2019-01-01 19:00:00 0.0101 52.938 -124.938
2019-01-01 23:00:00 0.0200 52.938 -124.938
2019-01-01 19:00:00 0.1644 39.063 -79.438
2019-01-01 23:00:00 -0.0027 39.063 -79.438
However, I need to associate the station-ID (from my original lat/long file) for each of the coordinates like this:
Station-ID Lat Lon time EVP lat lon
0 Bo1 40.00620 -88.29040 1/1/2019 19:00 0.0546 40.063 -88.313
1 1/1/2019 23:00 0.0049 40.063 -88.313
2 Br1 41.97490 -93.69060 1/1/2019 19:00 0.0052 41.938 -93.688
3 1/1/2019 23:00 0.0029 41.938 -93.688
4 Brw 71.32250 -156.60917 1/1/2019 19:00 0.0101 52.938 -124.938
5 1/1/2019 23:00 0.0200 52.938 -124.938
6 CaV 39.06333 -79.42083 1/1/2019 19:00 0.1644 39.063 -79.438
7 1/1/2019 23:00 -0.0027 39.063 -79.438
Any thoughts how can merge my data frames them like the provided example?
What about if you include the station name in your zip command, and then insert the ID into the pandas dataframe line like this (by the way, I couldn't access your CSV file, so I simplified slightly the example with a dummy list).
import pandas as pd
import xarray as xr
nc_file = "example.nc"
NC = xr.open_dataset(nc_file)
#dummy locations and station id as I can't access the CSV
lat=[40,42,41]
lon=[-100,-105,-99]
name=["a","b","c"]
Newdf = pd.DataFrame([])
for i,j,id in zip(lat,lon,name):
dsloc = NC.sel(lat=i,lon=j,method='nearest')
DT=dsloc.to_dataframe()
# insert the name with your preferred column title:
DT.insert(loc=0,column="station",value=id)
Newdf=Newdf.append(DT,sort=True)
print(Newdf)
This gives me:
EVP lat lon station
time
2019-01-01 19:00:00 0.0527 39.938 -99.938 a
2019-01-01 23:00:00 0.0232 39.938 -99.938 a
2019-01-01 19:00:00 0.0125 41.938 -104.938 b
2019-01-01 23:00:00 0.0055 41.938 -104.938 b
2019-01-01 19:00:00 0.0527 40.938 -98.938 c
2019-01-01 23:00:00 0.0184 40.938 -98.938 c