I am working with a nc file that has been created by merging single monthly nc files.
The actual file ('precip_combined.nc
') contains global monthly precipitation data from 1979-2020 (size: 3.79 GB):
<xarray.Dataset>
Dimensions: (lon: 3600, lat: 1800, time: 504)
Coordinates:
* lon (lon) float32 -179.9 -179.8 -179.8 ... 179.8 179.9 179.9
* lat (lat) float32 89.95 89.85 89.75 ... -89.75 -89.85 -89.95
* time (time) datetime64[ns] 1979-01-01 1979-02-01 ... 2020-12-01
Data variables:
precipitation (time, lat, lon) float32 ...
Attributes:
history: Created on 2022-10-09 13:23
input_data_hash: 1ff7f07166074240030172bae541afd3ecbc27ef941e50a65b04a26...type here
The goal is:
I used this code:
import xarray as xr
# Open the netCDF file
data = xr.open_dataset('.../precip_combined.nc')
# Define the start and end months of the hydrological year
start_month = 10
end_month = 9
# Initialize an empty list to store the sums of precipitation
precipitation_sums = []
# Loop through each year from 1979 to 2020
for year in range(1979, 2021):
# Define the start and end dates of the hydrological year
start_date = f'{year}-{start_month:02d}-01'
end_date = f'{year+1}-{end_month:02d}-30'
# Select the data for the hydrological year
hydro_year_data = data.sel(time=slice(start_date, end_date))
# Calculate the sum of precipitation for each lon and lat coordinate
precipitation_sum = hydro_year_data['precipitation'].sum(dim=['lon', 'lat'])
# Append the sum to the list
precipitation_sums.append(precipitation_sum)
# Convert the list of sums to a new xarray dataset
precipitation_sums_dataset = xr.concat(precipitation_sums, dim='time')
# Add the precipitation sums dataset to the original dataset
data['precipitation_sums'] = precipitation_sums_dataset
# Save the updated dataset as a new netCDF file
data.to_netcdf('.../precip_annual_sums.nc')
and got as an output a nc file with following shapes:
<xarray.Dataset>
Dimensions: (lon: 3600, lat: 1800, time: 504)
Coordinates:
* lon (lon) float32 -179.9 -179.8 -179.8 ... 179.8 179.9 179.9
* lat (lat) float32 89.95 89.85 89.75 ... -89.75 -89.85 -89.95
* time (time) datetime64[ns] 1979-01-01 ... 2020-12-01
Data variables:
precipitation (time, lat, lon) float32 ...
precipitation_sums (time) float32 ...
Attributes:
history: Created on 2022-10-09 13:23
input_data_hash: 1ff7f07166074240030172bae541afd3ecbc27ef941e50a65b04a26...type here
As you see, the Data variable precipitation_sums does not contain any coordinates. For example, if I read the created nc file and extract data for London, I get no output.
Note that I need the global dataset, not for a specific location as asked here https://stackoverflow.com/questions/65072907/calculate-monthly-average-and-annual-sum-of-precipitation-from-netcdf-with-cdo
I would appreciate any help. There is no comprehensible example available on GitHub etc., probably the task is too easy for pros.
As you tagged the question with cdo
I presume a cdo
answer is also acceptable for you, in this case you can shift the time index back 9 months, so October is now January, sum over the year, and then shift back so the timestamp is correct:
cdo shifttime,9months -yearsum -shifttime,-9months in.nc out.nc
Note:
--timestat_date
with options last
and first
to choose where the timestamp is defined (default is mid, so the data will April)