I'm rather new to Python so it's quite possible that my question has already been asked on the net but when I find things that seem relevant, I don't always know how to use them in my code (especially if it's a function definition), so I apologise if there's any redundancy.
I work with daily temperature data from the Copernicus website (https://marine.copernicus.eu/).
As the netCDF files are too large if I want the data for every day of every month for several years, what I'm trying to do is access the data without downloading it so that I can work with it.
The data is in the form of an array for 1 day of a month of a year.
I want to sum the values of all the arrays for each day of a month in a year.
To make things clearer, here's an example:
Simplified arrays :
array1([1,4,3,9]
[7,5,2,3])
array2([3,8,6,1]
[6,4,7,2])
#... etc until day 28,29,30 or 31
The result I want :
array1 + array 2 => ([1+3,4+8,3+6,9+1]
[7+6,5+4,2+7,3+2])
array1 + array 2 => ([4,12,9,10]
[13,9,9,5])
I first tried to do it without loop with the data for 2 days and it works.
My code :
import os
import xarray as xr
import numpy as np
import netCDF4 as nc
import copernicusmarine
# Access the data
DS = copernicusmarine.open_dataset(dataset_id="cmems_mod_glo_phy_my_0.083deg_P1D-m")
# Get only thetao (temperature) variable for 1 day
subset = DS[['thetao']].sel(time = slice("2014-01-01", "2014-01-01"))
# Obtain only data of a certain depth
target_depth = 0 #surface
subset_T = subset.thetao.isel(depth=target_depth)
# To view my data in array
thetao_depth0 = subset_T.data
thetao_depth0
# Same thing for next day of the same month and year
subset2 = DS[['thetao']].sel(time = slice("2014-01-02", "2014-01-02"))
subset_T2 = subset2.thetao.isel(depth=target_depth)
thetao_depth0_2 = subset_T2.data
thetao_depth0_2
# The sum of my arrays
days_sum = thetao_depth0 + thetao_depth0_2
days_sum
My thetao_depth0 arrays look like this :
For 01/01/2014 :
array([[[ nan, nan, nan, ..., nan,
nan, nan],
[ nan, nan, nan, ..., nan,
nan, nan],
[ nan, nan, nan, ..., nan,
nan, nan],
...,
[-1.70870081, -1.70870081, -1.70870081, ..., -1.70870081,
-1.70870081, -1.70870081],
[-1.71016569, -1.71016569, -1.71016569, ..., -1.71016569,
-1.71016569, -1.71016569],
[ nan, nan, nan, ..., nan,
nan, nan]]])
For 02/01/2014 :
array([[[ nan, nan, nan, ..., nan,
nan, nan],
[ nan, nan, nan, ..., nan,
nan, nan],
[ nan, nan, nan, ..., nan,
nan, nan],
...,
[-1.70870081, -1.70870081, -1.70870081, ..., -1.70870081,
-1.70870081, -1.70870081],
[-1.71016569, -1.71016569, -1.71016569, ..., -1.71016569,
-1.71016569, -1.71016569],
[ nan, nan, nan, ..., nan,
nan, nan]]])
And I get days_sum :
array([[[ nan, nan, nan, ..., nan,
nan, nan],
[ nan, nan, nan, ..., nan,
nan, nan],
[ nan, nan, nan, ..., nan,
nan, nan],
...,
[-3.41740161, -3.41740161, -3.41740161, ..., -3.41740161,
-3.41740161, -3.41740161],
[-3.42033139, -3.42033139, -3.42033139, ..., -3.42033139,
-3.42033139, -3.42033139],
[ nan, nan, nan, ..., nan,
nan, nan]]])
Now here's where it gets complicated.
I'd like to create a loop that does the same thing with all the arrays for every day of a month in a year (from 01/01/2014 to 31/01/2014 for example).
So far I've done this :
day = ['01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31']
month = ['01']
year = ['2014']
DS = copernicusmarine.open_dataset(dataset_id="cmems_mod_glo_phy_my_0.083deg_P1D-m")
for y in year:
for m in month:
for d in day:
start_date="%s"%y+"-%s"%m+"-%s"%d
end_date=start_date
subset_thetao = DS[['thetao']].sel(time = slice(start_date, end_date))
target_depth = 0
subset_depth = subset_thetao.thetao.isel(depth=target_depth)
thetao_depth0 = subset_depth.data
But I'm having trouble adding up the arrays for each round of the loop.
I first tried things with np.sum
but either it's not made for what I want to do, or I'm doing it wrong, especially when it comes to storing the array with the sum in a variable.
I've added empty_array = np.array([])
before my for loop but I don't know what to do next in the loop.
This is the first time I've handled arrays with python, so maybe I'm doing it wrong.
In the end, what I'd like to do is average the values of my different arrays over a month.
A simplified example with 3 days of a month :
array1([1,4,3,9]
[7,5,2,3])
array2([3,8,6,1]
[6,4,7,2])
array3([3,2,6,1]
[1,4,5,2])
To get :
array([(1+3+3)/3,(4+8+2)/3,...etc]
[...etc])
array([2.3,4.6,5,3.6]
[4.6,4.3,4.6,2.3])
I've added empty_array = np.array([]) before my for loop but I don't know what to do next in the loop.
Almost right. You need to instantiate an array of the same shape as the arrays you will sum. With Numpy, you can only sum arrays of the same shape. You can inspect shape using arr.shape
.
The array should initially be filled with zeros. This way, you end up with the sum after adding all your other arrays.
import numpy as np
# Create a zero-filled array with the same shape as the arrays you need to sum
sum_arr = np.zeros_like(thetao_depth0)
# alternative: sum_arr = np.zeros(thetao_depth0.shape)
for d in day:
# ...
# get the array subset_depth.data for the current day
# ...
# add the data of this day to the sum
sum_arr += subset_depth.data