python google-colaboratory google-earth-engine

How to aggregate monthly average rainfall by district in Python/Google Colab

I'm trying to retrieve monthly average rainfall from 2004 to 2021 based on CHIRPS data by district, using a shapefile I imported from my drive. So far, I am using the following code in Google Colab:

path = "/content/drive/.../x.shp"
districts = gpd.read_file(path) 

startDate = ee.Date('2004-01-01')
endDate = ee.Date('2021-12-31')

chirps = ee.ImageCollection('UCSB-CHG/CHIRPS/DAILY').filterDate(startDate, endDate).select("precipitation")

# Reduce the rainfall data to the district polygons
def reduce_image(img):
    img_reduced = img.reduceRegions(
        collection=districts,
        reducer=ee.Reducer.mean(),
        scale=5500
    )
    return img_reduced

rainfall_reduced = chirps.map(reduce_image).flatten()

... but I get an error message saying

EEException: Unrecognized argument type to convert to a FeatureCollection

Also, when I try adding

.featureBounds(districts)

to the chirps import, I get an error message saying

EEException: Invalid GeoJSON geometry.

I have tried changing the code for hours but don't seem to be able to make it work.

Could anyone tell me how I can calculate monthly average precipitation for each district, and ultimately download them as a .csv file?

Thank you very much in advance!

Solution

We need to get the 'features' from the shapefiles using to_json() and a few other things to get it as a ee.FeatureCollection:

file_name = '/content/drive/My Drive/Colab Notebooks/DATA_FOLDERS/SHP/gadm41_PRY_2.shx'
districts = gpd.read_file(file_name)

fc = []
for i in range(districts.shape[0]):
    g = districts.iloc[i:i + 1, :] 
    json_dict = eval(g.to_json()) 
    geo_json_dict = json_dict['features'][0] 
    fc.append(ee.Feature(geo_json_dict))

districts = ee.FeatureCollection(fc)

We also need to use mosaic() on the ee.ImageCollection or chirps:

chirps = ee.ImageCollection('UCSB-CHG/CHIRPS/DAILY').filterDate(startDate, endDate).select("precipitation").mosaic()

For reduceRegions() we need to use ee.Image() and getInfo():

def reduce_image(img):
    img_reduced = ee.Image(img).reduceRegions(
        reducer=ee.Reducer.mean(),
        collection=districts,
        scale=5500,
    ).getInfo()
    return img_reduced

Altogether we have:

file_name = '/content/drive/My Drive/Colab Notebooks/DATA_FOLDERS/SHP/gadm41_PRY_2.shx'
districts = gpd.read_file(file_name)

fc = []
for i in range(districts.shape[0]):
    g = districts.iloc[i:i + 1, :] 
    json_dict = eval(g.to_json()) 
    geo_json_dict = json_dict['features'][0] 
    fc.append(ee.Feature(geo_json_dict))

districts = ee.FeatureCollection(fc)

#startDate = ee.Date('2004-01-01')
startDate = ee.Date('2020-01-01')
endDate = ee.Date('2021-12-31')
chirps = ee.ImageCollection('UCSB-CHG/CHIRPS/DAILY').filterDate(startDate, endDate).select("precipitation").mosaic()

def reduce_image(img):
    img_reduced = ee.Image(img).reduceRegions(
        reducer=ee.Reducer.mean(),
        collection=districts,
        scale=5500,
    ).getInfo()
    return img_reduced

rainfall_reduced = reduce_image(chirps)

print(rainfall_reduced) outputs (I only included a subset since it's a million lines):

...
      [-56.286949156999924, -24.767101287999935],
      [-56.28482055699993, -24.76206016599997],
      [-56.28269958499993, -24.757202148999852],
      [-56.28142547599998, -24.754322050999917],
      [-56.28020477399997, -24.751232146999882],
      [-56.28010559099994, -24.751117705999945]]]},
   'id': '217',
   'properties': {'CC_2': 'NA',
    'COUNTRY': 'Paraguay',
    'ENGTYPE_2': 'District',
    'GID_0': 'PRY',
    'GID_1': 'PRY.18_1',
    'GID_2': 'PRY.18.15_1',
    'HASC_2': 'PY.SP.YN',
    'NAME_1': 'San Pedro',
    'NAME_2': 'Yataity del Norte',
    'NL_NAME_1': 'NA',
    'NL_NAME_2': 'NA',
    'TYPE_2': 'Distrito',
    'VARNAME_2': 'NA',
    'mean': 0}}]}

Note: I had to limit the dates range (i.e. startDate) because it's... a lot of data and I get this error/warning message:

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.