pythongoogle-colaboratorygoogle-earth-engine

How to aggregate monthly average rainfall by district in Python/Google Colab


I'm trying to retrieve monthly average rainfall from 2004 to 2021 based on CHIRPS data by district, using a shapefile I imported from my drive. So far, I am using the following code in Google Colab:

path = "/content/drive/.../x.shp"
districts = gpd.read_file(path) 

startDate = ee.Date('2004-01-01')
endDate = ee.Date('2021-12-31')

chirps = ee.ImageCollection('UCSB-CHG/CHIRPS/DAILY').filterDate(startDate, endDate).select("precipitation")

# Reduce the rainfall data to the district polygons
def reduce_image(img):
    img_reduced = img.reduceRegions(
        collection=districts,
        reducer=ee.Reducer.mean(),
        scale=5500
    )
    return img_reduced

rainfall_reduced = chirps.map(reduce_image).flatten()

... but I get an error message saying

EEException: Unrecognized argument type to convert to a FeatureCollection

Also, when I try adding

.featureBounds(districts) 

to the chirps import, I get an error message saying

EEException: Invalid GeoJSON geometry.

I have tried changing the code for hours but don't seem to be able to make it work.

Could anyone tell me how I can calculate monthly average precipitation for each district, and ultimately download them as a .csv file?

Thank you very much in advance!


Solution

  • We need to get the 'features' from the shapefiles using to_json() and a few other things to get it as a ee.FeatureCollection:

    file_name = '/content/drive/My Drive/Colab Notebooks/DATA_FOLDERS/SHP/gadm41_PRY_2.shx'
    districts = gpd.read_file(file_name)
    
    fc = []
    for i in range(districts.shape[0]):
        g = districts.iloc[i:i + 1, :] 
        json_dict = eval(g.to_json()) 
        geo_json_dict = json_dict['features'][0] 
        fc.append(ee.Feature(geo_json_dict))
    
    districts = ee.FeatureCollection(fc)
    

    We also need to use mosaic() on the ee.ImageCollection or chirps:

    chirps = ee.ImageCollection('UCSB-CHG/CHIRPS/DAILY').filterDate(startDate, endDate).select("precipitation").mosaic()
    

    For reduceRegions() we need to use ee.Image() and getInfo():

    def reduce_image(img):
        img_reduced = ee.Image(img).reduceRegions(
            reducer=ee.Reducer.mean(),
            collection=districts,
            scale=5500,
        ).getInfo()
        return img_reduced
    

    Altogether we have:

    file_name = '/content/drive/My Drive/Colab Notebooks/DATA_FOLDERS/SHP/gadm41_PRY_2.shx'
    districts = gpd.read_file(file_name)
    
    fc = []
    for i in range(districts.shape[0]):
        g = districts.iloc[i:i + 1, :] 
        json_dict = eval(g.to_json()) 
        geo_json_dict = json_dict['features'][0] 
        fc.append(ee.Feature(geo_json_dict))
    
    districts = ee.FeatureCollection(fc)
    
    #startDate = ee.Date('2004-01-01')
    startDate = ee.Date('2020-01-01')
    endDate = ee.Date('2021-12-31')
    chirps = ee.ImageCollection('UCSB-CHG/CHIRPS/DAILY').filterDate(startDate, endDate).select("precipitation").mosaic()
    
    def reduce_image(img):
        img_reduced = ee.Image(img).reduceRegions(
            reducer=ee.Reducer.mean(),
            collection=districts,
            scale=5500,
        ).getInfo()
        return img_reduced
    
    rainfall_reduced = reduce_image(chirps)
    

    print(rainfall_reduced) outputs (I only included a subset since it's a million lines):

    ...
          [-56.286949156999924, -24.767101287999935],
          [-56.28482055699993, -24.76206016599997],
          [-56.28269958499993, -24.757202148999852],
          [-56.28142547599998, -24.754322050999917],
          [-56.28020477399997, -24.751232146999882],
          [-56.28010559099994, -24.751117705999945]]]},
       'id': '217',
       'properties': {'CC_2': 'NA',
        'COUNTRY': 'Paraguay',
        'ENGTYPE_2': 'District',
        'GID_0': 'PRY',
        'GID_1': 'PRY.18_1',
        'GID_2': 'PRY.18.15_1',
        'HASC_2': 'PY.SP.YN',
        'NAME_1': 'San Pedro',
        'NAME_2': 'Yataity del Norte',
        'NL_NAME_1': 'NA',
        'NL_NAME_2': 'NA',
        'TYPE_2': 'Distrito',
        'VARNAME_2': 'NA',
        'mean': 0}}]}
    

    Note: I had to limit the dates range (i.e. startDate) because it's... a lot of data and I get this error/warning message:

    IOPub data rate exceeded.
    The notebook server will temporarily stop sending output
    to the client in order to avoid crashing it.