pythonwatershedgrass

How to calculate unnested watersheds in GRASS GIS?


I am running into a few issues using the GRASS GIS module r.accumulate while running it in Python. I use the module to calculate sub watersheds for over 7000 measurement points. Unfortunately, the output of the algorithm is nested. So all sub watersheds are overlapping each other. Running the r.accumulate sub watershed module takes roughly 2 minutes for either one or multiple points, I assume the bottleneck is loading the direction raster.

I was wondering if there is an unnested variant in GRASS GIS available and if not, how to overcome the bottleneck of loading the direction raster every time you call the module accumulate. Below is a code snippet of what I have tried so far (resulting in a nested variant):

locations = VectorTopo('locations',mapset='PERMANENT')
    locations.open('r')
    points=[]
    for i in range(len(locations)):
        points.append(locations.read(i+1).coords())
    for j in range(0,len(points),255):
        output = "watershed_batch_{}@Watersheds".format(j)
        gs.run_command("r.accumulate", direction='direction@PERMANENT', subwatershed=output,overwrite=True, flags = "r", coordinates = points[j:j+255])
        gs.run_command('r.stats', flags="ac", input=output, output="stat_batch_{}.csv".format(j),overwrite=True)

Any thoughts or ideas are very welcome.


Solution

  • The answer provided via email was very usefull. To share the answer I have provided the code below to do an unnested basin subwatershed calculation. A small remark: I had to feed the coordinates in batches as the list of coordinates exceeded the max length of characters windows could handle.

    Thanks to @Huidae Cho, the call to R.accumulate to calculate subwatersheds and longest flow path can now be done in one call instead of two seperate calls.

    The output are unnested basins. Where the largers subwatersheds are seperated from the smaller subbasins instead of being clipped up into the smaller basins. This had to with the fact that the output is the raster format, where each cell can only represent one basin.

    gs.run_command('g.mapset',mapset='Watersheds')
    gs.run_command('g.region', rast='direction@PERMANENT')
    StationIds = list(gs.vector.vector_db_select('locations_snapped_new', columns = 'StationId')["values"].values())
    XY = list(gs.vector.vector_db_select('locations_snapped_new', columns = 'x_real,y_real')["values"].values())
    
    for j in range(0,len(XY),255):
        output_ws = "watershed_batch_{}@Watersheds".format(j)
        output_lfp = "lfp_batch_{}@Watersheds".format(j)
        output_lfp_unique = "lfp_unique_batch_{}@Watersheds".format(j)
        gs.run_command("r.accumulate", direction='direction@PERMANENT', subwatershed=output_ws, flags = "ar", coordinates = XY[j:j+255],lfp=output_lfp, id=StationIds[j:j+255], id_column="id",overwrite=True)
        gs.run_command("r.to.vect", input=output_ws, output=output_ws, type="area", overwrite=True)
        gs.run_command("v.extract", input=output_lfp, where="1 order by id", output=output_lfp_unique,overwrite=True)
        
    

    To export the unique watersheds I used the following code. I had to transform the longest_flow_path to point as some of the longest_flow_paths intersected with the corner boundary of the watershed next to it. Some longest flow paths were thus not fully within the subwatershed. See image below where the red line (longest flow path) touches the subwatershed boundary: enter image description here

    gs.run_command('g.mapset',mapset='Watersheds')
    lfps= gs.list_grouped('vect', pattern='lfp_unique_*')['Watersheds']
    ws= gs.list_grouped('vect', pattern='watershed_batch*')['Watersheds']
    files=np.stack((lfps,ws)).T
    #print(files)
    for file in files:
        print(file)
        ids = list(gs.vector.vector_db_select(file[0],columns="id")["values"].values())
        for idx in ids:
            idx=int(idx[0])
            expr = f'id="{idx}"'
            gs.run_command('v.extract',input=file[0], where=expr, output="tmp_lfp",overwrite=True)
            gs.run_command("v.to.points", input="tmp_lfp", output="tmp_lfp_points", use="vertex", overwrite=True)
            gs.run_command('v.select', ainput= file[1], binput = "tmp_lfp_points", output="tmp_subwatersheds", overwrite=True)
            gs.run_command('v.db.update', map = "tmp_subwatersheds",col= "value", value=idx)
            gs.run_command('g.mapset',mapset='vector_out')
            gs.run_command('v.dissolve',input= "tmp_subwatersheds@Watersheds", output="subwatersheds_{}".format(idx),col="value",overwrite=True)
            gs.run_command('g.mapset',mapset='Watersheds')
            gs.run_command("g.remove", flags="f", type="vector",name="tmp_lfp,tmp_subwatersheds")
    

    I ended up with a vector for each subwatershed