numpynumpy-ndarraygeopandasshapelyrasterio

*Quickly* extract line of pixel values from rastio rasterize() ndarray


enter image description hereaddedProblem: given a bunch of buildings from a geopandas file and a list of lines, calculate the # of lines that don't intersect any buildings / total # of lines. I'm currently investigating using rasterio rasterize() for this.

In the images there are 2 industry areas in red and green and buildings potentially blocking line of sight shown in blue. We are making lines from every metre around the red area perimeter to every metre around the green area, and checking whether each line has line of sight (i.e. the line does not intersect a building). There are more than 100,000 "industry pairs" for each city being investigated.

Question: Is there a method (like rasterio.transform.rowcol()) that if given a list of xs and list of ys (or rows, cols) will return the values of all the pixels making up the line from the raster?

I have selected the features (buildings) from a geopandas file and created the raster using rasterio.features.rasterize().

I will need to do this for millions of lines, so I'm looking for a fast way to do this.

Note: I have seen these similar questions

Extract pixel values form a multispectral (B1-B6) raster image using a shape file mask This looks like each call returns a ndarray the size of the original raster. I tried something like this but using np.any() for these large arrays (3 GB or larger) was slow. (I only need the values of the pixels along the lines.)

How to extract a profile of value from a raster along a given line? From this answer it seems there is not a single (fast) rasterio method that gets the values for the pixels along the line in a single call instead of having to make 1 function call to read each pixel value. Is this Correct?

Update:

Maybe a (cooked) concrete example will help. When I use rastio.rasterize() it returns an ndarray. Let's say it looks like this

>>> a
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

Then if I use raterio rowcol() to convert xy points to rolcol locations I will get something like this

>>> rows
array([7, 8])
>>> cols
array([1, 2])

Note: rows and cols could contain thousands of values.

I am hoping there is a fancy numpy slicing/indexing method that would avoid me writing a loop like this

v = numpy.empty([len(rows)], dtype='uint8')
>>> for i, r in enumerate(rows) :
...     v[i] = a[rows[i], cols[i]]
>>> v
array([71, 82], dtype=uint8)

Solution

  • I am hoping there is a fancy numpy slicing/indexing method that would avoid me writing a loop like this

    You can index into an array using an array of coordinates:

    >>> a[rows, cols]
    array([71, 82])
    

    This is typically 100x faster than a Python-level loop for a large number of iterations.