I am using mahotas library to do texture analysis (GLCM) on a satellite image (250 x 200 pixels). GLCM calculations are carried out within a window size. So, for two adjacent positions of the sliding window we need to compute two co-occurrence matrices from scratch. I have read that I can set step size as well in order to avoid computing GLCM on overlapping areas. I have provided the code below.
#Compute haralick features
def haralick_feature(image):
haralick = mahotas.features.haralick(image, True)
return haralick
img = 'SAR_image.tif'
win_s=32 #window size
step=32 #step size
rows = img.shape[0]
cols = img.shape[1]
array = np.zeros((rows,cols), dtype= object)
harList = []
for i in range(0, rows-win_s-1, step):
print 'Row number: ', r
for j in range(0, cols-win_s-1, step):
harList.append(haralick_feature(image))
harImages = np.array(harList)
harImages_mean = harImages.mean(axis=1)
For the code above, I have set the window and step size to 32. When the code finishes, I get an image with dimensions 6 x 8 (instead of 250 x 200), it makes sense as the step size has been set to 32.
So, my question is: By setting a step size (to avoid computation in overlapping regions as well as the code gets faster) can I somehow derive the GLCM results for the whole image with dimensions 250 x 200 instead of having a subset of it (6 x 8 dimensions)? Or I have no option but looping over the image with the normal way (without setting a step size)?
You cannot do that using mahotas
as the function which computes the co-occurrence map is not available in this library. An alternative approach to the extraction of texture features from the GLCM would consist in utilizing skimage.feature.graycoprops
(check out this thread for details).
But if you wish to stick to mahotas, you should try using skimage.util.view_as_windows
rather than a sliding window as it could speed up the scanning across the image. Please, be sure to read the caveat about memory usage at the end of the documentation. If using view_as_windows
is an affordable approach for you, the following code gets the job done:
import numpy as np
from skimage import io, util
import mahotas.features.texture as mht
def haralick_features(img, win, d):
win_sz = 2*win + 1
window_shape = (win_sz, win_sz)
arr = np.pad(img, win, mode='reflect')
windows = util.view_as_windows(arr, window_shape)
Nd = len(d)
feats = np.zeros(shape=windows.shape[:2] + (Nd, 4, 13), dtype=np.float64)
for m in xrange(windows.shape[0]):
for n in xrange(windows.shape[1]):
for i, di in enumerate(d):
w = windows[m, n, :, :]
feats[m, n, i, :, :] = mht.haralick(w, distance=di)
return feats.reshape(feats.shape[:2] + (-1,))
DEMO
For the sample run below I have set win
to 19
which corresponds to a window of shape (39, 39)
. I have considered two different distances. Notice that mht.haralick
yields 13 GLCM features for four directions. Altogether this results in a 104-dimensional feature vector for each pixel. When applied to a (250, 200)
pixels crop from a Landsat image, feature extraction finishes in around 7 minutes.
In [171]: img = io.imread('landsat_crop.tif')
In [172]: img.shape
Out[172]: (250L, 200L)
In [173]: win = 19
In [174]: d = (1, 2)
In [175]: %time feature_map = haralick_features(img, win, d)
Wall time: 7min 4s
In [176]: feature_map.shape
Out[176]: (250L, 200L, 104L)
In [177]: feature_map[0, 0, :]
Out[177]:
array([ 8.19278030e-03, 1.30863698e+01, 7.64234582e-01, ...,
3.59561817e+00, -1.35383606e-01, 8.32570045e-01])
In [178]: io.imshow(img)
Out[178]: <matplotlib.image.AxesImage at 0xc5a9b38>