I am working with an image of size 512x512
. The image is divided into patches using einops
with patch size of 32
. The number of patches overall is 256
, in other words, we get a new "image" of size 256x1024
.
Since this image is actually a mask for a segmentation problem, the image is actually comprised of only 4 values (4 classes): 0
for the background, 1
for the first class, 2
for the second class, 3
for the third class.
My goal is to take every patch, and compute for every class C
the following:
Number of pixels in this patch / Number of pixels labeled C
.
This should give me an array of size 4 where the first entry is the total number of pixels in the patch (1024) over the number of background pixels (labeled as 0), the second, third and fourth entries are the same but for the corresponding class.
In theory, I know that I need to iterate over every single patch and then count how many pixels of each class exists in the current patch, then divide by 1024
. Doing this 256
yields exactly what I want. The problem is that I have a (very) large amount of images that I need to do this for, and the size of 512
is just an example to make the question simpler, therefore a for loop is out of question.
I know that I can get the result that I want using numpy. I tried both: numpy.apply_over_axes
and numpy.apply_along_axis
but I don't know which one is better suited for this task, also there is numpy.where
which I don't know how it applies here.
Here is what I did:
from einops import rearrange
import numpy as np
labn = np.random.randint(4,size= (512,512)) # Every pixels in this image is of value: 0,1,2,3
to_patch = rearrange(labn, "(h p1) (w p2) -> (h w) (p1 p2)", p1=32, p2=32)
print(to_patch.shape) # (256,1024)
c0 = np.full(1024, 0)
c1 = np.full(1024, 1)
c2 = np.full(1024, 2)
c3 = np.full(1024, 3)
def f(a):
_c0 = a == c0
_c1 = a == c2
_c2 = a == c2
_c3 = a == c3
pr = np.array([np.sum(_c0), np.sum(_c1), np.sum(_c2), np.sum(_c3)]) / 1024
return pr
resf = np.apply_along_axis(f, 1, to_patch)
print(resf.shape) # (256, 4, 1024)
Two things:
256x4
where every array along the second axes sums to one.EDIT: I forgot to add the sum, so now I do get 256x4
.
There is a built-in function to count occurrences called torch.histc
, it is similar to Python's collections.Counter
.
torch.histc(input, bins=100, min=0, max=0, *, out=None) → Tensor
Computes the histogram of a tensor.The elements are sorted into equal width bins between
min
andmax
. Ifmin
andmax
are both zero, the minimum and maximum values of the data are used.Elements lower than
min
and higher thanmax
are ignored.
You need to specify the number of bins
, here the number of classes C
. As well as the min
and max
values for ordering. Also, it won't work with multi-dimensional tensors as such the resulting tensor will contain global statistics of the input tensor regardless of dimensions. As a possible workaround, you can iterate through your patches, calling torch.histc
each time, then stacking the results and normalizing:
resf = torch.stack([torch.histc(patch, C, min=0, max=C-1) for patch in x]) / x.size(1)