Question: Given data in the format I describe below, what is the most efficient representation to make efficient queries to which block a point belongs to.
I want to parse the coding blocks of a video frame into some data structure to quickly query to which block some pixel position belongs to. The only information I have is an 2D array where each entry (x,y) tells me the size of the block to which the block (4*x
, 4*y
) with size 4 x 4 belongs to (i.e. the array is the original frame downsampled by factor of 4 x 4). So for each position, I now what is the block to which this block belongs to, but I don't know, where the block starts (the information is extracted from the libaom
library). The final blocks were generated by a video encoder, using recursive splitting of superblocks (blocks with fixed size 64x64 or 128x128). The splitting can be any of the 10 types AV1 specifies:
What I tried, was to decompose the problem into two steps:
For the first one, I thought about doing a DFS and always add the 3 corner points of the current block as the next potential starting positions for some blocks. But this turned out to be wrong, since I could add a small block instead of the correct large one in some cases. A similar thing occurs if I use BFS.
What I then did was to use a priority queue based BFS, where I always select blocks first by their lowest y-coordinate and if they're equal, by the lowest x-coordinate. However, when I do this, I sometimes get partitions which are impossible (the problem in the following image is that the partition inside the marked region produces overlapping regions, since I draw their outline):
When I use the Manhattan distance, everything seems to work fine:
However, I don't understand why this happens and don't want to have a solution that only works on the partitions I manually inspect.
Maybe someone could provide more insight into this?
For the point query, I came up with 2 solutions, which aren't optimal.
In the following let n
be the number of blocks in the frame.
I tried to come up with some tree-based data structure like quadtrees or segment trees but had problems with the non-square splits and non-rectangular regions.
I'd be glad if someone might have any better idea on how to solve this problem!
Here you can find data in JSON format for a sample frame. Every entry in the blockSize array tells what is the block size at position (x,y) is for the 4x4 block starting at (x,y). So (0,0) tells that the block (0-4) x (0-4) belongs to a block with size identifier 6, which according to the blockSizeMap is a block of size 16x16. Therefore, we know that at (0,0) starts a block with this size.
For Problem 2, preprocess blocks of the same type (height, width) into an array of x-coordinates. For each block with coordinates (x, y), insert x into the array at index y.
For the query part, go through each block type (there are only 22 types) and, for each type, consider every y-coordinate that might contain the query point (at most 128 values/the block height). For each valid y-coordinate, perform a binary search to find the largest x-coordinate that is less than or equal to the query's x-coordinate . If it exist, check if the block at that location contains the query point.
This is like O(22 * 128 * log(frameWidth)) and might only be worth it if frameHeight can be very big