NCS AsyncInferQueue returns previous results instead of true results for the specific inference

We have been working with the NCS2 for many months now, and have found very bizarre behavior recently. I've included the full script for a minimum reproducible program. Before that, though, here are the install conditions:

Raspberry Pi 4B+, running Raspbian GNU/Linux 11 (bullseye)
python3 --version is Python 3.9.2
openvino build from 2022.1.1

Behavior:

We are running code that takes a batch of n images, processes them asynchronously (we found best performance by running this way), and then returns the batch. See syn below.

We expected 16 different results, but for some reason, we seem to get the results for the image index mod the number of jobs for the async infer queue. For the case of jobs=1 below, the results for all images is the same as the first result (but note: userdata is unique, so the asyncinferqueue is giving the callback a unique value for userdata).

_temp_infer_queue = AsyncInferQueue(compiled_model, jobs=1)
AsyncInferenceResult = namedtuple("AsyncInferenceResult", ["id", "result"])

def syn(input_imgs, sort = False):
    res: List[AsyncInferenceResult] = []

    def _cb(
        infer_request: InferRequest, userdata: Any
    ) -> None:
        res.append(
            AsyncInferenceResult(
                id=userdata, result=infer_request.output_tensors[0].data[:]
                # also tried the following:
                # id=userdata, result=infer_request.get_output_tensor(0).data
            )
        )

    _temp_infer_queue.set_callback(_cb)

    for i, image in enumerate(input_imgs):
        tensor = np.expand_dims(image, (0, 3))
        # if all tensors were the same, their sum would be the same
        # easy way to verify that each image is unique
        print("TENSOR SUM", tensor.sum())
        _temp_infer_queue.start_async({0: tensor}, userdata=i)

    _temp_infer_queue.wait_all()

    for r1 in res:
        print(r1)

    print("---------------------------")
    if sort:
        return [r.result for r in sorted(res, key=op.attrgetter("id"))]
    return res

data = zarr.open("../../../allan/2023-03-03-135043__nomaxnoflowcontrol2.zip")

# yield_n will give n samples from an iterator - in this case,
# it will give [0,1,2,3], then [4,5,6,7], etc
for index_batch in yield_n(range(data.initialized), 4):
    images = [data[:, :, i] for i in index_batch]
    syn(images, sort=True)

Expected result: unique values for the results, since we are running inference on unique images

TENSOR SUM 181712885                                                   
TENSOR SUM 182752565                                                   
TENSOR SUM 182640761                                                   
TENSOR SUM 182361927                                                   
AsyncInferenceResult(id=0, result=array([[3.1972656]], dtype=float32)) 
AsyncInferenceResult(id=1, result=array([[2.3463234]], dtype=float32)) 
AsyncInferenceResult(id=2, result=array([[-1.345323]], dtype=float32)) 
AsyncInferenceResult(id=3, result=array([[3.0023452]], dtype=float32)) 
---------------------------                                            
TENSOR SUM 182579212                                                   
TENSOR SUM 182199813                                                   
TENSOR SUM 180750311                                                   
TENSOR SUM 180896550                                                   
AsyncInferenceResult(id=0, result=array([[1.2942656]], dtype=float32)) 
AsyncInferenceResult(id=1, result=array([[1.3351234]], dtype=float32)) 
AsyncInferenceResult(id=2, result=array([[2.3451223]], dtype=float32)) 
AsyncInferenceResult(id=3, result=array([[0.0345552]], dtype=float32))
---------------------------      
...etc

Actual Result: every result from inference is the same

TENSOR SUM 181712885                                                    
TENSOR SUM 182752565                                                    
TENSOR SUM 182640761                                                    
TENSOR SUM 182361927                                                    
AsyncInferenceResult(id=0, result=array([[3.1972656]], dtype=float32))  
AsyncInferenceResult(id=1, result=array([[3.1972656]], dtype=float32))  
AsyncInferenceResult(id=2, result=array([[3.1972656]], dtype=float32))  
AsyncInferenceResult(id=3, result=array([[3.1972656]], dtype=float32))  
---------------------------                                             
TENSOR SUM 182579212                                                    
TENSOR SUM 182199813                                                    
TENSOR SUM 180750311                                                    
TENSOR SUM 180896550                                                    
AsyncInferenceResult(id=0, result=array([[2.6289062]], dtype=float32))  
AsyncInferenceResult(id=1, result=array([[2.6289062]], dtype=float32))  
AsyncInferenceResult(id=2, result=array([[2.6289062]], dtype=float32))  
AsyncInferenceResult(id=3, result=array([[2.6289062]], dtype=float32))  
---------------------------     
...etc

And when we set the number of jobs for the AsyncInferQueue to 2, the same values are repeated (mod the number of jobs)

TENSOR SUM 181508284                                                    
TENSOR SUM 182244105                                                    
TENSOR SUM 181800558                                                    
TENSOR SUM 182178069                                                    
AsyncInferenceResult(id=0, result=array([[4.4921875]], dtype=float32))  
AsyncInferenceResult(id=1, result=array([[3.3867188]], dtype=float32))  
AsyncInferenceResult(id=2, result=array([[4.4921875]], dtype=float32))  
AsyncInferenceResult(id=3, result=array([[3.3867188]], dtype=float32))  
---------------------------                                             
TENSOR SUM 181820857                                                    
TENSOR SUM 181130636                                                    
TENSOR SUM 181852573                                                    
TENSOR SUM 181331641                                                    
AsyncInferenceResult(id=0, result=array([[2.3867188]], dtype=float32))  
AsyncInferenceResult(id=1, result=array([[2.9765625]], dtype=float32))  
AsyncInferenceResult(id=2, result=array([[2.3867188]], dtype=float32))  
AsyncInferenceResult(id=3, result=array([[2.9765625]], dtype=float32))  
---------------------------                                
...etc

So what is going on? Am I doing something wrong? I tried to follow the docs as well as possible (though this isn't even easy, the docs can be a little sparse, and searching for them gives old versions of openvino, e.t.c.). And if I am doing something wrong here, this seems like an easy trap to fall into? Shouldn't there be a loud failure somewhere?

We have been working with the NCS2 for many months now, so we hope it is an easy fix.

Let me know what needs clarification. I am really hoping for some help here!

Thank you in advance! :)

Solution

The issue is originating from this portion of code from your Python demo script:

def _cb(
    infer_request: InferRequest, userdata: Any
) -> None:
    res.append(
        AsyncInferenceResult(
            id=userdata, result=infer_request.output_tensors[0].data[:]
            # also tried the following:
            # id=userdata, result=infer_request.get_output_tensor(0).data
        )
    )

Your result list is reading the final value for the entire output tensors instead of from each individual output tensors.

Edit:

The correct way would be to use next(iter(infer_request.results.values())) instead of infer_request.output_tensors[0].data[:] in order to append the results into your list as it is a tried and tested method based on our Image Classification Async Python Sample.

Here is the result when using next(iter(infer_request.results.values())) :