We have been working with the NCS2 for many months now, and have found very bizarre behavior recently. I've included the full script for a minimum reproducible program. Before that, though, here are the install conditions:
Raspberry Pi 4B+, running Raspbian GNU/Linux 11 (bullseye)
python3 --version
is Python 3.9.2
openvino build from 2022.1.1
Behavior:
We are running code that takes a batch of n images, processes them asynchronously (we found best performance by running this way), and then returns the batch. See syn
below.
We expected 16 different results, but for some reason, we seem to get the results for the image index mod the number of jobs for the async infer queue. For the case of jobs=1
below, the results for all images is the same as the first result (but note: userdata is unique, so the asyncinferqueue is giving the callback a unique value for userdata).
_temp_infer_queue = AsyncInferQueue(compiled_model, jobs=1)
AsyncInferenceResult = namedtuple("AsyncInferenceResult", ["id", "result"])
def syn(input_imgs, sort = False):
res: List[AsyncInferenceResult] = []
def _cb(
infer_request: InferRequest, userdata: Any
) -> None:
res.append(
AsyncInferenceResult(
id=userdata, result=infer_request.output_tensors[0].data[:]
# also tried the following:
# id=userdata, result=infer_request.get_output_tensor(0).data
)
)
_temp_infer_queue.set_callback(_cb)
for i, image in enumerate(input_imgs):
tensor = np.expand_dims(image, (0, 3))
# if all tensors were the same, their sum would be the same
# easy way to verify that each image is unique
print("TENSOR SUM", tensor.sum())
_temp_infer_queue.start_async({0: tensor}, userdata=i)
_temp_infer_queue.wait_all()
for r1 in res:
print(r1)
print("---------------------------")
if sort:
return [r.result for r in sorted(res, key=op.attrgetter("id"))]
return res
data = zarr.open("../../../allan/2023-03-03-135043__nomaxnoflowcontrol2.zip")
# yield_n will give n samples from an iterator - in this case,
# it will give [0,1,2,3], then [4,5,6,7], etc
for index_batch in yield_n(range(data.initialized), 4):
images = [data[:, :, i] for i in index_batch]
syn(images, sort=True)
Expected result: unique values for the results, since we are running inference on unique images
TENSOR SUM 181712885
TENSOR SUM 182752565
TENSOR SUM 182640761
TENSOR SUM 182361927
AsyncInferenceResult(id=0, result=array([[3.1972656]], dtype=float32))
AsyncInferenceResult(id=1, result=array([[2.3463234]], dtype=float32))
AsyncInferenceResult(id=2, result=array([[-1.345323]], dtype=float32))
AsyncInferenceResult(id=3, result=array([[3.0023452]], dtype=float32))
---------------------------
TENSOR SUM 182579212
TENSOR SUM 182199813
TENSOR SUM 180750311
TENSOR SUM 180896550
AsyncInferenceResult(id=0, result=array([[1.2942656]], dtype=float32))
AsyncInferenceResult(id=1, result=array([[1.3351234]], dtype=float32))
AsyncInferenceResult(id=2, result=array([[2.3451223]], dtype=float32))
AsyncInferenceResult(id=3, result=array([[0.0345552]], dtype=float32))
---------------------------
...etc
Actual Result: every result from inference is the same
TENSOR SUM 181712885
TENSOR SUM 182752565
TENSOR SUM 182640761
TENSOR SUM 182361927
AsyncInferenceResult(id=0, result=array([[3.1972656]], dtype=float32))
AsyncInferenceResult(id=1, result=array([[3.1972656]], dtype=float32))
AsyncInferenceResult(id=2, result=array([[3.1972656]], dtype=float32))
AsyncInferenceResult(id=3, result=array([[3.1972656]], dtype=float32))
---------------------------
TENSOR SUM 182579212
TENSOR SUM 182199813
TENSOR SUM 180750311
TENSOR SUM 180896550
AsyncInferenceResult(id=0, result=array([[2.6289062]], dtype=float32))
AsyncInferenceResult(id=1, result=array([[2.6289062]], dtype=float32))
AsyncInferenceResult(id=2, result=array([[2.6289062]], dtype=float32))
AsyncInferenceResult(id=3, result=array([[2.6289062]], dtype=float32))
---------------------------
...etc
And when we set the number of jobs for the AsyncInferQueue to 2, the same values are repeated (mod the number of jobs)
TENSOR SUM 181508284
TENSOR SUM 182244105
TENSOR SUM 181800558
TENSOR SUM 182178069
AsyncInferenceResult(id=0, result=array([[4.4921875]], dtype=float32))
AsyncInferenceResult(id=1, result=array([[3.3867188]], dtype=float32))
AsyncInferenceResult(id=2, result=array([[4.4921875]], dtype=float32))
AsyncInferenceResult(id=3, result=array([[3.3867188]], dtype=float32))
---------------------------
TENSOR SUM 181820857
TENSOR SUM 181130636
TENSOR SUM 181852573
TENSOR SUM 181331641
AsyncInferenceResult(id=0, result=array([[2.3867188]], dtype=float32))
AsyncInferenceResult(id=1, result=array([[2.9765625]], dtype=float32))
AsyncInferenceResult(id=2, result=array([[2.3867188]], dtype=float32))
AsyncInferenceResult(id=3, result=array([[2.9765625]], dtype=float32))
---------------------------
...etc
So what is going on? Am I doing something wrong? I tried to follow the docs as well as possible (though this isn't even easy, the docs can be a little sparse, and searching for them gives old versions of openvino, e.t.c.). And if I am doing something wrong here, this seems like an easy trap to fall into? Shouldn't there be a loud failure somewhere?
We have been working with the NCS2 for many months now, so we hope it is an easy fix.
Let me know what needs clarification. I am really hoping for some help here!
Thank you in advance! :)
The issue is originating from this portion of code from your Python demo script:
def _cb(
infer_request: InferRequest, userdata: Any
) -> None:
res.append(
AsyncInferenceResult(
id=userdata, result=infer_request.output_tensors[0].data[:]
# also tried the following:
# id=userdata, result=infer_request.get_output_tensor(0).data
)
)
Your result list is reading the final value for the entire output tensors instead of from each individual output tensors.
Edit:
The correct way would be to use next(iter(infer_request.results.values()))
instead of infer_request.output_tensors[0].data[:]
in order to append the results into your list as it is a tried and tested method based on our Image Classification Async Python Sample.
Here is the result when using next(iter(infer_request.results.values())) :