opencvcomputer-visionazure-cognitive-servicesmicrosoft-custom-vision

Identify Parts of the car based on arrow markings


Please excuse me for not posting any code, as I don't think I have reached far enough to be relevant for my question.

I am working on a solution that need to identify the parts of a vehicle being pointed by the customer drawing and extract the Text and the part its referring to as shown in an example below. enter image description here

I am really new to ML or AI technologies as a result I was looking at using the Azure customvision.ai which allows me to train the model using a bunch of images and object identification and has a nice REST API's to work with. This is somewhat working as I am able to pass the image and it is able to identify the parts of the cars visible on that image.

However I am unable to understand how to how to identify that 9. BXCU12 is actually pointing to Bonnet.

Can someone please help me by pointing to any example or a suitable solution approach for me to solve this problem.


Solution

  • If I understand correctly, you already can identify parts from your recognition network and also text, and the link between them is given by the arrows in the image that you don't know how to locate. So, the remaining problem here is detecting the arrows and their end-points.

    I can think of two solutions right now:

    1) Use template matching to identify your arrows. The problem in your case though (from your example image) seems to be that your arrow heads have the same scale but have different lenghts. So, I'd suggest just using the head of the arrow + a very short tail as your template. Then you can rotate this small template N times, obtain N templates and use something like what opencv provides in term of template matching.

    2) Train a small convolutional neural network to recognize the arrows. You only want to recognize arrows, so it's rather easy to create a small dataset of rotated arrows of different scales and train the network on them. Note that you should probably be able to add this network as an additional, very shallow head to your recognition network (you'll need to refine jointly though), so the overhead would be minimal.

    Hope that helps.