machine-learningcomputer-visionyolo

Confused about YOLO Process


In order to get a grasp of the concept, I read this article written by Mauricio Menegaz and watched a video by Deeplearning.ai on YouTube but I got confused at the S x S x (B * 5 +C) part. I know that the S x S stands for the grid size, 5 stands for the components of the bounding box, and C is for the classes. Is B the same as the anchor boxes? If in case I only want to detect one class (e.g. license plate), does that mean there will only be 1 B?

Are bounding boxes created on the image prior to it being fed to the neural networks?


Solution

  • Is B the same as the anchor boxes?

    Yes, It is. B is number of anchors boxes

    If in case I only want to detect one class (e.g. license plate), does that mean there will only be 1 B?

    No, In this case, C = 1. But if you know in advance the width/height ratio of object you need to detect and this ratio does not change much between viewpoint, you may only need anchor box that match this ratio, so in your case of license plate, B can be 1 too

    But if you need to detect cars for example, you may need more anchor boxes because width/height ratio of cars vary much between viewpoint