I am training a yolov5 model to classify images of 4 different parts of a car(chassis, front spoiler, hubcap and wheel), but its guesses are quite wrong and it can't differentiate a chassis from a frontspoiler and a wheel from a hubcap respectively. This is true for 100, as well as 1000 epochs of training. Can anyone tell me what could be going wrong?
Without knowing too much about the data volumes you have trained you model on and only based on the visual results, I am quite confident that it is not enough. Yolov5 suggests 10000 instances of each class for good robust results. Increase the amount of data you train your model on.
The low confidences reflects how uncertain your model is at identifying each object. None of your detected objects has a strong confidence score.