I am trying to compute a confusion matrix for my object detection model. However, I seem to stumble across some pitfalls. My current approach is to compare each predicted box with each ground truth box. If they have an IoU > some threshold, I insert the predictions into the confusion matrix. After the insertion, I delete the element in the predictions list and move on to the next element.
Because I also want the misclassified proposals to be inserted in the confusion matrix, I treat the elements with IoU lower than the threshold as confusion with the background. My current implementation looks like this:
def insert_into_conf_m(true_labels, predicted_labels, true_boxes, predicted_boxes):
matched_gts = []
for i in range(len(true_labels)):
j = 0
while len(predicted_labels) != 0:
if j >= len(predicted_boxes):
break
if bb_intersection_over_union(true_boxes[i], predicted_boxes[j]) >= 0.7:
conf_m[true_labels[i]][predicted_labels[j]] += 1
del predicted_boxes[j]
del predicted_labels[j]
else:
j += 1
matched_gts.append(true_labels[i])
if len(predicted_labels) == 0:
break
# if there are ground-truth boxes that are not matched by any proposal
# they are treated as if the model classified them as background
if len(true_labels) > len(matched_gts):
true_labels = [i for i in true_labels if not i in matched_gts or matched_gts.remove(i)]
for i in range(len(true_labels)):
conf_m[true_labels[i]][0] += 1
# all detections that have no IoU with any groundtruth box are treated
# as if the groundtruth label for this region was Background (0)
if len(predicted_labels) != 0:
for j in range(len(predicted_labels)):
conf_m[0][predicted_labels[j]] += 1
The row-normalized matrix looks like this:
[0.0, 0.36, 0.34, 0.30]
[0.0, 0.29, 0.30, 0.41]
[0.0, 0.20, 0.47, 0.33]
[0.0, 0.23, 0.19, 0.58]
Is there a better way to generate the confusion matrix for an object detection system? Or any other metric that is more suitable?
Here is a script to compute the confusion matrix from the detections.record file generated by the TensorFlow Object Detection API. Here is the article explaining how this script works.
In summary, here is the outline of the algorithm from the article:
For each detection record, the algorithm extracts from the input file the ground-truth boxes and classes, along with the detected boxes, classes, and scores.
Only detections with a score greater or equal than 0.5 are considered. Anything that’s under this value is discarded.
For each ground-truth box, the algorithm generates the IoU (Intersection over Union) with every detected box. A match is found if both boxes have an IoU greater or equal than 0.5.
The list of matches is pruned to remove duplicates (ground-truth boxes that match with more than one detection box or vice versa). If there are duplicates, the best match (greater IoU) is always selected.
The confusion matrix is updated to reflect the resulting matches between ground-truth and detections.
Objects that are part of the ground-truth but weren’t detected are counted in the last column of the matrix (in the row corresponding to the ground-truth class). Objects that were detected but aren’t part of the confusion matrix are counted in the last row of the matrix (in the column corresponding to the detected class).
You can also take a look at the script for more information.