object-detectiontensorflow-datasetsyolovideo-trackingfaster-rcnn

Multiple Object Tracking (MOT) benchmark data-set format for ground truth tracking


I am trying to evaluate the performance of my object detection+tracking on the standard dataset used in the industry in the 2DMOT Challenge 2015. I have downloaded the dataset but I am unable to understand the data fields in the labelled ground truth data.

I have understood the first six columns of the dataset but unable to do so for the rest four columns. Following is the sample data from the directory <\2DMOT2015\train\ETH-Bahnhof\gt>:

frame no.   object_id   bb_left   bb_top   bb_width   bb_height   (?)   (?)       (?)      (?)
1           1           212       204      20         57          0     -3.1784   16.34    0.45739
1           2           223       181      36         104         1     -1.407    9.0212   0.68774

Please let me know if you are aware of this?


Solution

  • The last three fields represent the 3D real-world coordinates of the objects. A similar data structure can be found in videos of ETH-Bahnhof, ETH-Sunnyday, PETS09-S2L1 and TUD-Stadtmitte in 2DMOT2015. For ground-truth, score=1. But sometimes it varies b/w 0-1, then it acts as a flag value and zeroes mean that the line is not to be considered for evaluation. So the data fields are in the format:

    frame no. , object_id , bb_left , bb_top , bb_width , bb_height , score, X, Y, Z