pythonjsonbitmaskfaster-rcnncoco

COCO .json file contains strange segmentation values in the annotattions, how to convert these?


I have a COCO format .json file which contains strange values in the annotation section. Most segmentations here are fine, but some contain size and counts in non human-readable format.

When training my model, I run into errors because of the weird segmentation values. I have read somewhere these are in RLE format but I am not sure. I should be able to use bitmask instead of polygon to train my model, but I prefer to handle the root cause and change these segmentations to the normal format. What is their type, can they be converted to normal segmentation format, and if so, how can I do that?

{'id': 20, 'image_id': 87, 'category_id': 2, 'segmentation': [[301, 303, 305, 288, 321, 261, 335, 236, 346, 214, 350, 209, 351, 205, 349, 202, 344, 203, 334, 221, 322, 244, 307, 272, 297, 290, 295, 302, 297, 310, 301, 309]], 'area': 829.5, 'bbox': [295, 202, 56, 108], 'iscrowd': 0}
{'id': 21, 'image_id': 87, 'category_id': 2, 'segmentation': [[292, 300, 288, 278, 287, 270, 283, 260, 280, 249, 276, 240, 273, 234, 270, 233, 268, 233, 266, 236, 268, 240, 272, 244, 274, 253, 276, 259, 277, 265, 280, 272, 281, 284, 285, 299, 288, 306, 291, 306, 292, 304]], 'area': 517.0, 'bbox': [266, 233, 26, 73], 'iscrowd': 0}
{'id': 22, 'image_id': 87, 'category_id': 2, 'segmentation': [[300, 279, 305, 249, 311, 233, 313, 224, 314, 211, 319, 185, 322, 172, 323, 162, 321, 155, 318, 158, 314, 168, 311, 189, 306, 217, 299, 228, 296, 237, 296, 245, 296, 254, 295, 260, 291, 279, 290, 289, 293, 295, 295, 293, 299, 287]], 'area': 1177.0, 'bbox': [290, 155, 33, 140], 'iscrowd': 0}
{'id': 23, 'image_id': 87, 'category_id': 2, 'segmentation': [[311, 308, 311, 299, 314, 292, 315, 286, 315, 282, 311, 282, 307, 284, 303, 294, 301, 303, 302, 308, 306, 307]], 'area': 235.5, 'bbox': [301, 282, 14, 26], 'iscrowd': 0}

#Weird values
{'id': 24, 'image_id': 27, 'category_id': 2, 'segmentation': {'size': [618, 561], 'counts': 'of[56Tc00O2O000001O00000OXjP5'}, 'area': 71, 'bbox': [284, 326, 10, 8], 'iscrowd': 0}
{'id': 25, 'image_id': 27, 'category_id': 1, 'segmentation': {'size': [618, 561], 'counts': 'fga54Pc0<H4L4M2O2M3M2N2N3N1N2N101N101O00000O10000O1000000000000000000000O100O100O2N1O1O2N2N3L4M3MdRU4'}, 'area': 1809, 'bbox': [294, 294, 46, 47], 'iscrowd': 0}

#Normal values again
{'id': 26, 'image_id': 61, 'category_id': 1, 'segmentation': [[285, 274, 285, 269, 281, 262, 276, 259, 271, 256, 266, 255, 257, 261, 251, 267, 251, 271, 250, 280, 251, 286, 254, 292, 258, 296, 261, 296, 265, 294, 272, 291, 277, 287, 280, 283, 283, 278]], 'area': 1024.0, 'bbox': [250, 255, 35, 41], 'iscrowd': 0}
{'id': 27, 'image_id': 61, 'category_id': 2, 'segmentation': [[167, 231, 175, 227, 180, 226, 188, 226, 198, 228, 215, 235, 228, 239, 235, 243, 259, 259, 255, 261, 252, 264, 226, 249, 216, 244, 203, 238, 194, 235, 184, 234, 171, 235, 167, 233]], 'area': 782.5, 'bbox': [167, 226, 92, 38], 'iscrowd': 0}
{'id': 28, 'image_id': 61, 'category_id': 2, 'segmentation': [[279, 186, 281, 188, 281, 192, 280, 195, 278, 200, 274, 210, 271, 218, 267, 228, 266, 233, 266, 236, 265, 239, 264, 256, 261, 257, 257, 259, 255, 244, 256, 240, 256, 238, 257, 234, 259, 227, 264, 216, 267, 205, 271, 195, 274, 190]], 'area': 593.0, 'bbox': [255, 186, 26, 73], 'iscrowd': 0}
{'id': 29, 'image_id': 61, 'category_id': 2, 'segmentation': [[264, 245, 267, 239, 269, 236, 276, 232, 280, 230, 285, 227, 287, 227, 288, 229, 287, 232, 284, 234, 282, 237, 280, 239, 276, 241, 274, 246, 271, 254, 269, 254, 266, 254, 264, 254]], 'area': 264.0, 'bbox': [264, 227, 24, 27], 'iscrowd': 0}

Solution

  • Find here all you need: Interface for manipulating masks stored in RLE format

    "RLE is a simple yet efficient format for storing binary masks. RLE first divides a vector (or vectorized image) into a series of piecewise constant regions and then for each piece simply stores the length of that piece. For example, given M=[0 0 1 1 1 0 1] the RLE counts would be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1] (note that the odd counts are always the numbers of zeros). Instead of storing the counts directly, additional compression is achieved with a variable bitrate representation based on a common scheme called LEB128."


    So, basically you can have the mask annotated as:

    1. A polygon standard coco-json format (x,y,x,y,x,y, etc.),
    2. A binary mask (image png)
    3. An RLE encoded format.

    All three are the same, but you need to convert them in the required format sometimes (in case your DL library doesn't support all of them, or converts them for you).