
Google Cloud Video Intelligence Annotate Video JSON vs example code

Google Cloud Video Intelligence provides the following code for parsing annotation results with object tracking:

features = [videointelligence.Feature.OBJECT_TRACKING]
context = videointelligence.VideoContext(segments=None)
request = videointelligence.AnnotateVideoRequest(input_uri=gs_video_path, features=features, video_context=context, output_uri=output_uri)

operation = video_client.annotate_video(request)
result = operation.result(timeout=3600)
object_annotations = result.annotation_results[0].object_annotations

for object_annotation in object_annotations:
    print('Entity description: {}'.format(object_annotation.entity.description))
    print('Segment: {}s to {}s'.format(

    print('Confidence: {}'.format(object_annotation.confidence))

    # Here we print only the bounding box of the first frame_annotation in the segment
    frame_annotation = object_annotation.frames[0]
    box = frame_annotation.normalized_bounding_box
    timestamp = frame_annotation.time_offset.total_seconds()
    timestamp_end = object_annotation.segment.end_time_offset.total_seconds()

    print('Time offset of the first frame_annotation: {}s'.format(timestamp))
    print('Bounding box position:')
    print('\tleft  : {}'.format(box.left))
    print('\ttop   : {}'.format(
    print('\tright : {}'.format(box.right))
    print('\tbottom: {}'.format(box.bottom))

However, I want to parse the json file that is generated via output_uri. The format of the json file is as following :

  "annotation_results": [ {
    "input_uri": "/",
    "segment": {
      "start_time_offset": {
      "end_time_offset": {
        "seconds": 22,
        "nanos": 966666000
    "object_annotations": [ {
      "entity": {
        "entity_id": "/m/01yrx",
        "description": "cat",
        "language_code": "en-US"
      "confidence": 0.91939145,
      "frames": [ {
        "normalized_bounding_box": {
          "left": 0.17845993,
          "top": 0.44048917,
          "right": 0.5315634,
          "bottom": 0.7752136
        "time_offset": {
      }, {

How can I use the example code to parse the JSON that is provided with output_uri ? What kind of conversion is needed for this ?


  • Using the file from output_uri, you can parse the json using this code. I saved the file as response.json locally and will use this for parsing.

    This is similar with your code above where it parses data at the 1st frame_annotation. But this code lacks conversion of time offsets since the function used to convert is from a time object.

    I commented start_end_offset and end_time_offset since it has 2 keys, seconds and nano. It's up to you which one would you like to use, just uncomment the lines and adjust accordingly.

    import json
    f = open('response.json', "r")
    data = json.loads(
    for results in data["annotation_results"]:
        for obj_ann in results["object_annotations"]:
            #start_time_offset = obj_ann["segment"]["start_time_offset"]["seconds"]
            #end_time_offset = obj_ann["segment"]["end_time_offset"]["seconds"]
            frame_annotation = obj_ann["frames"][0]
            entity = obj_ann["entity"]["description"]
            confidence = obj_ann["confidence"]
            box = frame_annotation["normalized_bounding_box"]
            time_offset = frame_annotation["time_offset"] #apparently this also has 2 keys. Look out for the other key which is `seconds`
            print('Entity description: {}'.format(entity))
            #print('Segment: {}s to {}s'.format(start_time_offset,end_time_offset))
            print('Confidence: {}'.format(confidence))
            #You can modify the code here if you encounter the `second` key
            if 'nanos' not in time_offset: 
                print('No time offset in frame')
                print('Bounding box position:')
                print('\tleft  : {}'.format(str(box["left"])))
                print('\tleft  : {}'.format(str(box["top"])))
                print('\tleft  : {}'.format(str(box["right"])))
                print('\tleft  : {}'.format(str(box["bottom"])))
                print('Time offset of the first frame_annotation: {}'.format(time_offset["nanos"]))
                print('Bounding box position:')
                print('\tleft  : {}'.format(str(box["left"])))
                print('\tleft  : {}'.format(str(box["top"])))
                print('\tleft  : {}'.format(str(box["right"])))
                print('\tleft  : {}'.format(str(box["bottom"])))

    For testing I used gs://cloud-samples-data/video/cat.mp4 and used its response: enter image description here