google-cloud-platformvideo-intelligence-api

Google cloud video intelligence can't annotate multiple features


I've been using Google Cloud Video Intelligence for text detection. Now, I want to use it for speech transcription so I added SPEECH_TRANSCRIPTION feature to TEXT_DETECTION but the response only contains result for one feature, the last one.

const gcsUri = 'gs://path-to-the-video-on-gcs'
const request = {
  inputUri: gcsUri,
  features: ['TEXT_DETECTION', 'SPEECH_TRANSCRIPTION'],
};

// Detects text in a video
const [operation] = await video.annotateVideo(request);
const [operationResult] = await operation.promise();

const annotationResult = operationResult.annotationResults[0]
const textAnnotations  = annotationResult.textAnnotations
const speechTranscriptions  = annotationResult.speechTranscriptions

console.log(textAnnotations) // --> []
console.log(speechTranscriptions) // --> [{...}]

Is this a case where annotation is performed on only one feature at a time?


Solution

  • Annotation will be performed for both features. Below is an example code.

    const videoIntelligence = require('@google-cloud/video-intelligence');
    const client = new videoIntelligence.VideoIntelligenceServiceClient();
    const gcsUri = 'gs://cloud-samples-data/video/JaneGoodall.mp4';
    
    
    async function analyzeVideoTranscript() {
    const videoContext = {
     speechTranscriptionConfig: {
       languageCode: 'en-US',
       enableAutomaticPunctuation: true,
     },
    };
    
    
    const request = {
     inputUri: gcsUri,
     features: ['TEXT_DETECTION','SPEECH_TRANSCRIPTION'],
     videoContext: videoContext,
    };
    
    
    const [operation] = await client.annotateVideo(request);
    const results = await operation.promise();
    console.log('Waiting for operation to complete...');
    // Gets annotations for video
    console.log('Result------------------->');
    console.log(results[0].annotationResults);
    
    
    var i=1;
    results[0].annotationResults.forEach(annotationResult=> {
       console.log("annotation result no: "+i+" =======================>")
       console.log("Speech : "+annotationResult.speechTranscriptions);       
       console.log("Text: "+annotationResult.textAnnotations);
       i++;
    });
    
    
    }
    analyzeVideoTranscript();
    

    N.B: What I have found is that annotationResult may not return the result in the same order of the declared features . You may want to change the code accordingly as per your need.

    Edit:

    You can check how many results you are getting by printing the results.annotationResults.length . You should have two annotation results for the individual features. All you need to do is to traverse the response.

    Here is the output of the above code:

    enter image description here

    Output got converted to string as I have printed the result in the same line.