google-cloud-platformspeech-to-textgoogle-cloud-speechgoogle-speech-to-text-api

How do serialize and deserialize a `longRunningRecognize` operation to get its result later?


I'm using firebase cloud functions to transcribe user-uploaded audio files with the example code for longRunningRecognize:

// Detects speech in the audio file. This creates a recognition job that you
// can wait for now, or get its result later.
const [operation] = await client.longRunningRecognize(request);

// Get a Promise representation of the final result of the job
const [response] = await operation.promise();

This code works fine for short audio files that can be transcribed faster than the 9-minute firebase cloud function maximum execution limit, but 1) many of my ~hour-long user-uploaded files don't get transcribed that quickly, and 2) it seems wasteful to have a cloud function getting billed for each tenth of a second it's running just sitting around waiting for an API response.

I think the obvious fix here would be for Google's Speech-to-Text API to support webhooks.

Until that happens, how can I serialize and deserialize the SpeechClient operation so I can get the result of this transcription job later from a scheduled function?

Specifically, I'm looking for something that would work like the made-up SERIALIZE and DESERIALIZE functions in this example:

// start speech recognition job:
const [operation] = await client.longRunningRecognize(request);
const serializedOperation = operation.SERIALIZE();
db.doc("jobs/job1").set(serializedOperation);

// get the result later in a scheduled function:
const snap = await db.doc("jobs/job1").get();
const serializedOperation = snap.data();
const operation = DESERIALIZE(serializedOperation);
const [response] = await operation.promise();

Solution

  • Thank you Brendan for the pointer to GetOperation—that was the linchpin I needed to figure this out.

    Serializing an operation is trivially easy: just call operation.name and you'll get the operation's unique ID.

    Deserializing an operationName with the @google-cloud/speech node library was SO FRUSTRATINGLY DIFFICULT to figure out how to do but I finally figured it out.

    To check on the status of an Operation and get its result from an operation.name, use client.checkLongRunningRecognizeProgress(operation.name) like this:

    const operation = await client.checkLongRunningRecognizeProgress(operationName);
    if(operation.done) {
      console.log(JSON.stringify(operation.result));
    } else {
      const {progressPercent, startTime, lastUpdateTime} = op.metadata;
    }