pythonpython-3.xamazon-web-servicesamazon-s3amazon-transcribe

Get subtitles in aws transcribe job


I am creating a function which gets the transcription output from aws transcribe job.

def get_text(job_name, file_uri):
    job_name = job_name
    file_uri = file_uri
    transcribe_client = boto3.client('transcribe')
    max_tries = 60
    while max_tries > 0:
        max_tries -= 1
        job = transcribe_client.get_transcription_job(TranscriptionJobName=job_name)
        job_status = job['TranscriptionJob']['TranscriptionJobStatus']
        if job_status in ['COMPLETED', 'FAILED']:
            print(f"Job {job_name} is {job_status}.")
            if job_status == 'COMPLETED':
                response = urllib.request.urlopen(job['TranscriptionJob']['Transcript']['TranscriptFileUri'])
                data = json.loads(response.read())
                print(data)
                text = data['results']['transcripts'][0]['transcript']
            break
        else:
            print(f"Waiting for {job_name}. Current status is {job_status}.")
        time.sleep(10)
    return text

now in this I am getting the output perfectly but when I change the line job['TranscriptionJob']['Transcript']['TranscriptFileUri'] to job['TranscriptionJob']['Subtitles']['SubtitleFileUris'], I am getting an error output. enter image description here

what to do in this case.


Solution

  • job['TranscriptionJob']['Subtitles']['SubtitleFileUris'] is a list of URIs, not a single URI. You will need to change you code to something like this

    if job_status == 'COMPLETED':
        for uri in job['TranscriptionJob']['Transcript']['SubtitleFileUris']:
            response = urllib.request.urlopen(uri)
            data = json.loads(response.read())
            print(data)
            text = data['results']['transcripts'][0]['transcript']