My setup is the following:
React-native app client -> AWS API Gateway -> AWS Lambda function -> AWS S3 -> AWS Transcribe -> AWS S3
I am successfully able to upload an audio file to an S3 bucket from the lambda, start the transcription and even access it manually in the S3 bucket. However when I try to access the json file with the transcription data using TranscriptFileUri I am getting 403 response.
On the s3 bucket with the transcriptions I have the following CORS configuration:
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"GET",
"PUT",
],
"AllowedOrigins": [
"*"
],
"ExposeHeaders": [
"ETag"
]
}
]
My lambda function code looks like this:
response = client.start_transcription_job(
TranscriptionJobName=jobName,
LanguageCode='en-US',
MediaFormat='mp4',
Media={
'MediaFileUri': s3Path
},
OutputBucketName = 'my-transcription-bucket',
OutputKey = str(user_id) + '/'
)
while True:
result = client.get_transcription_job(TranscriptionJobName=jobName)
if result['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
break
time.sleep(5)
if result['TranscriptionJob']['TranscriptionJobStatus'] == "COMPLETED":
data = result['TranscriptionJob']['Transcript']['TranscriptFileUri']
data = requests.get(data)
print(data)
In Cloudwatch I get the following: <Response [403]>
when printing the response.
As far as I can tell, your code is invoking requests.get(data)
where data is the TranscriptFileUri. What does that URI look like? Is it signed? If not, as I suspect, then you cannot use requests to get the file from S3 (it would have to be a signed URL or a public object for this to work).
You should use an authenticated mechanism such as get_object.