I'd like to play an audio buffer from AWS polly into a discord voice channel with nextcord but I'm having some issues.
Here's how I'm getting audio from AWS Polly API using boto3
response = polly_client.synthesize_speech(VoiceId="Brian",
OutputFormat='pcm',
SampleRate="16000",
Text=message,
Engine='standard')
This is what the response object from AWS looks like
{'ResponseMetadata': {'RequestId': 'e9f897ce-78f8-4410-bbce-af5daa600850', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'e9f897ce-78f8-4410-bbce-af5daa600850', 'x-amzn-requestcharacters': '5', 'content-type': 'audio/pcm', 'transfer-encoding': 'chunked', 'date': 'Sat, 19 Aug 2023 20:34:11 GMT'}, 'RetryAttempts': 0}, 'ContentType': 'audio/pcm', 'RequestCharacters': 5, 'AudioStream': <botocore.response.StreamingBody object at 0x0000017754E9A7A0>}
The audio data in AudioStream
is stored in an boto3 StreamingBody. Which from my understanding you can call .read()
to read it into a buffer. Here's what that code looks like:
buff = io.BytesIO(response['AudioStream'].read())
Here is what the code looks like for playing audio into the discord channel using nextcord:
vc.play(nextcord.FFmpegPCMAudio(source=buff, pipe=True))
I realized that the PCM returned from polly is a 16 bit single channel, mono pcm audio buffer. So I can get the audio to play in discord but its super high pitched and you can't understand what the file is playing. This is because nextcord expects the PCM audio to be 16 bit dual channel stereo. So because of this I don't think PCM will work unless there is a way to convert mono PCM to stereo.
Instead I have been using mp3
as the OutputFormat
. I've tried a bunch of buffer combinations but I can't seem to get the mp3 audio buffer to play in discord using FFmpegPCMAudio
I've tried reading into a BytesIO
object
I've tried not reading into any object and just storing the raw bytes as a variable
I've tried reading the bytes into a io.BufferedIOBase
object
When using an mp3, no errors are thrown but no audio is played in discord. What am I missing?
The more painful solution I could do is to write the audio I get from Polly to a file on disk but then I have to write a bunch of file I/O cleanup code. I'd prefer to just keep in memory since these audio snippets are really small in size
Ended up solving this right after I posted. If anyone finds this from google I solved it with this github issue https://github.com/Rapptz/discord.py/issues/5192 and used the custom class that was created in that issue