react-nativeexpomp3text-to-speechexpo-av

Play audio response from OpenAI TTS API in React Native with Expo


I've recently started to develop a React Native App with Expo and a couple of days ago I ran into a problem which is now beginning to make me crazy.

I want to send a text string to OpenAI TTS API and play the audio response in my app immediately. I don't want to save the file locally. Link to API: https://platform.openai.com/docs/guides/text-to-speech

So far I have successfully made the HTTP request and it looks like I get the correct response from the API. When I output response.blob() I get:

{"_data": {"__collector": {}, "blobId": "5C362F63-71CB-45EC-913E-9A808AF2194F", "name": "speech.mp3", "offset": 0, "size": 143520, "type": "audio/mpeg"}}

The issue seems to be to play the sound with expo-av. I have been Googling for days now, and no matter which solution I try, I get some error when trying to play the sound. The current error with the current code is:

Error: The AVPlayerItem instance has failed with the error code -1002 and domain "NSURLErrorDomain".

I would really appreciate if anyone could help me (please write code examples).

I also read this post where the solution was to switch to Google's TTS API, but that is not the solution I want: https://www.reddit.com/r/reactnative/comments/13pa9wx/playing_blob_audio_using_expo_audio_react_native/

This is my code:

const convertTextToSpeech = async (textToConvert) => {

    const apiKey = 'myKey';
    const url = 'https://api.openai.com/v1/audio/speech';

    const requestOptions = {
        method: 'POST',
        headers: {
            'Authorization': `Bearer ${apiKey}`,
            'Content-Type': 'application/json',
        },
        body: JSON.stringify({
            model: 'tts-1',
            input: textToConvert,
            voice: 'alloy',
            language: 'da',
        })
    };

    await fetch(url, requestOptions)
        .then(response => {
            if (!response.ok) {
                throw new Error('Network response was not ok');
            }
            return response.blob();
        })
        .then(blob => {
            playSound(blob);
        })
        .catch(error => {
            console.error('There was a problem with the request:', error);
        });
};

async function playSound(blob) {
    const url = URL.createObjectURL(blob);
    const { sound } = await Audio.Sound.createAsync({ uri: url });
    await sound.playAsync();
}

Solution

  • Okay after some serious attempts trying to figure this out, this is my solution, I hope it serves you and others well. Works for me for now (I am still testing functionality so it might run into issues down the road but fingers crossed).

    Also I haven't handled the sound instance much more than just playing it to make sure it works, so expecting more work there probably..

    Client

        const toBuffer = async (blob) => {
          const uri = await toDataURI(blob);
          const base64 = uri.replace(/^.*,/g, "");
          return Buffer.from(base64, "base64");
        };
    
        const toDataURI = (blob) =>
          new Promise((resolve) => {
            const reader = new FileReader();
            reader.readAsDataURL(blob);
            reader.onloadend = () => {
              const uri = reader.result?.toString();
              resolve(uri);
            };
          });
    
        const constructTempFilePath = async (buffer) => {
          const tempFilePath = FileSystem.cacheDirectory + "speech.mp3";
          await FileSystem.writeAsStringAsync(
            tempFilePath,
            buffer.toString("base64"),
            {
              encoding: FileSystem.EncodingType.Base64,
            }
          );
    
          return tempFilePath;
        };
    

    and

    
        const blob = await response.blob();
        const buffer = await toBuffer(blob);
        const tempFilePath = await constructTempFilePath(buffer);
        const { sound } = await Audio.Sound.createAsync({ uri: tempFilePath });
        await sound.playAsync();
    

    Server

        const mp3 = await openai.audio.speech.create({
          model: "tts-1",
          voice: "alloy",
          input: "Hello World",
        });
    
        const mp3Stream = new PassThrough();
        mp3Stream.end(Buffer.from(await mp3.arrayBuffer()));
        res.setHeader("Content-Type", "audio/mpeg");
        res.setHeader("Content-Disposition", 'attachment; filename="audio.mp3"');
        mp3Stream.pipe(res);