ios audio safari mediarecorder mediastream

Safari & iOS MediaRecorder give unintelligible stream

I have setup a webserver where I record audio using the MediaRecorder in the front-end, then send it to the backend, the backend transforms it into pcm using ffmpeg. I need this in near real-time, so every 4 seconds, I send a packet containing the current 4 seconds of audio, prepended with the first 4 seconds of audio after starting because that part contains the audio header.

I am concatenating the transformed PCM on the server, and listening to the resulting audio works for audio originating from Chrome, Edge, and Firefox, but it does not for audio from MacBook Safari or for any browser on iOS. Some parts are intelligible, but a lot of parts are not even in the stream. What could be going wrong? Does setting up Safari take more than this?

Client-side code (simplified from real version):

// Map audio mime types to filenames for saving
const audioMimeToFilename: Record<string, string> = {
    "audio/webm": "audio.webm",
    "audio/wav": "audio.wav",
    "audio/ogg": "audio.ogg",
    "audio/mpeg": "audio.mp3",
    "audio/mp4": "audio.m4a",
    "application/octet-stream": "audio.m4a", // or try wav
};

const record = async (microphoneId: string): Promise<MediaRecorder> => {
    const mimeType = Object.keys(audioMimeToFilename)
                           .find((mime) => MediaRecorder.isTypeSupported(m));
    if (mimeType == null) {
        throw new Error("No supported mime type found");
    }

    const interval = 4; // Seconds
    const stream = await navigator.mediaDevices.getUserMedia({ audio: { deviceId: microphoneId }});
    const recorder = new MediaRecorder(stream, {
        mimeType,
        audioBitsPerSecond: 16000
    });
    recorder.ondataavailable = (e) => {
        if (e.data.size > 0) {
            void uploadAudio(e.data, interval); // Uploads the sound to the server using base64
        }
    }
    recorder.start(interval * 1000);
};

The server receives the base64 encoded audio, decodes it, saves it to disk based on the provided mime type in the audio blob and the file in audioMimeToFilename, and uses ffmpeg using the following invocation: ffmpeg -ss 4 -acodec pcm_s16le -f s16le -ac 1 -ar 16000 -i audio.m4a output.pcm, and later on concatenates it with the existing buffer. The logs show that for Safari & iOS devices the audio is received and ffmpeg does not throw any error.

Further information: In the client-side, I am setting up another stream to get the permissions to find out which microphones are available. After the stream closes and the user selects a microphone, the client-side code above runs. Somewhere on the internet I was told that Safari does not like two streams to be opened after each other. I refactored my code to use the same stream and switch the enabled flag on the MediaStreamTracks, but to no avail. What could be going wrong?

Solution

In the end, I ended up using audio-recorder-polyfill. I tried a multitude of options (like not closing a MediaStream, just disabling it), but nothing worked (and this was with all the modern Safari/iOS devices on 03-10-2024).

The great work in audio-recorder-polyfill worked wonders. I use the polyfill for Safari, all iOS devices, and MacOS Safari. I use it as follows:

Code that handles audio stream:

"use client";
import React from "react";
import dynamic from "next/dynamic";
const LegacyAudioRecorder = dynamic(() => import("./LegacyAudioRecorder"), {
    ssr: false,
});

const supportedAudioMimeTypes = ["audio/wav", "audio/webm", "audio/ogg"];

const useAudioStream = (onAudioBlob: (blob: BlobEvent) => void) => {
    const [mediaRecorder, setMediaRecorder] =
        React.useState<MediaRecorder | null>(null);
    const [stream, setStream] = React.useState<MediaStream | null>(null);

    const stop = () => {
        if (mediaRecorder != null) {
            mediaRecorder.stop();
        }
        if (stream != null) {
            stream.getTracks().forEach((track) => track.stop());
        }
    };

    const start = async () => {
        const mimeType = supportedAudioMimeTypes.find((m) =>
            MediaRecorder.isTypeSupported(m),
        );
        if (mimeType == null) {
            throw new Error("No supported mime type found");
        }
        console.log("Using mime type", mimeType);

        const interval = 4; // seconds
        const newStream = await navigator.mediaDevices.getUserMedia();
        setStream(newStream);
        const mediaRecorder = new window.MediaRecorder(newStream, {
            mimeType,
            audioBitsPerSecond: 16000,
        });

        // AudioRecorder does not support mediarecorder.dataavailable = func
        mediaRecorder.addEventListener("dataavailable", onAudioBlob);
        mediaRecorder.start(interval * 1000);
        setMediaRecorder(mediaRecorder);
    };

    const layout = isSupportedDevice() ? "" : <LegacyAudioRecorder />;
    return {
        layout,
        start,
        stop,
    };
};
export default useAudioStream;

const isSupportedDevice = (): boolean => {
    if (typeof window === "undefined") {
        return false;
    }

    const userAgent = window.navigator.userAgent;
    if (userAgent == null) {
        return false;
    }

    return (
        !/MSIE|Trident/.test(userAgent) &&
        !/iPhone|iPad|iPhone/.test(userAgent) &&
        !(/Safari/.test(userAgent) && !/Chrome/.test(userAgent))
    );
};

./LegacyAudioRecorder.tsx

import AudioRecorder from "audio-recorder-polyfill";
// eslint-disable-next-line @typescript-eslint/no-unsafe-member-access
AudioRecorder.prototype.mimeType = "audio/wav";
window.MediaRecorder = AudioRecorder;

const LegacyAudioRecorder: React.FC = () => <></>;
export default LegacyAudioRecorder;