android audio-recording android-mediacodec audiorecord mediamuxer

Android MediaMuxer crash due to supposed out of order frames

I want to record audio track from microphone (and from bluetooth headset microphone too, down the road) and convert it to MPEG4 AAC format. As required by specification of communication with backend in my project, the audio has to be split into short (0.5 - 2 second long) chunks. As an simplified example, I'm just saving these chunks as files in cache in provided code (without sending them to the backend).

To achieve this, I'm recording audio with AudioRecord in PCM-16 format, then convert it to AAC with MediaCodec, and finally save it as MPEG4 file with MediaMuxer.

Example code (based on this example):

private const val TAG = "RECORDING"

private const val AUDIO_CHUNK_LENGTH_MS = 500L

private const val RECORDER_SAMPLERATE = 44100
private const val RECORDER_CHANNELS = AudioFormat.CHANNEL_IN_MONO
private const val RECORDER_AUDIO_ENCODING = AudioFormat.ENCODING_PCM_16BIT
private val BUFFER_SIZE = AudioRecord.getMinBufferSize(RECORDER_SAMPLERATE, RECORDER_CHANNELS, RECORDER_AUDIO_ENCODING)

class RecordingUtil2(
    val context: Context,
    private val dispatcher: CoroutineDispatcher = Dispatchers.Default
) {

    lateinit var audioRecord: AudioRecord
    lateinit var encoder: MediaCodec
    lateinit var mediaMuxer: MediaMuxer
    var trackId: Int = 0

    var chunkCuttingJob: Job? = null
    var recordingJob: Job? = null

    var audioStartTimeNs: Long = 0

    var currentFile: File? = null

    var chunkEnd = false

    private fun prepareRecorder() {
        audioRecord = AudioRecord(
            MediaRecorder.AudioSource.MIC, RECORDER_SAMPLERATE,
            AudioFormat.CHANNEL_IN_MONO,
            AudioFormat.ENCODING_PCM_16BIT, BUFFER_SIZE * 10
        )
    }

    private suspend fun startRecording() {
        Timber.tag(TAG).i("started recording, buffer size $BUFFER_SIZE")
        createTempFile()
        prepareRecorder()

        try {
            encoder = createMediaCodec(BUFFER_SIZE)
            encoder.start()
            createMuxer(encoder.outputFormat, currentFile!!)
            mediaMuxer.start()
        } catch (exception: Exception) {
            Timber.tag(TAG).w(exception)
        }

        audioStartTimeNs = System.nanoTime()
        audioRecord.startRecording()

        var bufferInfo = MediaCodec.BufferInfo()

        chunkCuttingJob = CoroutineScope(dispatcher).launch {
            while (isActive) {
                delay(AUDIO_CHUNK_LENGTH_MS)
                cutChunk()
            }
        }
        recordingJob = CoroutineScope(dispatcher).launch {
            val buffer2 = ByteArray(BUFFER_SIZE)

            do {
                val bytes = audioRecord.read(buffer2, 0, BUFFER_SIZE)

                if (bytes != BUFFER_SIZE) {
                    Timber.tag(TAG).w("read less bytes than full buffer ($bytes/$BUFFER_SIZE)")
                }

                encodeRawAudio(encoder, mediaMuxer, buffer2, bytes, bufferInfo, !isActive || chunkEnd)

                if (chunkEnd) {
                    recreateEncoderAndMuxer()
                    bufferInfo = MediaCodec.BufferInfo()
                    // delay here causes crash after first cut
                    //delay(100)
                }
                // delay here fixes crash in most cases
                //delay(100)
            } while(isActive)
        }
    }

    private fun recreateEncoderAndMuxer() {
        createTempFile()
        chunkEnd = false
        audioStartTimeNs = System.nanoTime()
        encoder.stop()
        encoder.release()
        encoder = createMediaCodec(BUFFER_SIZE)
        encoder.start()
        mediaMuxer.stop()
        mediaMuxer.release()
        createMuxer(encoder.outputFormat, currentFile!!)
        mediaMuxer.start()
    }

    private fun encodeRawAudio(encoder: MediaCodec, muxer: MediaMuxer, bytes: ByteArray, byteCount: Int, bufferInfo: MediaCodec.BufferInfo, last: Boolean = false) {
        with(encoder) {
            val infputBufferIndex = dequeueInputBuffer(10_000)
            val inputBuffer = getInputBuffer(infputBufferIndex)
            inputBuffer?.clear()
            inputBuffer?.put(bytes)
            val presentationTimeUs: Long = (System.nanoTime() - audioStartTimeNs) / 1000

            queueInputBuffer(infputBufferIndex, 0, byteCount, presentationTimeUs, if (last) BUFFER_FLAG_END_OF_STREAM else 0)

            var outputBufferIndex = dequeueOutputBuffer(bufferInfo, 0)
            Timber.tag(TAG).d("encoding $byteCount bytes, last = $last, time: $presentationTimeUs, buffer time: ${bufferInfo.presentationTimeUs}")

            while (outputBufferIndex != MediaCodec.INFO_TRY_AGAIN_LATER) {
                if (outputBufferIndex >= 0) {
                    val outputBuffer = getOutputBuffer(outputBufferIndex)

                    outputBuffer?.position(bufferInfo.offset)
                    outputBuffer?.limit(bufferInfo.offset + bufferInfo.size)

                    if (bufferInfo.flags and MediaCodec.BUFFER_FLAG_CODEC_CONFIG != MediaCodec.BUFFER_FLAG_CODEC_CONFIG) {
                        val data = ByteArray(outputBuffer!!.remaining())
                        outputBuffer.get(data)

                        muxer.writeSampleData(trackId, outputBuffer, bufferInfo)
                    }

                    outputBuffer?.clear()
                    releaseOutputBuffer(outputBufferIndex, false)
                }

                outputBufferIndex = encoder.dequeueOutputBuffer(bufferInfo, 0)
            }
        }
    }

    private fun cutChunk() {
        Timber.tag(TAG).i("cutting chunk")
        chunkEnd = true
    }

    private fun stopRecording() {
        Timber.tag(TAG).i("stopped recording")
        chunkCuttingJob?.cancel()
        chunkCuttingJob = null
        recordingJob?.cancel()
        recordingJob = null
        audioRecord.stop()
        encoder.release()
        mediaMuxer.stop()
        mediaMuxer.release()
    }

    suspend fun record(isRecording: Boolean) {
        if (isRecording) {
            startRecording()
        } else {
            stopRecording()
        }
    }

    private fun createMediaCodec(bufferSize: Int, existing: MediaCodec? = null): MediaCodec {
        val mediaFormat = MediaFormat().apply {
            setString(MediaFormat.KEY_MIME, MediaFormat.MIMETYPE_AUDIO_AAC)
            setInteger(MediaFormat.KEY_BIT_RATE, 32000)
            setInteger(MediaFormat.KEY_CHANNEL_COUNT, 1)
            setInteger(MediaFormat.KEY_SAMPLE_RATE, RECORDER_SAMPLERATE)
            setInteger(MediaFormat.KEY_AAC_PROFILE, CodecProfileLevel.AACObjectLC)
            setInteger(MediaFormat.KEY_MAX_INPUT_SIZE, bufferSize)
        }

        val encoderString = MediaCodecList(MediaCodecList.REGULAR_CODECS).findEncoderForFormat(mediaFormat)

        Timber.tag(TAG).d("chosen codec: $encoderString")
        val mediaCodec = existing ?: MediaCodec.createByCodecName(encoderString)

        try {
            mediaCodec.configure(mediaFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
        } catch (e: Exception) {
            Timber.tag(TAG).w(e)
            mediaCodec.release()
        }
        return mediaCodec
    }

    private fun createMuxer(format: MediaFormat, file: File) {
        try {
            file.createNewFile()
            mediaMuxer = MediaMuxer(file.absolutePath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)
            trackId = mediaMuxer.addTrack(format)
        } catch (e: java.lang.Exception) {
            Timber.tag(TAG).e(e)
        }
    }

    private var currentIndex: Int = 0

    private fun createTempFile() {
        currentFile = File(context.cacheDir, "$currentIndex.m4a").also { it.createNewFile() }
        currentIndex++
    }
}

I run this code in coroutine, like:

class MyViewModel : ViewModel() {
    fun startRecording() {
        val recordingUtil = RecordingUtil2(...)
        viewModelScope.launch(Dispatchers.Default) {
            recordingUtil.record(true)
        }
    }
}

The problem I'm facing is that after several chunks are saved into consecutive files, MediaMuxer crashes in result of exception in MPEG4Writer:

E/MPEG4Writer: do not support out of order frames (timestamp: 13220 < last: 23219 for Audio track

Yet, as you can see in provided code, timestamps are generated incrementally and are used as MediaCodec.queueInputBuffer(...) argument in proper order.

What's interesting (and might be suggesting what's wrong) is that exception message from MPEG4Writer says that last timestamp is 23219 every single time, just like it's a constant, while judging from native platform code it should indeed show previous frame timestamp, which is very unlikely to be constant number much bigger than 0.

More logs from crash (for context)

I/MPEG4Writer: Normal stop process
D/MPEG4Writer: Audio track stopping. Stop source
I/MPEG4Writer: Received total/0-length (22/1) buffers and encoded 22 frames. - Audio
D/MPEG4Writer: Audio track source stopping
D/MPEG4Writer: Audio track source stopped
I/MPEG4Writer: Audio track drift time: 0 us
D/MPEG4Writer: Audio track stopped. Stop source
D/MPEG4Writer: Stopping writer thread
D/MPEG4Writer: 0 chunks are written in the last batch
D/MPEG4Writer: Writer thread stopped
I/MPEG4Writer: Ajust the moov start time from 44099 us -> 44099 us
I/MPEG4Writer: The mp4 file will not be streamable.
D/MPEG4Writer: Audio track stopping. Stop source
D/RECORDING: encoding 3528 bytes, last = false, time: 79102, buffer time: 0
D/RECORDING: encoding 3528 bytes, last = false, time: 85883, buffer time: 0
D/RECORDING: encoding 3528 bytes, last = false, time: 89383, buffer time: 79102
I/MPEG4Writer: setStartTimestampUs: 79102 from Audio track
I/MPEG4Writer: Earliest track starting time: 79102
E/MPEG4Writer: do not support out of order frames (timestamp: 13220 < last: 23219 for Audio track
E/MPEG4Writer: 0 frames to dump timeStamps in Audio track 
I/MPEG4Writer: Received total/0-length (3/0) buffers and encoded 2 frames. - Audio
I/MPEG4Writer: Audio track drift time: 0 us
E/MediaAdapter: pushBuffer called before start
E/AndroidRuntime: FATAL EXCEPTION: DefaultDispatcher-worker-1
E/AndroidRuntime: FATAL EXCEPTION: DefaultDispatcher-worker-1
    Process: com.example, PID: 23499
    java.lang.IllegalStateException: writeSampleData returned an error

Logs in case of succesfully recorded and saved audio chunk:

I/MPEG4Writer: Normal stop process
D/MPEG4Writer: Audio track stopping. Stop source
D/MPEG4Writer: Audio track source stopping
I/MPEG4Writer: Received total/0-length (18/0) buffers and encoded 18 frames. - Audio
D/MPEG4Writer: Audio track source stopped
I/MPEG4Writer: Audio track drift time: 0 us
D/MPEG4Writer: Audio track stopped. Stop source
D/MPEG4Writer: Stopping writer thread
D/MPEG4Writer: 0 chunks are written in the last batch
D/MPEG4Writer: Writer thread stopped
I/MPEG4Writer: Ajust the moov start time from 45890 us -> 45890 us
I/MPEG4Writer: The mp4 file will not be streamable.
D/MPEG4Writer: Audio track stopping. Stop source
D/RECORDING: encoding 3528 bytes, last = false, time: 44099, buffer time: 0
D/RECORDING: encoding 3528 bytes, last = false, time: 74366, buffer time: 44099
I/MPEG4Writer: setStartTimestampUs: 44099 from Audio track
I/MPEG4Writer: Earliest track starting time: 44099
D/RECORDING: encoding 3528 bytes, last = false, time: 116122, buffer time: 80805
D/RECORDING: encoding 3528 bytes, last = false, time: 156789, buffer time: 104025
D/RECORDING: encoding 3528 bytes, last = false, time: 196940, buffer time: 152221
D/RECORDING: encoding 3528 bytes, last = false, time: 235010, buffer time: 176108
D/RECORDING: encoding 3528 bytes, last = false, time: 275232, buffer time: 243989
D/RECORDING: encoding 3528 bytes, last = false, time: 316400, buffer time: 267209
D/RECORDING: encoding 3528 bytes, last = false, time: 361290, buffer time: 313871
D/RECORDING: encoding 3528 bytes, last = false, time: 401305, buffer time: 338259
D/RECORDING: encoding 3528 bytes, last = false, time: 441019, buffer time: 412824
D/RECORDING: encoding 3528 bytes, last = false, time: 481193, buffer time: 436044
I/RECORDING: cutting chunk
D/RECORDING: encoding 3528 bytes, last = true, time: 518624, buffer time: 458978
I/MediaCodec: Codec shutdown complete

I've noticed that logs from crash scenario show that BufferInfo contains timestamp = 0 for two initial frames, while non-crash logs always have only one such frame. Yet, I've observed the same crash sometimes with only one timestamp = 0 frame, so it might not be relevant.

Could anybody help me fix that issue ?

Solution

Check out the question Muxing AAC audio with Android's MediaCodec and MediaMuxer.

Best guess: the encoder is doing something with the output -- maybe splitting an input packet into two output packets -- that requires it to synthesize a timestamp. It takes the timestamp of the start of the packet and adds a value based on the bit rate and number of bytes. If you generate timestamps with reasonably correct presentation times you shouldn't see it go backwards when the "in-between" timestamp is generated. by @fadden

I quote the comment made by @fadden. It explains the reason why MediaCodec generated the surprise, which timestamps are not incremental somehow.

So, you said

Yet, as you can see in provided code, timestamps are generated incrementally and are used as MediaCodec.queueInputBuffer(...) argument in proper order.

It's not about your code to feed monolithic timestamps. If you look at the error message clearly.

java.lang.IllegalStateException: writeSampleData returned an error

The error was thrown at the method writeSampleData. So, simply put some logging to see the BufferInfo.presentationTimeUs before muxer.writeSampleData. You'll see the surprise.