I want to record audio track from microphone (and from bluetooth headset microphone too, down the road) and convert it to MPEG4 AAC
format. As required by specification of communication with backend in my project, the audio has to be split into short (0.5 - 2 second long) chunks. As an simplified example, I'm just saving these chunks as files in cache in provided code (without sending them to the backend).
To achieve this, I'm recording audio with AudioRecord
in PCM-16
format, then convert it to AAC
with MediaCodec
, and finally save it as MPEG4
file with MediaMuxer
.
Example code (based on this example):
private const val TAG = "RECORDING"
private const val AUDIO_CHUNK_LENGTH_MS = 500L
private const val RECORDER_SAMPLERATE = 44100
private const val RECORDER_CHANNELS = AudioFormat.CHANNEL_IN_MONO
private const val RECORDER_AUDIO_ENCODING = AudioFormat.ENCODING_PCM_16BIT
private val BUFFER_SIZE = AudioRecord.getMinBufferSize(RECORDER_SAMPLERATE, RECORDER_CHANNELS, RECORDER_AUDIO_ENCODING)
class RecordingUtil2(
val context: Context,
private val dispatcher: CoroutineDispatcher = Dispatchers.Default
) {
lateinit var audioRecord: AudioRecord
lateinit var encoder: MediaCodec
lateinit var mediaMuxer: MediaMuxer
var trackId: Int = 0
var chunkCuttingJob: Job? = null
var recordingJob: Job? = null
var audioStartTimeNs: Long = 0
var currentFile: File? = null
var chunkEnd = false
private fun prepareRecorder() {
audioRecord = AudioRecord(
MediaRecorder.AudioSource.MIC, RECORDER_SAMPLERATE,
AudioFormat.CHANNEL_IN_MONO,
AudioFormat.ENCODING_PCM_16BIT, BUFFER_SIZE * 10
)
}
private suspend fun startRecording() {
Timber.tag(TAG).i("started recording, buffer size $BUFFER_SIZE")
createTempFile()
prepareRecorder()
try {
encoder = createMediaCodec(BUFFER_SIZE)
encoder.start()
createMuxer(encoder.outputFormat, currentFile!!)
mediaMuxer.start()
} catch (exception: Exception) {
Timber.tag(TAG).w(exception)
}
audioStartTimeNs = System.nanoTime()
audioRecord.startRecording()
var bufferInfo = MediaCodec.BufferInfo()
chunkCuttingJob = CoroutineScope(dispatcher).launch {
while (isActive) {
delay(AUDIO_CHUNK_LENGTH_MS)
cutChunk()
}
}
recordingJob = CoroutineScope(dispatcher).launch {
val buffer2 = ByteArray(BUFFER_SIZE)
do {
val bytes = audioRecord.read(buffer2, 0, BUFFER_SIZE)
if (bytes != BUFFER_SIZE) {
Timber.tag(TAG).w("read less bytes than full buffer ($bytes/$BUFFER_SIZE)")
}
encodeRawAudio(encoder, mediaMuxer, buffer2, bytes, bufferInfo, !isActive || chunkEnd)
if (chunkEnd) {
recreateEncoderAndMuxer()
bufferInfo = MediaCodec.BufferInfo()
// delay here causes crash after first cut
//delay(100)
}
// delay here fixes crash in most cases
//delay(100)
} while(isActive)
}
}
private fun recreateEncoderAndMuxer() {
createTempFile()
chunkEnd = false
audioStartTimeNs = System.nanoTime()
encoder.stop()
encoder.release()
encoder = createMediaCodec(BUFFER_SIZE)
encoder.start()
mediaMuxer.stop()
mediaMuxer.release()
createMuxer(encoder.outputFormat, currentFile!!)
mediaMuxer.start()
}
private fun encodeRawAudio(encoder: MediaCodec, muxer: MediaMuxer, bytes: ByteArray, byteCount: Int, bufferInfo: MediaCodec.BufferInfo, last: Boolean = false) {
with(encoder) {
val infputBufferIndex = dequeueInputBuffer(10_000)
val inputBuffer = getInputBuffer(infputBufferIndex)
inputBuffer?.clear()
inputBuffer?.put(bytes)
val presentationTimeUs: Long = (System.nanoTime() - audioStartTimeNs) / 1000
queueInputBuffer(infputBufferIndex, 0, byteCount, presentationTimeUs, if (last) BUFFER_FLAG_END_OF_STREAM else 0)
var outputBufferIndex = dequeueOutputBuffer(bufferInfo, 0)
Timber.tag(TAG).d("encoding $byteCount bytes, last = $last, time: $presentationTimeUs, buffer time: ${bufferInfo.presentationTimeUs}")
while (outputBufferIndex != MediaCodec.INFO_TRY_AGAIN_LATER) {
if (outputBufferIndex >= 0) {
val outputBuffer = getOutputBuffer(outputBufferIndex)
outputBuffer?.position(bufferInfo.offset)
outputBuffer?.limit(bufferInfo.offset + bufferInfo.size)
if (bufferInfo.flags and MediaCodec.BUFFER_FLAG_CODEC_CONFIG != MediaCodec.BUFFER_FLAG_CODEC_CONFIG) {
val data = ByteArray(outputBuffer!!.remaining())
outputBuffer.get(data)
muxer.writeSampleData(trackId, outputBuffer, bufferInfo)
}
outputBuffer?.clear()
releaseOutputBuffer(outputBufferIndex, false)
}
outputBufferIndex = encoder.dequeueOutputBuffer(bufferInfo, 0)
}
}
}
private fun cutChunk() {
Timber.tag(TAG).i("cutting chunk")
chunkEnd = true
}
private fun stopRecording() {
Timber.tag(TAG).i("stopped recording")
chunkCuttingJob?.cancel()
chunkCuttingJob = null
recordingJob?.cancel()
recordingJob = null
audioRecord.stop()
encoder.release()
mediaMuxer.stop()
mediaMuxer.release()
}
suspend fun record(isRecording: Boolean) {
if (isRecording) {
startRecording()
} else {
stopRecording()
}
}
private fun createMediaCodec(bufferSize: Int, existing: MediaCodec? = null): MediaCodec {
val mediaFormat = MediaFormat().apply {
setString(MediaFormat.KEY_MIME, MediaFormat.MIMETYPE_AUDIO_AAC)
setInteger(MediaFormat.KEY_BIT_RATE, 32000)
setInteger(MediaFormat.KEY_CHANNEL_COUNT, 1)
setInteger(MediaFormat.KEY_SAMPLE_RATE, RECORDER_SAMPLERATE)
setInteger(MediaFormat.KEY_AAC_PROFILE, CodecProfileLevel.AACObjectLC)
setInteger(MediaFormat.KEY_MAX_INPUT_SIZE, bufferSize)
}
val encoderString = MediaCodecList(MediaCodecList.REGULAR_CODECS).findEncoderForFormat(mediaFormat)
Timber.tag(TAG).d("chosen codec: $encoderString")
val mediaCodec = existing ?: MediaCodec.createByCodecName(encoderString)
try {
mediaCodec.configure(mediaFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
} catch (e: Exception) {
Timber.tag(TAG).w(e)
mediaCodec.release()
}
return mediaCodec
}
private fun createMuxer(format: MediaFormat, file: File) {
try {
file.createNewFile()
mediaMuxer = MediaMuxer(file.absolutePath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)
trackId = mediaMuxer.addTrack(format)
} catch (e: java.lang.Exception) {
Timber.tag(TAG).e(e)
}
}
private var currentIndex: Int = 0
private fun createTempFile() {
currentFile = File(context.cacheDir, "$currentIndex.m4a").also { it.createNewFile() }
currentIndex++
}
}
I run this code in coroutine, like:
class MyViewModel : ViewModel() {
fun startRecording() {
val recordingUtil = RecordingUtil2(...)
viewModelScope.launch(Dispatchers.Default) {
recordingUtil.record(true)
}
}
}
The problem I'm facing is that after several chunks are saved into consecutive files, MediaMuxer
crashes in result of exception in MPEG4Writer
:
E/MPEG4Writer: do not support out of order frames (timestamp: 13220 < last: 23219 for Audio track
Yet, as you can see in provided code, timestamps are generated incrementally and are used as MediaCodec.queueInputBuffer(...)
argument in proper order.
What's interesting (and might be suggesting what's wrong) is that exception message from MPEG4Writer says that last timestamp is 23219 every single time, just like it's a constant, while judging from native platform code it should indeed show previous frame timestamp, which is very unlikely to be constant number much bigger than 0.
More logs from crash (for context)
I/MPEG4Writer: Normal stop process
D/MPEG4Writer: Audio track stopping. Stop source
I/MPEG4Writer: Received total/0-length (22/1) buffers and encoded 22 frames. - Audio
D/MPEG4Writer: Audio track source stopping
D/MPEG4Writer: Audio track source stopped
I/MPEG4Writer: Audio track drift time: 0 us
D/MPEG4Writer: Audio track stopped. Stop source
D/MPEG4Writer: Stopping writer thread
D/MPEG4Writer: 0 chunks are written in the last batch
D/MPEG4Writer: Writer thread stopped
I/MPEG4Writer: Ajust the moov start time from 44099 us -> 44099 us
I/MPEG4Writer: The mp4 file will not be streamable.
D/MPEG4Writer: Audio track stopping. Stop source
D/RECORDING: encoding 3528 bytes, last = false, time: 79102, buffer time: 0
D/RECORDING: encoding 3528 bytes, last = false, time: 85883, buffer time: 0
D/RECORDING: encoding 3528 bytes, last = false, time: 89383, buffer time: 79102
I/MPEG4Writer: setStartTimestampUs: 79102 from Audio track
I/MPEG4Writer: Earliest track starting time: 79102
E/MPEG4Writer: do not support out of order frames (timestamp: 13220 < last: 23219 for Audio track
E/MPEG4Writer: 0 frames to dump timeStamps in Audio track
I/MPEG4Writer: Received total/0-length (3/0) buffers and encoded 2 frames. - Audio
I/MPEG4Writer: Audio track drift time: 0 us
E/MediaAdapter: pushBuffer called before start
E/AndroidRuntime: FATAL EXCEPTION: DefaultDispatcher-worker-1
E/AndroidRuntime: FATAL EXCEPTION: DefaultDispatcher-worker-1
Process: com.example, PID: 23499
java.lang.IllegalStateException: writeSampleData returned an error
Logs in case of succesfully recorded and saved audio chunk:
I/MPEG4Writer: Normal stop process
D/MPEG4Writer: Audio track stopping. Stop source
D/MPEG4Writer: Audio track source stopping
I/MPEG4Writer: Received total/0-length (18/0) buffers and encoded 18 frames. - Audio
D/MPEG4Writer: Audio track source stopped
I/MPEG4Writer: Audio track drift time: 0 us
D/MPEG4Writer: Audio track stopped. Stop source
D/MPEG4Writer: Stopping writer thread
D/MPEG4Writer: 0 chunks are written in the last batch
D/MPEG4Writer: Writer thread stopped
I/MPEG4Writer: Ajust the moov start time from 45890 us -> 45890 us
I/MPEG4Writer: The mp4 file will not be streamable.
D/MPEG4Writer: Audio track stopping. Stop source
D/RECORDING: encoding 3528 bytes, last = false, time: 44099, buffer time: 0
D/RECORDING: encoding 3528 bytes, last = false, time: 74366, buffer time: 44099
I/MPEG4Writer: setStartTimestampUs: 44099 from Audio track
I/MPEG4Writer: Earliest track starting time: 44099
D/RECORDING: encoding 3528 bytes, last = false, time: 116122, buffer time: 80805
D/RECORDING: encoding 3528 bytes, last = false, time: 156789, buffer time: 104025
D/RECORDING: encoding 3528 bytes, last = false, time: 196940, buffer time: 152221
D/RECORDING: encoding 3528 bytes, last = false, time: 235010, buffer time: 176108
D/RECORDING: encoding 3528 bytes, last = false, time: 275232, buffer time: 243989
D/RECORDING: encoding 3528 bytes, last = false, time: 316400, buffer time: 267209
D/RECORDING: encoding 3528 bytes, last = false, time: 361290, buffer time: 313871
D/RECORDING: encoding 3528 bytes, last = false, time: 401305, buffer time: 338259
D/RECORDING: encoding 3528 bytes, last = false, time: 441019, buffer time: 412824
D/RECORDING: encoding 3528 bytes, last = false, time: 481193, buffer time: 436044
I/RECORDING: cutting chunk
D/RECORDING: encoding 3528 bytes, last = true, time: 518624, buffer time: 458978
I/MediaCodec: Codec shutdown complete
I've noticed that logs from crash scenario show that BufferInfo contains timestamp = 0 for two initial frames, while non-crash logs always have only one such frame. Yet, I've observed the same crash sometimes with only one timestamp = 0 frame, so it might not be relevant.
Could anybody help me fix that issue ?
Check out the question Muxing AAC audio with Android's MediaCodec and MediaMuxer.
Best guess: the encoder is doing something with the output -- maybe splitting an input packet into two output packets -- that requires it to synthesize a timestamp. It takes the timestamp of the start of the packet and adds a value based on the bit rate and number of bytes. If you generate timestamps with reasonably correct presentation times you shouldn't see it go backwards when the "in-between" timestamp is generated. by @fadden
I quote the comment made by @fadden. It explains the reason why MediaCodec generated the surprise, which timestamps are not incremental somehow.
So, you said
Yet, as you can see in provided code, timestamps are generated incrementally and are used as MediaCodec.queueInputBuffer(...) argument in proper order.
It's not about your code to feed monolithic timestamps. If you look at the error message clearly.
java.lang.IllegalStateException: writeSampleData returned an error
The error was thrown at the method writeSampleData
. So, simply put some logging to see the BufferInfo.presentationTimeUs
before muxer.writeSampleData
. You'll see the surprise.