I'm trying to make a simple "click track"-to-file renderer on Android. I have a PCM encoded data for a sound and some finite gap sequence as an input (represented as ClickTrack
class). I want a playable .m4a
file as an output with that sound repeating over the gaps rendered properly.
The problem is that I'm getting a file in semi-corrupted state - it plays all repetitions of the sound in the beginning as fast as it can and then the silence for the duration of the track. The duration of the track happens to be correct, so it seems that presentation times are correct.
Now the code:
fun render(clickTrack: ClickTrack, onProgress: (Float) -> Unit, onFinished: () -> Unit): File? {
var muxer: MediaMuxer? = null
var codec: MediaCodec? = null
try {
val audioFormat = MediaFormat.createAudioFormat(MediaFormat.MIMETYPE_AUDIO_AAC, 44100, 2)
.apply {
setInteger(MediaFormat.KEY_BIT_RATE, 96 * 1024)
}
val outputFile = File.createTempFile("click_track_export", ".m4a", context.cacheDir)
muxer = MediaMuxer(outputFile.path, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)
val codecName = MediaCodecList(MediaCodecList.REGULAR_CODECS).findEncoderForFormat(audioFormat)!!
codec = MediaCodec.createByCodecName(codecName)
codec.configure(audioFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
codec.start()
// Converts click track to sequence of sound buffers (all the same) with
// timestamps (computed using gaps) for convenience. Gaps are not presented
// in buffers in order to conserve memory
val samples = clickTrack.toSamples()
val bytesToWrite = samples.sumOf { it.data.data.size.toLong() }
val bufferInfo = MediaCodec.BufferInfo()
var bytesWritten = 0L
var index = 0
var endOfInput = samples.isEmpty()
var endOfOutput = samples.isEmpty()
var sample = samples.getOrNull(index)
var sampleBuffer: ByteBuffer? = null
while (!endOfInput || !endOfOutput) {
if (!endOfInput) {
if (sampleBuffer == null || !sampleBuffer.hasRemaining()) {
sample = samples[index]
sampleBuffer = ByteBuffer.wrap(samples[index].data.data)
++index
}
sample!!
sampleBuffer!!
val inputBufferIndex = codec.dequeueInputBuffer(0L)
if (inputBufferIndex >= 0) {
val inputBuffer = codec.getInputBuffer(inputBufferIndex)!!
while (sampleBuffer.hasRemaining() && inputBuffer.hasRemaining()) {
inputBuffer.put(sampleBuffer.get())
++bytesWritten
}
onProgress(bytesWritten.toFloat() / bytesToWrite)
endOfInput = !sampleBuffer.hasRemaining() && index == samples.size
codec.queueInputBuffer(
inputBufferIndex,
0,
inputBuffer.position(),
sample.timestampUs,
if (endOfInput) MediaCodec.BUFFER_FLAG_END_OF_STREAM else 0
)
}
}
if (!endOfOutput) {
val outputBufferIndex = codec.dequeueOutputBuffer(bufferInfo, 0L)
if (outputBufferIndex >= 0) {
val outputBuffer = codec.getOutputBuffer(outputBufferIndex)!!
muxer.writeSampleData(0, outputBuffer, bufferInfo)
codec.releaseOutputBuffer(outputBufferIndex, false)
} else if (outputBufferIndex == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
// Not using `audioFormat` because of https://developer.android.com/reference/android/media/MediaCodec#CSD
muxer.addTrack(codec.outputFormat)
muxer.start()
}
endOfOutput = bufferInfo.flags and MediaCodec.BUFFER_FLAG_END_OF_STREAM != 0
}
}
return outputFile
} catch (t: Throwable) {
Timber.e(t, "Failed to render track")
} finally {
try {
codec?.stop()
} catch (t: Throwable) {
Timber.e(t, "Failed to stop code")
} finally {
codec?.release()
}
try {
muxer?.stop()
} catch (t: Throwable) {
Timber.e(t, "Failed to stop muxer")
} finally {
muxer?.release()
}
onFinished()
}
return null
}
// Classes descriptions
class Sample(
val data: PcmData,
val timestampUs: Long,
)
class PcmData(
val pcmEncoding: Int,
val sampleRate: Int,
val channelCount: Int,
val data: ByteArray,
)
Turned out I misunderstood presentationTimeUs
parameter in queueInputBuffer
method. It DOES NOT write silence frames for you as I thought. It's just a hint for encoder/muxer for av synchronization and ordering if you happen to have B-frames and such.
For audio only file I made it all
This is actually wrong and didn't work on Android Marshmallow. You should compute adequate presentation time either way.0L
and it worked perfectly fine.
Another mistake was writing silence that is not a multiple of PCM frame size (that is sample size * channel count). If you don't do this, you will have audio glitches in the end.
So in the end I got this code for generating complete ByteArray
ready for MediaCodec
to consume:
private fun ClickTrack.render(): ByteArray {
val result = mutableListOf<Byte>()
for (event in toPlayerEvents()) {
// Object containing raw byte array and some meta information like sample rate and channel count
val pcm = event.sound
// Compute overall frame count that can fit in event.duration
// framesPerSecond = sampleRate / channelCount
val maxFramesCount = (event.duration.toDouble(DurationUnit.SECONDS) * pcm.framesPerSecond).toInt()
// Compute frames for sound. If sound is longer than event duration, truncate it
// bytesPerFrame = bytesPerSample (1 for ENCODING_PCM_8BIT, 2 for ENCODING_PCM_16BIT and so on) * channelCount
val framesOfSound = (pcm.data.size / pcm.bytesPerFrame).coerceAtMost(maxFramesCount)
// The rest is just silent frames
val framesOfSilence = maxFramesCount - framesOfSound
result += pcm.data.slice(0 until framesOfSound * pcm.bytesPerFrame)
result += ByteArray(framesOfSilence * pcm.bytesPerFrame).asIterable()
}
return result.toByteArray()
}