android android-mediacodec android-mediarecorder

MediaRecorder Surface Input with OpenGL - issue if audio recording is enabled

I want to use MediaRecorder for recording videos instead of MediaCodec, because it's very easy to use as we know.

I also want to use OpenGL to process frames while recording

Then I use example code from Grafika's ContinuousCaptureActivity sample to init EGL rendering context, create cameraTexture and pass it to Camera2 API as Surface https://github.com/google/grafika/blob/master/app/src/main/java/com/android/grafika/ContinuousCaptureActivity.java#L392

and create EGLSurface encodeSurface from our recorderSurface https://github.com/google/grafika/blob/master/app/src/main/java/com/android/grafika/ContinuousCaptureActivity.java#L418

and so on (processing frames as in Grafika sample, everything the same as in the example code Grafika code)

Then when I start recording (MediaRecorder.start()), it records video ok if audio source wasn't set

But if audio recording is also enabled

mediaRecorder.setAudioSource(MediaRecorder.AudioSource.MIC)
...
mediaRecorder.setAudioEncoder(MediaRecorder.AudioEncoder.AAC)

Then final video has large duration (length) and it's not really playable. So MediaRecorder audio encoder ruins everything when using Surface as input and GLES for adding and processing frames

I have no idea how to fix it.

Here's my code to process frames (based on Grafika sample, it's almost the same):

class GLCameraFramesRender(
    private val width: Int,
    private val height: Int,
    private val callback: Callback,
    recorderSurface: Surface,
    eglCore: EglCore
) : OnFrameAvailableListener {
    private val fullFrameBlit: FullFrameRect
    private val textureId: Int
    private val encoderSurface: WindowSurface
    private val tmpMatrix = FloatArray(16)
    private val cameraTexture: SurfaceTexture
    val cameraSurface: Surface

    init {
        encoderSurface = WindowSurface(eglCore, recorderSurface, true)
        encoderSurface.makeCurrent()

        fullFrameBlit = FullFrameRect(Texture2dProgram(Texture2dProgram.ProgramType.TEXTURE_EXT))

        textureId = fullFrameBlit.createTextureObject()

        cameraTexture = SurfaceTexture(textureId)
        cameraSurface = Surface(cameraTexture)
        cameraTexture.setOnFrameAvailableListener(this)
    }

    fun release() {
        cameraTexture.setOnFrameAvailableListener(null)
        cameraTexture.release()
        cameraSurface.release()
        fullFrameBlit.release(false)
        eglCore.release()
    }

    override fun onFrameAvailable(surfaceTexture: SurfaceTexture) {
        if (callback.isRecording()) {
            drawFrame()
        } else {
            cameraTexture.updateTexImage()
        }
    }

    private fun drawFrame() {
        cameraTexture.updateTexImage()

        cameraTexture.getTransformMatrix(tmpMatrix)


        GLES20.glViewport(0, 0, width, height)

        fullFrameBlit.drawFrame(textureId, tmpMatrix)

        encoderSurface.setPresentationTime(cameraTexture.timestamp)

        encoderSurface.swapBuffers()
       
    }

    interface Callback {
        fun isRecording(): Boolean
    }
}

Solution

It's very likely your timestamps aren't in the same timebase. The media recording system generally wants timestamps in the uptimeMillis timebase, but many camera devices produce data in the elapsedRealtime timebase. One counts time when the device is in deep sleep, and the other doesn't; the longer it's been since you rebooted your device, the bigger the discrepancy becomes.

It wouldn't matter until you add in the audio, since MediaRecorder's internal audio timestamps will be in uptimeMillis, while the camera frame timestamps will come in as elapsedRealtime. A discrepancy of more than a few fractions of a second would probably be noticeable as a bad A/V sync; a few minutes or more will just mess everything up.

When the camera talks to the media recording stack directly, it adjusts timestamps automatically; since you've placed the GPU in the middle, that doesn't happen (since the camera doesn't know that's where your frames are going eventually).

You can check if the camera is using elapsedRealtime as the timebase via SENSOR_INFO_TIMESTAMP_SOURCE. But in any case, you have a few choices:

If the camera uses TIMESTAMP_SOURCE_REALTIME, measure the difference between the two timestamp at the start of recording, and adjust the timestamps you feed into setPresentationTime accordingly (delta = elapsedRealtime - uptimeMillis; timestamp = timestamp - delta;)
Just use uptimeMillis() * 1000000 as the time for setPresentationTime. This may cause too much A/V skew, but it's easy to try.