ffmpegjavacvhardware-accelerationlibavdxva

FFmpeg invalid data found when processing input with D3D11VA and DXVA2 hw acceleration


I'm currently porting my Android streaming app to Windows and for decoding the h264 video stream I use FFmpeg with possible hardware acceleration. Last two weeks I was reading a lot of documentation and studied a lot of examples on the internet. For my project I use JavaCV which is internally using FFmpeg 5.1.2. On Windows I support D3D11VA, DXVA2 and Cuvid for hardware acceleration (and software decoding as fallback). During testing I noticed that I get some strange artefacts when using D3D11VA or DXVA2 hw acceleration. Upon further investigation I saw that I receive a lot of

"Invalid data found when processing input"

errors when calling avcodec_send_packet. It seems this error only occurs on certain key frames. The error is reproducable all the time. The software decoder or cuvid decoder has absolutely no problem to process and to decode such a frame, so not sure why there should be an invalid data in the frame? I played around a lot with the decoder configuration but nothing seems to help and at that point I think this is definitely not normal behaviour.

I provided a reproducable example which can be downloaded from here. All the important part is in the App.java class. In addition an example of the code was posted below. The example is trying to decode a key frame. The keyframe data with sps and pps is read from a file in the resource folder of the project.

To run the project just perform a .\gradlew build and afterwards a .\gradlew run. If you run the example the last log message shown in the terminal should be "SUCESS with HW decoding". The hardware decoder can be changed via the HW_DEVICE_TYPE variable in the App.java class. To disable hw acceleration just set the USE_HW_ACCEL to false.

For me everything seems to be correct and I have no idea what could be wrong with the code. I looked a lot on the internet to find the root cause of the issue and I did not really found a solution but other sources which are related to (maybe) the same problem

https://www.mail-archive.com/libav-user@ffmpeg.org/...

https://stackoverflow.com/questions/67307397/ffmpeg-...

I also found another streaming app on Windows which can use D3D11VA and DXVA2 hardware acceleration called Chiaki (it requires a PS4 or a PS5) which seems to have the exact same problem. I used the build provided here. It will fail to decode certain key frames as well when hardware acceleration with D3D11VA or DXVA2 is selected (e.g. the first key frame received by the stream). Chiaki can output the seemingly faulty frame but this is also possible with my example by setting the USE_AV_EF_EXPLODE to false.

Are there any ffmpeg gurus around that can check what's the problem with D3D11VA or DXVA2? Anything else that needs to be done to make the D3D11VA and DXVA2 hardware decoder work? I'm now completly out of ideas and I'm not even sure if this is fixable.

I have Windows 11 installed on my test machine, and I have the latest Nvidea drivers installed.

Edit: here is a shrinked complete example of my project (keyframe file which includes sps and pps can be downloaded from here. It's a hex string file and can be decoded with the provided HexUtil class)

import javafx.application.Application;
import javafx.scene.Scene;
import javafx.scene.layout.Pane;
import javafx.stage.Stage;
import org.bytedeco.ffmpeg.avcodec.AVCodec;
import org.bytedeco.ffmpeg.avcodec.AVCodecContext;
import org.bytedeco.ffmpeg.avcodec.AVCodecHWConfig;
import org.bytedeco.ffmpeg.avcodec.AVPacket;
import org.bytedeco.ffmpeg.avutil.AVBufferRef;
import org.bytedeco.ffmpeg.avutil.AVDictionary;
import org.bytedeco.ffmpeg.avutil.AVFrame;
import org.bytedeco.javacpp.BytePointer;
import org.bytedeco.javacpp.IntPointer;
import org.bytedeco.javacv.FFmpegLogCallback;
import org.tinylog.Logger;

import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;
import java.util.Objects;
import java.util.function.Consumer;

import static org.bytedeco.ffmpeg.avcodec.AVCodecContext.AV_EF_EXPLODE;
import static org.bytedeco.ffmpeg.avcodec.AVCodecContext.FF_THREAD_SLICE;
import static org.bytedeco.ffmpeg.global.avcodec.*;
import static org.bytedeco.ffmpeg.global.avutil.*;

public class App extends Application {

    /**** decoder variables ****/

    private AVHWContextInfo hardwareContext;

    private AVCodec decoder;
    private AVCodecContext m_VideoDecoderCtx;

    private AVCodecContext.Get_format_AVCodecContext_IntPointer formatCallback;
    private final int streamResolutionX = 1920;
    private final int streamResolutionY = 1080;

    // AV_HWDEVICE_TYPE_CUDA // example works with cuda
    // AV_HWDEVICE_TYPE_DXVA2 // producing Invalid data found on keyframe
    // AV_HWDEVICE_TYPE_D3D11VA // producing Invalid data found on keyframe
    private static final int HW_DEVICE_TYPE = AV_HWDEVICE_TYPE_DXVA2;

    private static final boolean USE_HW_ACCEL = true;

    private static final boolean USE_AV_EF_EXPLODE = true;

    public static void main(final String[] args) {
        //System.setProperty("prism.order", "d3d,sw");
        System.setProperty("prism.vsync", "false");
        Application.launch(App.class);
    }

    @Override
    public void start(final Stage primaryStage) {
        final Pane dummyPane = new Pane();
        dummyPane.setStyle("-fx-background-color: black");
        final Scene scene = new Scene(dummyPane, this.streamResolutionX, this.streamResolutionY);
        primaryStage.setScene(scene);
        primaryStage.show();
        primaryStage.setMinWidth(480);
        primaryStage.setMinHeight(360);

        this.initializeFFmpeg(result -> {
            if (!result) {
                Logger.error("FFmpeg could not be initialized correctly, terminating program");
                System.exit(1);
                return;
            }
            this.performTestFramesFeeding();
        });
    }

    private void initializeFFmpeg(final Consumer<Boolean> finishHandler) {
        FFmpegLogCallback.setLevel(AV_LOG_DEBUG); // Increase log level until the first frame is decoded
        //FFmpegLogCallback.setLevel(AV_LOG_QUIET);
        this.decoder = avcodec_find_decoder(AV_CODEC_ID_H264); // usually decoder name is h264 and without hardware support it's yuv420p otherwise nv12

        if (this.decoder == null) {
            Logger.error("Unable to find decoder for format {}", "h264");
            finishHandler.accept(false);
            return;
        }
        Logger.info("Current decoder name: {}, {}", this.decoder.name().getString(), this.decoder.long_name().getString());

        if (true) {
            for (; ; ) {
                this.m_VideoDecoderCtx = avcodec_alloc_context3(this.decoder);
                if (this.m_VideoDecoderCtx == null) {
                    Logger.error("Unable to find decoder for format AV_CODEC_ID_H264");
                    if (this.hardwareContext != null) {
                        this.hardwareContext.free();
                        this.hardwareContext = null;
                    }
                    continue;
                }

                if (App.USE_HW_ACCEL) {
                    this.hardwareContext = this.createHardwareContext();
                    if (this.hardwareContext != null) {
                        Logger.info("Set hwaccel support");
                        this.m_VideoDecoderCtx.hw_device_ctx(this.hardwareContext.hwContext()); // comment to disable hwaccel
                    }
                } else {
                    Logger.info("Hwaccel manually disabled");
                }


                // Always request low delay decoding
                this.m_VideoDecoderCtx.flags(this.m_VideoDecoderCtx.flags() | AV_CODEC_FLAG_LOW_DELAY);

                // Allow display of corrupt frames and frames missing references
                this.m_VideoDecoderCtx.flags(this.m_VideoDecoderCtx.flags() | AV_CODEC_FLAG_OUTPUT_CORRUPT);
                this.m_VideoDecoderCtx.flags2(this.m_VideoDecoderCtx.flags2() | AV_CODEC_FLAG2_SHOW_ALL);

                if (App.USE_AV_EF_EXPLODE) {
                    // Report decoding errors to allow us to request a key frame
                    this.m_VideoDecoderCtx.err_recognition(this.m_VideoDecoderCtx.err_recognition() | AV_EF_EXPLODE);
                }

                // Enable slice multi-threading for software decoding
                if (this.m_VideoDecoderCtx.hw_device_ctx() == null) { // if not hw accelerated
                    this.m_VideoDecoderCtx.thread_type(this.m_VideoDecoderCtx.thread_type() | FF_THREAD_SLICE);
                    this.m_VideoDecoderCtx.thread_count(2/*AppUtil.getCpuCount()*/);
                } else {
                    // No threading for HW decode
                    this.m_VideoDecoderCtx.thread_count(1);
                }

                this.m_VideoDecoderCtx.width(this.streamResolutionX);
                this.m_VideoDecoderCtx.height(this.streamResolutionY);
                this.m_VideoDecoderCtx.pix_fmt(this.getDefaultPixelFormat());

                this.formatCallback = new AVCodecContext.Get_format_AVCodecContext_IntPointer() {
                    @Override
                    public int call(final AVCodecContext context, final IntPointer pixelFormats) {
                        final boolean hwDecodingSupported = context.hw_device_ctx() != null && App.this.hardwareContext != null;
                        final int preferredPixelFormat = hwDecodingSupported ?
                                App.this.hardwareContext.hwConfig().pix_fmt() :
                                context.pix_fmt();
                        int i = 0;
                        while (true) {
                            final int currentSupportedFormat = pixelFormats.get(i++);
                            System.out.println("Supported pixel formats " + currentSupportedFormat);
                            if (currentSupportedFormat == preferredPixelFormat) {
                                Logger.info("[FFmpeg]: pixel format in format callback is {}", currentSupportedFormat);
                                return currentSupportedFormat;
                            }
                            if (currentSupportedFormat == AV_PIX_FMT_NONE) {
                                break;
                            }
                        }

                        i = 0;
                        while (true) { // try again and search for yuv
                            final int currentSupportedFormat = pixelFormats.get(i++);
                            if (currentSupportedFormat == AV_PIX_FMT_YUV420P) {
                                Logger.info("[FFmpeg]: Not found in first match so use {}", AV_PIX_FMT_YUV420P);
                                return currentSupportedFormat;
                            }
                            if (currentSupportedFormat == AV_PIX_FMT_NONE) {
                                break;
                            }
                        }

                        i = 0;
                        while (true) { // try again and search for nv12
                            final int currentSupportedFormat = pixelFormats.get(i++);
                            if (currentSupportedFormat == AV_PIX_FMT_NV12) {
                                Logger.info("[FFmpeg]: Not found in second match so use {}", AV_PIX_FMT_NV12);
                                return currentSupportedFormat;
                            }
                            if (currentSupportedFormat == AV_PIX_FMT_NONE) {
                                break;
                            }
                        }

                        Logger.info("[FFmpeg]: pixel format in format callback is using fallback {}", AV_PIX_FMT_NONE);
                        return AV_PIX_FMT_NONE;
                    }
                };
                this.m_VideoDecoderCtx.get_format(this.formatCallback);

                final AVDictionary options = new AVDictionary(null);
                final int result = avcodec_open2(this.m_VideoDecoderCtx, this.decoder, options);
                if (result < 0) {
                    Logger.error("avcodec_open2 was not successful");
                    finishHandler.accept(false);
                    return;
                }
                av_dict_free(options);
                break;
            }
        }

        if (this.decoder == null || this.m_VideoDecoderCtx == null) {
            finishHandler.accept(false);
            return;
        }
        finishHandler.accept(true);
    }

    private AVHWContextInfo createHardwareContext() {
        AVHWContextInfo result = null;
        for (int i = 0; ; i++) {
            final AVCodecHWConfig config = avcodec_get_hw_config(this.decoder, i);
            if (config == null) {
                break;
            }

            if ((config.methods() & AV_CODEC_HW_CONFIG_METHOD_HW_DEVICE_CTX) < 0) {
                continue;
            }
            final int device_type = config.device_type();
            if (device_type != App.HW_DEVICE_TYPE) {
                continue;
            }
            final AVBufferRef hw_context = av_hwdevice_ctx_alloc(device_type);
            if (hw_context == null || av_hwdevice_ctx_create(hw_context, device_type, (String) null, null, 0) < 0) {
                Logger.error("HW accel not supported for type {}", device_type);
                av_free(config);
                av_free(hw_context);
            } else {
                Logger.info("HW accel created for type {}", device_type);
                result = new AVHWContextInfo(config, hw_context);
            }
            break;
        }

        return result;
    }

    @Override
    public void stop() {
        this.releaseNativeResources();
    }

    /************************/
    /*** video processing ***/
    /************************/


    private void performTestFramesFeeding() {
        final AVPacket pkt = av_packet_alloc();
        if (pkt == null) {
            return;
        }
        try (final BytePointer bp = new BytePointer(65_535 * 4)) {
            final byte[] frameData = AVTestFrames.h264KeyTestFrame;


            bp.position(0);

            bp.put(frameData);
            bp.limit(frameData.length);

            pkt.data(bp);
            pkt.capacity(bp.capacity());
            pkt.size(frameData.length);
            pkt.position(0);
            pkt.limit(frameData.length);
            final AVFrame avFrame = av_frame_alloc();

            final int err = avcodec_send_packet(this.m_VideoDecoderCtx, pkt); // this will fail with D3D11VA and DXVA2
            if (err < 0) {
                final BytePointer buffer = new BytePointer(512);
                av_strerror(err, buffer, buffer.capacity());
                final String string = buffer.getString();
                System.out.println("Error on decoding test frame " + err + " message " + string);
                av_frame_free(avFrame);
                return;
            }

            final int result = avcodec_receive_frame(this.m_VideoDecoderCtx, avFrame);
            final AVFrame decodedFrame;
            if (result == 0) {
                if (this.m_VideoDecoderCtx.hw_device_ctx() == null) {
                    decodedFrame = avFrame;
                    av_frame_unref(decodedFrame);
                    System.out.println("SUCESS with SW decoding");
                } else {
                    final AVFrame hwAvFrame = av_frame_alloc();
                    if (av_hwframe_transfer_data(hwAvFrame, avFrame, 0) < 0) {
                        System.out.println("Failed to transfer frame from hardware");
                        av_frame_unref(hwAvFrame);
                        decodedFrame = avFrame;
                    } else {
                        av_frame_unref(avFrame);
                        decodedFrame = hwAvFrame;
                        System.out.println("SUCESS with HW decoding");
                    }
                    av_frame_unref(decodedFrame);
                }
            } else {
                final BytePointer buffer = new BytePointer(512);
                av_strerror(result, buffer, buffer.capacity());
                final String string = buffer.getString();
                System.out.println("error " + result + " message " + string);
                av_frame_free(avFrame);
            }
        } finally {
            if (pkt.stream_index() != -1) {
                av_packet_unref(pkt);
            }
            pkt.releaseReference();
        }
    }

    final Object releaseLock = new Object();
    private volatile boolean released = false;

    private void releaseNativeResources() {
        if (this.released) {
            return;
        }
        this.released = true;
        synchronized (this.releaseLock) {
            // Close the video codec
            if (this.m_VideoDecoderCtx != null) {
                avcodec_free_context(this.m_VideoDecoderCtx);
                this.m_VideoDecoderCtx = null;
            }

            // close the format callback
            if (this.formatCallback != null) {
                this.formatCallback.close();
                this.formatCallback = null;
            }

            // close hw context
            if (this.hardwareContext != null) {
                this.hardwareContext.free();
            }
        }
    }

    private int getDefaultPixelFormat() {
        return AV_PIX_FMT_YUV420P; // Always return yuv420p here
    }

    public static final class HexUtil {
        private static final char[] hexArray = "0123456789ABCDEF".toCharArray();

        private HexUtil() {
        }

        public static String hexlify(final byte[] bytes) {
            final char[] hexChars = new char[bytes.length * 2];

            for (int j = 0; j < bytes.length; ++j) {
                final int v = bytes[j] & 255;
                hexChars[j * 2] = HexUtil.hexArray[v >>> 4];
                hexChars[j * 2 + 1] = HexUtil.hexArray[v & 15];
            }

            return new String(hexChars);
        }

        public static byte[] unhexlify(final String argbuf) {
            final int arglen = argbuf.length();
            if (arglen % 2 != 0) {
                throw new RuntimeException("Odd-length string");
            } else {
                final byte[] retbuf = new byte[arglen / 2];

                for (int i = 0; i < arglen; i += 2) {
                    final int top = Character.digit(argbuf.charAt(i), 16);
                    final int bot = Character.digit(argbuf.charAt(i + 1), 16);
                    if (top == -1 || bot == -1) {
                        throw new RuntimeException("Non-hexadecimal digit found");
                    }

                    retbuf[i / 2] = (byte) ((top << 4) + bot);
                }

                return retbuf;
            }
        }
    }

    public static final class AVHWContextInfo {
        private final AVCodecHWConfig hwConfig;
        private final AVBufferRef hwContext;

        private volatile boolean freed = false;

        public AVHWContextInfo(final AVCodecHWConfig hwConfig, final AVBufferRef hwContext) {
            this.hwConfig = hwConfig;
            this.hwContext = hwContext;
        }

        public AVCodecHWConfig hwConfig() {
            return this.hwConfig;
        }

        public AVBufferRef hwContext() {
            return this.hwContext;
        }

        public void free() {
            if (this.freed) {
                return;
            }
            this.freed = true;
            av_free(this.hwConfig);
            av_free(this.hwContext);
        }


        @Override
        public boolean equals(Object o) {
            if (this == o) return true;
            if (o == null || getClass() != o.getClass()) return false;
            AVHWContextInfo that = (AVHWContextInfo) o;
            return freed == that.freed && Objects.equals(hwConfig, that.hwConfig) && Objects.equals(hwContext, that.hwContext);
        }

        @Override
        public int hashCode() {
            return Objects.hash(hwConfig, hwContext, freed);
        }

        @Override
        public String toString() {
            return "AVHWContextInfo[" +
                    "hwConfig=" + this.hwConfig + ", " +
                    "hwContext=" + this.hwContext + ']';
        }

    }

    public static final class AVTestFrames {

        private AVTestFrames() {

        }

        static {
            InputStream inputStream = null;
            try {
                inputStream = AVTestFrames.class.getClassLoader().getResourceAsStream("h264_test_key_frame.txt");
                final byte[] h264TestFrameBuffer = inputStream == null ? new byte[0] : inputStream.readAllBytes();
                final String h264TestFrame = new String(h264TestFrameBuffer, StandardCharsets.UTF_8);
                AVTestFrames.h264KeyTestFrame = HexUtil.unhexlify(h264TestFrame);
            } catch (final IOException e) {
                Logger.error(e, "Could not parse test frame");
            } finally {
                if (inputStream != null) {
                    try {
                        inputStream.close();
                        inputStream = null;
                    } catch (final IOException e) {
                        Logger.error(e, "Could not close test frame input stream");
                    }
                }
            }
        }

        public static byte[] h264KeyTestFrame;
    }
}

The build gradle of the project looks like this

plugins {
    id 'application'
    id 'org.openjfx.javafxplugin' version '0.0.13'
}

group 'com.test.example'
version '1.0.0'

repositories {
    mavenCentral()
    mavenLocal()
    maven { url 'https://jitpack.io' }
}

dependencies {
    implementation group: 'org.bytedeco', name: 'javacv-platform', version: '1.5.8'
    implementation group: 'com.github.oshi', name: 'oshi-core', version: '3.4.3'
    implementation 'org.tinylog:tinylog-api:2.1.0'
    implementation 'org.tinylog:tinylog-impl:2.1.0'
    implementation 'org.jcodec:jcodec:0.2.5'
}

test {
    useJUnitPlatform()
}

javafx {
    version = '17.0.6'
    modules = ['javafx.graphics', 'javafx.controls', 'javafx.fxml', 'javafx.base']
}

mainClassName = 'com.test.example.App'

Solution

  • After countless of hours debugging, searching on the internet and reading source code of ffmpeg I finally found the issue. This is indeed a limitation in the current source code of ffmpeg (still exists until today version 6.0.0)

    I'm using ffmpeg via javacv and I merged there a ffmpeg patch which fixes the issue. It seems that in h264dec.h there is a variable called MAX_SLICES which is set to 32. This MAX_SLICES value is used here dxva2_h264.c. I also found this interesting bug report on github which seems to be related to that issue

    https://github.com/wang-bin/QtAV/issues/923

    My pull request on javacv can be found here. So if anyone if facing the same problem with D3D11VA and DXVA2 hw acceleration decoding please check how many slices the frame you want to decode has and if it has more than 32 and you are using an unmodified version of ffmpeg the decoding will fail. I don't know why the supported slices is set that low in the code though. Software decoding and using Nvidea cuvid decoder are not affected by this limitation.