[SOLVED] How can I render frames decoded by FFmpeg using hardware decoding with D3D11?

This can't possibly be answered in full here, but the open-source VLC player's Windows source code will demonstrate how to do this.

To answer your questions at a higher level:

What is the equivalent concept in D3D11 for a decoded frame? Is it a texture?

If using full hardware decoding, then yes always. If not, then it depends on if you want to use the GPU just to convert from NV12 to BGRA - which has to happen at some point in the process. Let's walk through the different scenarios:

Full Hardware Acceleration

First you'll have one GPU texture for the current output frame, in NV12 pixel format. Each decoded frame gets copied onto this single output texture. You map this output texture onto a simple rectangle serving as your "screen" with the camera pointed directly at it, and render and flip your swapchain after each frame is copied.

For the NV12 to BGRA conversion - which is necessary because AFAIK your final output to the window has to be a BGRA swapchain - you then need to apply a hardware shader on the output texture to convert from NV12 to BGRA, otherwise the colors will be completely wrong. VLC has one but it's licensed under GPL so you might have an issue there if you used it in your own project. However the actual algorithm is what it is, so if you can write your own shader with HSL then you can copy the math without copying the code. (Edit - looks like the github project has an MIT-licensed shader from Microsoft so this would be the one to use).

On top of this, setting up the whole decoding system using libav, if you haven't already done so, is a nightmare and for less than 4K resolution isn't really necessary on modern PCs. But if you do use it, you'd allocate a texture array using CreateTexture2D with DXGI_FORMAT_NV12 pixel format and no CPU access. Allocate a few dozen (depends on how much VRAM you want to say your player requires). You'd hold onto those textures and feed them to the decoder via AVD3D11VAContext::surface and the decoder will decode directly to those textures. Then you have pure VRAM textures in NV12 format, which you continually copy to your single output texture through simple VRAM-VRAM texture copy and then immediately put back in the queue to be reused by the decoder. Note this only works with specific libav codecs.

So with this method everything stays in VRAM the whole time and it's certainly the best performing way to decode video with libav and D3D11. But it's extremely complicated and requires a very deep dive into video decoding and libav specifically. Anyone who tries to give you advice here who hasn't gone down that specific rabbit hole likely doesn't know what they're talking about.

No hardware acceleration

You say you're using hardware decoding, but I'm not sure you are so I still want to take a second to discuss the alternative as it's much simpler. The frames given to you by libav are already in CPU ram, although still in NV12 format, so you still have to convert them to BGRA at some point. Here again you have two options:

You could continue to hold onto an NV12 D3D output texture mapped to a rectangle, as above, using a hardware shader to convert from NV12. Then for each frame you have to copy the frame texture to the output texture, which means copying from RAM to VRAM, which you probably know is very slow.
You could do everything in CPU/RAM by using sws_getCachedContext and sws_scale to get BGRA pixels. If you do it this way you don't need to use D3D at all; you would just have raw BGRA data at that point that you could copy to the window using D2D or even GDI+.

Because copying from RAM to VRAM is so slow, I couldn't tell you whether (1) or (2) is faster. In other words the benefit to using the GPU to convert from NV12 to BGRA could easily be offset by the slowness of copying the NV12 data into VRAM. So you'll just have to experiment here.

I have seen many solutions that convert NV12 data to RGB, but it seems that DX11 does not require this conversion anymore.

Hm. Not sure what you mean here. Yes D3D11 can handle an NV12 texture, in VRAM or RAM, but the final output surface or swapchain always has to be BGRA in Windows AFAIK so you will have to convert at some point in the process. The most efficient way to do this is with a shader applied to the NV12 output texture.

I just want to display this frame, and since my frame is on the GPU, is there a more convenient way to render directly on the GPU without copying?

Yeah, this would require going the acceleration route. If your frame is on the GPU then you're already using HA and if so should already have access to the DXGI textures that the decoder outputs to.

Edit: I looked at the github project you referenced and it's not using ffmpeg/libav to decode. The textures it's reading off the disk are already NV12 frames. So it's not an example of hardware decoding.

For true hardware decoding, again, check out VLC and how it uses libav (the underlying library powering ffmpeg) to directly decode to DXGI textures.

What the Github project DOES demonstrate is how to use the shader on the NV12 textures, after they're decoded, including setting set up the viewport, shader, etc., which is all stuff you'll have to do also. So it's a good resource to be sure, but it's not even half the battle when it comes to making a full player.