c++videoms-media-foundationdxva

Is there a faster way to ReadSample from an IMFSample?


I'm setting up a VideoRenderer for my application which uses Direct3D9Ex interfaces but when i use big texture (desktop resolution) the video starts to slow down.

I was using DirectShow but i found some problems with H264 and i decided to go for Media Foundation. I've searched a lot about it, but i did not get how to render a video with DXVA, and because of that, im reading a sample with IMFSourceReader (Async) using the MF_SOURCE_READER_ENABLE_ADVANCED_VIDEO_PROCESSING and MFVideoFormat_RGB32 so i can copy to my surface and then render it normal.

This is how i create the SourceReader.

    MFCreateAttributes(&m_Attributes, 4);

    m_Attributes->SetUnknown(MF_SOURCE_READER_D3D_MANAGER, GRAPHICSDEVICE->GetDeviceManager());
    m_Attributes->SetUnknown(MF_SOURCE_READER_ASYNC_CALLBACK, this);
    m_Attributes->SetUINT32(MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS, TRUE);
    m_Attributes->SetUINT32(MF_SOURCE_READER_ENABLE_ADVANCED_VIDEO_PROCESSING, TRUE);

    MFCreateSourceReaderFromURL(L"Video.mp4", m_Attributes, &m_SourceReader);
    MFCreateMediaType(&m_MediaType);
    MFSetAttributeSize(m_MediaType, MF_MT_FRAME_SIZE, m_VideoWidth, m_VideoHeight);

    m_MediaType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
    m_MediaType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_RGB32);

Then i post one ReadSample and in my Update method, i do that:

if (WaitForSingleObject(m_SampleEvent, 0) == WAIT_OBJECT_0)
    {
        if (m_SourceReader)
        {
            m_SourceReader->ReadSample(MF_SOURCE_READER_FIRST_VIDEO_STREAM, 0, nullptr, nullptr, nullptr, nullptr);
        }
    }

This is a part of my OnReadSample callback, that just copies one surface to another.

IDirect3DSurface9 * pSampleSurface = nullptr;

if (SUCCEEDED(GetD3DSurfaceFromSample(Sample, &pSampleSurface)))
{
    D3DLOCKED_RECT SampleRect;
    if (FAILED(pSampleSurface->LockRect(&SampleRect, nullptr, D3DLOCK_READONLY)))
    {
        pSampleSurface->Release();
        goto Quit;
    }

    BYTE * pVideo = (BYTE*)SampleRect.pBits;

    D3DLOCKED_RECT TextureRect;
    if (FAILED(m_Texture->LockRect(0, &TextureRect, nullptr, D3DLOCK_DISCARD)))
    {
        pSampleSurface->UnlockRect();
        pSampleSurface->Release();
        goto Quit;
    }

    BYTE * pDest = (BYTE*)TextureRect.pBits;

    for (unsigned int i = 0; i < m_VideoHeight; i++)
    {
        CopyMemory(pDest, pVideo, m_VideoWidth * 4);
        pDest += TextureRect.Pitch;
        pVideo += SampleRect.Pitch;
    }

    m_Texture->UnlockRect(0);
    pSampleSurface->UnlockRect();
    pSampleSurface->Release();
}

So, my actual results are acceptable for a debug environment, but when i change my application resolution to my desktop one (from 800x600 to 1366x768) things starts to get a lot slower.

Do i have to use something as DXVA? Can i tweak the current code to run faster? Where can i find some good samples about it?


Solution

  • The main speed related factor here is to be able to decode on GPU into texture and then use this texture without donwloading the data into system memory, if possible.

    You are doing MF_SOURCE_READER_D3D_MANAGER and eventually you read data from texture. So DXVA is already working for you, and it should work out decently fast (that is, you don't need to accelearate ReadSample per se). IDirect3DSurface9::LockRect and accessing bits is presumably making is slow, you might want to disable reading texture step and compare the performance to verify.