c++texturesdirectx-11unity-game-enginedxt

Error runtime update of DXT compressed textures with Directx11


Context: I'm developing a native C++ Unity 5 plugin that reads in DXT compressed texture data and uploads it to the GPU for further use in Unity. The aim is to create an fast image-sequence player, updating image data on-the-fly. The textures are compressed with an offline console application. Unity can work with different graphics engines, I'm aiming towards DirectX11 and OpenGL 3.3+.

Problem: The DirectX runtime texture update code, through a mapped subresource, gives different outputs on different graphics drivers. Updating a texture through such a mapped resource means mapping a pointer to the texture data and memcpy'ing the data from the RAM buffer to the mapped GPU buffer. Doing so, different drivers seem to expect different parameters for the row pitch value when copying bytes. I never had problems on the several Nvidia GPU's I tested on, but AMD and Intel GPU seems to act differently and I get distorted output as shown underneath. Furthermore, I'm working with DXT1 pixel data (0.5bpp) and DXT5 data (1bpp). I can't seem to get the correct pitch parameter for these DXT textures.

Code: The following initialisation code for generating the d3d11 texture and filling it with initial texture data - e.g. the first frame of an image sequence - works perfect on all drivers. The player pointer points to a custom class that handles all file reads and contains getters for the current loaded DXT compressed frame, it's dimensions, etc...

if (s_DeviceType == kUnityGfxRendererD3D11)
    {
        HRESULT hr;
        DXGI_FORMAT format = (compression_type == DxtCompressionType::DXT_TYPE_DXT1_NO_ALPHA) ? DXGI_FORMAT_BC1_UNORM : DXGI_FORMAT_BC3_UNORM;

        // Create texture
        D3D11_TEXTURE2D_DESC desc;
        desc.Width = w;
        desc.Height = h;
        desc.MipLevels = 1;
        desc.ArraySize = 1;
        desc.Format = format;
        // no anti-aliasing
        desc.SampleDesc.Count = 1;
        desc.SampleDesc.Quality = 0;
        desc.Usage = D3D11_USAGE_DYNAMIC;
        desc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
        desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
        desc.MiscFlags = 0;

        // Initial data: first frame
        D3D11_SUBRESOURCE_DATA data;
        data.pSysMem = player->getBufferPtr();
        data.SysMemPitch = 16 * (player->getWidth() / 4);
        data.SysMemSlicePitch = 0; // just a 2d texture, no depth

        // Init with initial data
        hr = g_D3D11Device->CreateTexture2D(&desc, &data, &dxt_d3d_tex);

        if (SUCCEEDED(hr) && dxt_d3d_tex != 0)
        {
            DXT_VERBOSE("Succesfully created D3D Texture.");

            DXT_VERBOSE("Creating D3D SRV.");
            D3D11_SHADER_RESOURCE_VIEW_DESC SRVDesc;
            memset(&SRVDesc, 0, sizeof(SRVDesc));
            SRVDesc.Format = format;
            SRVDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE2D;
            SRVDesc.Texture2D.MipLevels = 1;

            hr = g_D3D11Device->CreateShaderResourceView(dxt_d3d_tex, &SRVDesc, &textureView);
            if (FAILED(hr))
            {
                dxt_d3d_tex->Release();
                return hr;
            }
            DXT_VERBOSE("Succesfully created D3D SRV.");
        }
        else
        {
            DXT_ERROR("Error creating D3D texture.")
        }
    }

The following update code that runs for each new frame has the error somewhere. Please note the commented line containing method 1 using a simple memcpy without any rowpitch specified which works well on NVIDIA drivers.
You can see further in method 2 that I log the different row pitch values. For instace for a 1920x960 frame I get 1920 for the buffer stride, and 2048 for the runtime stride. This 128 pixels difference probably have to be padded (as can be seen in the example pic below) but I can't figure out how. When I just use the mappedResource.RowPitch without dividing it by 4 (done by the bitshift), Unity crashes.

    ID3D11DeviceContext* ctx = NULL;
    g_D3D11Device->GetImmediateContext(&ctx);

    if (dxt_d3d_tex && bShouldUpload)
    {
        if (player->gather_stats) before_upload = ns();

        D3D11_MAPPED_SUBRESOURCE mappedResource;
        ctx->Map(dxt_d3d_tex, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);

        /* 1: THIS CODE WORKS ON ALL NVIDIA DRIVERS BUT GENERATES DISTORTED OR NO OUTPUT ON AMD/INTEL: */
        //memcpy(mappedResource.pData, player->getBufferPtr(), player->getBytesPerFrame());

        /* 2: THIS CODE GENERATES OUTPUT BUT SEEMS TO NEED PADDING? */
        BYTE* mappedData = reinterpret_cast<BYTE*>(mappedResource.pData);
        BYTE* buffer = player->getBufferPtr();
        UINT height = player->getHeight();
        UINT buffer_stride = player->getBytesPerFrame() / player->getHeight();
        UINT runtime_stride = mappedResource.RowPitch >> 2;

        DXT_VERBOSE("Buffer stride: %d", buffer_stride);
        DXT_VERBOSE("Runtime stride: %d", runtime_stride);

        for (UINT i = 0; i < height; ++i)
        {
            memcpy(mappedData, buffer, buffer_stride);
            mappedData += runtime_stride;
            buffer += buffer_stride;
        }

        ctx->Unmap(dxt_d3d_tex, 0);
    }

Example pic 1 - distorted ouput when using memcpy to copy whole buffer without using separate row pitch on AMD/INTEL (method 1)

distorted output

Example pic 2 - better but still erroneous output when using above code with mappedResource.RowPitch on AMD/INTEL (method 2). The blue bars indicate zone of error, and need to disappear so all pixels align well and form one image.

enter image description here

Thanks for any pointers! Best, Vincent


Solution

  • The mapped data row pitch is in byte, when you divide by four, it is definitely an issue.

    UINT runtime_stride = mappedResource.RowPitch >> 2;
    ...
    mappedData += runtime_stride; // here you are only jumping one quarter of a row
    

    It is the height count with a BC format that is divide by 4.

    Also a BC1 format is 8 bytes per 4x4 block, so the line below should by 8 * and not 16 *, but as long as you handle row stride properly on your side, d3d will understand, you just waste half the memory here.

    data.SysMemPitch = 16 * (player->getWidth() / 4);