performanceopenglframebufferfboamd-gpu

OpenGL drops performance when writing to nonzero FBO attachment on AMD


I noticed that my 3D engine runs very slow on AMD hardware. After some investigation the slow code boiled down to creating FBO with several attachments and writing to any nonzero attachment. In all tests I compared AMD performance with the same AMD GPU, but writing to unaffected GL_COLOR_ATTACHMENT0, and with Nvidia hardware whose performance difference to my AMD device is well known.

Writing fragments to nonzero attachments is 2-3 times slower than expected.

This code is equivalent to how I create a framebuffer and measure performance in my test apps:

    // Create a framebuffer
    static const auto attachmentCount = 6;
    GLuint fb, att[attachmentCount];
    glGenTextures(attachmentCount, att);
    glGenFramebuffers(1, &fb);
    glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);

    for (auto i = 0; i < attachmentCount; ++i) {
        glBindTexture(GL_TEXTURE_2D, att[i]);
        glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
        glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0 + i, GL_TEXTURE_2D, att[i], 0);
    }
    GLuint dbs[] = {
        GL_NONE,
        GL_COLOR_ATTACHMENT1,
        GL_NONE,
        GL_NONE,
        GL_NONE,
        GL_NONE};
    glDrawBuffers(attachmentCount, dbs);


    // Main loop
    while (shouldWork) {
        glClear(GL_COLOR_BUFFER_BIT);
        for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
        glfwSwapBuffers(window);
        glfwPollEvents();
        showFps();
    }

Is anything wrong with it?

Fully reproducible minimal tests can be found here. I tried many other writing patterns or OpenGL states and described some of them in AMD Community.

I suppose the problem is in AMD's OpenGL driver, but if it's not, or you faced the same problem and found a workaround (a vendor extension?), please share.

UPD: moving problem detail here.

I prepared a minimal test pack, where the application creates an FBO with six RGBA UNSIGNED_BYTE attachments and renders 100 fullscreen rects per frame to it. There are four executables with four patterns of writing:

  1. Writing shader output 0 to attachment 0. Only output 0 is routed to the framebuffer with glDrawBuffers. All other outputs are set to GL_NONE.

  2. Same as 1, but with output and attachment 1.

  3. Writing output 0 to attachment 0, but all six shader outputs are routed to attachments 0..6 respectively, and all drawbuffers except 0 are masked with glColorMaski.

  4. Same as 3, but for attachment 1.

I run all tests on two machines with almost similar CPUs and following GPUs:

AMD Radeon RX550, driver version 19.30.01.16

Nvidia Geforce GTX 650 Ti, which is ~2x less powerful than RX550

and got these results:

Geforce GTX 650 Ti:
attachment0: 195 FPS
attachment1: 195 FPS
attachment0 masked: 195 FPS
attachment1 masked: 235 FPS
Radeon RX550:
attachment0: 350 FPS
attachment1: 185 FPS
attachment0 masked: 330 FPS
attachment1 masked: 175 FPS

Pre-built test executables are attached to the post or can be downloaded from Google drive.

Test sources (with MSVS-friendly cmake buildsystem) are available here on Github

All four programs show a black window and console with FPS counter.

We see that when writing to nonzero attachment, AMD is much slower than less powerful nvidia GPU and than itself. Also global masking of drawbuffer output drops some fps.

I also tried to use renderbuffers instead of textures, use other image formats (while the formats in tests are the most compatible ones), render to power-of-two sized framebuffer. Results were the same.

Explicitly turning off scissor, stencil and depth tests does not help.

If I decrease number of attachments or reduce framebuffer coverage by multiplying vertex coords by less then 1 value, test performance increases proportionally, and finally RX550 outperforms GTX 650 Ti.

glClear calls are also affected, and their performance under various conditions fits the above observations.

My teammate launched tests on Radeon HD 3000 with Linux natively and using Wine. Both test runs exposed the same huge difference between attachment0 and attachment1 tests. I can't tell exactly what is his driver version, but it's provided by Ubuntu 19.04 repos.

Another teammate tried the tests on Radeon RX590 and got the same 2 times difference.

Finally, let me copy-paste two almost equivalent test examples here. This one works fast:

#include <iostream>
#include <cassert>
#include <string>
#include <sstream>
#include <chrono>

#include "GL/glew.h"
#include "GLFW/glfw3.h"
#include <vector>

static std::string getErrorDescr(const GLenum errCode)
{
    // English descriptions are from
    // https://www.opengl.org/sdk/docs/man/docbook4/xhtml/glGetError.xml
    switch (errCode) {
        case GL_NO_ERROR: return "No error has been recorded. THIS message is the error itself.";
        case GL_INVALID_ENUM: return "An unacceptable value is specified for an enumerated argument.";
        case GL_INVALID_VALUE: return "A numeric argument is out of range.";
        case GL_INVALID_OPERATION: return "The specified operation is not allowed in the current state.";
        case GL_INVALID_FRAMEBUFFER_OPERATION: return "The framebuffer object is not complete.";
        case GL_OUT_OF_MEMORY: return "There is not enough memory left to execute the command.";
        case GL_STACK_UNDERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to underflow.";
        case GL_STACK_OVERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to overflow.";
        default:;
    }
    return "No description available.";
}

static std::string getErrorMessage()
{
    const GLenum error = glGetError();
    if (GL_NO_ERROR == error) return "";

    std::stringstream ss;
    ss << "OpenGL error: " << static_cast<int>(error) << std::endl;
    ss << "Error string: ";
    ss << getErrorDescr(error);
    ss << std::endl;
    return ss.str();
}

[[maybe_unused]] static bool error()
{
    const auto message = getErrorMessage();
    if (message.length() == 0) return false;
    std::cerr << message;
    return true;
}

static bool compileShader(const GLuint shader, const std::string& source)
{
    unsigned int linesCount = 0;
    for (const auto c: source) linesCount += static_cast<unsigned int>(c == '\n');
    const char** sourceLines = new const char*[linesCount];
    int* lengths = new int[linesCount];

    int idx = 0;
    const char* lineStart = source.data();
    int lineLength = 1;
    const auto len = source.length();
    for (unsigned int i = 0; i < len; ++i) {
        if (source[i] == '\n') {
            sourceLines[idx] = lineStart;
            lengths[idx] = lineLength;
            lineLength = 1;
            lineStart = source.data() + i + 1;
            ++idx;
        }
        else ++lineLength;
    }

    glShaderSource(shader, linesCount, sourceLines, lengths);
    glCompileShader(shader);
    GLint logLength;
    glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &logLength);
    if (logLength > 0) {
        auto* const log = new GLchar[logLength + 1];
        glGetShaderInfoLog(shader, logLength, nullptr, log);
        std::cout << "Log: " << std::endl;
        std::cout << log;
        delete[] log;
    }

    GLint compileStatus;
    glGetShaderiv(shader, GL_COMPILE_STATUS, &compileStatus);
    delete[] sourceLines;
    delete[] lengths;
    return bool(compileStatus);
}

static GLuint createProgram(const std::string& vertSource, const std::string& fragSource)
{
    const auto vs = glCreateShader(GL_VERTEX_SHADER);
    if (vs == 0) {
        std::cerr << "Error: vertex shader is 0." << std::endl;
        return 2;
    }
    const auto fs = glCreateShader(GL_FRAGMENT_SHADER);
    if (fs == 0) {
        std::cerr << "Error: fragment shader is 0." << std::endl;
        return 2;
    }

    // Compile shaders
    if (!compileShader(vs, vertSource)) {
        std::cerr << "Error: could not compile vertex shader." << std::endl;
        return 5;
    }
    if (!compileShader(fs, fragSource)) {
        std::cerr << "Error: could not compile fragment shader." << std::endl;
        return 5;
    }

    // Link program
    const auto program = glCreateProgram();
    if (program == 0) {
        std::cerr << "Error: program is 0." << std::endl;
        return 2;
    }
    glAttachShader(program, vs);
    glAttachShader(program, fs);
    glLinkProgram(program);

    // Get log
    GLint logLength = 0;
    glGetProgramiv(program, GL_INFO_LOG_LENGTH, &logLength);

    if (logLength > 0) {
        auto* const log = new GLchar[logLength + 1];
        glGetProgramInfoLog(program, logLength, nullptr, log);
        std::cout << "Log: " << std::endl;
        std::cout << log;
        delete[] log;
    }
    GLint linkStatus = 0;
    glGetProgramiv(program, GL_LINK_STATUS, &linkStatus);
    if (!linkStatus) {
        std::cerr << "Error: could not link." << std::endl;
        return 2;
    }
    glDeleteShader(vs);
    glDeleteShader(fs);
    return program;
}

static const std::string vertSource = R"(
#version 330
layout(location = 0) in vec2 v;
void main()
{
    gl_Position = vec4(v, 0.0, 1.0);
}
)";

static const std::string fragSource = R"(
#version 330
layout(location = 0) out vec4 outColor0;
void main()
{
    outColor0 = vec4(0.5, 0.5, 0.5, 1.0);
}
)";

int main()
{
    // Init
    if (!glfwInit()) {
        std::cerr << "Error: glfw init failed." << std::endl;
        return 3;
    }

    static const int width = 800;
    static const int height= 600;
    glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
    glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
    glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
    GLFWwindow* window = nullptr;
    window = glfwCreateWindow(width, height, "Shader test", nullptr, nullptr);
    if (window == nullptr) {
        std::cerr << "Error: window is null." << std::endl;
        glfwTerminate();
        return 1;
    }
    glfwMakeContextCurrent(window);

    if (glewInit() != GLEW_OK) {
        std::cerr << "Error: glew not OK." << std::endl;
        glfwTerminate();
        return 2;
    }

    // Shader program
    const auto shaderProgram = createProgram(vertSource, fragSource);
    glUseProgram(shaderProgram);

    // Vertex buffer
    GLuint vao;
    glGenVertexArrays(1, &vao);
    glBindVertexArray(vao);

    GLuint buffer;
    glGenBuffers(1, &buffer);
    glBindBuffer(GL_ARRAY_BUFFER, buffer);
    float bufferData[] = {
        -1.0f, -1.0f,
        1.0f, -1.0f,
        1.0f, 1.0f,
        -1.0f, -1.0f,
        1.0f, 1.0f,
        -1.0f, 1.0f
    };
    glBufferData(GL_ARRAY_BUFFER, std::size(bufferData) * sizeof(float), bufferData, GL_STATIC_DRAW);
    glEnableVertexAttribArray(0);
    glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, (GLvoid*)(0));

    glClearColor(0.0f, 0.0f, 0.0f, 0.0f);

    // Framebuffer
    GLuint fb, att[6];
    glGenTextures(6, att);
    glGenFramebuffers(1, &fb);

    glBindTexture(GL_TEXTURE_2D, att[0]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[1]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[2]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[3]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[4]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[5]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);

    glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, att[0], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, att[1], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, att[2], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, att[3], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT4, GL_TEXTURE_2D, att[4], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT5, GL_TEXTURE_2D, att[5], 0);

    GLuint dbs[] = {
        GL_COLOR_ATTACHMENT0,
        GL_NONE,
        GL_NONE,
        GL_NONE,
        GL_NONE,
        GL_NONE};
    glDrawBuffers(6, dbs);

    if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER)) {
        std::cerr << "Error: framebuffer is incomplete." << std::endl;
        return 1;
    }
    if (error()) {
        std::cerr << "OpenGL error occured." << std::endl;
        return 2;
    }

    // Fpsmeter
    static const uint32_t framesMax = 50;
    uint32_t framesCount = 0;
    auto start = std::chrono::steady_clock::now();

    // Main loop
    while (!glfwWindowShouldClose(window)) {
        if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS) glfwSetWindowShouldClose(window, GLFW_TRUE);

        glClear(GL_COLOR_BUFFER_BIT);
        for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
        glfwSwapBuffers(window);
        glfwPollEvents();

        if (++framesCount == framesMax) {
            framesCount = 0;
            const auto now = std::chrono::steady_clock::now();
            const auto duration = now - start;
            start = now;
            const float secsPerFrame = (std::chrono::duration_cast<std::chrono::microseconds>(duration).count() / 1000000.0f) / framesMax;
            std::cout << "FPS: " << 1.0f / secsPerFrame << std::endl;
        }
    }

    // Shutdown
    glBindBuffer(GL_ARRAY_BUFFER, 0);
    glBindVertexArray(vao);
    glUseProgram(0);
    glDeleteProgram(shaderProgram);
    glDeleteBuffers(1, &buffer);
    glDeleteVertexArrays(1, &vao);
    glDeleteFramebuffers(1, &fb);
    glDeleteTextures(6, att);
    glfwMakeContextCurrent(nullptr);
    glfwDestroyWindow(window);
    glfwTerminate();
    return 0;
}

And this one works equivalently fast on Nvidia and Intel GPUs, but 2-3 times slower than the first example on AMD GPUs:

#include <iostream>
#include <cassert>
#include <string>
#include <sstream>
#include <chrono>

#include "GL/glew.h"
#include "GLFW/glfw3.h"
#include <vector>

static std::string getErrorDescr(const GLenum errCode)
{
    // English descriptions are from
    // https://www.opengl.org/sdk/docs/man/docbook4/xhtml/glGetError.xml
    switch (errCode) {
        case GL_NO_ERROR: return "No error has been recorded. THIS message is the error itself.";
        case GL_INVALID_ENUM: return "An unacceptable value is specified for an enumerated argument.";
        case GL_INVALID_VALUE: return "A numeric argument is out of range.";
        case GL_INVALID_OPERATION: return "The specified operation is not allowed in the current state.";
        case GL_INVALID_FRAMEBUFFER_OPERATION: return "The framebuffer object is not complete.";
        case GL_OUT_OF_MEMORY: return "There is not enough memory left to execute the command.";
        case GL_STACK_UNDERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to underflow.";
        case GL_STACK_OVERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to overflow.";
        default:;
    }
    return "No description available.";
}

static std::string getErrorMessage()
{
    const GLenum error = glGetError();
    if (GL_NO_ERROR == error) return "";

    std::stringstream ss;
    ss << "OpenGL error: " << static_cast<int>(error) << std::endl;
    ss << "Error string: ";
    ss << getErrorDescr(error);
    ss << std::endl;
    return ss.str();
}

[[maybe_unused]] static bool error()
{
    const auto message = getErrorMessage();
    if (message.length() == 0) return false;
    std::cerr << message;
    return true;
}

static bool compileShader(const GLuint shader, const std::string& source)
{
    unsigned int linesCount = 0;
    for (const auto c: source) linesCount += static_cast<unsigned int>(c == '\n');
    const char** sourceLines = new const char*[linesCount];
    int* lengths = new int[linesCount];

    int idx = 0;
    const char* lineStart = source.data();
    int lineLength = 1;
    const auto len = source.length();
    for (unsigned int i = 0; i < len; ++i) {
        if (source[i] == '\n') {
            sourceLines[idx] = lineStart;
            lengths[idx] = lineLength;
            lineLength = 1;
            lineStart = source.data() + i + 1;
            ++idx;
        }
        else ++lineLength;
    }

    glShaderSource(shader, linesCount, sourceLines, lengths);
    glCompileShader(shader);
    GLint logLength;
    glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &logLength);
    if (logLength > 0) {
        auto* const log = new GLchar[logLength + 1];
        glGetShaderInfoLog(shader, logLength, nullptr, log);
        std::cout << "Log: " << std::endl;
        std::cout << log;
        delete[] log;
    }

    GLint compileStatus;
    glGetShaderiv(shader, GL_COMPILE_STATUS, &compileStatus);
    delete[] sourceLines;
    delete[] lengths;
    return bool(compileStatus);
}

static GLuint createProgram(const std::string& vertSource, const std::string& fragSource)
{
    const auto vs = glCreateShader(GL_VERTEX_SHADER);
    if (vs == 0) {
        std::cerr << "Error: vertex shader is 0." << std::endl;
        return 2;
    }
    const auto fs = glCreateShader(GL_FRAGMENT_SHADER);
    if (fs == 0) {
        std::cerr << "Error: fragment shader is 0." << std::endl;
        return 2;
    }

    // Compile shaders
    if (!compileShader(vs, vertSource)) {
        std::cerr << "Error: could not compile vertex shader." << std::endl;
        return 5;
    }
    if (!compileShader(fs, fragSource)) {
        std::cerr << "Error: could not compile fragment shader." << std::endl;
        return 5;
    }

    // Link program
    const auto program = glCreateProgram();
    if (program == 0) {
        std::cerr << "Error: program is 0." << std::endl;
        return 2;
    }
    glAttachShader(program, vs);
    glAttachShader(program, fs);
    glLinkProgram(program);

    // Get log
    GLint logLength = 0;
    glGetProgramiv(program, GL_INFO_LOG_LENGTH, &logLength);

    if (logLength > 0) {
        auto* const log = new GLchar[logLength + 1];
        glGetProgramInfoLog(program, logLength, nullptr, log);
        std::cout << "Log: " << std::endl;
        std::cout << log;
        delete[] log;
    }
    GLint linkStatus = 0;
    glGetProgramiv(program, GL_LINK_STATUS, &linkStatus);
    if (!linkStatus) {
        std::cerr << "Error: could not link." << std::endl;
        return 2;
    }
    glDeleteShader(vs);
    glDeleteShader(fs);
    return program;
}

static const std::string vertSource = R"(
#version 330
layout(location = 0) in vec2 v;
void main()
{
    gl_Position = vec4(v, 0.0, 1.0);
}
)";

static const std::string fragSource = R"(
#version 330
layout(location = 1) out vec4 outColor1;
void main()
{
    outColor1 = vec4(0.5, 0.5, 0.5, 1.0);
}
)";

int main()
{
    // Init
    if (!glfwInit()) {
        std::cerr << "Error: glfw init failed." << std::endl;
        return 3;
    }

    static const int width = 800;
    static const int height= 600;
    glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
    glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
    glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
    GLFWwindow* window = nullptr;
    window = glfwCreateWindow(width, height, "Shader test", nullptr, nullptr);
    if (window == nullptr) {
        std::cerr << "Error: window is null." << std::endl;
        glfwTerminate();
        return 1;
    }
    glfwMakeContextCurrent(window);

    if (glewInit() != GLEW_OK) {
        std::cerr << "Error: glew not OK." << std::endl;
        glfwTerminate();
        return 2;
    }

    // Shader program
    const auto shaderProgram = createProgram(vertSource, fragSource);
    glUseProgram(shaderProgram);

    // Vertex buffer
    GLuint vao;
    glGenVertexArrays(1, &vao);
    glBindVertexArray(vao);

    GLuint buffer;
    glGenBuffers(1, &buffer);
    glBindBuffer(GL_ARRAY_BUFFER, buffer);
    float bufferData[] = {
        -1.0f, -1.0f,
        1.0f, -1.0f,
        1.0f, 1.0f,
        -1.0f, -1.0f,
        1.0f, 1.0f,
        -1.0f, 1.0f
    };
    glBufferData(GL_ARRAY_BUFFER, std::size(bufferData) * sizeof(float), bufferData, GL_STATIC_DRAW);
    glEnableVertexAttribArray(0);
    glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, (GLvoid*)(0));

    glClearColor(0.0f, 0.0f, 0.0f, 0.0f);

    // Framebuffer
    GLuint fb, att[6];
    glGenTextures(6, att);
    glGenFramebuffers(1, &fb);

    glBindTexture(GL_TEXTURE_2D, att[0]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[1]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[2]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[3]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[4]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[5]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);

    glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, att[0], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, att[1], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, att[2], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, att[3], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT4, GL_TEXTURE_2D, att[4], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT5, GL_TEXTURE_2D, att[5], 0);

    GLuint dbs[] = {
        GL_NONE,
        GL_COLOR_ATTACHMENT1,
        GL_NONE,
        GL_NONE,
        GL_NONE,
        GL_NONE};
    glDrawBuffers(6, dbs);

    if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER)) {
        std::cerr << "Error: framebuffer is incomplete." << std::endl;
        return 1;
    }
    if (error()) {
        std::cerr << "OpenGL error occured." << std::endl;
        return 2;
    }

    // Fpsmeter
    static const uint32_t framesMax = 50;
    uint32_t framesCount = 0;
    auto start = std::chrono::steady_clock::now();

    // Main loop
    while (!glfwWindowShouldClose(window)) {
        if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS) glfwSetWindowShouldClose(window, GLFW_TRUE);

        glClear(GL_COLOR_BUFFER_BIT);
        for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
        glfwSwapBuffers(window);
        glfwPollEvents();

        if (++framesCount == framesMax) {
            framesCount = 0;
            const auto now = std::chrono::steady_clock::now();
            const auto duration = now - start;
            start = now;
            const float secsPerFrame = (std::chrono::duration_cast<std::chrono::microseconds>(duration).count() / 1000000.0f) / framesMax;
            std::cout << "FPS: " << 1.0f / secsPerFrame << std::endl;
        }
    }

    // Shutdown
    glBindBuffer(GL_ARRAY_BUFFER, 0);
    glBindVertexArray(vao);
    glUseProgram(0);
    glDeleteProgram(shaderProgram);
    glDeleteBuffers(1, &buffer);
    glDeleteVertexArrays(1, &vao);
    glDeleteFramebuffers(1, &fb);
    glDeleteTextures(6, att);
    glfwMakeContextCurrent(nullptr);
    glfwDestroyWindow(window);
    glfwTerminate();
    return 0;
}

The only difference between these examples is the color attachment used.

I composed two almost similar copy-pasted programs on purpose to avoid possible nasty effects of framebuffer deletion and recreation.

UPD2: Also tried OpenGL 4.6 debug context on my test examples on both Nvidia and AMD. Got no performance warnings.

UPD3: RX470 results:

attachment0: 775 FPS
attachment1: 396 FPS

UPD4: I built attachment0 and attachment1 tests for webgl via emscripten and ran them on Radeon RX550. Full source is in problem's Github repo, build command lines are

emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment0_webgl.cpp -o attachment0.html
emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment1_webgl.cpp -o attachment1.html

Both test programs issue a single drawcall: glDrawArraysInstanced(GL_TRIANGLES, 0, 6, 1000);

First test: Firefox with default config, i.e. DirectX-backed ANGLE.

Unmasked Vendor:    Google Inc.
Unmasked Renderer:  ANGLE (Radeon RX550/550 Series Direct3D11 vs_5_0 ps_5_0)

attachment0: 38 FPS
attachment1: 38 FPS

Second test: Firefox with disabled ANGLE, (about:config -> webgl.disable-angle = true), using native OpenGL:

Unmasked Vendor:    ATI Technologies Inc.
Unmasked Renderer:  Radeon RX550/550 Series

attachment0: 38 FPS
attachment1: 19 FPS

We see that DirectX is not affected by the problem, and OpenGL issue is reproducible in WebGL. It's an expected result, as gamers and developers complained only about OpenGL performance.

P.S. Probably my issue is the root of this and this performance drops.


Solution

  • The problem is fixed by AMD since (at least) December 2019 driver. The fix is confirmed by abovementioned test programs and our game engine FPS rate. See also this thread.

    Dear AMD OpenGL driver team, thank you very much!