copengl-es-2.0hud

OpenGL ES 2.0 + Cairo: HUD


I am trying to render a HUD over an OpenGL ES 2.0 application written in C on an ARM Linux platform.

I am currently using 2 triangles positioned close to the near clipping plane and tiling the texture onto them. The texture is the size of the screen and is mostly transparent except for the parts where I have text rendered. The texture is generated using Pango/Cairo

If I turn on the HUD (uncommenting the call to render_ui), I currently take a 50% performance hit (Goes from 60fps to 30fps).

Here is the code to render the HUD:

void render_ui(OGL_STATE_T *state) {

    glUseProgram(state->uiHandle);

    matIdentity(modelViewMatrix);
    matTranslate(modelViewMatrix, 0, 0, -0.51);

    const GLfloat *mvMat2 = modelViewMatrix;

    glViewport(0,0,state->screen_width, state->screen_height);

    glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA);
    glEnable(GL_BLEND);

    glBindBuffer(GL_ARRAY_BUFFER, state->uiVB);
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, state->uiIB);

    glActiveTexture(GL_TEXTURE0);
    glBindTexture(GL_TEXTURE_2D, state->uiTex);
    glUniform1i(_uiTexUniform, 0);

    glUniformMatrix4fv(_uiProjectionUniform, 1, 0, pMat);
    glUniformMatrix4fv(_uiModelViewUniform, 1, 0, mvMat2);

    glVertexAttribPointer(_uiPositionSlot, 3, GL_FLOAT, GL_FALSE, sizeof(Vertex), 0); 
    glVertexAttribPointer(_uiColorSlot, 4, GL_FLOAT, GL_FALSE, sizeof(Vertex),
            (GLvoid *) (sizeof(GLfloat) * 3));
    glVertexAttribPointer(_uiTexCoordSlot, 2, GL_FLOAT, GL_FALSE, sizeof(Vertex),
            (GLvoid *) (sizeof(GLfloat) * 7));

    glEnableVertexAttribArray(_uiPositionSlot);
    glEnableVertexAttribArray(_uiColorSlot);
    glEnableVertexAttribArray(_uiTexCoordSlot);

    glDrawElements(GL_TRIANGLES, uiIndicesArraySize / uiIndicesElementSize,
            GL_UNSIGNED_BYTE, 0);   

    glDisableVertexAttribArray(_uiTexCoordSlot);
    glDisable(GL_BLEND);

    GLenum err;

    if ((err = glGetError()) != GL_NO_ERROR)
        printf("There was an error");
}

There has to be a more sensible way of doing this.


Solution

  • On mobile devices GPUs are very sensitive to blending, this for multiple reasons :

    So in short mobile GPUs love opaque polygons and hate transparent ones.

    Note that the total surface occupied by transparent polygons on screen is also very important due to the "tile based" nature of most mobile GPUs (when a tile/bin is covered by transparent polygons you can lose some GPU optimizations for it).

    Also, since you say you get a sharp drop from 60fps to 30fps, I would conclude that your device GPU is blocking, waiting for the screen 60Hz vertical sync to swap, so this means that your frame DT can only be multiples of 16ms, so you probably only can get fps values like : 60, 30, 15, 7.5, ...

    So if you were at 60fps, but add something in your app main loop which would drop the theorical fps to only 57fps, then because of the vertical sync wait, you will abruptly go to 30fps. VSync can be disabled, or techniques like triple buffering can be used to mitigate this, but with OpenGLES the way of doing this is specific to the OS & hardware you are working with ... there is no "official way of doing it which works on all devices".

    So, knowing all this here are some propositions to get back to 60fps :

    1. Use a reduced resolution, ex: 1280x720 instead of 1920x1080, this will reduce bandwith usage & fragment processing. Not ideal of course, but this could be used as a test to confirm that you have a bandwith or fragment issue (if you get 60fps back after reducing resolution, then you have this kind of issue)
    2. Use 16bit (R5G6B5) backbuffer instead of 32bits backbuffer (R8G8B8A8), this can reduce bandwith usage, but with some visual quality loss
    3. Reduce area of blended surfaces: in your case this would mean that you should organize your texts by "blocks", each block fitting as much as possible the text like in this picture from IOS docs :enter image description here
    4. Find a way to disable Vsync on your device / use triple buffering. If you have access to Vivante GPU docs (I do not) this may be described inside.

    Point 3 is the best thing to do (this is what was done in most mobile games I worked on), however this will need some non negligible extra work. Points 1, 2 and 3 are more straightforward but are only "half solutions".