c++openglpbo

Why is using multiple Pixel buffer Objects advised. Surely it is redundant?


This article is commonly referenced when anyone asks about video streaming textures in OpenGL.

It says:

To maximize the streaming transfer performance, you may use multiple pixel buffer objects. The diagram shows that 2 PBOs are used simultaneously; glTexSubImage2D() copies the pixel data from a PBO while the texture source is being written to the other PBO.

Double PBO

For nth frame, PBO 1 is used for glTexSubImage2D() and PBO 2 is used to get new texture source. For n+1th frame, 2 pixel buffers are switching the roles and continue to update the texture. Because of asynchronous DMA transfer, the update and copy processes can be performed simultaneously. CPU updates the texture source to a PBO while GPU copies texture from the other PBO.

They provide a simple bench-mark program which allows you to cycle between texture updates without PBO's, with a single PBO, and with two PBO's used as described above.

I see a slight performance improvement when enabling one PBO. But the second PBO makes no real difference.

Right before the code glMapBuffer's the PBO, it calls glBufferData with the pointer set to NULL. It does this to avoid a sync-stall.

// map the buffer object into client's memory
// Note that glMapBufferARB() causes sync issue.
// If GPU is working with this buffer, glMapBufferARB() will wait(stall)
// for GPU to finish its job. To avoid waiting (stall), you can call
// first glBufferDataARB() with NULL pointer before glMapBufferARB().
// If you do that, the previous data in PBO will be discarded and
// glMapBufferARB() returns a new allocated pointer immediately
// even if GPU is still working with the previous data.

So, Here is my question... Doesn't this make the second PBO completely useless? Just a waste of memory !?

With two PBO's the texture data is stored 3 times. 1 in the texture, and one in each PBO.

With a single PBO. There are two copies of the data. And temporarily only a 3rd in the event that glMapBuffer creates a new buffer because the existing one is presently being DMA'ed to the texture?

The comments seem to suggest that OpenGL drivers internally are capable to creating the second buffer IF and only WHEN it is required to avoid stalling the pipeline. The in-use buffer is being DMA'ed, and my call to map yields a new buffer for me to write to.

The Author of that article appears to be more knowledgeable in this area than myself. Have I completely mis-understood the point?


Solution

  • Answering my own question... But I wont accept it as an answer... (YET).

    There are many problems with the benchmark program linked to in the question. It uses immediate mode. It uses GLUT!

    The program was spending most of its time doing things we are not interested in profiling. Mainly rendering text via GLUT, and writing pretty stripes to the texture. So I have removed those functions.

    I cranked the texture resultion up to 8K, and added more PBO Modes.

    If anyone else would like to examine my code, it is vailable here

    I have experimented with different texture sizes... and different updatePixels functions... I cannot, despite my best efforts get the double PBO implementation to perform any better than the single-PBO implementation.

    Furthermore... NOT orphanning the previous buffer, actually vields better performance. Exactly opposite to what the article claims.

    Perhaps modern drivers / hardware does not suffer the problem that this design is attemtping to fix...

    Perhaps my graphics hardware / driver is buggy, and not taking advantage of the double-PBO...

    Perhaps the commonly referenced article is completely wrong?

    Who knows. . . . My test hardware is Intel(R) HD Graphics 5500 (Broadwell GT2).

    EDIT: Many Many years later, I have repeated this test on new hardware, a discrete GPU, after @Heiner's comment that integrated was not a good test, as it shared RAM with CPU.

    New hardware is AMD Radeon RX 6800M (Navi22).

    So... yes, on a discrete GPU, Im able to observe a 0.4% frame rate boost using a second PBO... The biggest boost to PBO performance however is orphaning the previous buffer (2%).