c++optimization opengl glsl vertex-shader

The correct way to manipulate an object positional data in OpenGL, shaders or buffers?

I've been trying to learn OpenGL using the newer shader pipeline functionality over the deprecated immediate mode fixed pipeline gl. There's a couple things I'm confused about both in terms of performance but also in terms of design, whether what I'm doing is the "correct" way, or commonly accepted way to be doing things.

In older versions of GL, I could use glTranslate to manipulate my object across the screen, and use matrix stacks to push copies of my object+translate each one individually. With newer GL this isn't possible, so I've been experimenting with ways to achieve similar functionality.

Minimal example of my environment:

glGenVertexArrays(1, &_vao);
glBindVertexArray(_vao);

// centered square
float verts[] = {
    -0.5,    0.5
     0.5,    0.5,
    -0.5,   -0.5,
     0.5,   -0.5,
};

glGenBuffers(1, &_vbo);
glBindBuffer(GL_ARRAY_BUFFER, _vbo);
glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_STATIC_DRAW);

int idx[] = {
    0, 1, 2,
    1, 2, 3
};

glGenBuffers(1, &_ebo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, _ebo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(idx), idx, GL_STATIC_DRAW);

glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 2 * sizeof(float), 0);
glEnableVertexAttribArray(0);

while (renderLoop) {
    glUseProgram(_program);
    glBindVertexArray(_vao);
    glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);

    // swap buffers
}

I want to translate my square around the screen. I'm aware that its position relies on the vertices I passed into my array buffer, and that those are manipulated by the vertex shader. From that, I deduced that there are two logical entrypoints for me to manipulate my data, the buffer directly, or via the shader. My only dilemma is I'm not sure which is the correct way to be doing it, both for performance and maintainability. I want to develop the right habit from the beginning so that I don't end up carrying a bad habit throughout my development, but I'm not sure what I'm meant to do.

If I were to rewrite the buffer directly, I'd have to change it every render tick, which could potentially be costly. I also have to likely switch it to GL_DYNAMIC_DRAW since I'm changing it so often. However the main benefits is that I can manipulate each point separately, something that may be intended. Let's say, in an example, I wanted to create my object at my mouse pointer. I'd need to know the mouse pointer x and y coords, then scale them with normalized width/height coordinates, all of which needs me to rewrite the buffer.

while (renderLoop) {
    glUseProgram(_program);
    glBindVertexArray(_vao);

    manipulateVerts(verts);
    glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_DYNAMIC_DRAW); // also changed above, before render loop

    glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);

    // swap buffers
}

The other possible way I considered was via shaders, where I possess a uniform position variable, and pass in coordinates to my shader. Let's say my manipulateVerts function moved x by -0.6, and y by 0.4. I could use these values to pass movement offsets via uniform vec2. This seems like the more logical thing to do given that a vertex shader is designed to manipulate the vertex data. However I can only manipulate each pixel independently, if they depend on the other for knowing their new position I can't do that. This poses a problem with the shader approach.

#version 330 core
layout (location = 0) in vec2 pos;
uniform vec2 offset;

void main() {
    gl_Position = vec4(pos.x + offset.x, pos.y + offset.y, 0.0, 1.0);
}

Then within my render loop, I could lookup the uniform id, and change it.

while (renderLoop) {
    glUseProgram(_program);
    glBindVertexArray(_vao);

    float offset[] = { -0.6, 0.4 };
    unsigned int _offset = glGetUniformLocation(_program, "offset");
    glUniform2fv(_offset, 1, offset);

    glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);

    // swap buffers
}

This is a good way to do it until I consider that shaders are usually meant to run through multiple, hundreds if not thousands of VBOs. Although in my situation one VBO can use this method, what about multiple different VBOs. Would I just need to set offsets to 0? Run through the multiple VBOs and bind them as intended, then manipulate my offset, or would it make more sense to manipulate the vertex. What if I had multiple copies of a VBO, I couldn't just keep translating its vertex since it would affect the others, so I'd either have to make multiple copies in memory, eating up a lot of RAM for a completely unnessecary reason.

I think I'm coming to the conclusion that it simply depends, but I'd like an outside opinion. I'm fairly new to OpenGL and GLSL shaders, so my inexperience may be clouding my ability to see the rational choice.

Solution

First, don't forget there is no substitute for measuring. Performance rules of thumb will get you far, but they're just telling you the collected wisdom of other developers, not the ground truth and they could be wrong for your program. If you want ultimate performance in any program, you have to measure, change something, measure again. Over and over and over.

You are right that it simply depends. That said, the same rules of thumb work for most OpenGL programs.

If you update vertex coordinates in the buffer then... you have to update all the vertex coordinates in the buffer. Each coordinate, individually, the CPU calculates all of them and tells the GPU what they are. This is fine if you just have a handful of vertices (can be a big handful, several thousand vertices), or if you have a really complex effect that just can't be done in a shader (but you'd be surprised what can be done in shaders).

When you have a million vertices, it's not so fine and you want the GPU to do that work, since that's what it's there for.

If you have a handful (that can be several hundred) of different objects, with plenty of vertices each, then it makes sense to set the offset uniform, draw one object, set the uniform, draw another object, etc. This is generally accepted. I'd bet most game engines work this way most of the time.

By the way, in 3D it's a lot more common to use matrices instead of just offsets. Matrices allow for translation (offsetting), rotation, resizing and camera perspective.

You can stop here because it's how most 3D games work. But I already wrote the more advanced ways, so you may as well read on out of curiosity...

The communication path between the CPU and GPU (not just the PCIe slot but also the OpenGL driver) isn't terribly fast. It's fast alright, but it's peanuts compared to the raw processing power the GPU has available, which is as much as the world's fastest supercomputer from 1996 (I actually checked; it's called ASCI Red). When you insist on calculating all the vertex data on the CPU (method 1), the GPU wastes 99% of its time just twiddling its thumbs waiting to hear the next vertex. When you send a single uniform and draw command for each object (method 2) that's a lot better, but maybe it's possible to do even better.

If you have a lot of the same shape to draw, especially if the object doesn't have a lot of vertices, just sending the uniform and draw command over and over can be too much wasted time. For this situation you have an ability called instancing. It draws the same shape many times with one command. You can make a buffer full of offsets, as well as your buffer full of vertices, and use the instanced draw command, and then your shader will run many times on the same vertices, but the gl_InstanceID will be different and you can use this variable to get a different offset from the offset buffer. You might find this to be a useful way to render trees, or blades of grass.

If you want to draw loads and loads of different shapes, you can use indirect drawing, where you feed the GPU a buffer full of other draw commands. You're feeding it draw commands, not glVertexAttribPointer commands, so all the shapes have to be in different parts of the same vertex buffer. It also supports instancing so you can draw lots of one shape. For example (and this is an example without instancing), you could have a buffer full of level vertices, and then you could tell the GPU which parts to render depending on where the player is, and which parts they can see. Then you only need to update that draw command buffer when the player moves to a different part of the level.

By the way, as you can see with indirect drawing, there's no need to make it so that every shape has its own VBO. If you're going for high performance, you may as well stuff different shapes into the same VBO as much as possible so you can use indirect drawing. If you're just using the same old method 2 that everyone uses, you don't need to but you still can, if you want. Might speed up loading times if you have to load 1 VBO instead of 10000. (Then again, it might not. Measure!)

P.S. shaders have nothing to do with VBOs. It's not "one shader = thousands of VBOs", they can go in whatever combination you want. Buffers hold data and shaders process it and churn out vertices to go on the screen. It's not even like they have to be VBOs - a VBO just means a buffer that holds vertex data but it's not like the GPU knows it holds that.