Whats an effective way to cull instanced meshes
(f.e. 2000 trees, each with ~ 17 triangles) without using the geometry shader ?
Unfortunately my software supports only OpenGL ES 3.0, so have to cull in the vertex shader or somewhere else.
Another solution would be to rearrange the instance buffer in each frame.
GPU culling is pointless if it cannot be done efficiently; that is, after all, the whole point of putting culling on the GPU to begin with.
Efficient GPU culling requires the following:
OpenGL ES 3.0 lacks a mechanism for doing either of these. Geometry shaders and transform feedback are the older means for doing #1, but it could also be done with compute shaders and SSBOs/image load/store. Of course, ES 3.0 has neither sets of functionality; you'd need ES 3.1 for that.
ES 3.0 also has no indirect rendering features, which could be used to actually render with the GPU-generated data without any read-back of data from the CPU. So even if you had a way to do #1, you'd have to read the data back on the CPU to be able to use it in a rendering command.
So unless CPU culling is somehow more expensive than doing a full GPU/CPU sync (it almost certainly isn't), it's best to just do the culling on the CPU.