In the glTF format, mesh object positions go through three levels of indirection: mesh -> accessor -> bufferView -> buffer.
How should mesh data be distributed across the accessors, bufferViews, and buffers in a glTF file? Does it matter?
E.g. suppose I have two meshes, each with a POSITION attribute. It seems that I have, essentially, four choices for laying out the position data in the glTF file:
===============================================================
(A) one big shared accessor,bufferView,buffer for all positions
mesh 0 -+
+-> accessor 0 ---> bufferView 0 ---> buffer 0
mesh 1 -+
===============================================================
(B) accessor per mesh, one big shared bufferView,buffer
mesh 0 ---> accessor 0 -+
+-> bufferView 0 ---> buffer 0
mesh 1 ---> accessor 1 -+
===============================================================
(C) accessor,bufferView per mesh, one big shared buffer
mesh 0 ---> accessor 0 ---> bufferView 0 -+
+-> buffer 0
mesh 1 ---> accessor 1 ---> bufferView 1 -+
===============================================================
(D) accessor,bufferView,buffer per mesh
mesh 0 ---> accessor 0 ---> bufferView 0 ---> buffer 0
mesh 1 ---> accessor 1 ---> bufferView 1 ---> buffer 1
===============================================================
Of the above four layout choices, (A) produces the simplest most compact glTF files (especially if the two meshes share some vertex positions). So I'm not sure why I would ever use any of (B), (C), or (D).
The reason I'm asking is that I'm currently using (A) for creating my glTF files (each file containing thousands of meshes), but I'm finding that it seems to have some problems when I load the file into three.js and use it. For example:
So I think there must be an assumption (by threejs and/or other consumers of glTF files) that I am using (B), (C), or (D) instead? But I don't know what is being assumed, or which layout I should use, in general.
I tend to think of buffers and buffer views in these terms:
In most cases you should only need one buffer. Any subdivision is usually for application-specific purposes, like partitioning data from different meshes or animations so that it can be downloaded over the network at a different time.
Theoretically, fewer buffer views are better. In practice, most 3D engines have limitations that will affect the optimal choice. As you've described, three.js does not do a very good job of sharing vertex data across multiple THREE.BufferGeometry instances. In addition to the bounding box problem (which probably can be fixed!), it will currently upload each BufferGeometry to the GPU separately, which duplicates data, and is more complex to solve. So for three.js in particular, the ideal layout is probably one buffer view per mesh primitive, with vertex attributes within each mesh primitive interleaved in that single buffer view.
So while engines aim to at least support everything, they are probably optimized for specific layouts. In practice it can be helpful to pre-process glTF files for the target engine, to improve loading times. gltfpack or glTF Transform are popular tools that can be used here.
Example:
npm install --global @gltf-transform/cli
# aims for one interleaved buffer view per mesh primitive
gltf-transform cp in.glb out.glb --vertex-layout interleaved
# aims for one buffer view per vertex attribute
gltf-transform cp in.glb out.glb --vertex-layout separate
Specific situations — like compression or sharing individual vertex attributes in multiple geometries — do complicate the situation and require more decisions by the encoder.
three.js r164