3d gltf scenegraph skeletal-animation skeletal-mesh

Meaning of glTF's Inverse(GlobalTransform) in Skin Matrix

I want to render a mesh with skeletal animation. Before animating, I want to just render the mesh with just the first keyframe of the animaton i.e. render mesh with the bone hierarchy transforms in place. I'm ignoring the scene structure in the glTF; I'm just using meshes[0] to get the mesh and skins[0] to get its skeleton.

I understand that the final skin matrix, fed as uniforms to the vertex shader, is calculated

for (bone in bones) {
  bone.skin_xform = inverse(global_xform) * bone.global_xform * bone.inv_bind_xform;
}

When I do exactly this I see that my model is 11.4 (5.7 + 5.7) units below ground (plane at Z = 0; world has +Z as up). When I render just the mesh without any skinning i.e. with only position, normal and texture coordinates it rests on the ground. I was also able to deduce why this happens when skinning.

Here's the relevant part of the gltf

    "skins" : [
        {
            "inverseBindMatrices" : 6,
            "joints" : [
                0,
            ...
        }
    ],
    "nodes" : [
        {
            "name" : "Root",
            "rotation" : [
                0,
                0,
                1,
                0
            ],
            "translation" : [
                0,
                0,
                -5.709875583648682
            ]
        },
        {
            "mesh" : 0,
            "name" : "Body",
            "skin" : 0
        },
        {
            "children" : [
                0,
                1
            ],
            "name" : "Armature",
            "translation" : [
                0,
                0,
                5.709875583648682
            ]
        }
    ]

I've read glTF's documentation, tutorial and the reference guide (PDF). While the documentation doesn't speak about it at all, here's what the tutorial and reference guide had to say about inverse(global_xform):

The vertices have to be transformed with inverse of the global transform of the node that the mesh is attached to, because this transform is already done using the model-view-matrix, and thus has to be cancelled out from the skinning computation.

As per this, Body's global transform has to be inverted and used. This results in translateZ(-5.7). Root already has a local transform of translateZ(-5.7), so I understand the -11.4 offset of the mesh into the ground. However, if I use Body's global transform as-is, without inversion, in the above formula there're no issues.

Why does the reference guide ask us to invert the global transform of root bone's parent? What am I missing? When I imported this model from Blender, I noticed that the transform on the armature object was indeed translateZ(5.7).

Solution

You say

I'm ignoring the scene structure in the glTF; I'm just using meshes[0] to get the mesh and skins[0] to get its skeleton.

However, the (relevant part of the) standard says (emphasis mine)

The vertices have to be transformed with inverse of the global transform of the node that the mesh is attached to, because this transform is already done using the model-view-matrix

Since you say you're lifting off the mesh and skeleton from the glTF without the scene structure inverse(global_xform) is unneeded. This is because you don't have a non-identity transform for your mesh's model-view-matrix for inverse(global_xform) to counter offset. Things work fine with three.js because it renders the entire scene with all its node hierarchy, unlike your's.

However, if I use Body's global transform as-is, without inversion, in the above formula there're no issues.

This is the correct usage since the global transform of Root is a concatenation of all its parent transforms with Root's local transform

root.global_xform = armature.local_xform * body.local_xform * root.local_xform

As per your comment I see that the armature object has a non-identity transform as its location. Usually is better off as identity; reference: a tutorial on skeletal animation by TheThinMatrix.

Here's a more detailed explanation with calculations. We see that the Root node has a local transform of TranslateZ(-5.7); its parent, the Armature node, has a local transform of TranslateZ(5.7); armature has no further parents. Thus global transform of Root is actually identity. Here's the equation in the skinning vertex shader

point’ = P * V * Mesh *              Skin                  * point
point’ = P * V * Mesh * (InvMeshGlobal * Global * InvBind) * point
                  5.7 * (-5.7 * 5.7 * -5.7 * InvBind)
                  5.7 * (-5.7 * I * InvBind)

So InvGlobalXform (written above as InvMeshGlobal) is needed only when the entire scene hierarchy is rendered, while you just got the mesh and its skeleton, ignoring ancestral nodes beyond the node where mesh and skin are present. I can come up with two solutions.

Solution 1

Make sure the mesh and the armature has Location, Rotation and Scale applied in Blender before exporting
Both Root and its parent Armature local transforms would become identity i.e. root’s global transform would be identity
Ignore both Mesh and InvMeshGlobal transforms
Just the InvBind would be needed in entire equation

Solution 2

Store Root’s ancestral transform
Use ancestral transform to arrive at Root’s proper global transform; usually becomes identity
Ignore both Mesh and InvMeshGlobal transforms
Global and InvBind would be needed in equation

Solution (1) works only when the input mesh and skeleton has no global rotation or translation; solution (2) works here too. Solution (2) needs storage of ancestral transform unlike solution (1) though.