The code snippet in the docs calls DrawIndexdeInstanced
in a for loop
for (UINT i = 0; i < m_cityRowCount; i++) {
for (UINT j = 0; j < m_cityColumnCount; j++) {
pCommandList->DrawIndexedInstanced(numIndices, 1, 0, 0, 0);
}
}
But the API
void DrawIndexedInstanced(
[in] UINT IndexCountPerInstance,
[in] UINT InstanceCount,
[in] UINT StartIndexLocation,
[in] INT BaseVertexLocation,
[in] UINT StartInstanceLocation
);
void DrawInstanced(
[in] UINT VertexCountPerInstance,
[in] UINT InstanceCount,
[in] UINT StartVertexLocation,
[in] UINT StartInstanceLocation
);
has StartInstanceLocation
and InstanceCount
parameters which I assume affects of offsetting by InstanceIndex*StartInstanceLocation
.
So are the following equivalent?
DrawIndexedInstanced(100, 2, 0, 0, 100);
//vs
DrawIndexedInstanced(100, 1, 0, 0, 0);
DrawIndexedInstanced(100, 1, 100, 0, 0);
DrawInstanced(100, 2, 0, 100);
//vs
DrawInstanced(100, 1, 0, 0);
DrawInstanced(100, 1, 100, 0);
How does instancing improve performance in the D3D12Bundles sample referred to by the docs? They call SetPipelineState
in between each each instance. And the constant buffer used for the g_mWorldViewProj
in the vertex shader also changes each instance. How does anything get reused?
for (UINT i = 0; i < m_cityRowCount; i++) {
for (UINT j = 0; j < m_cityColumnCount; j++) {
// Alternate which PSO to use; the pixel shader is different on
// each just as a PSO setting demonstration.
pCommandList->SetPipelineState(usePso1 ? pPso1 : pPso2);
usePso1 = !usePso1;
// Set this city's CBV table and move to the next descriptor.
pCommandList->SetGraphicsRootDescriptorTable(2, cbvSrvHandle);
cbvSrvHandle.Offset(cbvSrvDescriptorSize);
pCommandList->DrawIndexedInstanced(numIndices, 1, 0, 0, 0);
}
}
The canonical sample for instancing is InstancingFX11 (rather than D3D12Bundles linked to by the docks of DrawIndexedInstanced()
which just benefits from the indexing but not the instancing)
The writer of the InstantingFX11 sample wrote some comments on how use instancing properly
Pay attention to the code defining the buffer in Instancing.cpp as this basically implements 2 vertex buffers. 1 for the geometry and the other for the instance data (matrices in this case). Adding the 2nd buffer is like adding another for loop around the draw call (but alot more efficient).
Your instancing example only discusses adding a instanceid system variable. Instancing requires a 2nd vertex buffer attached bound to the draw context which contains unique data such as say World translation matrices. You then update your signature with the definition of the 2nd buffer, defining also in your HLSL code that it will receive instancing data also. Your example is a single buffer version where you may use a constant buffer and the instance id to look up with in that. This is less efficient.
Looking up the data in the vertex shader means that the data can not be inlined by the driver. Any precaching/setup on the gpu wave front is wasted. For each vertex you visit the hardware now goes and looks up the related array entry rather than it being loaded once by the hardware and passed in as an argument of your vertex shader.
The key then to performance is that the first vertex buffer (containing the actually vertices) stays the same for each instance. While only the second vertex buffer (containing the different World translation matrices) strides between instances.
Seems like I was completely wrong about the first question. The two are not equivalent. In the INPUT_ELEMENT_DESC
structure the member InstanceDataStepRate
The number of instances to draw using the same per-instance data before advancing in the buffer by one element. This value must be 0 for an element that contains per-vertex data (the slot class is set to
D3D11_INPUT_PER_VERTEX_DATA
).
means that the D3D11_INPUT_PER_VERTEX_DATA
vertex buffers will not differ between instances. So sequential instanced draw calls do not combine into one instance, whoops.