I want to ensure that my D3D12_HEAP_TYPE_UPLOAD
resource has been upload before I use it.
Apparently to do this you call ID3D12Resource::Unmap
, ID3D12CommandList::Close
, ID3D12CommandQueue::ExecuteCommandList
and then ID3D12CommandQueue::Signal
.
However, this confuses me. The call ID3D12Resource::Unmap
is completely unconnected to the command list and queue, except by the device the resource was created on. But I have multiple command queues per device. So how does it chose which command queue to upload the resource on?
Is this documented anywhere? The only help I can find are comments in the samples.
Once you have copied your data to a mapped pointer, it becomes available immediately to be consumed by commands, in case of Upload resources there is no need to Unmap resource in that case (you can unmap on Release or at application shutdown).
However, it is important to note (specially reading by your comments), that command will be executed later on the gpu, so if you plan to reuse that memory you need to have some synchronization mechanisms.
Let's make a simple pseudo code example : You have a buffer called buffer1 (that you already created and mapped), now you have access to its memory via mappedPtr1.
copy data1 to mappedPtr1
call compute shader in commandList
execute CommandList
Now everything will execute properly (for one frame assuming you have synchronization)
Now if you do the following :
copy data1 to mappedPtr1
call compute shader in commandList (1)
copy data2 to mappedPtr1
call compute shader in commandList (1)
execute CommandList
In that case, since you copied data2 at the same place as data1, the first compute shader call will use data2 (at it is the latest available data when you call execute CommandList)
Now let's have a slightly different example :
copy data1 to mappedPtr1
call compute shader in commandList1
execute CommandList1
copy data2 to mappedPtr1
call compute shader in commandList2
execute CommandList2
What will now happen is undefined, since you do not know when CommandList1 and CommandList2 will be effectively processed.
In case CommandList1 is processed (fast enough) before :
copy data2 to mappedPtr1
then data1 will be the current memory and be used
However, if your commandList is a bit heavier and CommandList1 is not yet processed at the time you finish your call to
copy data2 to mappedPtr1
Which is likely to happen, then both compute will again use data2 when used by the gpu.
This is because executeCommandList is a non blocking function, when it returns it only means that your commands have been prepared for execution, not that the commands have been processed.
In order to guarantee that you use the correct data at the correct time, you have in that case several options:
1/Use a fence and wait for completion
copy data1 to mappedPtr1
call compute shader in commandList1
execute CommandList1 on commandQueue
attachSignal (1) to commandQueue
add a waitevent for value (1)
copy data2 to mappedPtr1
call compute shader in commandList2
execute CommandList2 on commandQueue
attachSignal (2) to commandQueue
add a waitevent for value (2)
This is simple but is vastly inefficient, since now you wait for your gpu to finish all execution of commandList before to continue any cpu work.
2/Use different resources :
since now you copy to 2 different locations you will of course guarantee that your data is different accross both calls.
3/Use a single resource with offsets.
You can also create a resource larger that can hold data for all your calls, then copy once.
I'll assume your data is 64 bytes here (so you would create a 128 byte buffer)
copy data1 to mappedPtr1 (offset 0)
bind address from mappedPtr1 (offset 0) to compute
call compute shader in commandList1
execute CommandList1 on commandQueue
copy data2 to mappedPtr1 (offset 64)
bind address from mappedPtr1 (offset 64) to compute
call compute shader in commandList2
execute CommandList2 on commandQueue
Please note that you should still have fences to indicate when a frame have finished to be processed, this is the only way to guarantee you that upload part can finally be reused.
If you want to copy the data to a default heap (specially if you do it on a separate copy queue), you will also need a Fence on the copy queue and a wait in the main queue to ensure the copy queue has finished processing and that data is available (you also need, as per the other answer, to set up resource barriers in the default heap resource in that case)
Hope it makes sense.