I have a Unity project in which I'm writing to an AppendStructuredBuffer<Triangle>
via Append(triangle)
in a compute shader.
In this instance, I know the theoretical limit to the number of triangles that could exist, so the obvious correct approach is to size the buffer accordingly. As a hack, though, I'm experimenting with allocating drastically smaller buffers so that they can be more efficiently processed by other parts of the system (in particular, reading back to CPU). One could imagine other situations in which a specific limit may not be known, or could be wrongly assumed.
Clearly, this is potentially hazardous. I'm sure there are more robust approaches that could be used for my current system (or more generally) without sacrificing performance, but I'm not (particularly) asking for advice on that.
What I want to know is what the expected behaviour is when a program calls Append()
beyond the capacity of such a buffer. I imagine that it is undefined, and potentially liable to corrupt other areas of VRAM, to an extent dependent on GPU drivers / DirectX version etc. It may be that it is more formally specified, but I haven't been able to find that out.
Of course, even if the behaviour is specified, it seems somewhat reckless to deliberately risk. Still, I'd like to know:
Perhaps it is 'safe' to the extent that excess data will simply be lost to the void without cost. In any case the system can - for example - periodically check fullness of buffers and do any extra housework that may be necessary... leaving the question of how severe any mistakes in the tuning of such a system might be.
Under many circumstances, at least in DirectX, out of bounds access is defined as returning 0. I'm still not totally sure about writes, but think there is reason to believe they should be generally safe in current implementations.
I would still be very wary of relying on this, especially when using other APIs.
According to this specification,
5.3.10.2 Using Unordered Count and Append Buffers
... The counter behind imm_atomic_alloc and imm_atomic_consume has no overflow or underflow clamping, and there is no feedback given to the shader as to whether overflow/underflow happened (wrapping of the counter). The only thing the counter really accomplishes is a way of generating unique addresses that is conveniently bundled with the UAV.
There is no clamping of the count, so it wraps on overflow.
I don't think I'm wrong in interpretting 'wrapping' as being to the length of the buffer in these instances.
So, the answer as I understand it is that on Append()
the internal counter will wrap, and subsequent invocations will end up overwriting earlier data. As it happens, I am currently rendering my buffer without reference to such a counter (because I do another pass on the 'triangles' to turn them into vertices for rendering, which I currently do on a non-AppendBuffer). I should experiment with passing a buffer with a count to that draw call, which should allow me to verify whether most of my model suddenly disappears when I overflow.
In any case, it seems that the operation should be safe in terms of not corrupting other parts of the system, but that referring to the counter may be the wrong way to detect problems.