UE5 seems to only set IndexType to != UNKNOWN when querying RTAS sizes.
This contradicts D3D12 docs, but this matches Vulkan behavior, so do the
same thing. Adds a warn when IBO VA is NULL with non-null format to catch app
bugs.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Temporarily abandons the idea to fuse waiters with execution.
For whatever reason, this seemed to cause random flicker in Halo Infinite
with async compute on, and I have failed to figure out exactly why.
By playing around with how commands are fused, the results changed
dramatically, which means I doubt vkd3d-proton was actually at fault
here.
There is some questionable code around UpdateTileMappings in the game
where a COPY queue is used, and it does not seem to synchronize this with other
queues as far as I can tell. It is uncertain at this time if D3D12
requires a tile update to synchronize with *every* queue or just the
queue being submitted to. We assume the latter, as it's the only
behavior that makes sense.
It is possible that submitting waits as they are queued up
affects synchronization between queues in unexpected ways.
When separating out the wait operations, everything appears to work.
It is also simpler code.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We're supposed to fail here, but we ended up failing
due to parsing uninitialized version instead, meaning
it could spuriously succeed or read garbage.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fixes a bug in the logic trying to combine the waits by simplifying the code.
Problem discovered by HK.
Signed-off-by: Derek Lesho <dlesho@codeweavers.com>
Attempt to release fences before their signal/waits have been satisfied.
Also tests this behavior for shared fences.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Even when misusing the API, S_OK is still returned on native runtimes.
Keep the error log, and add an error report to command allocator release
if there are still pending submissions.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
D3D12 has some unfortunate rules around CommandQueue::Wait().
It's legal to release the fence early, before the fence actually
completes its wait operation.
The behavior on D3D12 is just to release all waiters.
For out of order signal/wait, we hold off submissions,
so we can implement this implicitly through CPU signal to UINT64_MAX
on fence release. If we have submitted a wait which depends on the
fence, it will complete in finite time, so it still works fine.
We cannot release the semaphores early in Vulkan, so we must hold on
to a private reference of the ID3D12Fence object until we have observed
that the wait is complete.
To make this work, we refactor waits to use the vkd3d_queue wait list.
On other submits, we resolve the wait. This is a small optimization
since we don't have to perform dummy submits that only performs the wait.
At that time, we signal a timeline semaphore and queue up a d3d12_fence_dec_ref().
Since we're also adding this system where normal submissions signal
timelines, handle the submission counters more correctly by deferring
the decrements until we have waited for the submission itself.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Apparently RT shaders in RE Engine require min16float to
be implemented as native FP16. Fun ... ._.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
With DXR, it seems like some applications require other FL 12.2 features
to be enabled even if they are not actually used. Various RE engine
titles seem to be affected by this.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
It's technically undefined to use NULL UAV counters,
but drivers all implement some form of robust behavior here
when presented with NULL counters, so we'll have to follow suit.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We have no way of expressing size / alignment requirements to
applications since the API query does not provide us with heap
information. Reuse the fallback path for promoting placed to committed.
Guardians of the Galaxy hits a case where it tries to place 3x
host-visible 3D images in one heap, and they end up overlapping in
memory due to a 16x16x80 3D texture taking up far less space in optimal
tiling compared to linear tiling on AMD.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For internal debug shaders, it is helpful to ensure in-order logs when
sorted for later inspection.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>