Commit Graph

2839 Commits

Author SHA1 Message Date
Hans-Kristian Arntzen d3a76eee90 idl: Fix const correctness of UpdateTileMappings.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-25 18:10:08 +02:00
Hans-Kristian Arntzen 481680ecd8 vkd3d: Use IndexFormat as a sentinel for indexed RTAS build.
UE5 seems to only set IndexType to != UNKNOWN when querying RTAS sizes.
This contradicts D3D12 docs, but this matches Vulkan behavior, so do the
same thing. Adds a warn when IBO VA is NULL with non-null format to catch app
bugs.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-25 17:51:02 +02:00
Hans-Kristian Arntzen 11c82c84d1 vkd3d: Add some trace debug logs of RTAS build infos.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-25 17:51:02 +02:00
Hans-Kristian Arntzen c0b9682c69 vkd3d: Small warning fixes.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-25 17:09:07 +02:00
Derek Lesho df1829e407 vkd3d: Implement ID3D12Fence sharing on top of D3D12-Fence exportable Vulkan timeline semaphores.
Signed-off-by: Derek Lesho <dlesho@codeweavers.com>
2022-07-25 11:16:53 +02:00
Hans-Kristian Arntzen be2aafff1a vkd3d: Resolve fence waiters early.
Temporarily abandons the idea to fuse waiters with execution.
For whatever reason, this seemed to cause random flicker in Halo Infinite
with async compute on, and I have failed to figure out exactly why.
By playing around with how commands are fused, the results changed
dramatically, which means I doubt vkd3d-proton was actually at fault
here.

There is some questionable code around UpdateTileMappings in the game
where a COPY queue is used, and it does not seem to synchronize this with other
queues as far as I can tell. It is uncertain at this time if D3D12
requires a tile update to synchronize with *every* queue or just the
queue being submitted to. We assume the latter, as it's the only
behavior that makes sense.

It is possible that submitting waits as they are queued up
affects synchronization between queues in unexpected ways.

When separating out the wait operations, everything appears to work.
It is also simpler code.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-21 21:10:34 +02:00
Derek Lesho 849537614a vkd3d: HACK: Don't create host pointer heap for Halo Infinite.
Some usage pattern here is causing a failure inside amdgpu.

Signed-off-by: Derek Lesho <dlesho@codeweavers.com>
2022-07-21 20:48:56 +02:00
Derek Lesho f487db4756 vkd3d: Implement ID3D12Resource sharing.
Signed-off-by: Derek Lesho <dlesho@codeweavers.com>
2022-07-21 20:48:56 +02:00
Hans-Kristian Arntzen 4f4c96bb11 vkd3d: Fail creating root signatures from blobs without RTS0.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-20 12:00:07 +02:00
Derek Lesho a2439e766f vkd3d: Flush queued waiters before waiting for the sparse binding semaphore.
Fixes a bug in the logic trying to combine the waits by simplifying the code.
Problem discovered by HK.

Signed-off-by: Derek Lesho <dlesho@codeweavers.com>
2022-07-20 01:27:20 +02:00
Hans-Kristian Arntzen 4ff504b52d vkd3d: Match native runtime better in command allocator reset.
Even when misusing the API, S_OK is still returned on native runtimes.
Keep the error log, and add an error report to command allocator release
if there are still pending submissions.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-18 19:00:25 +02:00
Hans-Kristian Arntzen 6335e411bb vkd3d: Rewrite submission logic for wait fences.
D3D12 has some unfortunate rules around CommandQueue::Wait().
It's legal to release the fence early, before the fence actually
completes its wait operation.

The behavior on D3D12 is just to release all waiters.
For out of order signal/wait, we hold off submissions,
so we can implement this implicitly through CPU signal to UINT64_MAX
on fence release. If we have submitted a wait which depends on the
fence, it will complete in finite time, so it still works fine.

We cannot release the semaphores early in Vulkan, so we must hold on
to a private reference of the ID3D12Fence object until we have observed
that the wait is complete.

To make this work, we refactor waits to use the vkd3d_queue wait list.
On other submits, we resolve the wait. This is a small optimization
since we don't have to perform dummy submits that only performs the wait.
At that time, we signal a timeline semaphore and queue up a d3d12_fence_dec_ref().

Since we're also adding this system where normal submissions signal
timelines, handle the submission counters more correctly by deferring
the decrements until we have waited for the submission itself.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-18 19:00:25 +02:00
Hans-Kristian Arntzen 11c943dd7e vkd3d: Unblock all fence waiters when public ref-count hits 0.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-18 19:00:25 +02:00
Hans-Kristian Arntzen 5b73139f18 vkd3d: Fail creation of command signature if DGC is not supported.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-12 14:31:53 +02:00
Hans-Kristian Arntzen 766da69afb vkd3d: Also add profiles for RE3/RE7.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 15:58:21 +02:00
Hans-Kristian Arntzen b7a960f94f vkd3d: Also add RE workaround for RE2 DXR.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 15:58:21 +02:00
Hans-Kristian Arntzen ee39209798 vkd3d: Add flag to force native FP16 paths.
Apparently RT shaders in RE Engine require min16float to
be implemented as native FP16. Fun ... ._.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 15:58:21 +02:00
Hans-Kristian Arntzen afb87e013f vkd3d: Add per-application feature overrides.
With DXR, it seems like some applications require other FL 12.2 features
to be enabled even if they are not actually used. Various RE engine
titles seem to be affected by this.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 15:58:21 +02:00
Hans-Kristian Arntzen f704cb9776 vkd3d: Use index type LUT for DGC.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 15:14:13 +02:00
Hans-Kristian Arntzen e17a7cb40c vkd3d: Attempt to reuse application indirect command buffer.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 15:14:13 +02:00
Hans-Kristian Arntzen 3b8a13e63d vkd3d-shader: Implement robust UAV counters.
It's technically undefined to use NULL UAV counters,
but drivers all implement some form of robust behavior here
when presented with NULL counters, so we'll have to follow suit.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 15:07:47 +02:00
Hans-Kristian Arntzen 65804bbde5 vkd3d: Ignore cpu_access_domain when reporting heap tier.
For host visible, we only place buffers anyways.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:59:24 +02:00
Hans-Kristian Arntzen 233ff38175 vkd3d: Force LINEAR images to be allocated as committed resources.
We have no way of expressing size / alignment requirements to
applications since the API query does not provide us with heap
information. Reuse the fallback path for promoting placed to committed.

Guardians of the Galaxy hits a case where it tries to place 3x
host-visible 3D images in one heap, and they end up overlapping in
memory due to a 16x16x80 3D texture taking up far less space in optimal
tiling compared to linear tiling on AMD.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:59:24 +02:00
Hans-Kristian Arntzen 4a07d9c038 debug: Add concept of implicit instance index to debug ring.
For internal debug shaders, it is helpful to ensure in-order logs when
sorted for later inspection.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:59:00 +02:00
Hans-Kristian Arntzen bcdac3180a debug: Make Instance sorting easier.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:59:00 +02:00
Hans-Kristian Arntzen df11b5ba5a debug: Pretty-print execute template debug messages.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:59:00 +02:00
Hans-Kristian Arntzen e138a5117a vkd3d: Encode in detail which commands we're emitting in template.
Feed this back to debug ring for less cryptic logs.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:59:00 +02:00
Hans-Kristian Arntzen 96fdb71ae4 vkd3d: Refactor out patch command token enum.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:59:00 +02:00
Hans-Kristian Arntzen fe707989fe vkd3d: Clamp command count in execute indirect path.
Shouldn't be required, but take no chances.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:59:00 +02:00
Hans-Kristian Arntzen 6d3c5d53b0 vkd3d: Add debug ring path for execute indirect template patches.
Somehow inspect draw parameters this way.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:59:00 +02:00
Hans-Kristian Arntzen f93a581dae vkd3d: Trace breadcrumbs for execute indirect templates.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:59:00 +02:00
Hans-Kristian Arntzen eda0b2fab2 vkd3d: Do a best effort in handling COLLECTION local static samplers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:58:19 +02:00
Hans-Kristian Arntzen 7f5dbcfc40 vkd3d: Add workaround to allow identifiers to be queried from library.
CP77 relies on this to work somehow ...
The DXR spec seems to suggest this is allowed, but there is no direct
concept for this in Vulkan.

This seems to work on NVIDIA at least, but we're on very shaky ground
here ...

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:34:34 +02:00
Hans-Kristian Arntzen d333159c86 vkd3d: Disallow querying identifiers from COLLECTION objects.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:34:34 +02:00
Hans-Kristian Arntzen 74eb676cfb vkd3d-shader: Normalize root signature compatibility hashing.
The hash should only depend on the raw byte stream, not the entire DXBC
blob. Useful now since we can declare root signatures either through
DXBC blob or as RDAT object (which is raw).

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:34:34 +02:00
Hans-Kristian Arntzen b34931eb17 vkd3d: Log how shader identifiers are queried.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:23:38 +02:00
Hans-Kristian Arntzen 7410f53912 vkd3d: Add debug ring support to raytracing shaders.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:23:38 +02:00
Hans-Kristian Arntzen 03fdbac59e vkd3d: Dump TraceRays parameters to breadcrumbs.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:04:38 +02:00
Hans-Kristian Arntzen 7832eeb60d vkd3d: Add detailed tracing for RTPSO creation.
So much state floating around ...

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:04:38 +02:00
Hans-Kristian Arntzen 8a94c3ce0e vkd3d: Add more detailed breadcrumb logging for TraceRays.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:04:38 +02:00
Hans-Kristian Arntzen ddb425c5cb vkd3d: Add support for tag logging in breadcrumbs.
To keep things simple, outer code is responsible for keeping string
alive. Intended to be used for RTPSO entry point name debugging.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 14:04:38 +02:00
Hans-Kristian Arntzen b88b04e4f1 vkd3d: Rewrite how submodules are associated with exports.
Handle embedded DXIL subobjects and fix various issues exposed by the
upcoming new tests.

Associating with global root signatures, shader config and pipeline
config needs to be rewritten so that we validate uniqueness late.

The strategy here is to look at all exports we care about and find an
association.

There are many priority levels which are implied by how I understand the
DXR docs. State objects in the API win over embedded DXIL state objects.
Any DXIL state object wins over a collection.

Hit group associations can trump an entry point. It's not entirely clear
how this works, but we let it win if it has higher priority, i.e.
an explicit association directed at the hit group.

There's also cases where explicit assignment trumps explicit default
assignment, which then trumps just declaring a state object.

Collection state is inherited in some cases like AddToStateObject() even
if this seems to be undocumented behavior.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 13:41:06 +02:00
Hans-Kristian Arntzen 4a121b9aaa vkd3d-shader: Forward RDAT subobjects.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 12:37:34 +02:00
Hans-Kristian Arntzen 0ef6a8b798 vkd3d: Expose utility for creating root signature from raw blob.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 12:37:34 +02:00
Hans-Kristian Arntzen 49b6e67e7d vkd3d-shader: Expose entry point for raw root signature parsing.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 12:37:34 +02:00
Hans-Kristian Arntzen 2ef3fd469c vkd3d-common: Add strequal_mixed between WCHAR and ASCII.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 12:37:34 +02:00
Hans-Kristian Arntzen 22778b99be vkd3d: Handle default global root signature in RTPSO.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 12:37:34 +02:00
Hans-Kristian Arntzen 3c92b3a1bc vkd3d: Implement AddToStateObject().
This is barely implementable, and relies on implementations to do kinda
what we want.

To make this work in practice, we need to allow two pipelines per state
object. One that is created with LIBRARY and one that can be bound. When
incrementing the PSO, we use the LIBRARY one.

It seems to be allowed to create a new library from an old library.
It is more convenient for us if we're allowed to do this, so do this
until we're forced to do otherwise.

DXR 1.1 requires that shader identifiers remain invariant for child
pipelines if the parent pipeline also have them.
Vulkan has no such guarantee, but we can speculate that it works and
validate that identifiers remain invariant. This seems to work fine on
NVIDIA at least ... It probably makes sense that it works for
implementations where pipeline libraries are compiled at that time.

The basic implementation of AddToStateObject() is to consider
the parent pipeline as a COLLECTION pipeline. This composes well and
avoids a lot of extra implementation cruft.

Also adds validation to ensure that COLLECTION global state matches with
other COLLECTION objects and the parent. We will also inherit global
state like root signatures, pipeline config, shader configs etc when
using AddToStateObject().

The tests pass on NVIDIA at least.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 12:11:27 +02:00
Hans-Kristian Arntzen 8473355a98 vkd3d: Hold private ownership over global root signature.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 11:49:44 +02:00
Hans-Kristian Arntzen 1438ff5637 vkd3d: Allow different but compatible global root signature objects.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-07-11 11:49:44 +02:00