Ignore any indexed draw calls which uses a NULL index buffer.
This is not fully correct, but there is no easy way to emulate D3D12
behavior exactly.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We cannot compare resource pointers or view pointers,
since the pointers might have been recycled.
This leads to a scenario where we're not updating descriptors we're
supposed to, and the GPU reads a stale descriptor.
Fixes a GPU hang in Death Stranding (and possibly lots of other weird
crashes as well).
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For correctness, we will need to defer any initial resource state
handling to the queue timeline. Here, we will build an UNDEFINED ->
common layout barrier if (and only if):
- The resource is marked to care about initial layout transition.
- We are the first queue thread to observe that initial_transition
member is 1 (atomic exchange).
- The first use of the resource was not marked to be a discard.
E.g., if the first use of the resource is an alias barrier, we must
not emit an early barrier. The only we should do here is to clear the
initial_transition member, and leave it like that.
A command list maintains a list of d3d12_resources which *might* need a
transition. For the first frame a resource is used (or so), it will not
have the flag cleared yet, so multiple command lists might add the
d3d12_resource to its own transition list. This is fine, as the queue
will resolve it.
If multiple queues see the same initial transition, there might be
shenanigans, but the application must ensure there is either a
submission boundary or fence boundary between the uses. Any initial
layout transition will only be submitted after a Wait() is observed, as
submission of the transition command buffer will be in-order with other
submissions.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This is used extensively by Horizon Zero Dawn, and allows us
to skip the compile screen after the initial first run.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Unused now, instead we should implement D3D12 caching primitives
correctly and rely on the Vulkan driver otherwise.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
There is no resource state associated with this, so emit the barrier at
the end of a command buffer based on trivial tracking.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Need to handle large (> 4G) jumps in timeline value, which is not
supported by all implementations.
There is no good way to handle that, so rewrite and clean up timeline
semaphore handling by separating the timeline into a virtual timeline
(which can rewind and jump around arbitrarely) and a physical timeline
which increments by one each time.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
These memory types might end up being used as fallback memory types,
which is problematic due to their tiny sizes, and unexpected performance
behavior. Generally, when we want to fallback, we should cleanly fall
back to system memory rather than a different device local type.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Manages unique static samplers for now, in order to reduce duplicates.
Can be extended to also manage descriptor pools for static samplers in
the future.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
D3D12 allows much larger pools to be created for heaps that are not
shader-visible, which some games make use of. Fixes crashes on Nvidia.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Stores info about where exactly the descriptor is stored in the
Vulkan descriptor pool, and whether we have to worry about an
additional UAV counter descriptor.
This is meant to replace all the other non-static data stored
inside d3d12_desc.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We're not using these anywhere because we need formats to be correct
for image views. Buffer views are used for root descriptors and null
UAV counters.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
When we're using extended dynamic state, we will often end up with dummy
pipeline binds, which we should try to avoid if we can.
Also avoids having to rebind dynamic state redundantly.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Cleans up dynamic state such that we do not have to keep dynamic state
create infos around.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fall back when there is a mismatch, which can happen if application does
not declare inputs to hull shader (unlikely).
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When using EXT_extended_dynamic_state, we will be able to compile a
master pipeline. Only in special cases will we have to fallback.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This can happen in the worst case where we have all bindless sets, and:
- Static samplers
- Packed descriptors (UAV counters on drivers without support for this)
- Root descriptors
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We need these for the upcoming swapchain factory implementation
for standalone D3D12.
They're also probably good to have around in future for the
d3d12 device.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
This commit moves the module handling code which was previously dumped in device.c and the code to retrieve the current executable path to its own file.
This also eliminates HAVE_DECL_PROGRAM_INVOCATION_NAME from config.h
Signed-off-by: Joshua Ashton <joshua@froggi.es>
debug_marker/debug_report are both deprecated in favor of debug_utils and vkd3d was using marker in a
buggy way anways, as debug_marker requires debug_report to work, but it was
only conditionally enabled.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Gets rid of the full barrier on command buffer end.
Instead, do what D3D12 wants, which is to serialize all
ExecuteCommandLists. Simplify the existing timeline sempahore setup for
sparse queues and use it for all submissions.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We'll need this to more accurately select the memory type for D3D12
heaps based on which resources are allowed to be placed in it.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Need float16_int8 and subgroup with extended types to implement new SM
6.2 features. For now, skip over SM 6.1 features until someone makes use
of them.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Some sparse resource may have a metadata aspect on some drivers,
which needs to be bound before the image can be used in any way.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This will serve as a fallback if at least one queue family
does not support sparse binding.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This allows us to perform clears inside the render pass even if
the render pass hasn't been started at the time of the clear yet.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We need access to the resource in order to perform render pass
layout transitions, just the view handle isn't enough.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Passing the main struct to the public functions allows us
to share common data between multiple types of operations.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
It makes sense to separate this from d3d12_create_sampler since static
samplers and regular samplers differ in border color support.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Works around an app-bug in SotTR, where the command pool is reset before
the command buffer completes.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
D3D12 supports out-of-order signal and wait. So does Vulkan timeline
semaphores. However, in Vulkan we don't have an infinite amount of
virtual queues. We must potentially map multiple D3D12 queues on top of
Vulkan, which might lead to a deadlock when app attempts to
wait-before-signal if the two queues are mapped to the same physical
Vulkan queue.
In order to solve this, we need to hold back submissions until we know
it is safe to do so. To make this work in practice as simply as possible, each
ID3D12CommandQueue has its own submission thread, which will block on an
ID3D12Fence's pending timeline value for a Wait command. The main reason to use a
submission thread is that resolving this directly in
ID3D12CommandQueue::Signal is extremely tricky and potentially
needs recursively locking queues and fences.
Note that we only block on the pending wait value, not the actual wait
value, so there is no real CPU <-> GPU synchronization here. In the
common case, no submission thread will block.
The added benefit is that submits are async now, so main thread CPU
overhead might slightly decrease.
To play nice with DXGI swapchain, the external entry point for acquiring
the Vulkan queue needs to drain the submission thread and lock it to ensure
submissions happen in order.
Fixes hangs in The Division 1, which makes use of this D3D12 feature.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The current code uses D3D12 abstractions to create pipelines but
issues raw Vulkan API calls to actually implement the functionality,
which means the code makes assumptions about the exact descriptor
set layout and push constant layout, which is generally a bad idea
now that we have multiple code paths for root constants etc.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Prepares for a rewrite of queue submission, the legacy path is never
run in practice and will likely break in subtle ways.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We're going to need this to implement other parts of the
API, so it should be in common code.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
And expose the following feature cap on capable GPUs:
VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
And add a function to (re-)apply dynamic state as necessary. This
will allow us to ignore dynamic state not needed by the pipeline,
and may become necessary if we implement shader-based copies etc.
Currently unused; the following commits will subsequently change
state setting methods over.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This way we don't have to change all function parameter types
every time we upgrade the interface version.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We need a more extensible struct to contain the pipeline
descriptions in order to be able to support new rendering
features.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
We'll add this to the root descriptor set since moving the binding
to one of the bindless sets would be hard to do; we'd need to track
the binding index of each "bindless" binding for set updates etc.
In order to stay within the limit of 8 sets, we also cannot introduce
a separate set for UAV counters (currently there are 6 bindless sets,
the static sampler set, and the root descriptor set).
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Greatly improves performance in various games that update or
copy a large number of descriptors per frame due to the high
overhead of pthread_mutex_{un}lock.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Meant to bundle all d3d12 feature caps and options, of which
we're going to have to add more over time.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Logically split up descriptor pool allocation in three types:
- STATIC: Root descriptors and internal allocation.
- VOLATILE: For packed descriptor set which comes from heaps.
- IMMUTABLE_SAMPLER: For immutable samplers. This should be removed once
we start allocating sampler sets at sampler creation time.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For now this is enbaled based on device capabilities, but future changes
may require this to be disabled for certain root signatures.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
When changing tables that only have bindless descriptors,
only update the push constants instead.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Don't enable any bindless features for now so that we don't
introduce regressions as features get added.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Uses the new data structures to iterate over descriptor
tables and populate the packed descriptor set.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Further changes will require a rework of how resource binding
works inside a command list, so for now, this is just a cleanup
that also removes some old code that is no longer needed.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Static samplers are embedded in the root signature, so we can create
a separate descriptor set layout and descriptor set which we only
need to rebind when the root signature itself changes.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Updates the root descriptor set or push descriptor at draw time.
This fixes a potential issue with shader-based clear/copy commands
invalidating previously bound root descriptors.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Removes some unused counters and repurposes the existing ones to
differentiate between bindings (i.e. the array passed to the shader
compiler) and packed descriptors.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Fixes an issue where push constants can be invalidated by
shader-based clear/copy commands.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Uses one push constant range with VK_SHADER_STAGE_ALL. This
will allow us to easily add descriptor table offsets as push
constants.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
An upcoming change to the binding model will use these to
initialize descriptors that have the wrong resource type
bound, or were left uninitialized by the application.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This needs a major rework as the current implementation has bugs,
is hard to reason about, and very hard to maintain as we're about
to make major changes to the binding model as a whole.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Resource index is found in idx[0] in SM 5.0, but idx[1] when using SM
5.1, and register space is encoded separately. An rb_tree keeps track of
the internal resource index idx[0] and can map that to space/binding as
required when emitting SPIR-V.
For this to work, we must also make UAV counters register space aware.
In earlier implementation, UAV counter mask was assumed to correlate 1:1
with register_index, which breaks on SM 5.1.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Greatly reduce VA allocations we have to make and makes returned VA more
sensible, and better matches returned VAs we see on native drivers.
D3D12 usage flags for buffers seem generic enough that there is no
obvious benefit to place smaller VkBuffers on top of VkDeviceMemory.
Ideally, physical_buffer_address is used here, but this works as a good
fallback if that path is added later.
With this patch and previous VA optimization, I'm observing a 2.0-2.5%
FPS uplift on SOTTR when CPU bound.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
ID3D12GraphicsCommandList2 and WriteBufferImmediate() are used by
Hitman 2, but implementing the function on top of an AMD extension has
no effect on game behaviour. It's commonly used to write debug info.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Addresses the following limitations of the previous implementation:
- Only R32_{UINT,TYPELESS} were supported for buffers.
- Clearing an image UAV did not behave correctly for images with non-UINT formats.
- Due to the use of transfer operations, extra memory barriers were needed.
If necessary, this will create a temporary view with a bit-compatible
UINT format for the resource in order to perform a bit-exact clear.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This also fixes a format specifier warning in an ERR for the 32-bit Linux
build.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Currently, vkd3d_view_destroy_descriptor assumes image views
by default, but we need to be able to attach buffer views to
command allocators for UAV clears.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
The additional data is needed to implement UAV clears.
Moving this out of d3d12_desc also helps make copying and
traversing descriptor arrays more CPU cache-friendly.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Shadow of the Tomb Raider does not re-bind all descriptor tables after
setting a new root signature if tessellation is enabled, which causes
some descriptors to be left undefined.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
The GPU VA allocator was allocating memory in a way where dereferencing
GPU VA required a lock + bsearch() to find the right VA range.
Rather than going this route, we turn the common case into O(1) and
lock-free by creating a slab allocator which allows us to lookup a
pointer directly from a GPU VA with (VA - Base) / PageSize.
The number of allocations in the fast path must be limited since we
cannot trivially grow the allocator while remaining lock-free for
dereferences.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Shadow of the Tomb Raider overwrites descriptors while they are being
copied in another thread. This patch makes reads and writes atomic for
CBV, SRV, UAV, and sampler descriptors, but not RTV and DSV, for which
copying is not implemented.
Benchmark total frames vs mutex count (the single mutex was locked
only once for copying):
1 mutex: 6480 6489 6503
8 mutexes: 6691 6693 6661
16 mutexes: 6665 6682 6703
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Enables ReadFromSubresource() to succeed in cases where it would have
failed otherwise.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Order of structures doesn't matter so we can simply prepend instead of
apending.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
VkDeviceMemory must be externally synchronized.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
The condition in d3d12_resource_is_cpu_accessible() is going to be
changed in the following commits.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Predicate arguments which are only non-zero in bit 32 or higher are not
supported. Predicates will not be applied to clear and copy commands because
Vulkan does not support predication of these command classes.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
We maintain separate arrays for enqueued fences and fences owned by the
fence worker thread.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
It isn't immediately obvious what "1u << graphics->rt_count" means.
Use dsv_attachment_mask() helper instead.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Use the last attachment for depth-stencil instead of the first.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>