Passing the main struct to the public functions allows us
to share common data between multiple types of operations.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Works around an app-bug in SotTR, where the command pool is reset before
the command buffer completes.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
D3D12 supports out-of-order signal and wait. So does Vulkan timeline
semaphores. However, in Vulkan we don't have an infinite amount of
virtual queues. We must potentially map multiple D3D12 queues on top of
Vulkan, which might lead to a deadlock when app attempts to
wait-before-signal if the two queues are mapped to the same physical
Vulkan queue.
In order to solve this, we need to hold back submissions until we know
it is safe to do so. To make this work in practice as simply as possible, each
ID3D12CommandQueue has its own submission thread, which will block on an
ID3D12Fence's pending timeline value for a Wait command. The main reason to use a
submission thread is that resolving this directly in
ID3D12CommandQueue::Signal is extremely tricky and potentially
needs recursively locking queues and fences.
Note that we only block on the pending wait value, not the actual wait
value, so there is no real CPU <-> GPU synchronization here. In the
common case, no submission thread will block.
The added benefit is that submits are async now, so main thread CPU
overhead might slightly decrease.
To play nice with DXGI swapchain, the external entry point for acquiring
the Vulkan queue needs to drain the submission thread and lock it to ensure
submissions happen in order.
Fixes hangs in The Division 1, which makes use of this D3D12 feature.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The current code uses D3D12 abstractions to create pipelines but
issues raw Vulkan API calls to actually implement the functionality,
which means the code makes assumptions about the exact descriptor
set layout and push constant layout, which is generally a bad idea
now that we have multiple code paths for root constants etc.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Prepares for a rewrite of queue submission, the legacy path is never
run in practice and will likely break in subtle ways.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
And add a function to (re-)apply dynamic state as necessary. This
will allow us to ignore dynamic state not needed by the pipeline,
and may become necessary if we implement shader-based copies etc.
Currently unused; the following commits will subsequently change
state setting methods over.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
All attachments must be at least as large as the framebuffer, using a
max operator is not compliant with Vulkan.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Logically split up descriptor pool allocation in three types:
- STATIC: Root descriptors and internal allocation.
- VOLATILE: For packed descriptor set which comes from heaps.
- IMMUTABLE_SAMPLER: For immutable samplers. This should be removed once
we start allocating sampler sets at sampler creation time.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For now this is enbaled based on device capabilities, but future changes
may require this to be disabled for certain root signatures.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
When changing tables that only have bindless descriptors,
only update the push constants instead.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Uses the new data structures to iterate over descriptor
tables and populate the packed descriptor set.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Static samplers are embedded in the root signature, so we can create
a separate descriptor set layout and descriptor set which we only
need to rebind when the root signature itself changes.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Updates the root descriptor set or push descriptor at draw time.
This fixes a potential issue with shader-based clear/copy commands
invalidating previously bound root descriptors.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Allows us to more easily refactor root signature-related code
without having to worry about root descriptors for now.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Fixes an issue where push constants can be invalidated by
shader-based clear/copy commands.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Uses one push constant range with VK_SHADER_STAGE_ALL. This
will allow us to easily add descriptor table offsets as push
constants.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The primary purpose of this function was to invalidate UAV
counters upon binding a pipeline. This is no longer an issue
and we don't have any other per-pipeline bindings, so this
function can be dropped.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This needs a major rework as the current implementation has bugs,
is hard to reason about, and very hard to maintain as we're about
to make major changes to the binding model as a whole.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Resource index is found in idx[0] in SM 5.0, but idx[1] when using SM
5.1, and register space is encoded separately. An rb_tree keeps track of
the internal resource index idx[0] and can map that to space/binding as
required when emitting SPIR-V.
For this to work, we must also make UAV counters register space aware.
In earlier implementation, UAV counter mask was assumed to correlate 1:1
with register_index, which breaks on SM 5.1.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Greatly reduce VA allocations we have to make and makes returned VA more
sensible, and better matches returned VAs we see on native drivers.
D3D12 usage flags for buffers seem generic enough that there is no
obvious benefit to place smaller VkBuffers on top of VkDeviceMemory.
Ideally, physical_buffer_address is used here, but this works as a good
fallback if that path is added later.
With this patch and previous VA optimization, I'm observing a 2.0-2.5%
FPS uplift on SOTTR when CPU bound.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
ID3D12GraphicsCommandList2 and WriteBufferImmediate() are used by
Hitman 2, but implementing the function on top of an AMD extension has
no effect on game behaviour. It's commonly used to write debug info.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This method was missing in version 10.0.15063.0 of the SDK, but is
present in version 10.0.18362.0, without a UUID change. Presumably that
means this was simply an omission in the older header, rather than an
API change in the newer header.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
There is no bit-compatible UINT format, so we'll use DXGI_FORMAT_R32_UINT.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Addresses the following limitations of the previous implementation:
- Only R32_{UINT,TYPELESS} were supported for buffers.
- Clearing an image UAV did not behave correctly for images with non-UINT formats.
- Due to the use of transfer operations, extra memory barriers were needed.
If necessary, this will create a temporary view with a bit-compatible
UINT format for the resource in order to perform a bit-exact clear.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Needed to support compute-based clear and copy operations.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
The additional data is needed to implement UAV clears.
Moving this out of d3d12_desc also helps make copying and
traversing descriptor arrays more CPU cache-friendly.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Shadow of the Tomb Raider does not re-bind all descriptor tables after
setting a new root signature if tessellation is enabled, which causes
some descriptors to be left undefined.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
If separate transitions of the depth and stencil plane occur in the
same array of barriers, they will be consolidated into one Vulkan
layout transition. This can only be supported for combinations of
depth read and depth write states, or identical states.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
By setting this flag, command pools cannot efficiently pool allocations.
This flag should be set to 0 so only the VkCommandPool may be reset.
This matches D3D12 API.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
D3D12 command allocators are intended to recycle memory across resets,
so we should do the same thing in vkd3d.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
The condition in d3d12_resource_is_cpu_accessible() is going to be
changed in the following commits.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
The Vulkan spec says:
"Conditional rendering must also either begin and end inside the same
subpass of a render pass instance, or must both begin and end outside
of a render pass instance (i.e. contain entire render pass instances)."
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Predicate arguments which are only non-zero in bit 32 or higher are not
supported. Predicates will not be applied to clear and copy commands because
Vulkan does not support predication of these command classes.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This might delay updating a d3d12_fence when a fence enqueued later than
other fences is signaled before them. On the other hand, it
significantly reduces CPU usage. I haven't found a program negatively
impacted by this change so far.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
We maintain separate arrays for enqueued fences and fences owned by the
fence worker thread.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
The debug log level is demoted to WARN after the FIXME is printed once.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
It isn't immediately obvious what "1u << graphics->rt_count" means.
Use dsv_attachment_mask() helper instead.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
We use transfer operations instead of unordered access.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Used by Resident Evil 2.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Use the last attachment for depth-stencil instead of the first.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Fixes a regression introduced by
9eba55403d.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>