We're going to need more capabilities outside the 0-63 range
going forward, so a bitmask doesn't cut it and adding extra
struct members for each capability seems excessive.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Resource index is found in idx[0] in SM 5.0, but idx[1] when using SM
5.1, and register space is encoded separately. An rb_tree keeps track of
the internal resource index idx[0] and can map that to space/binding as
required when emitting SPIR-V.
For this to work, we must also make UAV counters register space aware.
In earlier implementation, UAV counter mask was assumed to correlate 1:1
with register_index, which breaks on SM 5.1.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
On Windows, it is not ideal to rely on Vulkan being available as a
linkable library as a full install of the Vulkan SDK must be present and
set up, be friendly and load Vulkan dynamically instead.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Cannot disable VK_EXT_descriptor_indexing as we relied on internal
behavior in RADV related to global_bo_list. Implementing bindless
properly in vkd3d will solve this correctly.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Otherwise, we delcare certain input control points twice in shaders that
access them in a fork phase, which is not allowed as per Vulkan spec:
"Any two inputs listed as operands on the same OpEntryPoint must not
be assigned the same location, either explicitly or implicitly"
Fixes invalid SPIR-V and resulting RADV driver crashes in Metro Exodus.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Greatly reduce VA allocations we have to make and makes returned VA more
sensible, and better matches returned VAs we see on native drivers.
D3D12 usage flags for buffers seem generic enough that there is no
obvious benefit to place smaller VkBuffers on top of VkDeviceMemory.
Ideally, physical_buffer_address is used here, but this works as a good
fallback if that path is added later.
With this patch and previous VA optimization, I'm observing a 2.0-2.5%
FPS uplift on SOTTR when CPU bound.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The alignments are now checked in d3d12_resource_validate_desc().
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This would cause CoreValidation-Shader-InterfaceTypeMismatch validation
errors from Wine's test_shader_interstage_interface() d3d11 test. This
reverts parts of commits 1eb7eca411 and
04ec461fb4.
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
ID3D12GraphicsCommandList2 and WriteBufferImmediate() are used by
Hitman 2, but implementing the function on top of an AMD extension has
no effect on game behaviour. It's commonly used to write debug info.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This method was missing in version 10.0.15063.0 of the SDK, but is
present in version 10.0.18362.0, without a UUID change. Presumably that
means this was simply an omission in the older header, rather than an
API change in the newer header.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
The right place for alignment validation is d3d12_resource_validate_desc().
The mod alignment test, which returns a size of ~0 on failure, is incorrect
on systems where Vulkan requires alignments of 0x20000 or more, and breaks
Hitman 2, which uses the returned value unchecked and allocates heaps of
0xffffffff bytes.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Hitman 2 calls GetHeapProperties() for each swapchain buffer and checks if
the creation node mask is 1. If not then it fails to store the resource
pointers for later rendering.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
There is no bit-compatible UINT format, so we'll use DXGI_FORMAT_R32_UINT.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Addresses the following limitations of the previous implementation:
- Only R32_{UINT,TYPELESS} were supported for buffers.
- Clearing an image UAV did not behave correctly for images with non-UINT formats.
- Due to the use of transfer operations, extra memory barriers were needed.
If necessary, this will create a temporary view with a bit-compatible
UINT format for the resource in order to perform a bit-exact clear.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Needed to support ClearUnorderedAccessViewUint() for all formats.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This also fixes a format specifier warning in an ERR for the 32-bit Linux
build.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Needed to support compute-based clear and copy operations.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Currently, vkd3d_view_destroy_descriptor assumes image views
by default, but we need to be able to attach buffer views to
command allocators for UAV clears.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
The additional data is needed to implement UAV clears.
Moving this out of d3d12_desc also helps make copying and
traversing descriptor arrays more CPU cache-friendly.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This fixes Shadow of the Tomb Raider crashing because of NULL root
signatures being passed since c002aee119.
Signed-off-by: Rémi Bernon <rbernon@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This case needs special care since both VKD3DSPR_INPUT in the
control point phase and VKD3DSPR_INCONTROLPOINT in fork/join
phases refer to the same set of input variables, and we should
not declare input variables with the same location twice.
Encountered in Shadow of the Tomb Raider.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Shadow of the Tomb Raider does not re-bind all descriptor tables after
setting a new root signature if tessellation is enabled, which causes
some descriptors to be left undefined.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Uses the private patch constant array for tessellation factor built-ins.
Fixes two separate issues encountered in Shadow of the Tomb Raider:
- The output registers that have one component mapped to any of
the TESS_FACTOR sysvals can have their other components mapped
to a regular patch constant output, in which case we need to
use a private io variable.
- The tessellation factor outputs are not necessarily dynamically
indexed within shader code. Previously, this did not work correctly
and lead to invalid store operations in the generated SPIR-V.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Line tessellation factors use two different DXBC semantics that
both map to the same SPIR-V built-in. In this case, we cannot
rely on the semantic index.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Private variables are always vec4, so using a sparse write mask here
will lead to invalid code being generated when accessing the variable.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Fork and join phases in hull shaders allow dynamic indexing for
all output registers, not just the tessellation factor built-ins.
Moreover, the patch constant output register space is shared with
join phases, which can read back the outputs computed in the fork
phases, also allowing dynamic indexing.
In order to support this in a not overly complex way, use a private
array representing the entire patch constant space, and use epilogue
functions to assign them to the actual output variables.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Needed to support dynamically indexed output arrays.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Otherwise, if a private variable is used for the given output,
vkd3d_dxbc_compiler_emit_store_shader_output will write to the
private variable again instead of the actual output, and some
outputs may never be emitted. This is common in hull shaders.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
The GPU VA allocator was allocating memory in a way where dereferencing
GPU VA required a lock + bsearch() to find the right VA range.
Rather than going this route, we turn the common case into O(1) and
lock-free by creating a slab allocator which allows us to lookup a
pointer directly from a GPU VA with (VA - Base) / PageSize.
The number of allocations in the fast path must be limited since we
cannot trivially grow the allocator while remaining lock-free for
dereferences.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
If separate transitions of the depth and stencil plane occur in the
same array of barriers, they will be consolidated into one Vulkan
layout transition. This can only be supported for combinations of
depth read and depth write states, or identical states.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Fixes an assertion when compiling shaders with more than four
clip or cull distances. Output arrays are arrays of scalars,
so shifting the write mask is not very meaningful.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
There appears to be a complete implementation of RS 1.1 already,
so enable this feature.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Even if the shader doesn't explicitly declare it.
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
By setting this flag, command pools cannot efficiently pool allocations.
This flag should be set to 0 so only the VkCommandPool may be reset.
This matches D3D12 API.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
D3D12 command allocators are intended to recycle memory across resets,
so we should do the same thing in vkd3d.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
It is possible to map a resource, but not disclose the VA to caller.
This is used for WriteToSubresource.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Shadow of the Tomb Raider overwrites descriptors while they are being
copied in another thread. This patch makes reads and writes atomic for
CBV, SRV, UAV, and sampler descriptors, but not RTV and DSV, for which
copying is not implemented.
Benchmark total frames vs mutex count (the single mutex was locked
only once for copying):
1 mutex: 6480 6489 6503
8 mutexes: 6691 6693 6661
16 mutexes: 6665 6682 6703
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Enables ReadFromSubresource() to succeed in cases where it would have
failed otherwise.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
The "allocations" array is filled with unused entries when D3D12 buffers
are destroyed. The majority of entries might be unused after running for
a while. Remove the entry when VA is freed in order to prevent
accumulation of unused entries. This makes destroying D3D12 buffers more
expensive.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Order of structures doesn't matter so we can simply prepend instead of
apending.
Signed-off-by: Józef Kucia <jkucia@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>