Avoid using the separate layouts if we're only using formats with one
aspects. This makes it more likely to match layouts with common layout,
and we can avoid awkward transition barriers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The spec is pretty clear that it's invalid usage. Return E_INVALIDARG
like native drivers.
This is a workaround for the inventory GPU hang with Cyberpunk 2077
which is actually a game bug. Luckily the game handles this error
properly.
The problem is that the game always assume that an image with 2 mips
is smaller than the same image but with 6 mips. This is not always
true if the swizzle mode is different and a recent Mesa update changed
that. Then the game creates a D3D12 heap that is too small and this
triggered a memory violation and then a GPU hang with RADV.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Our internal copy shaders are fine, but we get benign errors about
sample count being wrong since we alias descriptors.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We cannot use the memory requirement output, since we will zero-clear
memory with a size that might be larger than the VkBuffer size.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Some games end up writing the wrong descriptor type when using null
descriptors, and to be robust against that, we have to clear out
all descriptors when creating null descriptors.
If we copy a null descriptor, we will also have to copy from all sets.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
In cases where acquire image is blocking, we should call that after
presentation to avoid latency when the app calls present.
This avoids weird inverse frame cadences with Mesa WSI right now,
as acquiring an image is always a blocking call until it is complete.
In cases when we aren't blocking, this kicks off the acquisition so
it can be waited upon by the next present blit pass.
Use another set of semaphores to wait for the image acquisition on the
GPU.
In the non-blocking vkAcquireNextImageKHR case, this means that a
potential bubble of time between waiting on the fence and submitting
the blit + presentation is eliminated.
Runaway presentation in this setup is avoided by frame latency objects
and normal frame latency which is always 3 according to documentation.
Be careful about handling SUBOPTIMAL. Semaphores will be signaled, but
we might want to tear down the swapchain. In these cases, we need to
wait for the semaphore to be signaled first, which can only be done by
submitting a wait, since QueueWaitIdle or DeviceWaitIdle don't cover
WSI.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Co-authored-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Documentation says that this should always be 3 without WAITABLE_OBJECT
unlike in D3D11 where it will use the DXGI device's frame latency.
This stops runaway presentations in the non-blocking acquire image case
with the new semaphore setup.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Fixes test TODOs. Apparently Vulkan drivers can saturate here, which
caused the TODO to appear, at least on AMD Windows.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Roll this into vkd3d_device_create_info, no need for this to be a pNext thing.
Additionally, fix some memory leaks on device creation failure.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Roll this into vkd3d_instance_create_info, no need for this to be a pNext thing.
Additionally, fix some memory leaks on instance creation failure.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Needed so we can switch between having a VRS and non-VRS attachment on the fly.
Extensible enough for this to work for other things down the line also.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
C is fun, yo. Returned data from dead stack variable, also triggered
overflow in some cases.
Uncalled in release mode, but can crash debug builds.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Otherwise we can do an alligned_malloc with a non-aligned size as the descriptor size is 48 for a d3d12_rtv_desc otherwise.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
This can happen if the fence thread starts with a delay and
the queue gets destroyed shortly after being created.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
There isn't much of a reason why we should have to do this. The original
implementation was more of a hack if anything.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Increment physical value one by one, find the exact timeline value we're
supposed to signal and perform the update.
Select lowest physical timeline value correctly.
Array can be reordered now, so lowest value isn't necessarily first.
Fixes some super weird hangs in Control DXR.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Rather than one per device. This solves issues with D3D12 fences
being signalled too late because the fence worker is waiting on
a different set of semaphores while the fence is being enqueued.
Greatly increases performance in Horizon Zero Dawn and Death
Stranding with multi-queue mode enabled.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
This will be necessary once we introduce fence workers per
command queue, since we cannot reliably store pointers to
queues.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Replaces d3d12_device_get_vkd3d_queue when mapping D3D12
command queues to Vulkan device queues.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
The implemnentation is not complete enough to safely enable it, since
some games will try to create RTPSOs by default, leading to crashes.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
It's really supposed to load 4 components and ignore. RGB16 is not
mandatory, so just use the "expected" formats after all.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Compacts ranges and only issues one bind for buffer ranges and
full subresource updates, rather than one bind per tile.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Bindless CBV is *pretty* bad on NVIDIA, so add a code path which can
promote descriptor table CBVs into push descriptors.
We can safely do this with Root Signature 1.1 STATIC or
the somewhat obscure STATIC_KEEPING_BUFFER_BOUNDS_CHECKS.
With VOLATILE, which basically all titles are using,
we can still force this behavior through a config flag,
but this is an incorrect speed hack. It works in most
titles however, since bindless CBV is exceptionally rare.
We only hoist descriptors when the root signature range has 1 descriptor
anyway, so we should avoid any reasonable bindless scenario.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
There is no need to scan through the Vulkan format list,
especially since texel buffer creation happens in the hot path
in cases where we know we need to create R32UI texel buffer views.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We might have to emit to different bind point than our binding entry
suggests due to DXR, so pass down information explicitly to leaf
functions.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Refactor push constant invalidation to SetPipelineState,
it is technically more correct to only invalidate when actually pushing
constants, but we need to do full state invalidation when transitioning
between RT pipelines and non-RT pipelines due to bind point aliasing
shenanigans in D3D12, so it makes more sense to invalidate state based
on active bind point there.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Enabling VK_EXT_debug_utils comes at some overhead in Wine due to the object tracking required. There is also likely a non-zero overhead in some native implementations also.
By enabling this conditionally, we can also avoid additional overhead from apps that set debug labels on both the Vulkan and front-end side.
The default condition is to enable it when building with Renderdoc integration or in debug builds.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
This thing has no right to exist.
We don't get this information in D3D12 and it's getting in the way of me refactoring config flags.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Simplifies this to make it easier to add new properties/features
so we don't have a bunch of pointers to things that are just a child
of the device info structure.
Fixes warnings when compiling without traces too.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Gives a massive boost on NVIDIA for some reason.
RADV defers push constant update, so ALL_STAGES doesn't have
that much of a perf hit.
~20% uplift in RE2, ~5% uplift in CP77 from some quick and dirty testing.
Seems to be heavily content dependent either way.
Also a bug fix, since we would clobber graphics push constants from
compute and vice versa if both graphics and compute used the same root
signature.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
As per MSDN, SetName is just a wrapper around SetPrivateData and a specific GUID.
Some apps and tools will use this to retrieve their name back.
So instead, just forward the name to Vulkan in the SetPrivateData call.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Instead, infer the required stages from the D3D12 shader visibility
field from all root parameters that we map to push constants.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
There are pragmatic reasons for not following spec 100% here.
The only known case where UpdateAfterBind robustness is not exposed
seems to be somewhat bogus, and we cannot run D3D12 correctly without
robustness either way.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Can only support a subset in Vulkan without extra heroics. The DXR API
lets you query things that you technically should know apriori in the
application. We might need to allocate some side-channel buffers on
demand, but let's defer that until actually needed ... :\
DXR is also very awkward in that we have a query which is resolved in
UNORDERED_ACCESS state instead of COPY_DEST state, so we'll have to
ping-pong through some barriers redundantly.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When building acceleration structures, we need to have an
VkAccelerationStructureKHR object, but the D3D12 API just uses a plain
VA = ID3D12Resource::GetGPUVA() + offset.
For this to work, we need to resolve the VA back to VkBuffer + offset.
The only VkBuffer we can lookup is the original backing memory
allocation in the VA map, and that allocation itself must own a view
map, since we cannot tie the VA to any specific ID3D12Resource.
Since creating an RTAS is not the common path, we allocate the view map
on-demand with CAS.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>