Halo Infinite uses &desc->Width for total_bytes.
We can't set total_bytes early because code after this relies on desc->Width.
Signed-off-by: Robin Kertels <robin.kertels@gmail.com>
Guardians of the Galaxy hits this case. Fallback is to disable depth
attachment entirely in a fallback pipeline.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The 16-byte requirement is kind of a lie. The real requirement is tied
to how vectorized load-store instructions are emitted in the shader
itself since I guess it allows compiler to assume something about
alignment of the base pointer.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
In d3d12, input element alignment needs to be the _minimum_ of 4 and the size of
the type. See the D3D11 spec, section 4.4.6, which behaves similarly:
https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#4.4.6%20Element%20Alignment
This is correctly taken into account when generating, e.g., the
vertex_buffer_stride_align_mask used for validation, but is not taken
into account when D3D12_APPEND_ALIGNED_ELEMENT is used to automatically
place input elements. Currently, vkd3d always assumes the alignment is
4.
This means that, for example, bytes or shorts should be packed tightly
together when D3D12_APPEND_ALIGNED_ELEMENT is used, but are instead
padded to 4 bytes.
Fixing this makes units appear in Age of Empires IV (see vkd3d-proton
issue #880 for examples.)
Signed-off-by: David Gow <david@ingeniumdigital.com>
Wine VKD3D version of my original commit.
Co-authored-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Robin Kertels <robin.kertels@gmail.com>
The Vulkan spec update 1.2.195 restricted these features to a very limited
format subset, and somehow this is supposed to not be an API break?
Anyway, let's follow the new rules.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Fixes a number of issues observed in tessellation shaders,
and potentially geometry shaders, when inputs and/or outputs
are array variables.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
It's common enough that new games break on RDNA2 because of this that we
should enable this by default. This matches DXVK behavior.
SOTTR gets a special weird exception, just like DXVK. The shaders are
broken enough that the proper fix is actually precise, not invariant.
This will be addressed at some later point.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This function fails if the counter overflows.
CP77 hits this case a lot and we should just warn the specific failure
instead of a random error.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Potentially reduces the size of the query map, and makes each entry
versioned so that we no longer have to clear the entire map for multiple
dispatches even if it is sparsely populated.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
If we need to fallback in both VRS and non-VRS scenarios, we need to key
on it. Fixes segfault in DIRT5 when toggling VRS.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The ordinals except for D3D12CreateDevice and GetDebugInterface are not
part of the ABI apparently.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If we don't find a clear association to an entry point,
we can also find it in the hit group.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
parameter_count == NumParameters for local RS since
hoisting is explicitly ignored for those.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
With RTPSOs we might have to create static sampler sets for local root
signatures. In this case we will have to create a compatible pipeline
layout which is equal to global pipeline layout, except for an extra
set.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Useful for test suite since a test can be comprised of several smaller
submissions, and it's easier to debug if we have one trace.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If we deduce that fallback heap allocation is impossible, we will accept
this, and defer allocation to CreatePlacedResource() instead where we make a committed resource.
This breaks aliasing, but in practice, this situation will only arise for render
targets, and it's not like we have a choice in the matter here on NV :\
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When allocating dedicated memory, ignore heap_flag requirements we
deduce from memory info. Any memory type is allowed. This is important
on NV when allocating fallback render targets.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
There are situations where we cannot fallback to system memory, so don't
log that we're going to do so.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Don't attempt to enter memory allocation when we can invalidate a heap
allocation up front. Avoids some dumb edge cases later.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Many UE4 games have this broken bloom shader that samples a texture with implicit lod in divergent control flow.
Fixes Bus Simulator 21
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Width + offset must not overflow in SPIR-V. SM 5+ is well-defined here.
It's enough to just clamp the width against 32 - offset in all cases.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Need to use fallback pipeline system here.
Keep track of active masks for PSO and current render target.
The intersection of those sets are the attachments which should be
active in the render pass.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fix failure in test_create_heap where a TIER_2 host visible heap was
attempted, but failed due to recent DEATHLOOP fixes.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Windows returns E_INVALIDARG at least on AMD and Intel.
Psychonaughts 2 seems to use this as a de facto "do not create"
value, and reasonable vram usage depends on the call failing.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Game attempts to create a host visible resource with
ALLOW_RENDER_TARGET flag. We cannot make this work on NVIDIA, but the
game never seems to actually create an RTV, so as a workaround, nop out
the flag, which does make it work after all :3
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For resizable BAR, we don't want to endlessly promote UPLOAD heaps to
BAR since VRAM is precious. The aim is to set a fixed budget where we
can keep allocating until full, at which point we fall back to plain HOST.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
With BAR budgets, what will happen is that
- Small allocation is requested
- A new chunk is requested
- try_suballocate_memory will end up calling allocate_memory, which
allocates a fallback memory type
- Subsequent small allocators will always end up allocating a new
fallback memory block, never reusing existing blocks.
- System memory is rapidly exhausted once apps start hitting against
budget.
The fix is to add flags which explicitly do not attempt to fallback
allocate. This makes it possible to handle fallbacks at the appropriate
level in try_suballocate_memory instead.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We will need to consider some form of budgeting, so make sure that all
allocation and freeing is done in a central place.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
D3D12 validation layers complain if you try to map mipmapped 3D volumes
for ... some reason. The error is very explicit, so I assume it's
intentional :)
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Need to consider that based on host visibility requirements, we need to
select either LINEAR or OPTIMAL image types, and those tiling modes can
have different memory requirements.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Need to initialize the set mask so that copies happen properly
on default-initialized descriptors. Also, move the current_null_type to
metadata so that it's properly copied on descriptor copy.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
There are titles clearing the same descriptors constantly.
This leads to unnecessary updates that can become costly.
This commit introduces a new flag to track when D3D12 descriptors are
not null, and skips clearing them if they are already null.
Descriptors are assumed to be null by default.
This fixes a performance regression introduced by
9983a1720f
Signed-off-by: Rodrigo Locatti <rlocatti@nvidia.com>
Emitting render pass clears while we're in the process of starting
a render pass overrides dsv layout tracking info.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
D3D12 validation layer errors out, so unless we can prove that specific
behavior is relied upon, we should be okay to just ignore.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Get information directly from vkd3d_format and allow for subsampled
formats in the future.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Psychonauts 2 uses a SAMPLE_DESC.Count of 0 for some things, which
previously was forcing it down the MSAA alignment placement path.
Found from playing a native D3D12 apitrace back and seeing
the log spam.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Since we added validation here for FH4, this crashes now as vkd3d-compiler passes a NULL shader_interface_info.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Consider we have declarations of CB0 of size 36 and CB1 of size 153.
Previously we'd just return the struct of CB0 when accessing CB1 because it came first as we didn't consider the size.
Psychonauts 2 indexes into CB1 by constant values above 36.
There is no reason a compiler could not eliminate these reads as it is technically out of bounds for the underlying array type.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Adds the "upload_hvv" config flag, which will make D3D12_HEAP_TYPE_UPLOAD attempt to use host-visible VRAM for allocations.
This takes advantage of large or resizable BAR if available.
I see a perf delta of 83-84 -> 92-94 (~12%) when using this in Horizon Zero Dawn.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
FloatControlProperties struct appears to be broken, and it does seem to
work just fine.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
WaveMatch and WaveMultiPrefix are implemented and pass test.
Other features are gated behind feature bits.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
From native testing, we can expose higher shader models if
cap bits features are not supported. E.g. Polaris exposes SM 6.5, even
when 16-bit and barycentrics are not supported.
With latest dxil-spirv updates we can support the required SM 6.4
features.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This was passing through flags of the root signature not the shader interface flags of it.
Need to get the shader interface flags of the root signature instead.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Goal here is to avoid unnecessary image layout transitions when render
passes toggle depth-stencil PSO states. Since we cannot know which
states a resource is in, we have to be conservative, and assume that
shader reads *could* happen.
The best effort we can do is to detect when writes happen to a DSV
resource. In this scenario, we can deduce that the aspect cannot be
read, since DEPTH_WRITE | RESOURCE state is not allowed.
To make the tracking somewhat sane, we only promote to OPTIMAL if an
entire image's worth of subresources for a given aspect is transitioned.
The common case for depth-stencil images is 1 mip / 1 layer anyways.
Some other changes are required here:
- Instead of common_layout for the depth image, we need to consult the
command list, which might promote the layout to optimal.
- We make use of render pass compatibility rules which state that we can
change attachment reference layouts as well as initial/finalLayout.
To make this change, a pipeline will fill in a
vkd3d_render_pass_compat struct.
- A command list has a dsv_plane_optimal_mask which keeps track
of the plane aspects we have promoted to OPTIMAL, and we know cannot
be read by shaders.
The desired optimal mask is (existing optimal | PSO write).
The initial existing optimal is inherited from the command list's
tracker.
- RTV/DSV/views no longer keep track of VkImageLayout. This is
unnecessary since we always deduce image layout based on context.
Overall, this shows a massive gain in HZD benchmark (RADV, 1440p ultimate, ~16% FPS on RX 6800).
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Idea is to keep track of scenarios where we know a resource's aspect is
known to be in a OPTIMAL state. Based on this, we can override the image
layout from the common_layout in order to avoid unnecessary full
barriers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For copies, we can always use the intended aspects, since we have
separate DS layouts now.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When clearing a DSV, we must get aliasing guarantees, so we must
transition away from UNDEFINED. This is only possible when using
separate_ds_layouts and for render pass clears we need to use
renderpass2 mechanisms to do this.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The clamp is absolute, not relative to baseMip. Also avoids validation
error and potential crash when LODClamp > numLevels.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Not correct, will need spec additions to handle it properly.
Fixes ground rendering in DIRT 5.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Lets calling code know if it should use ALLOW_VARYING_SUBGROUP_SIZE.
To avoid too much churn on pipeline caches, only add the flag when
needed.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
GravityBench ends up using ClearView with too large dimensions.
This is a validation error in Vulkan, so just clamp the extents.
To make full rect detection a bit more robust, do a range check instead
of memcmp().
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If we're doing a layout transition of depth-stencil aspects, we need to ensure all potential
accesses are made visible.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
- Honor resource barriers for resource states which cannot automatically
decay or promote. This includes COLOR_ATTACHMENT, UNORDERED_ACCESS and
VRS image. If SIMULTANEOUS_ACCESS is used, we can still promote, and
we handle that by setting common layout to GENERAL for these resources.
- Avoid redundant barriers in render passes since normal resource
barriers will always make sure we are already in
COLOR_ATTACHMENT_OPTIMAL.
- Do not force GENERAL layout if resource has UNORDERED_ACCESS flag set.
As this is not a promotable state, we have to explicitly transition
into it. I tested this on validation layers, where even COMMON state
refuses to promote to UAV state. The exception here of course is
SIMULTANOUS_ACCESS, but we handle that properly now.
- Verify that UAV or SIMULTANEOUS access is not used together with DSV
state. This is explicitly banned in the API docs.
- Actually emit image barriers. Batch the image transitions as that's
what D3D12 docs encourage app developers to do, and it also expects
that drivers can optimize this. Ensure that we respect the in-order
resource barrier rules by splitting batches if there are overlaps in
the transitions.
- Ensure that correct image layout is used when clearing a suspended
render pass attachment.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Avoid using the separate layouts if we're only using formats with one
aspects. This makes it more likely to match layouts with common layout,
and we can avoid awkward transition barriers.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The spec is pretty clear that it's invalid usage. Return E_INVALIDARG
like native drivers.
This is a workaround for the inventory GPU hang with Cyberpunk 2077
which is actually a game bug. Luckily the game handles this error
properly.
The problem is that the game always assume that an image with 2 mips
is smaller than the same image but with 6 mips. This is not always
true if the swizzle mode is different and a recent Mesa update changed
that. Then the game creates a D3D12 heap that is too small and this
triggered a memory violation and then a GPU hang with RADV.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
In control flow, we can force LOD 0.0 to avoid undefined result when
games sample with implicit LOD in non-quad uniform control flow.
Behavior on different implementations is:
- Helper lanes come to life and interpolate shader input.
- LOD is clamped to 0.0 in divergent control flow.
This hack is not safe in general, since we force 0.0 even when the
control flow is quad uniform.
This is the most practical solution for the problem for now.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Our internal copy shaders are fine, but we get benign errors about
sample count being wrong since we alias descriptors.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We cannot use the memory requirement output, since we will zero-clear
memory with a size that might be larger than the VkBuffer size.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Some games end up writing the wrong descriptor type when using null
descriptors, and to be robust against that, we have to clear out
all descriptors when creating null descriptors.
If we copy a null descriptor, we will also have to copy from all sets.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
In cases where acquire image is blocking, we should call that after
presentation to avoid latency when the app calls present.
This avoids weird inverse frame cadences with Mesa WSI right now,
as acquiring an image is always a blocking call until it is complete.
In cases when we aren't blocking, this kicks off the acquisition so
it can be waited upon by the next present blit pass.
Use another set of semaphores to wait for the image acquisition on the
GPU.
In the non-blocking vkAcquireNextImageKHR case, this means that a
potential bubble of time between waiting on the fence and submitting
the blit + presentation is eliminated.
Runaway presentation in this setup is avoided by frame latency objects
and normal frame latency which is always 3 according to documentation.
Be careful about handling SUBOPTIMAL. Semaphores will be signaled, but
we might want to tear down the swapchain. In these cases, we need to
wait for the semaphore to be signaled first, which can only be done by
submitting a wait, since QueueWaitIdle or DeviceWaitIdle don't cover
WSI.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Co-authored-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Documentation says that this should always be 3 without WAITABLE_OBJECT
unlike in D3D11 where it will use the DXGI device's frame latency.
This stops runaway presentations in the non-blocking acquire image case
with the new semaphore setup.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Fixes test TODOs. Apparently Vulkan drivers can saturate here, which
caused the TODO to appear, at least on AMD Windows.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Since we introduce side effects, avoid full late-Z for everything, which
is slow, and not necessarily correct either.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>