Halo Infinite uses &desc->Width for total_bytes.
We can't set total_bytes early because code after this relies on desc->Width.
Signed-off-by: Robin Kertels <robin.kertels@gmail.com>
Guardians of the Galaxy hits this case. Fallback is to disable depth
attachment entirely in a fallback pipeline.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
The 16-byte requirement is kind of a lie. The real requirement is tied
to how vectorized load-store instructions are emitted in the shader
itself since I guess it allows compiler to assume something about
alignment of the base pointer.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
In d3d12, input element alignment needs to be the _minimum_ of 4 and the size of
the type. See the D3D11 spec, section 4.4.6, which behaves similarly:
https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#4.4.6%20Element%20Alignment
This is correctly taken into account when generating, e.g., the
vertex_buffer_stride_align_mask used for validation, but is not taken
into account when D3D12_APPEND_ALIGNED_ELEMENT is used to automatically
place input elements. Currently, vkd3d always assumes the alignment is
4.
This means that, for example, bytes or shorts should be packed tightly
together when D3D12_APPEND_ALIGNED_ELEMENT is used, but are instead
padded to 4 bytes.
Fixing this makes units appear in Age of Empires IV (see vkd3d-proton
issue #880 for examples.)
Signed-off-by: David Gow <david@ingeniumdigital.com>
Wine VKD3D version of my original commit.
Co-authored-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Robin Kertels <robin.kertels@gmail.com>
The Vulkan spec update 1.2.195 restricted these features to a very limited
format subset, and somehow this is supposed to not be an API break?
Anyway, let's follow the new rules.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
It's common enough that new games break on RDNA2 because of this that we
should enable this by default. This matches DXVK behavior.
SOTTR gets a special weird exception, just like DXVK. The shaders are
broken enough that the proper fix is actually precise, not invariant.
This will be addressed at some later point.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
This function fails if the counter overflows.
CP77 hits this case a lot and we should just warn the specific failure
instead of a random error.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Potentially reduces the size of the query map, and makes each entry
versioned so that we no longer have to clear the entire map for multiple
dispatches even if it is sparsely populated.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
If we need to fallback in both VRS and non-VRS scenarios, we need to key
on it. Fixes segfault in DIRT5 when toggling VRS.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If we don't find a clear association to an entry point,
we can also find it in the hit group.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
parameter_count == NumParameters for local RS since
hoisting is explicitly ignored for those.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
With RTPSOs we might have to create static sampler sets for local root
signatures. In this case we will have to create a compatible pipeline
layout which is equal to global pipeline layout, except for an extra
set.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Useful for test suite since a test can be comprised of several smaller
submissions, and it's easier to debug if we have one trace.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
If we deduce that fallback heap allocation is impossible, we will accept
this, and defer allocation to CreatePlacedResource() instead where we make a committed resource.
This breaks aliasing, but in practice, this situation will only arise for render
targets, and it's not like we have a choice in the matter here on NV :\
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
When allocating dedicated memory, ignore heap_flag requirements we
deduce from memory info. Any memory type is allowed. This is important
on NV when allocating fallback render targets.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
There are situations where we cannot fallback to system memory, so don't
log that we're going to do so.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Don't attempt to enter memory allocation when we can invalidate a heap
allocation up front. Avoids some dumb edge cases later.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Many UE4 games have this broken bloom shader that samples a texture with implicit lod in divergent control flow.
Fixes Bus Simulator 21
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Need to use fallback pipeline system here.
Keep track of active masks for PSO and current render target.
The intersection of those sets are the attachments which should be
active in the render pass.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Fix failure in test_create_heap where a TIER_2 host visible heap was
attempted, but failed due to recent DEATHLOOP fixes.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Windows returns E_INVALIDARG at least on AMD and Intel.
Psychonaughts 2 seems to use this as a de facto "do not create"
value, and reasonable vram usage depends on the call failing.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Game attempts to create a host visible resource with
ALLOW_RENDER_TARGET flag. We cannot make this work on NVIDIA, but the
game never seems to actually create an RTV, so as a workaround, nop out
the flag, which does make it work after all :3
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
For resizable BAR, we don't want to endlessly promote UPLOAD heaps to
BAR since VRAM is precious. The aim is to set a fixed budget where we
can keep allocating until full, at which point we fall back to plain HOST.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
With BAR budgets, what will happen is that
- Small allocation is requested
- A new chunk is requested
- try_suballocate_memory will end up calling allocate_memory, which
allocates a fallback memory type
- Subsequent small allocators will always end up allocating a new
fallback memory block, never reusing existing blocks.
- System memory is rapidly exhausted once apps start hitting against
budget.
The fix is to add flags which explicitly do not attempt to fallback
allocate. This makes it possible to handle fallbacks at the appropriate
level in try_suballocate_memory instead.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
We will need to consider some form of budgeting, so make sure that all
allocation and freeing is done in a central place.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
D3D12 validation layers complain if you try to map mipmapped 3D volumes
for ... some reason. The error is very explicit, so I assume it's
intentional :)
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Need to consider that based on host visibility requirements, we need to
select either LINEAR or OPTIMAL image types, and those tiling modes can
have different memory requirements.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Need to initialize the set mask so that copies happen properly
on default-initialized descriptors. Also, move the current_null_type to
metadata so that it's properly copied on descriptor copy.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
There are titles clearing the same descriptors constantly.
This leads to unnecessary updates that can become costly.
This commit introduces a new flag to track when D3D12 descriptors are
not null, and skips clearing them if they are already null.
Descriptors are assumed to be null by default.
This fixes a performance regression introduced by
9983a1720f
Signed-off-by: Rodrigo Locatti <rlocatti@nvidia.com>
Emitting render pass clears while we're in the process of starting
a render pass overrides dsv layout tracking info.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
D3D12 validation layer errors out, so unless we can prove that specific
behavior is relied upon, we should be okay to just ignore.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Get information directly from vkd3d_format and allow for subsampled
formats in the future.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Psychonauts 2 uses a SAMPLE_DESC.Count of 0 for some things, which
previously was forcing it down the MSAA alignment placement path.
Found from playing a native D3D12 apitrace back and seeing
the log spam.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Adds the "upload_hvv" config flag, which will make D3D12_HEAP_TYPE_UPLOAD attempt to use host-visible VRAM for allocations.
This takes advantage of large or resizable BAR if available.
I see a perf delta of 83-84 -> 92-94 (~12%) when using this in Horizon Zero Dawn.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
FloatControlProperties struct appears to be broken, and it does seem to
work just fine.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
WaveMatch and WaveMultiPrefix are implemented and pass test.
Other features are gated behind feature bits.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
From native testing, we can expose higher shader models if
cap bits features are not supported. E.g. Polaris exposes SM 6.5, even
when 16-bit and barycentrics are not supported.
With latest dxil-spirv updates we can support the required SM 6.4
features.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>