Commit Graph

2787 Commits

Author SHA1 Message Date
Philip Rebohle bb2e35c539 vkd3d: Use vkGetDevice{Buffer,Image}MemoryRequirementsKHR in vkd3d_memory_info_init.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-04-22 11:36:02 +02:00
Philip Rebohle d5ad5bb1de vkd3d: Use vkGetDeviceImageMemoryRequirementsKHR in vkd3d_get_image_allocation_info.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-04-22 11:36:02 +02:00
Philip Rebohle beb58f8472 vkd3d: Enable and require VK_KHR_maintenance4.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-04-22 11:36:02 +02:00
Hans-Kristian Arntzen 358f95aff2 vkd3d: Ignore cached SPIR-V if we're dumping SPIR-V.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-04-22 11:29:27 +02:00
Philip Rebohle 119e00ed45 vkd3d: Do not add uint format to image format list.
Fixes #1069.

Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-04-21 13:51:58 +02:00
Philip Rebohle beaedbd857 vkd3d: Use UAV clear fallback based on format compatibility.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-04-21 13:51:58 +02:00
Philip Rebohle 81927c5895 vkd3d: Fix handling of non-zero base layer in ClearUAV fallback path.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-04-21 13:51:58 +02:00
Philip Rebohle e7a6af4971 vkd3d: Use texel buffer views for UAV clears with buffer to image copy.
Allows this to more easily work with more formats.

Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-04-21 13:51:58 +02:00
Philip Rebohle a1d5e6f39a vkd3d: Re-add R11G11B10 format compatibility info.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-04-21 13:51:58 +02:00
Hans-Kristian Arntzen 5044975152 vkd3d: Drop redundant validate of PSO state blob from disk cache.
If we get an entry, it's implicitly validated.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-04-06 16:36:26 +02:00
Hans-Kristian Arntzen 8dc8b72807 cache: Add some performance information for shader cache operations.
They can take a long time and it's useful to have some reports here.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-04-06 16:36:26 +02:00
Hans-Kristian Arntzen ae0dafa3a1 cache: Attempt to use disk cache instead when appropriate.
When the disk cache is used, the cache we give back to applications is a
dummy. Therefore, try to use the disk cache blob if we detect a useless
application blob.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-04-06 16:36:26 +02:00
Hans-Kristian Arntzen 6c8542f7d6 vkd3d: Make use of internal pipeline library if we're asked to.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-04-06 16:36:26 +02:00
Hans-Kristian Arntzen 2dcb1e2efc cache: Implement an on-disk pipeline library.
With VKD3D_SHADER_CACHE_PATH, we can add automatic serialization of pipeline
blobs to disk, even for games which do not make any use of GetCachedBlob
of ID3D12PipelineLibrary interfaces. Most applications expect drivers to
have some kind of internal caching.

This is implemented as a system where a disk
thread will manage a private ID3D12PipelineLibrary, and new PSOs are
automatically committed to this library. PSO creation will also consult
this internal pipeline library if applications do not provide their own
blob.

The strategy for updating the cache is based on a read-only cache which
is mmaped from disk, with an exclusive write-only portion for new blobs,
which ensures some degree of safety if there are multiple
concurrent processes using the same cache.

The memory layout of the disk cache is optimized to be very efficient
for appending new blobs, just simple fwrites + fflush.
The format is also robust against sliced files, which solves the problem
where applications tear down without destroying the D3D12 device
properly.

This structure is very similar to Fossilize, and in fact the idea is to
move towards actually using the Fossilize format directly later.
This implementation prepares us for this scenario where e.g. Steam could
potentially manage the vkd3d-proton cache.

The main complication in this implementation is that we have to merge
the read-only and write caches.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-04-06 16:36:26 +02:00
Hans-Kristian Arntzen 3095ed84d3 cache: Add concept of internal pipeline libraries.
For internal pipeline libraries, we want a somewhat different strategy.

- PSOs are keyed by hash instead of user key.
- We want the option to conditionally store SPIR-V and PSO blobs.
  For internal caches, there isn't much of a reason to store PSO blobs
  since the disk cache is going to be primed anyways.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-04-05 14:12:20 +02:00
Hans-Kristian Arntzen db9b9a13de cache: Fix misleading comment about chunk alignment.
It's 8. Used to be 4 before some other fixes ...

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-04-05 14:12:20 +02:00
Hans-Kristian Arntzen 637834dc75 vkd3d: Make private_root_signatures actually private.
Makes sure that we drop private root signature device references when
public pipeline state refcount hits 0.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-04-05 14:12:20 +02:00
Hans-Kristian Arntzen ca0a186a4b common: Add some file utils.
Supports more advanced file operations than we'd normally need.
Intended to be used by magic disk cache.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-04-05 14:12:20 +02:00
Philip Rebohle 829c02bf90 vkd3d: Remove format compatibility info for R11G11B10.
Not allowing R32 views may give us compression back in some scenarios.

Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-04-05 11:52:23 +02:00
Philip Rebohle e4184830c5 vkd3d: Add ClearUAV path that uses buffer-to-image copies.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-04-05 11:52:23 +02:00
Philip Rebohle d1425ee4d1 vkd3d: Use VK_ACCESS_MEMORY_{READ,WRITE}_BIT where appropriate
Buggy RADV versions no longer work due to missing extension support.

Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-04-05 11:52:23 +02:00
Denis Barkar 8dda6df729 vkd3d: Force non-invariant position for Serious Sam 4.
Signed-off-by: Denis Barkar <dbarkar@nvidia.com>
2022-04-01 15:34:52 +02:00
Joshua Ashton 2ed513b99a vkd3d: Remove VKD3D_MAX_DYNAMIC_STATE_COUNT
This was off by one, at some point, which could cause a stack buffer overrun which is naughty.

Replace this with just an ARRAY_SIZE on the dynamic_state_list for the array size.

Signed-off-by: Joshua Ashton <joshua@froggi.es>
2022-04-01 15:19:18 +02:00
Hans-Kristian Arntzen 241078d7e8 vkd3d: Add scalar UBO layout requirement for SM 6.0.
Needed to support SM 6.0 CBufferLoad.
This path is mostly unused since it's opt-in in DXC and horribly broken
...

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-30 20:13:32 +02:00
Hans-Kristian Arntzen 6f43f450c8 vkd3d: Disable primitive restart when using non-compatible topologies.
Primitive restart is only used for strip primitive types, and must be
ignored for lists. Use and require extended_dynamic_state2 for this
purpose.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-30 16:12:16 +02:00
Hans-Kristian Arntzen cfeaa18b09 vkd3d: Enable MUTABLE_SINGLE_SET for Intel GPUs.
There are strict limits on number of descriptors which can be used,
and we have to use MUTABLE + single set to make this work.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-30 12:25:20 +02:00
Hans-Kristian Arntzen da63f0beac vkd3d: Compute range_end after sparse checks in copy tracking.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-30 12:13:25 +02:00
Philip Rebohle 6378f1b880 vkd3d: Optimize WriteBufferImmediate for consecutive writes.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-03-30 11:51:10 +02:00
Hans-Kristian Arntzen 2e8fb27182 vkd3d: Correctly handle dynamic depth/stencil attachment infos.
{depth,stencil}AttachmentFormat and p{Depth,Stencil}Attachment are only
allowed if the format contains that aspect. Check this explicitly.

Fixes some validation errors.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-24 17:55:32 +01:00
Hans-Kristian Arntzen 1b5f7e8fc3 vkd3d: Use VkImageViewCreateInfo correctly.
For EXTENDED_USAGE, we still need to restrict image usage when creating
concrete views.
Use VkImageViewUsageCreateInfo to restrict usage flags to the kind of
view we're creating.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-24 17:55:32 +01:00
Hans-Kristian Arntzen cf65a78570 vkd3d: Rename DSV UNKNOWN workaround query.
Make it more obvious what it's really trying to check.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-23 22:36:00 +01:00
Philip Rebohle 1d3957fe6d vkd3d: Do not create pipeline variants for NULL DSV.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-03-23 22:22:09 +01:00
Philip Rebohle c9abcfa656 vkd3d: Use d3d12_graphics_pipeline_state_has_unknown_dsv_format more consistently.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-03-23 22:22:09 +01:00
Hans-Kristian Arntzen 03427c6ee6 vkd3d: Explicitly use NULL RTV mask for dual source blending.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-23 14:29:51 +01:00
Hans-Kristian Arntzen 6273780e50 vkd3d: Accurately validate dual source blend state.
We need to check RTVFormats and IO signature.
If both RTVFormat uses non-null format and IO signature has an active
entry, we must fail compilation.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-23 14:29:51 +01:00
Hans-Kristian Arntzen 6e915dd2c0 vkd3d: Use rt_count as basis for binding RTVs.
Found some validation errors where rt_count != rtv_active_mask,
and blending used rt_count instead of rtv_active_mask. If shader renders
to a NULL attachment, we must make sure that it's part of the PSO
interface.

Also, use rt_count rather than active mask when beginning render pass.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-23 14:29:51 +01:00
Philip Rebohle 34f5fc6a31 vkd3d: Do not create pipeline variants for NULL RTVs.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-03-22 13:06:00 +01:00
Hans-Kristian Arntzen 63530501a5 vkd3d: Require VK_EXT_extended_dynamic_state.
This is basically required for not horrible stutter and performance and
is widely supported.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-16 17:48:21 +01:00
Hans-Kristian Arntzen dd6534f3f8 vkd3d: Report enabled debug ring size as INFO instead of WARN.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-11 13:26:27 +01:00
Hans-Kristian Arntzen 09997b4dd8 vkd3d: Fish for message clues on device lost.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-11 13:26:27 +01:00
Hans-Kristian Arntzen 6d35f98e59 vkd3d: Emit deadca7 cookie for num_words in debug ring.
Makes it somewhat feasible to fish for message begin codes in the
stream.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-11 13:26:27 +01:00
Hans-Kristian Arntzen e61cc0234a vkd3d: Allow debug ring to know about device lost scenarios.
For this case, we want to block and teardown the debug ring thread.
It's okay to fish for dead messages in the ring, since we know there
won't be more GPU work submitted.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-11 13:26:27 +01:00
Hans-Kristian Arntzen c54895b4b7 vkd3d: Fix overflow of ring_size.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-11 13:26:27 +01:00
Hans-Kristian Arntzen a6700d3d85 vkd3d: Make debug ring aware of potential crash scenarios.
If we expect device losts (breadcrumb debug), we need to use DEVICE uncached/coherent,
since we might not be able to flush GPU caches properly.

We also need to remove the idea of being able to copy out the control
block back to host. This is too brittle and we should instead just place
the control block in PCI-e BAR instead. Rethink how we pass messages
from GPU to CPU to make it more robust.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-11 13:26:27 +01:00
Hans-Kristian Arntzen 33b9166fec vkd3d: Make device coherency extension optional for breadcrumbs.
Some implementation can support marker, but not explicit coherency.
Buffer markers are often uncached either way, so should be fine ...

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-11 13:07:56 +01:00
Hans-Kristian Arntzen 972ce74ac6 vkd3d: When using breadcrumbs, consider that WaitSemaphore can be buggy.
Spec says that in device lost, driver must return DEVICE_LOST in finite
time, but this does not happen on NV drivers. Use a long timeout instead
in this scenario.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-11 13:07:56 +01:00
Robin Kertels 5f97d1eb70 vkd3d: Implement NV_checkpoint path for breadcrumbs.
Signed-off-by: Robin Kertels <robin.kertels@gmail.com>
Co-authored-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-11 13:07:56 +01:00
Robin Kertels a6ea442819 vkd3d: Enable VK_NV_device_diagnostic_checkpoints.
Signed-off-by: Robin Kertels <robin.kertels@gmail.com>
2022-03-11 13:07:56 +01:00
Hans-Kristian Arntzen 365dd05557 vkd3d: Add breadcrumbs support.
AMD path for this commit.
Idea is that we can automatically instrument markers with command list
information we can make some sense of in vkd3d-proton.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-11 13:07:56 +01:00
Hans-Kristian Arntzen 5017b3723c vkd3d: Enable VK_AMD_device_coherent_memory.
For breadcrumbs support, along with buffer marker.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-11 13:07:56 +01:00
Hans-Kristian Arntzen 6a4f2842cb cache: Move d3d12_pipeline_library to internal references.
Allow us to hold internal magic pipeline libraries without creating
cycles.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-11 12:29:32 +01:00
Hans-Kristian Arntzen 18a5315db4 cache: Refactor lock strategy of internal hashmaps.
Rather than having to take writer lock on serialize calls from the
outside, we should just take locks when accessing the internal hashmaps
instead.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-11 12:29:32 +01:00
Hans-Kristian Arntzen 7c228139c3 cache: Refactor out pipeline library serialization.
If outer code has taken a reader lock, we don't need to lock again.
Also allows a reader lock to go GetSerializedSize + Serialize with one
reader lock.

This will be relevant for magic cache implementation.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-11 12:29:32 +01:00
Hans-Kristian Arntzen 30b4abcea1 vkd3d: Do not discard images in Clear*View() unless we have to.
It's redundant to add an UNDEFINED transition here for committed
resources. We need it for sparse and placed resources to handle aliasing
rules, but that's it.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-10 15:14:55 +01:00
Hans-Kristian Arntzen 17b1ffb41a vkd3d: Add path to use GENERAL depth-stencil images.
On some implementations, it doesn't matter for performance what we use,
and we can avoid a lot of ugly barriers this way.

Opt-in to use this extensions on GPUs we know handles it well,
otherwise, keep using the tracking paths.

With VK_KHR_dynamic_rendering, this is now feasible to do since we no longer
have to deal with shenanigans related to VkRenderPass layouts and
complicated compatibility rules.

To make this work with the existing framework, just need to consider
that GENERAL can be a common layout alongside DEPTH_STENCIL_OPTIMAL,
which are both common layouts that do not need to be tracked at all.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-10 15:14:55 +01:00
Hans-Kristian Arntzen f9da3bf564 vkd3d: Add VK_KHR_driver_properties.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-10 15:14:55 +01:00
Hans-Kristian Arntzen c6149b47cd cache: Handle ref-count rules for multiple LoadPipeline/StorePipeline.
In pipeline libraries, the library holds on to private references of the
libraries so that they can be rapidly loaded on-demand.

This behavior is verifed by API tests.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-09 18:35:09 +01:00
Hans-Kristian Arntzen cc08339624 vkd3d: Use internal_refcounts for pipeline state.
When we store pipeline state in libraries we have to manage lifetime a
bit differently, which requires internal refcounts of some sort.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-09 18:35:09 +01:00
Hans-Kristian Arntzen 422f6804fb vkd3d: Enable VK_KHR_create_renderpass2.
Required extension by VK_KHR_fragment_shading_rate and
VK_KHR_separate_depth_stencil_layouts, but we don't care about enabling
any features or use it directly.

Needed to silence validation errors.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-09 16:35:05 +01:00
Georg Lehmann 14a06680d9 vkd3d: Remove unused renderpass remains.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
2022-03-08 18:34:18 +01:00
Hans-Kristian Arntzen 409dc57645 vkd3d: Properly decay depth-stencil images.
When performing a decay of a DSV resource, make sure to transition all
subresources, not just the particular aspect being transitioned.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-08 18:11:50 +01:00
Hans-Kristian Arntzen b330900659 vkd3d: Do not transition all aspects for single subresource.
We require separate DS layouts.
Fixes validation errors where we transition from read-only, but our
neighbor aspect might have been optimal.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-08 18:11:50 +01:00
Philip Rebohle 9a408367dc vkd3d: Remove render pass cache.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-03-08 17:44:47 +01:00
Philip Rebohle 51e6b2bbbe vkd3d: Remove render pass from command list state.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-03-08 17:44:47 +01:00
Philip Rebohle 94f82d1085 vkd3d: Get rid of pipeline variant flags.
These only existed for VRS attachment, which is no longer
necessary with VK_KHR_dynamic_rendering.

Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-03-08 17:44:47 +01:00
Philip Rebohle 1a68267962 vkd3d: Remove framebuffer list from d3d12_command_allocator.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-03-08 17:44:47 +01:00
Philip Rebohle c4f88951fc vkd3d: Use dynamic rendering for regular draw calls.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-03-08 17:44:47 +01:00
Philip Rebohle 9673ac173d vkd3d: Use dynamic rendering for pipeline creation.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-03-08 17:44:47 +01:00
Philip Rebohle 3783eaf4f7 vkd3d: Implement swap chain blits using dynamic rendering.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-03-08 17:44:47 +01:00
Philip Rebohle 024ef02f9b vkd3d: Implement meta image copies using dynamic rendering.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-03-08 17:44:47 +01:00
Philip Rebohle 549d4ee63f vkd3d: Remove render pass list from d3d12_command_allocator.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-03-08 17:44:47 +01:00
Philip Rebohle 6186cc1f0e vkd3d: Implement clears using dynamic rendering.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-03-08 17:44:47 +01:00
Philip Rebohle 2c92ab7d1e vkd3d: Enable and require VK_KHR_dynamic_rendering.
Signed-off-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
2022-03-08 17:44:47 +01:00
Hans-Kristian Arntzen 9fbae668fe vkd3d: Ensure that all SPIR-V modules are properly cached.
When we require inter-stage fixups, we need a solution for partial
validity of the cache. Accept the modules all or nothing.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-08 16:43:30 +01:00
Hans-Kristian Arntzen ce45297695 vkd3d: Enable debug_utils if vk_debug is enabled.
Allows debug callbacks to go through in Wine.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-08 16:40:51 +01:00
LemiSt24 c411d0d0c2 vkd3d: Add case for D3D12_STATE_SUBOBJECT_TYPE_GLOBAL_ROOT_SIGNATURE
Signed-off-by: LemiSt24 <lennard.strohmeyer@gmail.com>
2022-03-07 16:15:22 +01:00
Hans-Kristian Arntzen 9a63df07b8 vkd3d: Add punchthrough path for descriptor copies.
Proves out the viability of this style of implementation. Ideally we'd
have a more officially sanctioned way of doing similar things later :)

Unfortunately, the overhead removal is too great to ignore on target
platform. Makes use of a private (reserved) extension for now ...

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-03-04 13:34:18 +01:00
Mike Blumenkrantz 1d76803aff vkd3d: optimize memory access pattern for sampler descriptors
this removes them from the bitscan path

Signed-off-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
2022-03-01 22:50:45 +01:00
Hans-Kristian Arntzen dc622fc715 vkd3d: Recycle command pools in Elden Ring.
Very churny.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 18:40:52 +01:00
Hans-Kristian Arntzen 9817c52d24 vkd3d: Add workaround to ignore mismatch driver/device in PSO library.
Elden Ring does not detect the proper error code and create a new
pipeline library. Instead, create a fresh new library, which works
around the issue.

The game has a pattern of LoadPipeline -> if fail -> CreatePSO ->
StorePipeline. Sometimes, in the same process it will LoadLibrary from
its own cache (could explain some stutters),
so it's very useful to have this either way.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 14:50:57 +01:00
Hans-Kristian Arntzen a8229390f9 vkd3d: Add more pipeline_library_log snippets.
Hook GetCachedBlob and various attempts to use LoadPipeline.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 14:50:57 +01:00
Hans-Kristian Arntzen 12c73ee18a swapchain: More gracefully handle SURFACE_LOST.
Just like handling min/maxImageExtent of 0, we can just fall back to
user buffers.

Elden Ring hits this case on application teardown.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 14:04:06 +01:00
Hans-Kristian Arntzen f39ece9a7c vkd3d: Enable performance workarounds for Elden Ring.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 13:59:08 +01:00
Hans-Kristian Arntzen c19eaac376 vkd3d: Add VKD3D_CONFIG option for command pool recycling.
Normal behaving apps should not benefit from any of this.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 13:59:08 +01:00
Hans-Kristian Arntzen 54fbadcc94 vkd3d: Recycle command pools.
Elden Ring in particular spam frees and allocates command pools despite
this being a very bad idea.

Add a simple 8-entry cache which seems to take care of it.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 13:59:08 +01:00
Hans-Kristian Arntzen 4b07535909 vkd3d: Optimize memory access pattern for single descriptor copies.
We can mark a descriptor as being SINGLE_DESCRIPTOR, which means we
only need one descriptor copy. This way, we can avoid doing somewhat
expensive work (every nanosecond counts here):

- Bitscan loop
- Read deep into d3d12_device guts (often a cache miss). The memory
  index depends on the bitscan, which causes bubble.

When we have a single descriptor, we can just store the binding
information inline and avoid this jank.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 13:04:43 +01:00
Hans-Kristian Arntzen 84d632f194 vkd3d: Rewrite memory layout for resource descriptors.
Tune memory layout so that we can deduce various information without
making a single pointer dereference:

- d3d12_descriptor_heap*
- heap offset
- Pointer to various side data structures we need to keep around.

Instead of having one big 64 byte data structure with tons of padding,
tune it down to 32 + 8 bytes per descriptor of extra dummy data.

To make all of this work, use a somewhat clever encoding scheme for CPU
VA where lower bits store number of active bits used to encode
descriptor offset. From there, we can mask away bits to recover
d3d12_descriptor_heap. Metadata is stored inline in one big allocation,
and we can just offset from there based on extracted log2i_ceil(descriptor count).

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 13:04:43 +01:00
Hans-Kristian Arntzen b309913b6d vkd3d: Use unsafe_impl in CopyDescriptorsSimple.
This is an ultra-hot path and seems to show up somehow on profile.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-25 13:04:43 +01:00
Hans-Kristian Arntzen c29d005ef4 vkd3d: Don't enable fast descriptor copy path for descriptor QA.
The hooks are in the generic function.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-24 16:42:00 +01:00
Hans-Kristian Arntzen 8a46c21254 vkd3d: Add VKD3D_CONFIG to skip memory allocator clears.
For cases where games spam committed allocations and don't use
NOT_ZEROED. We still rely on zerovram behavior for initial backing which
should be enough in most cases.

Strictly speaking however, we are forced to clear the allocations every
time if application does not use the flag correctly.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-24 12:52:05 +01:00
Hans-Kristian Arntzen 76ca492a39 vkd3d: Add some debug logging for when clear passes happen.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-24 12:52:05 +01:00
Hans-Kristian Arntzen 83c4e62660 vkd3d: Bump suballocation limit to 2 MiB.
This is a more principled limit since that's the huge page size.

Avoids some allocation spam.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-24 12:14:22 +01:00
Hans-Kristian Arntzen 4bea653504 vkd3d: Fix CopyTiles for suballocated linear resources.
Forgot to offset buffer offset. Fun!
Found when bumping VA allocation limit to 2 MiB instead of 1 MiB.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-24 12:14:22 +01:00
Hans-Kristian Arntzen edbf49aad4 vkd3d: Support opt-in to single MUTABLE set.
Useful for Intel since Intel hardware cannot support more than 1M
descriptors in general, and opting in to correct behavior should improve
CPU overhead as well when copying descriptors.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-21 17:08:25 +01:00
Hans-Kristian Arntzen e0af8f2810 vkd3d: Make error message for buffer alignment more direct.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-21 16:37:12 +01:00
Hans-Kristian Arntzen b066e72243 swapchain: Add env-var to override swapchain images.
For perf debug mostly.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-21 16:36:36 +01:00
Hans-Kristian Arntzen 15704b2419 vkd3d: Optimize descriptor copies for common code paths.
The common path that we really need to optimize for is CBV_SRV_UAV +
Simple + 1 descriptor.

Descriptor benchmark shows an almost 50% reduction in overhead now.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-21 16:35:36 +01:00
Hans-Kristian Arntzen c725c29bb6 vkd3d: Inline query for set/binding from set_index.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-21 16:35:36 +01:00
Hans-Kristian Arntzen 2f6a91e772 vkd3d: De-virtualize query for descriptor size.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-21 16:35:36 +01:00
Hans-Kristian Arntzen 1cc8afcc8e vkd3d: Fix potential crashes when VK_KHR_dynamic_rendering is added.
Checking for pNext here is too brittle and causes crashes when dynamic
rendering path is added.
Also need to chain in existing pNexts.

Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
2022-02-17 11:27:25 +01:00