Straightforward, just converting to a renderpass as well. Note that
we now own the renderpass so I also added a bool to check if we own
it so we can destroy it after recording.
Doing the destruction at destroy & reset time, as reset can be called
during recording, and destroy all the time.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13721>
This is just the naive implementation that create a new renderpass
and then destroys it at the end.
I do it this way because in meta operations we are still creating
temporary subpasses for a renderpass for e.g. the resolve.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13721>
RGP expects shaders to be contiguous in memory, otherwise it explodes
because we have to generate huge captures with lot of holes.
This reduces capture sizes of Cyberpunk 2077 from ~3.5GiB to ~180MiB.
This should also help for future pipeline libraries.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13690>
If there is holes, eg. the application firsts set vertex attributes
0 and 1, then vertex attributes 0 and 7, the format of vertex attribute
1 is still the previous one, while it should be FORMAT_INVALID to avoid
a GPU hang.
This fixes a GPU hang with Yuzu.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5627
Cc: 21.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13856>
src/amd/vulkan/radv_cmd_buffer.c:3232:75: runtime error: index 252 out
of bounds for type 'uint8_t [128]'
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13951>
This was plain broken. The solution is to not require any framebuffer
changes. Silently skipping results in broken behavior. e.g:
(secondary cmdbuffer with no framebuffer)
ClearAttachment 2
translated into
bind attachment 2 as attachment 0 (skipped)
clear attachment 0
restore original bindings (skipped)
which results in clearing attachment 0, not what we wanted. It is
a small wonder CTS doesn't find it until VK_KHR_dynamic_rendering.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13699>
In the following scenario:
CmdBindPipeline()
CmdBindVertexBuffers()
CmdSetVertexInput()
CmdDraw()
CmdBindVertexBuffers()
CmdSetVertexInput()
CmdDraw()
The VBO won't be updated for the second draw because the state is
cleared when the dynamic state is emitted and the pipeline isn't dirty.
Found by inspection.
Cc: 21.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13855>
If the same cmdbuf is submitted more than once, they were waiting on
the same fence value. Fix this by clearing the value when beginning
a new command buffer.
This might fix spurious GPU hangs, especially on GFX9.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5401
Cc: 21.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13777>
It got introduced by "radv: optimize subpass barrier flushes for
imageless framebuffers" but never used in the final version.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13702>
Assumes cmd_buffer != NULL for this path to eliminate the checks in the generic code.
Benchmarks:
I made a microbenchmark based on Bas' vulkan microbench suite.
https://gitlab.freedesktop.org/bnieuwenhuizen/vulkan_microbench/-/merge_requests/1
In this benchmark, this improves vkDescriptorTemplateUpdate performance consistently by 36%.
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
DescriptorTemplateUpdate 81.2 ns 81.2 ns 8573169
->
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
DescriptorTemplateUpdate 52.9 ns 52.9 ns 13306065
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13342>
The driver should always know the attachments at this point. This
should reduce the number of L2 cache flushes for imageless framebuffers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13291>
If the application first uses nontrivial divisors, the driver emits
the vertex shader VA to the upload BO rather than directly via the
user SGPRs locations. But, if the vertex input dynamic state changes,
the driver might select a different VS prolog that no longer needs
nontrivial divisors.
In this case, the driver needs to re-emit the prolog inputs because
otherwise the VS prolog will jump to the PC that is emitted via the
user SGPR locations, and the previous one was somewhere in the
upload BO...
This fixes a GPU hang with Bioshock and Zink.
Fixes: d9c7a17542 ("radv: enable VK_EXT_vertex_input_dynamic_state")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13377>
Fix defect reported by Coverity Scan.
Resource leak (RESOURCE_LEAK)
leaked_storage: Variable prolog going out of scope leaks the storage it points to
Fixes: 80841196b2 ("radv: implement dynamic vertex input state using vertex shader prologs")
Suggested-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13402>
unnecessarily dereferencing the vertex buffer info array here causes a
ton of cpu overhead due to bad cache locality, so just use a mask to
avoid loading X more cachelines into memory unnecessarily
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13320>
when the shader pipeline is known to not require any of the more complex
calculations, those calculations can be excluded from the dynamic update
code
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13320>
Only try to invalidate L2 if we actually hit one of the incoherent images.
Note we may actually insert some extra flushes at the end of a command
buffer so that we may asume the caches are clean the start of the next
command buffer. However, on average I think that case is uncommon
enough that being able to make assumptions at the start of a cmdbuffer
is beneficial. Especially since MSAA is somewhat rare in more recent
games.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13239>
This lets us pre-compile a prolog and avoid a hash table lookup during
command buffer recording, most of the time.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11717>
This doesn't actually use the functionality or implement prolog
compilation yet.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11717>
From the VK_KHR_maintenance4 spec:
"Allow the application to destroy their VkPipelineLayout object
immediately after it was used to create another object. It is no
longer necessary to keep its handle valid while the created object
is in use."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13193>
This commit enables NGG culling on all GFX10.3 GPUs by default.
A new debug flag environment variable RADV_DEBUG=nonggc is added to
disable this feature on GPUs where it is enabled by default.
The previous perf test flag RADV_PERFTEST=nggc will not be needed on
GFX10.3 anymore but it can still be used to enable the feature on
GPUs where it isn't on by default.
Totals from 58239 (45.27% of 128647) affected shaders:
VGPRs: 1989752 -> 2049408 (+3.00%); split: -3.21%, +6.21%
SpillSGPRs: 675 -> 883 (+30.81%); split: -78.07%, +108.89%
CodeSize: 72205968 -> 153572764 (+112.69%)
LDS: 0 -> 227125248 (+inf%)
MaxWaves: 1614598 -> 1646934 (+2.00%); split: +3.08%, -1.08%
Instrs: 14202239 -> 29654042 (+108.80%)
Latency: 87986508 -> 136960419 (+55.66%); split: -0.23%, +55.89%
InvThroughput: 14444832 -> 21141875 (+46.36%); split: -0.01%, +46.37%
VClause: 340794 -> 493067 (+44.68%); split: -1.33%, +46.01%
SClause: 520983 -> 738636 (+41.78%); split: -0.25%, +42.03%
Copies: 775639 -> 2787382 (+259.37%)
Branches: 296911 -> 1225431 (+312.73%)
PreSGPRs: 1316896 -> 2057270 (+56.22%); split: -0.14%, +56.36%
PreVGPRs: 1473558 -> 1658432 (+12.55%); split: -1.44%, +13.99%
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13086>
If the SGPR loc is declared, the shader loads push constants.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13149>