Commit Graph

381 Commits

Author SHA1 Message Date
Samuel Pitoiset ba5e25ed29 radv: ignore pInheritanceInfo for primary command buffers
From the Vulkan spec:
"If this is a primary command buffer, then this value is ignored."

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-26 10:39:43 +02:00
Samuel Pitoiset 07cb1373a2 radv: fix HTILE metadata initialization in presence of subpass clears
If the driver ends up by performing a slow depthstencil clear,
the HTILE metadata won't be initialized correctly.

This fixes random VM faults on Polaris while running CTS
with Bas's runner. This doesn't seem to regress performance.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-25 17:38:59 +02:00
Samuel Pitoiset 41f6096c26 radv: use EOP_DATA_SEL_* instead of magic numbers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-21 10:31:02 +02:00
Samuel Pitoiset 65b3fed037 radv: always initialize the clear depth/stencil values to 0
Similar to the clear color values.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-20 13:21:42 +02:00
Samuel Pitoiset 204cf5714a radv: always initialize the clear color values to 0
Having random data in there is probably not the best.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-20 13:21:42 +02:00
Samuel Pitoiset 4b564bd612 radv: always initialize the DCC predicate to FALSE
This might eventually skip some useless DCC decompression
passes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-20 13:21:42 +02:00
Bas Nieuwenhuizen ed06b1cdca radv: Don't check for pipeline being set in draw.
Draws without pipeline are definitely not allowed.

Fixes Coverity issue 1434216.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-06-19 22:35:13 +02:00
Marek Olšák 6703fec58c amd,radeonsi: rename radeon_winsys_cs -> radeon_cmdbuf
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-19 13:08:50 -04:00
Bas Nieuwenhuizen a3d93eec7c radv: Use less conservative approximation for context rolls.
Drops the number of time we set the scissor by 4x for F1 2017,
which results in a consistent performance improvement of about 4%.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-06-18 16:21:10 +02:00
Samuel Pitoiset f8e2c4c57c Revert "radv: always set/load both depth and stencil clear values"
This fixes a rendering regression with RoTR.

This reverts commit 4bdad9fadd.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 16:52:06 +02:00
Samuel Pitoiset 706d51de7f radv: update ZRANGE_PRECISION in radv_update_bound_fast_clear_ds()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:54:06 +02:00
Samuel Pitoiset fa8bc821a8 radv: clean up radv_{set,load}_depth_clear_regs() helpers
And replace _regs by _metadata because it makes more sense.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:54:04 +02:00
Samuel Pitoiset 4bdad9fadd radv: always set/load both depth and stencil clear values
I don't think that matter much to emit both values and that
makes the code a bit simpler.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:54:02 +02:00
Samuel Pitoiset 2193a6a828 radv: update the fast ds clear values only if the image is bound
It's unnecessary to update the fast depth/stencil clear values
if the fast cleared depth/stencil image isn't currently bound.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:54:00 +02:00
Samuel Pitoiset be794fa26b radv: clean up radv_{set,load}_color_clear_regs() helpers
And replace _regs by _metadata because it makes more sense.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:53:58 +02:00
Samuel Pitoiset d7b772abb4 radv: update the fast color clear values only if the image is bound
It's unnecessary to update the fast color clear values if the
fast cleared color image isn't currently bound.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:53:55 +02:00
Samuel Pitoiset 68dead112e radv: update the ZRANGE_PRECISION value for the TC-compat bug
On GFX8+, there is a bug that affects TC-compatible depth surfaces
when the ZRange is not reset after LateZ kills pixels.

The workaround is to always set DB_Z_INFO.ZRANGE_PRECISION to match
the last fast clear value. Because the value is set to 1 by default,
we only need to update it when clearing Z to 0.0.

We also need to set the depth clear regs and to update
ZRANGE_PRECISION when initializing a TC-compat depth image to 0.

Original patch from James Legg.

This fixes random CTS fails with
dEQP-VK.renderpass.suballocation.formats.d32_sfloat_s8_uint.input.*

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105396
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-14 11:38:29 +02:00
Alex Smith dfff9fb6f8 radv: Handle GFX9 merged shaders in radv_flush_constants()
This was not previously handled correctly. For example,
push_constant_stages might only contain MESA_SHADER_VERTEX because
only that stage was changed by CmdPushConstants or
CmdBindDescriptorSets.

In that case, if vertex has been merged with tess control, then the
push constant address wouldn't be updated since
pipeline->shaders[MESA_SHADER_VERTEX] would be NULL.

Use radv_get_shader() instead of getting the shader directly so that
we get the right shader if merged. Also, skip emitting the address
redundantly - if two merged stages are set in push_constant_stages
this change would have made the address get emitted twice.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-01 08:53:34 +01:00
Alex Smith 7ca0167ae9 radv: Consolidate GFX9 merged shader lookup logic
This was being handled in a few different places, consolidate it into a
single radv_get_shader() function.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-01 08:53:31 +01:00
Bas Nieuwenhuizen 38933c1151 radv: Add option to print errors even in optimized builds.
Errors are not that common of a case so we can eat a slight perf
hit in having to call a function and do a runtime check.

In turn this makes debugging random errors happening for end users
easier, because they don't have to have a debug build on hand.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-31 11:51:23 +02:00
Samuel Pitoiset 02c7916298 radv: fix emitting descriptor pointers with LLVM < 7
This was terribly wrong, I forced use of 32-bit pointers when
emitting shader descriptor pointers. This fixes GPU hangs with
LLVM 5&6 because 32-bit pointers are only supported with LLVM 7.

Fixes: 88d1ed0f81 ("radv: emit shader descriptor pointers consecutively")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-30 11:38:54 +02:00
Samuel Pitoiset 88d1ed0f81 radv: emit shader descriptor pointers consecutively
This reduces the number of SET_SH_REG packets which are emitted
for applications that use more than one descriptor set per stage.

We should be able to emit more SET_SH_REG packets consecutively
(like push constants and vertex buffers for the vertex stage),
but this will be improved later.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-29 10:07:18 +02:00
Bas Nieuwenhuizen a29bc043ae radv: Implement VK_KHR_draw_indirect_count.
Literally the same as the AMD ext.

Passes *indirect_draw_count* CTS tests.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-28 12:08:26 +02:00
Bas Nieuwenhuizen 6914d5a2c0 radv: Implement alternate GFX9 scissor workaround.
This improves dota2 performance for me by 11% when I force the
GPU DPM level to low (otherwise dota2 is CPU limited for 4k on my
threadripper), which should be a large part of the radv-amdvlk gap.
(For me with that was radv 60.3 -> 66.6, while AMDVLK does about 68
fps)

It looks like dota2 rendered the GUI with a bunch of draws with
a SetScissors before almost each draw, causing a lot of pipeline
stalls.

I'm not really happy with the duplication of code, but overriding
radeon_set_context_reg would also be messy since we have the
pre-recorded pipelines and a bunch of si_cmd_buffer code, as well
as some memory->context reg loads for which things would be more
complicated.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-28 12:04:25 +02:00
Samuel Pitoiset 36a4d6d081 radv: add support for 32-bit pointers in user data SGPRs
We still use 64-bit GPU pointers for all ring buffers because
llvm.amdgcn.implicit.buffer.ptr doesn't seem to support 32-bit
GPU pointers for now. This can be improved later anyways.

Vega10:
Totals from affected shaders:
SGPRS: 1008722 -> 1026710 (1.78 %)
VGPRS: 706580 -> 707136 (0.08 %)
Spilled SGPRs: 22555 -> 22209 (-1.53 %)
Spilled VGPRs: 75 -> 75 (0.00 %)
Code Size: 34819208 -> 35202140 (1.10 %) bytes
Max Waves: 175423 -> 175086 (-0.19 %)

Polaris10:
Totals from affected shaders:
SGPRS: 1029849 -> 1036517 (0.65 %)
VGPRS: 709984 -> 708872 (-0.16 %)
Spilled SGPRs: 22672 -> 22309 (-1.60 %)
Spilled VGPRs: 82 -> 66 (-19.51 %)
Scratch size: 76 -> 60 (-21.05 %) dwords per thread
Code Size: 34915336 -> 35309752 (1.13 %) bytes
Max Waves: 151221 -> 151677 (0.30 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:22 +02:00
Samuel Pitoiset 0d1406ad12 radv: allocate the upload BO in the 32-bit addr space
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-22 15:53:17 +02:00
Samuel Pitoiset fcba3934fc radv: add radv_emit_shader_pointer() helper
For future work (support for 32-bit GPU pointers).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-17 21:28:59 +02:00
Grazvydas Ignotas 4fdce205dd radv: assorted typo fixes
Trivial.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-10 11:50:46 +03:00
Samuel Pitoiset 1d766b0196 radv: only disable out-of-order rast for perfect occlusion queries
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-02 10:33:22 +02:00
Samuel Pitoiset 5c1233ed62 radv: use a global BO list only for VK_EXT_descriptor_indexing
Maintaining two different paths is annoying but this gets
rid of the performance regression introduced by the global
BO list.

We might find a better solution in the future, but for now
just keeps two paths.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-20 16:18:18 +02:00
Samuel Pitoiset 7bd5367546 Revert "radv: Don't store buffer references in the descriptor set."
In order to reduce a performance regression introduced by
4b13fe55a4 ("radv: Keep a global BO list for VkMemory."),
we are going to maintain two different paths.

One when VK_EXT_descriptor_indexing is enabled by the
application because we need to have a global BO list, and
one (the old one) when it's not enabled.

With Talos on Polaris, the global BO list reduces performance
by 10% which is too much for me.

This reverts commit ab6cadd3ec.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-20 16:18:13 +02:00
Bas Nieuwenhuizen d1ce31d36c radv: Add bound checking workaround for dynamic buffers.
I have seen a few applications and games do the dynamic buffer bounds incorrectly, this
make it easier to work around, e.g. for debugging.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-19 16:13:25 +02:00
Bas Nieuwenhuizen ab6cadd3ec radv: Don't store buffer references in the descriptor set.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-18 22:56:54 +02:00
Samuel Pitoiset 7e84d69861 radv: handle CMASK/FMASK transitions only if DCC is disabled
DCC implies a fast-clear eliminate, so I think this sounds
reasonable.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:20:59 +02:00
Samuel Pitoiset 584d1f2711 radv: merge radv_handle_{dcc,cmask}_image_transition() functions
Into radv_handle_color_image_transition().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:20:56 +02:00
Samuel Pitoiset d5812b900b radv: add radv_init_color_image_metadata() helper
In order to separate initialization from decompression. In the
future, that will allow us to init DCC/FMASK/CMASK in one shot.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:20:54 +02:00
Samuel Pitoiset fde7b90ecf radv: make radv_initialise_cmask() static
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:20:51 +02:00
Samuel Pitoiset 790f6e4718 radv: clean up radv_handle_image_transition() a bit
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:20:49 +02:00
Samuel Pitoiset 6967d32beb radv: add radv_handle_color_image_transition() helper
To handle CMASK, FMASK and DCC transitions in the same place.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:20:45 +02:00
Samuel Pitoiset c6b1f1c97a radv: handle DCC image transitions before CMASK/FMASK transitions
Mostly because DCC implies a fast-clear eliminate and we
should be able to skip some DCC decompressions by setting
a predicate like for CMASK and FMASK.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-16 14:20:42 +02:00
Bas Nieuwenhuizen ed94638156 radv: Enable RB+ where possible.
According to Marek, not enabling it on Stoney has a significant
negative performance impact. (And I guess this might impact
performance on Raven as well)

The register settings are pretty much copied from radeonsi. I did
not put this in the pipeline as that would make the pipeline more
dependent on the format which mean we would have to have more
pipelines for the meta shaders.

v2: Don't clear RB+ regs if not enabled as the CLEAR_STATE packet
    does already.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-11 01:19:10 +02:00
Samuel Pitoiset 9f6a28eb27 radv: add shader BOs to the list at pipeline bind time
Otherwise, the shader BOs are not added to the list on SI because
prefetching isn't supported. Calling radv_cs_add_buffer() in the
prefetch codepath was a bad idea.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105952
Fixes: 4ad7595f35 ("radv: rename radv_emit_prefetch() to radv_emit_prefetch_L2")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Turo Lamminen <turo@alternativegames.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-10 21:55:28 +02:00
Bas Nieuwenhuizen 41fbcc7901 radv: Always reset draw user SGPRs after secondary command buffer.
As we sometimes reset them to -1, -1 does not mean that they are
not written by the secondary command buffer.

Fixes: ad11fc3571 "radv: don't emit unneeded vertex state."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-09 23:04:42 +02:00
Bas Nieuwenhuizen 74b0b869dd radv: Don't set instance count using predication.
The packet can sometimes be skipped, but we still think the change takes effect.

This just makes the packet always take effect.

Fixes: ad11fc3571 "radv: don't emit unneeded vertex state."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105942
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-09 23:04:35 +02:00
Samuel Pitoiset 04e609f1f8 radv: fix prefetching of vertex shader and VBOs on SI
Forgot one check... Too many mistakes for a simple change.

Fixes: f1d7c16e85 ("radv: fix prefetching compute shaders on CIK and older chips")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-09 16:14:12 +02:00
Samuel Pitoiset 0fc9113ac5 radv: add radv_image_has_{cmask,fmask,dcc,htile}() helpers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-09 11:21:10 +02:00
Samuel Pitoiset f882c62218 radv: add radv_clear_{cmask,dcc} helpers
They will help for DCC MSAA textures and if we support mipmaps
in the future.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-09 11:21:05 +02:00
Samuel Pitoiset 8f9f62c2db radv: don't pass the pipeline to radv_flush_constants()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-06 19:46:27 +02:00
Samuel Pitoiset 2bd50cceff radv: rename radv_cmd_buffer_update_vertex_descriptors()
... to radv_flush_vertex_descriptors().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-04-06 19:46:23 +02:00
Samuel Pitoiset e829a0cc1e radv: do not try to skip draw calls when VBOs upload failed
This is unnecessary because we record an error which should
be returned by vkEndCommandBuffer(), and the app shouldn't
submit a command buffer when this happens.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-06 19:46:21 +02:00