Decreasing the time spent in radeon_cs_memory_below_limit is the motivation.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8794>
This actually introduced some VRS related regressions and some others.
Fixes: cc5b6a0e89 ("radv: enable TC-compat HTILE with D32S8 and MSAA on GFX9+")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8777>
Mipmaps+layers should be investigated and mipmaps support added
for previous gens (GFX8-9).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8124>
This is the main function that enables/disables HTILE for mipmaps.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8124>
This can probably be optimized further by checking if the levels
are contiguous in memory.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8124>
It's completely useless to decompress or resummarize levels that
are not compressed using HTILE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8124>
With HTILE mipmaps support, we should check if the base level
currently in use supports compression.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8124>
This doesn't change behevior since the driver currently doesn't
support HTILE for mipmaps and also because we can only clear the
whole array layers at once. This improves consistency regarding
the fast clear color path.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8124>
GFX10 can only compress the first level in the mip tail.
GFX9+ is not yet supported because mips are interleaved.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8124>
On Fiji, the CTS image can cause a hang when these are enabled.
Let's enable them for Polaris and newer only, for now.
Gitlab: #4136
Fixes: 9f43b44bf0
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8646>
On GFX6, EOS events are always emitted with EVENT_WRITE_EOS.
On GFX7+, EOS events are emitted with EVENT_WRITE_EOS on the
graphics queue, and with RELEASE_MEM on the compute queue.
Fixes: 9c65f1f111 ("radv: synchronize Cmd{Set,Write}Event() using PS_DONE/CS_DONE events")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8710>
MSVC has an undocumented feature that can act as GCC weak functions.
Also fix warnings about returning a value from void functions.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7793>
D and linear are both DISPLAY micro tiling according to ac_surface
but don't work together. This fixes an issue with GFX9+.
This fixes the SkQP WritePixelsNonTexture_Gpu test.
Fixes: 69ea473eeb ("amd/addrlib: update to the latest version")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8665>
This is probably rarely used but it can be easily implemented now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8650>
We were incorrectly shifting the input VGPRs for the instance ID
for chips affected by the LS VGPR init bug (ie. Vega10 and Raven).
When there is no HS threads, the hardware loads the LS VGPR
starting from VGPR 0, so they should be shifted by two.
This fixes some sort of vertex explosion with Squad, Visage, Barn
Finders and probably more titles that use tessellation. Note that
only Vega10 and Raven were affected by this bug.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4129
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3311
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Diego Viola <diego.viola@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8694>
Previously, we might not have required all coordinates to be in WQM if
there were other args before them. We should probably also require that
the offset is in WQM.
fossil-db (GFX10.3):
Totals from 10053 (7.21% of 139391) affected shaders:
SGPRs: 911032 -> 911048 (+0.00%); split: -0.00%, +0.00%
VGPRs: 689856 -> 688412 (-0.21%); split: -0.26%, +0.05%
CodeSize: 84151460 -> 84140396 (-0.01%); split: -0.02%, +0.01%
MaxWaves: 77526 -> 77527 (+0.00%)
Instrs: 15972106 -> 15971521 (-0.00%); split: -0.01%, +0.01%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4153
Fixes: 4015b3651a ("aco: only require texture coordinates to be in WQM if NSA is used")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8693>
This should be supported. Note that CTS doesn't have tests for
sparseImageFloat32Atomics.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8643>
For depth/stencil images, the driver was decompressing both aspects
while it should be enough to only decompress the one that's going
to be resolved.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8561>
We used to select the stencil layout even if we should have selected
the depth/stencil one.
Fixes: e4c8491bdf ("radv: implement VK_KHR_separate_depth_stencil_layouts")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8552>
With RADV_THREAD_TRACE_BUFFER_SIZE=1073741824, the computed size
will overflow and be 4096 instead of 4294967296.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8616>
For example:
s2: %688:s[32-33] = p_linear_phi %3:s[10-11], %688:s[32-33]
would have been considered trivial.
This might happen due to parallelcopies when assigning phi registers.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 69b6069dd2 ("aco: refactor try_remove_trivial_phi() in RA")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8645>
Currently, these cannot be vectorized as in NIR
shift operands are 32bit while for 16bit-vectorization
they need to be 16bit.
No fossildb changes.
Fixes: fcd2ef23e5 ('radv: vectorize 16bit instructions')
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8612>
This will allow to propagate and emit sub-register constants
on all hardware generations.
Also fixes GFX8 constant emission to not use SDWA.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8260>
It could happen that due to inconsistent copy-propagation
v1 = p_parallelcopy v2b
instructions were left after optimization on GFX8.
Cc: 20.3
Cc: 21.0
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8260>
When NGG is used, the hw can't know the number of geometry shader
primitives. To fix that, the NGG geometry shader accumulates itself
the number of primitives by using an atomic operation directly to GDS.
Then, begin/query copy the start/stop values from GDS to the
query pool buffer using a PS_DONE event. This was actually wrong
because PS_DONE is completely asynchronous to everything and executed
when the preceding draws finish pixel shaders.
Fix this by using a COPY_DATA packet which is synced with CP. This
fixes random failures on Sienna Cichlid with
dEQP-VK.query_pool.statistics_query.*.geometry_shader_primitives.*.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8590>
Fixes several dEQP-VK.robustness.robustness2.* tests on GFX8. Generations
other than GFX8 don't fail the tests because bounds-checking is done using
the index (making it per-vertex).
fossil-db (Polaris):
Totals from 1387 (0.99% of 140385) affected shaders:
(no statistics affected)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 03a0d39366 ("aco: use MUBUF in some situations instead of splitting vertex fetches")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7834>
We check below if HTILE is in compressed state, so checking if
the image has HTILE is useless because radv_layout_is_htile_compressed()
will return FALSE if no HTILE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8579>
This is already checked in radv_handle_depth_image_transition().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8579>
From comment in emit_mimg():
We don't need the bias, sample index, compare value or offset to be
computed in WQM but if the p_create_vector copies the coordinates, then it
needs to be in WQM.
fossil-db (GFX10.3):
Totals from 1778 (1.28% of 139391) affected shaders:
SGPRs: 105080 -> 105072 (-0.01%); split: -0.02%, +0.01%
VGPRs: 96800 -> 96776 (-0.02%); split: -0.07%, +0.05%
CodeSize: 10001120 -> 10001384 (+0.00%); split: -0.04%, +0.04%
MaxWaves: 18164 -> 18163 (-0.01%)
Instrs: 1883750 -> 1883598 (-0.01%); split: -0.06%, +0.05%
Cycles: 34800176 -> 34767840 (-0.09%); split: -0.10%, +0.01%
We don't have a p_create_vector if we use NSA.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8523>
In some rare cases, L2 needs to be flushed if an image is affected
by the pipe misaligned issue. This is roughly based on AMDVLK.
I confirmed that disabling TC-compat HTILE, and respectively DCC,
for the relevant images also fixes the regressions below.
This fixes some regressions introduced with L2 coherency for
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_* and for
dEQP-VK.renderpass2.suballocation.multisample_resolve.*.
Fixes: 4a783a3c78 ("radv: Use L2 coherency on GFX9+.")
Co-Authored-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8557>
The driver used to invalidate the vector cache for meta operations
but this has been removed and I think it should be restored to fix
a bunch of regressions on GFX8.
This probably needs to be cleaned up but this is a hotfix.
This fixes a bunch of regressions and flakes on GFX8 like
dEQP-VK.pipeline.multisample.sample_locations_ext.draw.color.samples_4.*.
Fixes: 8f8d72af55 ("radv: Use access helpers for flushing with meta operations.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8573>
I don't know why this wasn't enabled but I think it should be.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8562>
v_or_b32 with a v2b definition should use SDWA if is_partial=true.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 56345b8c61 ("aco: allow reading/writing upper halves/bytes when possible")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8577>
This restores the previous logic because L2 coherency was fully
implemented. It appears that flushing L2 metadata with a CS_DONE
event hangs.
This fixes GPU hangs with Monster Hunter World.
Fixes: 4a783a3c ("radv: Use L2 coherency on GFX9+.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8566>
The flush VA space was only allocated for command buffers on the
graphics queue. Also, the ZPASS_DONE event should never be emitted
on compute queues because it hangs.
Invalidating the L2 metadata cache is only required for coherency
between the RBs and L2, so only on the graphics queue.
The L2 cache is invalidated at beginning of any IBs and that should
also invalidate the L2 metadata cache for compute anyways.
Fixes: 4a783a3c ("radv: Use L2 coherency on GFX9+.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8494>
All expressions have been replaced by their closest equivalent. No major
simplification efforts have been made to minimize risk of regressions.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7799>
This makes for a more self-describing iteration behavior, and it gets rid
of the need for the duplicated "final check" at the bottom.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7799>
All code paths that set "found" to true either break or return before the
loop header is reached again, so the checks are unnecessary.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7799>
All expressions have been replaced by their closest equivalent. No major
simplification efforts have been made to minimize risk of regressions.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7799>
This mainly clarifies the semantics of register bounds (inclusive vs
exclusive), and further groups related varaibles together to clarify
sliding-window-style loops.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7799>
Delaying the call to adjust_max_used_regs until after get_regs_for_copies
returns puts the RA context into a state where registers past max_used_gpr
may be blocked. This isn't an issue on its own, but it adds a surprising
corner case to get_reg_simple that is easily avoided now.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7799>
When both operands of a v_sub (same apply for v_add) are mul and one
already uses clamp/omod, pick the other operand to get a chance to
combine to a MAD.
No fossils-db changes.
Co-authored-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6680>
It should work fine now.
This gives +1-2% improvements with Control MSAA (2x and 4x)
on Sienna.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8413>
Especially on GFX10 we can avoid pretty much all L2 flushes.
However, instead of that we have to do L2_METADATA invalidations. We
do that every time we could possibly be reading new DCC/HTILE info
from the L2 cache in shaders.
Benchmark results, basemark on high preset with a navi10 on profile_standard
(which is slower than a navi10 on default settings, please don't compare
to random navi10 results you find)
before:
5932
5928
5937
after:
6011
6013
6009
So this looks like a >1% increase.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7202>
This way we're properly using the vulkan barrier paradigm instead
of adhoc guessing what caches need to be flushed. This is more robust
for cache policy changes as we now don't have to revisit all the meta
operations all the time.
Note that a barrier has both a src and dst part though. So
barrier:
flush src
meta op
flush dst
becomes
barrier:
flush barrier src
flush meta op dst
meta op
flush meta op src
flush barrier dst
And there are some places where we've been able to replace a CB flush
with a shader flush because that is what we'd need according to vulkan rules
(and it turns out that in the cases the CB flush mattered the app will set the
bit in one of the relevant flushes or it was needed as a result of an optimization
that we counter-acted in the previous patch.)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7202>
To cancel the optimization in radv_dst_access_flush if these helpers
get used by meta operations.
We could also remove that optimization but I think this triggers less
often as all SHADER_WRITE flushes on images not supporting STORAGE should
be meta
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7202>
I attempted to enable it for 21.0, only 2x and 4x were supported
but there is new failures if DCC+MSAA is enabled.
Disable it again because DCC is more important than this feature and
no Mesa releases have it on GFX10+.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8472>
This is broken for some reasons, and probably rare enough to
care for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8468>
If we have CMASK, we can also skip FCE like we do for DCC.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8332>
In case we don't have DCC, we can still predicate FCE with CMASK.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8332>
The FCE predicate value is only allocated if DCC is enabled.
We only want to use predication for DCC decompressions and for FCE
but not having FMASK doesn't mean the predicate is allocated.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4075
Fixes: 6e7008e94b ("radv: do not predicate FMASK decompression when DCC+MSAA is used")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8441>
dcc_slice_size is in DWORD on GFX9... Also, layers aren't supported
because they might be interleaved. Fix this by clearing the entire
DCC buffer.
Fixes: 5e8f6967b1 ("radv: add support for fast-clearing DCC layers on GFX9+")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8443>
The Android Vulkan loader needs this symbol, so the addition of the
linker script broke Vulkan for Android.
(For non-Android builds: I checked that having a non-existent symbol in
the linker script works ok and doesn't put the symbol in the library)
Fixes: 41bb6459d3 ("radv: restrict exported symbols with static llvm")
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8437>
When there are no param exports in an NGG (or legacy VS) shader,
the NO_PC_EXPORT=1 is set, which means PS waves can launch before
the current stage finishes.
If the current stage has any stores, we need to make sure to wait for
those before we allow PS waves to start, so that PS can read what
these instructions stored.
Fossil DB results on Navi 10:
Totals from 45 (0.03% of 136420) affected shaders:
CodeSize: 87224 -> 87404 (+0.21%)
Instrs: 16750 -> 16795 (+0.27%)
Cycles: 69580 -> 69760 (+0.26%)
VMEM: 8022 -> 8167 (+1.81%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7868>
When there are no param exports in an NGG (or legacy VS) shader,
the NO_PC_EXPORT=1 is set by RADV, which means PS waves can launch
before the current stage finishes.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7868>
If the dest image has DCC it should be re-initialized to the
uncompressed state.
Note that the driver always selects the graphics path if the dest
image has DCC, so this has no effect for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8326>
Because DCC is re-initialized to the uncompressed state after the
resolve, so if the app does a partial resolve it should be
decompressed first.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8326>
Could still be improved a little. For example, 8-bit pack without
constants could be:
(s_pack_ll(x, z) & 0x00ff00ff) | ((s_pack_ll(y, w) & 0x00ff00ff) << 8)
fossil-db (Sienna):
Totals from 136 (0.10% of 139391) affected shaders:
CodeSize: 279776 -> 278144 (-0.58%)
Instrs: 50742 -> 50470 (-0.54%)
Cycles: 211560 -> 210472 (-0.51%)
SMEM: 3607 -> 3557 (-1.39%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8421>
This can be used to work around a common class of bugs appearing as
flickering.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8104>
This is only supported on GFX8+, this fixes a ton of CTS failures
on my Pitcairn (GFX6).
Fixes: af7fb4df50 ("radv: Add sparse image queries.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8415>
This doesn't have much of an effect, but it helps avoid a
pathological case for Assassin's Creed Valhalla and a RDR2 shader with a
future change.
fossil-db (Sienna):
Totals from 55074 (39.51% of 139391) affected shaders:
SGPRs: 3515076 -> 3567744 (+1.50%); split: -0.01%, +1.51%
CodeSize: 206942120 -> 206941868 (-0.00%); split: -0.00%, +0.00%
Instrs: 39625900 -> 39625837 (-0.00%); split: -0.00%, +0.00%
Cycles: 1640088780 -> 1640088828 (+0.00%); split: -0.00%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4070
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8416>
The predication is based on the mip level, so if the image has layers
and DCC is enabled, it should only be used if the range of layers
covers the whole image.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8368>
Remove one old comment because it supports decompressing layers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8368>
If we change the virtual address we also have to change the offset in the buffer
to be mapped.
Fixes: 715df30a4e "radv/amdgpu: Add winsys implementation of virtual buffers."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7953>
Found a case where we mapped a range too many.
Per the comment the constraint is:
/* [first, last] is exactly the range of ranges that either overlap the
* new parent, or are adjacent to it. This corresponds to the bind ranges
* that may change.
*/
So that means that after the ++last we the ranges[last] should still
be adjacent. So we need to test the post-increment value to see whether
it is adjacent.
Failure case:
ranges:
0: 0 - ffff
1: 10000 - 1ffff
2: 20000 - 2ffff
3: 30000 - 3ffff
new range: 10000 - 1ffff
wrong first, last: 0,3
However range 3 clearly isn't adjacent at all.
Fixes: 715df30a4e "radv/amdgpu: Add winsys implementation of virtual buffers."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7953>
For GFX9 I didn't reuse the existing mipmap offset/pitch because
last time we did that there was a revert request from Marek.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7953>
If the current layout isn't compressed we don't have to re-initialize
the HTILE metadata.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8389>
This was added to workaround some CTS failures which no longer happen.
Note that radv_clear_htile() will only clear the depth or stencil
bytes of the HTILE buffer based on the aspect.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8389>
iview can be NULL inside a secondary command buffer.
Fixes: 00064713a3 ("radv: determine at creation if an image view can be fast cleared")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8408>
Even if the FCE predicate is FALSE, we might still need to decompress
FMASK if compressed rendering was used. FMASK decompressions should
never been predicated.
This fixes a ton of CTS failures and a rendering issue with Control
when DCC+MSAA is force-enabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8331>
This can be determined earlier than every time a clear is performed
by the driver, it probably saves a bunch of CPU cycles.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8370>
To properly support multi-planar images, we don't want to set metadata
on anything other than the first plane. To achieve this radv currently
checks for the image TILING and assumes LINEAR means it's not the first
plane.
However this doesn't account for images with a single LINEAR plane. We
still want to set metadata on those, e.g. to properly set the scanout
bit in the tiling flags.
Instead of checking for LINEAR, check if the offset is zero. Only the
first plane has a zero offset on AMD.
This mirrors the radeonsi logic [1].
While at it, move the metadata declaration into the if block.
[1]: 6fecdc6dda/src/gallium/drivers/radeonsi/si_texture.c (L710)
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8086>
Since sparse fetch/load uses vec5 destinations, it may be possible that we
encounter nir_op_vec5.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7775>
Since sparse fetch/load uses vec5 destinations, it may be possible that we
encounter nir_op_vec5.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7775>
We will want both a VDATA operand and a sampler for some TFE/LWE MIMG
instructions.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7775>
The ISA docs were inconsistent about what this flag does, but that seems
fixed in the RDNA doc.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7775>
This fixes a performance regression for games (eg. Youngblood) that
declare all images as concurrent. This is likely buggy for compute
queues but this just restores the previous behaviour for now.
Fixes: f4f096805b ("radv: fix TC-compat HTILE images with DST_OPTIMAL on the compute queue")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8351>
It wasn't expected to also enable inside render loops.
Fixes: 4bb92d9145 ("radv: enable TC-compat HTILE in GENERAL on GFX10+")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8351>
This was missing, it can be enabled with RADV_PERFTEST=tccompatcmask.
Note that this feature is still experimental.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8350>
Compute the number of components of the destination vector from the
bitsize when eg. a 16-bit vec2 vertex fetches is splitted. This is
because the dst will be a v1, so the p_create_vector should be created
from two v2b fro both sizes to match.
This prevents a regression from the next change which will split
typed vertex buffer loads on GFX6 and GFX10+.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8363>
This assumption becomes incorrect with RADV_PERFTEST=cswave32.
Games include Detroit: Become Human and Doom Eternal.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7918>
GFX10+ supports compressed writes to HTILE, so it should just work
to skip decompressions when transitioning from/to GENERAL.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>
Otherwise it's useless because we are unlikely to perform a
fast depth stencil clear.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>
This is probably rare but can happen if someone performs a depth-stencil
copy on the compute queue. This might work (untested by CTS) but it
looks more conservative to decompress before perfoming the operation.
Found by inspection.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>
We can only use the entire HTILE buffer if TILE_STENCIL_DISABLE is
TRUE. On GFX8+, this is only true if the depth image has no stencil
and if it's not TC-compatible because of the ZRANGE_PRECISION issue.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>
To make sure the stencil compare state is properly initialized and
cleared when the driver performs a fast depth clear.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8039>
Not doing this for APUs because spilling is quite likely, due to
overall VRAM pressure.
Also adding a flag to disable for performance debugging.
Finally adds some memset for places where we depended on the memory
being initialized to zero, which we won't get with VRAM anymore.
(I think these places should stop depending on it since it hides
issues with executing the cmdbuffer multiple times, but this
preserves behavior)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7979>
Now that this function does not block RegisterFile entries anymore,
the temporary copy is only needed upon reaching the collect_vars call.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8261>