It breaks too many things and shouldn't have been merged. The fix isn't
trivial and it will probably not be backported because it's intrusive.
It will be re-applied later when everything will work.
This reverts commit 94706601fa.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15882>
When emitting the AR forces splitting an ALU group, and at the same time
a new CF instruction is started, then the last instrcution in the finished
CF block might not have the "last" bit set, which results in an invalid
shader that might hang, or crash SB.
So when a new CF is started, force the last bit in the last ALU instruction.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
With the NIR code, we have instructions groups that use
INTERP_LOAD_P0 that don't fill all slots. Just make sure
the backend scheduler doesn't fill in INTERP_LOAD_P0
instructions with a different LDS location.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
Since the value is not written, there is no need to allocate
a register for it, so don't take it into account.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
ALU_SRC_PARAM_BASE is an inline constant that defines the
address for pulling data from LDS memory for interpolation
and not a value from the kcache, so there is no need to
take these values into account when allocating kcache
load slots.
v2: Fix the constant range check to not exclude the translated
ranges for kcache banks 2 and 3.
v3: limit range check to only include kcache values and and
rename relevant function (Emma).
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15714>
We don't need to disable this path if there are indirect or 8/16/64-bit
push constant loads. We can just use the default path for them.
fossil-db (Sienna Cichlid):
Totals from 21 (0.02% of 134621) affected shaders:
CodeSize: 2028 -> 1884 (-7.10%)
Instrs: 366 -> 363 (-0.82%); split: -2.46%, +1.64%
Latency: 6630 -> 6579 (-0.77%)
InvThroughput: 26520 -> 26316 (-0.77%)
Copies: 84 -> 102 (+21.43%)
PreSGPRs: 141 -> 222 (+57.45%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12145>
virglrenderer doesn't properly emit the conversion code when the source
is a integer value and the output is also integer.
Fixes on NTT:
dEQP-GLES31.functional.shaders.sample_variables.sample_mask.inverse_per_*
v2: fix typo (Emma)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15836>
The host driver will optimize unused variables away, and checking thoroughly whether we
may need an extra temp is just uselessly costly.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15836>
NIR doesn't propagate precise through moves, and with NTT the
last output is usually preceded by a move, so that we no longer
see that the evaluation of some value is supposed to be exact,
and, hence we can't decorate the outputs accordingly.
Fixes with NTT:
dEQP-GLES31.functional.tessellation.common_edge.
triangles_equal_spacing_precise
triangles_fractional_odd_spacing_precise
triangles_fractional_even_spacing_precise
quads_equal_spacing_precise
quads_fractional_odd_spacing_precise
quads_fractional_even_spacing_precise
v2: Don't clear the precise flag when we hit a mov, because we may
hit a if/else construct like below and we don't track branches
IF X
TEMP[0] = OP_PRECICE ...
ELSE
TEMP[0] = MOV CONST[]
ENDIF
Thanks Emma for pointing out the problem.
v2: allocate precise handling flags to transform_prolog (Emma)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15836>
Include in the CODEOWNERS file who to ping in case of issues with the
Broadcom (V3D/V3DV/VC4) CI.
v2:
- Add Chema (Chema)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Acked-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15858>
In addition really fill all slots, because otherwise the alu-group merger
might move a read from the indirectly written register into the 't' slot.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15848>
drm_shim.c undefines the _FILE_OFFSET_BITS macro, so plain off_t might
be 32 bits, while it's 64 bits in device.c. To avoid this mismatch,
use off64_t which will always be 64 bits in both source files.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12203>
loader_open_render_node returns the first device in /dev/dri that it
can use. To make sure the drm-shim device always gets chosen, return
the fake entries in readdir before returning the real ones.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12203>
When IndirectParameterEnable==true it's not actually used by the hardware,
but if it's not initialized and INTEL_DEBUG=bat is set, then Valgrind complains.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15850>
Will allow us to drop more GLSL IR code in future once we switch
all drivers to NIR. Also stops the need for all drivers to call
this pass to remove indirect temps that may have been added during
the NIR varying linking lowering/optimisations.
This patch fixes some tests on i915, d3d12, lima and vc4.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15871>
With VK_EXT_graphics_pipeline_library, we will have to support
independent sets and also to merge sets from different libraries.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15849>
With VK_EXT_graphics_pipeline_library, shader modules can be NULL and
be passed via the pNext of VkPipelineShaderStageCreateInfo. To prepare
for this, just store everything we need to radv_pipeline_stage.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15847>
To remove use of shader modules completely in the next commit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15847>
Mesa shader stages are correctly sorted.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15847>
We've wanted to remove destructors from ralloc's API for a long time (it's
an extra storage cost per ralloc for a rarely-used feature), and for the
suballoc change we'd need to spend more storage on storing the tu_device
pointer per result since destructors don't get anything else but the
pointer passed into them.
Fixes use-after-frees:
=================================================================
==2383==ERROR: AddressSanitizer: heap-use-after-free on address 0xffff88fe1940 at pc 0xffff934f427c bp 0xfffff5481e90 sp 0xfffff5481ea8
WRITE of size 8 at 0xffff88fe1940 thread T0
#0 0xffff934f4278 in list_del ../src/util/list.h:108
#1 0xffff934f4278 in result_destructor ../src/freedreno/vulkan/tu_autotune.c:237
#2 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#3 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#4 0xffff934f4368 in history_destructor ../src/freedreno/vulkan/tu_autotune.c:229
#5 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#6 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#7 0xffff934f5990 in tu_autotune_on_submit ../src/freedreno/vulkan/tu_autotune.c:442
[...]
0xffff88fe1940 is located 80 bytes inside of 112-byte region [0xffff88fe18f0,0xffff88fe1960)
freed by thread T0 here:
#0 0xffff9c1c90d8 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
#1 0xffff934f4368 in history_destructor ../src/freedreno/vulkan/tu_autotune.c:229
#2 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#3 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#4 0xffff934f5990 in tu_autotune_on_submit ../src/freedreno/vulkan/tu_autotune.c:442
#5 0xffff935cf2ac in tu_queue_submit_locked ../src/freedreno/vulkan/tu_drm.c:997
[...]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
We don't return unused space to the suballocator, so it's a little useful
to limit how much we overallocate to reduce memory footprint. I took a
look through the tu_cs_emit_array() calls and accounted for a couple of
them in the variant-specific space calculation, then dropped the base
allocation by factors of 2 until we started throwing asserts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
Allocating a BO for each pipeline meant that for apps with many pipelines
(such as Asphalt9 under ANGLE), we would end up spending too much time in
the kernel tracking the BO references.
Looking at CS:Source on zink, before we had 85 BOs for the pipelines for a
total of 1036 kb, and now we have 7 BOs for a total of 896 kb.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
The intel_perf_query path used for performance queries on GL was
passing a bogus "end" pointer to intel_perf_query_result_accumulate(),
causing it to accumulate garbage values. This was causing the values
of many performance counters to be corrupted.
The "end" pointer was incorrect because the current code was assuming
that different OA reports were located TOTAL_QUERY_DATA_SIZE bytes
apart, which is a hard-coded preprocessor define. However recent
(Gfx12+) hardware generations use a variable query size determined by
the query layout. Use the size derived from it instead, and remove
the stale define.
Fixes: 3c51325025 ("intel/perf: switch query code to use query layout")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15783>