In addition really fill all slots, because otherwise the alu-group merger
might move a read from the indirectly written register into the 't' slot.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15848>
drm_shim.c undefines the _FILE_OFFSET_BITS macro, so plain off_t might
be 32 bits, while it's 64 bits in device.c. To avoid this mismatch,
use off64_t which will always be 64 bits in both source files.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12203>
loader_open_render_node returns the first device in /dev/dri that it
can use. To make sure the drm-shim device always gets chosen, return
the fake entries in readdir before returning the real ones.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12203>
When IndirectParameterEnable==true it's not actually used by the hardware,
but if it's not initialized and INTEL_DEBUG=bat is set, then Valgrind complains.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15850>
Will allow us to drop more GLSL IR code in future once we switch
all drivers to NIR. Also stops the need for all drivers to call
this pass to remove indirect temps that may have been added during
the NIR varying linking lowering/optimisations.
This patch fixes some tests on i915, d3d12, lima and vc4.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15871>
With VK_EXT_graphics_pipeline_library, we will have to support
independent sets and also to merge sets from different libraries.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15849>
With VK_EXT_graphics_pipeline_library, shader modules can be NULL and
be passed via the pNext of VkPipelineShaderStageCreateInfo. To prepare
for this, just store everything we need to radv_pipeline_stage.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15847>
To remove use of shader modules completely in the next commit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15847>
Mesa shader stages are correctly sorted.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15847>
We've wanted to remove destructors from ralloc's API for a long time (it's
an extra storage cost per ralloc for a rarely-used feature), and for the
suballoc change we'd need to spend more storage on storing the tu_device
pointer per result since destructors don't get anything else but the
pointer passed into them.
Fixes use-after-frees:
=================================================================
==2383==ERROR: AddressSanitizer: heap-use-after-free on address 0xffff88fe1940 at pc 0xffff934f427c bp 0xfffff5481e90 sp 0xfffff5481ea8
WRITE of size 8 at 0xffff88fe1940 thread T0
#0 0xffff934f4278 in list_del ../src/util/list.h:108
#1 0xffff934f4278 in result_destructor ../src/freedreno/vulkan/tu_autotune.c:237
#2 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#3 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#4 0xffff934f4368 in history_destructor ../src/freedreno/vulkan/tu_autotune.c:229
#5 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#6 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#7 0xffff934f5990 in tu_autotune_on_submit ../src/freedreno/vulkan/tu_autotune.c:442
[...]
0xffff88fe1940 is located 80 bytes inside of 112-byte region [0xffff88fe18f0,0xffff88fe1960)
freed by thread T0 here:
#0 0xffff9c1c90d8 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
#1 0xffff934f4368 in history_destructor ../src/freedreno/vulkan/tu_autotune.c:229
#2 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#3 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#4 0xffff934f5990 in tu_autotune_on_submit ../src/freedreno/vulkan/tu_autotune.c:442
#5 0xffff935cf2ac in tu_queue_submit_locked ../src/freedreno/vulkan/tu_drm.c:997
[...]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
We don't return unused space to the suballocator, so it's a little useful
to limit how much we overallocate to reduce memory footprint. I took a
look through the tu_cs_emit_array() calls and accounted for a couple of
them in the variant-specific space calculation, then dropped the base
allocation by factors of 2 until we started throwing asserts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
Allocating a BO for each pipeline meant that for apps with many pipelines
(such as Asphalt9 under ANGLE), we would end up spending too much time in
the kernel tracking the BO references.
Looking at CS:Source on zink, before we had 85 BOs for the pipelines for a
total of 1036 kb, and now we have 7 BOs for a total of 896 kb.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
The intel_perf_query path used for performance queries on GL was
passing a bogus "end" pointer to intel_perf_query_result_accumulate(),
causing it to accumulate garbage values. This was causing the values
of many performance counters to be corrupted.
The "end" pointer was incorrect because the current code was assuming
that different OA reports were located TOTAL_QUERY_DATA_SIZE bytes
apart, which is a hard-coded preprocessor define. However recent
(Gfx12+) hardware generations use a variable query size determined by
the query layout. Use the size derived from it instead, and remove
the stale define.
Fixes: 3c51325025 ("intel/perf: switch query code to use query layout")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15783>
gallium queries have to be mapped onto multiple vulkan queries,
this can happen for two reasons.
1. primitives generated and overflow any don't map directly, and
multiple vulkan queries are needs per iteration. These are stored
inside the "starts" as zink_vk_query ptrs.
2. suspending/resuming queries uses multiple queries, these are
the "starts". Every suspend/resume cycle adds a new start.
Vulkan also requires that multiple queries of the same time don't
execute at once, which affects the overflow any vs xfb normal
queries, so vk_query structs are refcounted and can be shared
between starts. Due to this when the draw state changes, it's
simple to just suspend/resume all queries so the shared vulkan
queries get handled properly.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15785>
Vs state can be NULL by the time r300_set_constant_buffer is called.
We don't hit this with OpenGL though, so this is why I didn't spot
this in my testing, but nine hits this codepath. Restore the original
behavior here.
Fixes: 882811b1ff
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15842>
Some colors were missing/inverted on big endian machine(s390x).
because blend_type.length > src_fmt->nr_channels.
In my case, blend_type has 4 channels (rgba) but src_fmt has only 3.
So the from_lsb was wrong by 1, and channels got messed up.
(blue was always 255, green -> red, and blue -> green).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6204
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15556>
This enables a lower power mode in the sampler hardware in certain
common scenarios. On Tigerlake, SAMPLER_MODE is not programmable by
userspace but the kernel already sets this bit for us.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
While this register still exists, it's no longer a per-context register.
Instead, on Gfx12+, SAMPLER_MODE exists per dual-subslice and is
accessed as a "multicast" register, where you write control which
version is accessed by the "steering control register".
At any rate, userspace cannot write it any longer, and so there's not
much point to it existing in our genxml (which was missing most of the
fields anyway).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
This allows the sampler to perform faster filtering of 8-bit UNORM
textures by filtering them at a different precision. The filtering
is intended to still be OpenGL and DirectX spec compliant.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
We need to make sure we still have descriptors to copy in the
while() condition. While at it, drop the assert() checking that
the number of descriptors already copied is less than the
requested number.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15828>
By doing this to remove the need of C++ runtime when not using llvmpipe
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15659>
according to spec, a barrier is required any time the pixels of the
framebuffer are changed. since zink defers clears and runs them at
a later time, it must also be responsible for handling the required
synchronization for such operations
fixes (radv):
KHR-GL46.blend_equation_advanced.blend_all*
fixes#5572
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15831>
according to spec, for fbfetch this should match the subpass self-dependency of
* stage FRAGMENT_STAGE -> FRAGMENT_STAGE
* access 0 -> INPUT_ATTACHMENT_READ
zs fbfetch doesn't seem to be a thing, so that code is left for historical
and/or future purposes
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15831>