nir_opt_load_store_vectorize has an is_strided_vector() function that
looks for types with weird explicit strides. It does so by comparing
the explicit stride against the type-size-derived typical stride.
This had a subtle bug. Simple vector types (vec2/3/4) have no explicit
stride, so glsl_get_explicit_stride() returns 0. This never matches the
typical stride for a vector, so is_strided_vector() would return true
for basically any vector type, causing the vectorizer to bail.
I found this by looking at a compute shader with scalar SSBO loads at
offsets 0x220, 0x224, 0x228, 0x22c. nir_opt_load_store_vectorize would
properly vectorize the first two into a vec2 load, but would refuse to
extend it to a vec3 and ultimately vec4 load because is_strided_vector()
saw a vec2 and freaked out.
Neither ACO nor ANV do load/store vectorization before lowering derefs,
so this shouldn't affect them. However, I'd like to fix this bug to
avoid the trap for anyone who decides to in the future. In a branch
where anv used this lowering, this cut an additional 38% of the send
messages in the shader by properly vectorizing more things.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4255>
In vulkan, offsets are already provided through the api
vkCmdBindTransformFeedbackBuffersEXT, so this is duplicated
calculation.
Fixes : 9ff1959ca5
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Brian Ho <brian@brkho.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4604>
In case of transform feedback query, it writes two integer values,
which one is for primitives written and another is for primitives
generated.
To handle this, the second member of the struct slot_value is worth
to be presented not as a padding.
In addition, we also need to modify get/copy_result to access both
values.
This patch is the prep work for the transform feedback query support.
Tested with
dEQP-VK.pipeline.timestamp.*
dEQP-VK.query_pool.occlusion_query.*
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Brian Ho <brian@brkho.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4604>
64-bit immediates are only allowed as src0. Long ago, we decided to
avoid constructing such illegal situations in the IR, rather than
allowing them in the IR but then promoting bogus immediates to GRFs
later. So, we need to fix opt_peephole_sel to not put 64-bit immediates
as src1 of the new SEL instruction.
Fixes: a4b36cd3dd ("intel/fs: Coalesce when the src live range is contained in the dst")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2816
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4692>
This should have gone away when removing source modifiers. They won't
be set any longer, so this is simply dead code.
Fixes: b7c47c4f7c ("intel/compiler: Drop nir_lower_to_source_mods() and related handling.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4691>
VEC4_OPCODE_PICK_HIGH_32BIT performs 32-bit UD access on a 64-bit DF
value. abs and negate make sense on DF, but break entirely when
trying to access pieces of the value as unsigned integer dwords.
Fixes an fsign Piglit test on Ivybridge:
tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-sign-neg-abs
It had regressed when I removed nir_lower_to_source_modifiers, as that
caused us to start generating different code which provoked this bug.
Fixes: b7c47c4f7c ("intel/compiler: Drop nir_lower_to_source_mods() and related handling.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2817
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4691>
Fixes all the ARB_texture_query_lod piglit tests, and needed to get
the Vulkan CTS textureQueryLOD passing with the ongoing Vulkan driver.
Note that LOD Query bit flag became only available on V42 of the hw,
but the v3d40_tex is using V41 as reference. In order to avoid setting
up the infrastructure to support both v41 and v42, we manually set the
bit if the device version is the correct one.
We also fix how the ARB_texture_query_lod (so EXT_texture_query_lod)
is exposed. Before this commit it was always exposed (wrongly as it
was not really supported). Now it is exposed for devinfo.ver >= 42.
v2: move _need_sampler helper to nir.h (Eric Anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4677>
Configuration Parameter packets 1 and 2 are pointed as optional, but
it is not clearly stated if you can skip only P1 when P2 is needed.
In the practice, it seems that the situation P0 - non-P1 - P2 can
causes problems, and at least on the simulator, it seems that sampler
info are attempted to be accessed. So let's just be conservative, and
only skip P1 configuration if we can skip P2 configuration too.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4677>
TMU configuration parameter 1 configures the sampler for the texture
operation. But there are some texture operations that doesn't need a
sampler. Skipping the configuration could provide a small perf
improvement on OpenGL. On the incoming Vulkan driver, would allow us
to avoid to set up an unneeded sampler.
Note that we still need to add the sampler configuration parameter if
the output is a 32bit, as it is on the sampler where we configure that
info.
Also, note that for images this is done comparing against a unpacked
p1 default. But in order to do that it is needed to go through the
code that fills up the unpacked p1. We can skip that too.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4677>
Passes tests in:
dEQP-VK.pipeline.multisample.sample_locations_ext.*
Note that these tests fail because of gl_PrimitiveID not working correctly:
dEQP-VK.pipeline.multisample.sample_locations_ext.verify_location.*
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4665>
This simplifies code and helps some shaders
Totals from affected shaders:
Code Size: 51227172 -> 51202216 (-0.05 %) bytes
Max Waves: 19955 -> 19948 (-0.04 %)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
This simplifies definition handling and
helps a few shaders
Totals from affected shaders:
Code Size: 659540 -> 659376 (-0.02 %) bytes
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4573>
The optimizer isn't yet updated to handle this, since lower_to_hw_instr
will be the only user for now.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4469>
discard is supposed to be a terminator, killing the thread, so that it's
possible to exit main solely by a discard e.g. inside of an infinite
loop. However, it currently isn't treated as a terminator in NIR due to
workarounds turning it into demote (d3d-style kill) and even if that
were fixed, we probably wouldn't want to treat discard_if as a jump
since otherwise the scheduler wouldn't be able to schedule things around
it. So, add this workaround which inserts jump instructions as
necessary to guarantee that the program always terminates.
This fixes a hang in dEQP-VK.graphicsfuzz.while-inside-switch, which
conditionally does a discard inside an infinite loop.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4658>
The first block was being added to the list twice, once here and once in
emit_block(), leading to list corruption and infinite loops when trying
to traverse the list of blocks backwards.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4658>
In SPIRV of compute shader in Aztec Ruins benchmark there is:
OpControlBarrier %uint_1 %uint_1 %uint_0
// ControlBarrier(Device, Device, rdcspv::MemorySemantics(0));
which is an incorrect translation of glsl barrier().
GLSLang, prior to c3f1cdfa, emitted the OpControlBarrier with
Device instead of Workgroup for execution scope.
2365520c covers similar case but isn't applied when execution_scope
is SpvScopeDevice.
Cc: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2742
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4660>
i965 is the only driver that ever linked to this code and it's been
doing it in BLORP for a long time now. The only possible case where it
would have fallen back to meta was for depth/stencil but that should
have ended starting with 6cec618e82. Rip out the dead code.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4622>