We also add a new list iterator which takes a modes bitfield and
automatically figures out which list to use. In the future, this
iterator will work for multiple modes but today it assumes a single mode
thanks to the behavior of nir_variable_list_for_mode. This also doesn't
work for function_temp variables.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
There are a bunch of cases where we really do want to walk the list that
is nir->uniforms because we want all things declared "uniform" in the
GLSL. Add a helper for this but restrict it to the GL linking code.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
This one's a bit more complex because it filters off only those
variables with mode == nir_var_uniform. As such, it's not exactly a
drop-in replacement for nir_foreach_variable(var, &nir->uniforms).
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
For the most part, this doesn't actually matter today. We already only
call remove_dead_vars on the lists that are specified in the modes. The
only functional change here is for the uniform, mem_ubo, and mem_ssbo
modes because they share a list. If nir_remove_dead_variables is called
with a mode of nir_var_uniform, it will no longer remove UBOs or SSBOs,
for instance.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
Lower them to ACCESS_COHERENT to simplify the backend and
probably give better performance than invalidating or writing back the
entire L0/L1 cache.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4905>
For convenience in e68871f6a4 ("spirv: Handle constants and types
before execution modes") we moved all execution mode parsing after the
constants and types, so that those using OpExecutionModeId could be
handled together.
Later in 84781e1f1d ("spirv/nir: keep track of SPV_KHR_float_controls
execution modes") we had to parse certain non-ID execution modes
before handling constants.
Instead of handling just the float controls related execution modes
early, handle all modes that don't need an ID. This is a more
"natural" split and will allow other type handling to rely on
execution mode in the future.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6062>
Vulkan allows fragment shaders to read gl_ClipDistance[], in which
case the SPIR-V compiler inserts a single compact array variable for
VARYING_SLOW_CLIP_DIST0 and the lowering should not try to inject
its own variables, but instead work in terms of the existing one.
Vulkan drivers are expected to call this with use_clipdist_array set
to true to be consistent with this setup.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6022>
I missed this if statement so atomic counters weren't getting bindings
and, when you have more than one of them, that meant they were all
getting combined into one.
Fixes: 3584cb09bc15 "spirv: Give atomic counters their own variable mode"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6060>
The current scheduling algorithm favors parallelism a bit too
aggressively and sometimes generates shaders that fail register
allocation. This happens even if the threshold is set to zero to force
it to always use the CSR instruction choosing algorithm.
This patch adds an option to use an even more aggressive fallback that
just always picks the instruction with the shortest maximum delay in the
hope that that will generate the least register pressure. The intention
is to use this as a last resort after register allocation fails in order
to at least have a working shader.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
Adds a callback function to nir_schedule_options to give the backend a
chance to add custom dependencies between certain intrinsics. The
callback can assign a class number to the intrinsic and then set a read
or write dependency on that class.
v2: Use a linked-list of schedule nodes for the dependency classes
instead of a fixed-sized array.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
Instead of copying the individual members of nir_schedule_options into
the scoreboard, it now just keeps a pointer to the options. This avoids
the duplicated comments and makes it easier to add more options later.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
nir_schedule already has a struct for options which makes it more than
just a function declaration. Later patches intend to add more structs to
complement these options. In order to make the code easier to manage,
this moves the nir_scheduler-related parts out of nir.h to their own
header.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
Previously, objects of type OpTypeImage or OpTypeSampler were treated as
vtn_pointers and objects of type OpTypeSampledImage were a special-use
vtn_sampled_image struct. This commit changes that so that all of those
objects are stored in vtn_ssa_values. Each of images, samplers, and
sampled images, are stored as a scalar or vector nir_ssa_def whose
components are NIR deref values. We now use vtn_type_get_nir_type to
re-resolve those as-needed into GLSL sampler types for NIR.
This simplification has a number of benefits:
1. We can git rid of the rest of our special-cases for handling images
and samplers in function arguments. Now that they're treated as
structs at the glsl_type level, the generic paths can handle images
and samplers.
2. We can now construct composite values containing images and samplers
internally. It's unclear from the SPIR-V spec whether or not this
is allowed and it's not a pattern that GLSLang currently generates
thanks to GLSL rules. However, if we do start seeing SPIR-V that
contains such composites, we should now be able to handle it.
3. SPIR-V OpNull and OpUndef instructions can now create samplers,
images, and sampled images. The NIR generated won't likely be fully
valid but, given a NIR pass to do something sensible, it should be a
thing we can compile.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
There are a few cases, atomic counters being one example, where the type
used by vtn_ssa_value is not the same as the type we want NIR to use in
derefs and variables. To solve this, we add a helper which converts
between the types for us. In the next commit, we'll be adding another
major user of this: images and samplers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
Primarily, we check for two things:
1. That we only ever add SSA values via vtn_push_ssa_value and
vtn_copy_value.
2. That the type of the SSA value matches the SPIR-V destination type.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
Previously, we created our vtn_ssa_value in _vtn_variable_load_store
manually as we did the recursive load/store. Instead, we now create the
SSA value before calling into the recursive function. This is a tiny
bit less efficient but it removes a case of hand-rolling vtn_ssa_value
creation. For symmetry, we make _vtn_block_load_store assume the value
is already created. Finally, we remove a trivial hand-rolled case in
vtn_composite_extract.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
For three different functions which create vtn_ssa_values, we had three
completely different implementations. This unifies them all to roughly
the same algorithm. While we're at it, we take advantage of the
nir_build_imm helper to avoid some extra code in vtn_const_ssa_value.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
The w++ is to handle a differences between the KHR extension and Vulkan
1.1 feature where the Vulkan 1.1 instructions take an scope parameter.
While incrementing w technically works, it's really subtle and very easy
to miss when reading the code.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
For some reason, we were doing a signed shift vectors and an unsigned
shift for scalars. We then plug it into i2b so it should make no
difference whatsoever. The fact that we're doing different things for
vectors vs. scalars is bonkers. Let's simplify the code a bit.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
The original implementation of SPV_EXT_descriptor_indexing was extremely
paranoid about the NonUniform qualifier, trying to fetch it from every
possible location and propagate it through access chains etc. However,
the Vulkan spec is quite nice to us on this and has very strict rules
for where the NonUniform decoration has to be placed. For image and
texture operations, we can search for the decoration on the spot when we
process the image or texture op. For pointers, we continue putting it
on the pointer but we don't bother trying to do anything silly like
propagate it through casts.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
8e1b75b3 introduced umax/umin in order to lower iand/ior for (n)eq zero.
That breaks the lower_int_to_float pass, because umax and umin weren't
handled there.
Tested with lima. The other users of nir_lower_int_to_float
(etnaviv, freedreno) should also have that issue.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6043>
This provides an alternate lowering for scratch in which it uses global
reads/writes and bases scratch addresses on a base pointer.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5927>
This way we can avoid some unnecessary conversions because there's no
need to sanitize to 0/1 for scratch.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5927>
Test cases:
opt_if_simplification - the most trivial test case.
opt_if_simplification_single_source_phi_after_if - tests that
opt_if_simplification correctly handles single-source phis after
the if, found in https://gitlab.freedesktop.org/mesa/mesa/-/issues/3282
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5945>
Consider the following case:
if ssa_1 {
block block_2:
/* succs: block_4 */
} else {
block block_3:
...
break
/* succs: block_5 */
}
block block_4:
vec1 32 ssa_100 = phi block_2: ssa_2
After block_3 extraction and reinsertion, phi->pred becomes invalid
and isn't updated by reinsertion since it is unreachable from block_3.
Call nir_opt_remove_phis_block before moving block to eliminate single
source phis after the if.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3282
Fixes: e3e929f8c3
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5945>
This adds a nir_unsigned_upper_bound() helper which does something similar
to nir_analyze_range() except it tries to obtain the largest possible
value instead of it's relation to zero.
It also adds nir_addition_might_overflow(), which uses this helper to try
to prove that an unsigned addition does not wrap around.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2720>
Fixes an issue with Renderdoc's shader debugging with ACO.
If nir_opt_algebraic isn't called in-between nir_lower_explicit_io and
nir_lower_int64, we can end up with 64-bit multiplications.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 6320e37d4b ('nir: add amul instruction')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5709>
This pulls in commit 63cb1fc131573fa from KhronosGroup/SPIRV-Headers
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5992>
If the SPIR-V had a shared+image memory barrier, we would emit two NIR
barriers: a shared barrier and an image barrier.
Unlike a single barrier, two barriers allows transformations such as:
intrinsic image_deref_store (ssa_27, ssa_33, ssa_34, ssa_32, ssa_25) (1)
intrinsic memory_barrier_shared () ()
intrinsic memory_barrier_image () ()
intrinsic store_shared (ssa_35, ssa_24) (0, 1, 4, 0)
->
intrinsic memory_barrier_shared () ()
intrinsic store_shared (ssa_35, ssa_24) (0, 1, 4, 0)
intrinsic image_deref_store (ssa_27, ssa_33, ssa_34, ssa_32, ssa_25) (1)
intrinsic memory_barrier_image () ()
This commit fixes two dEQP-VK.memory_model.* CTS tests with ACO.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5951>
The alignment can just be copied from the source intrinsic.
Fixes the assertion
nir_intrinsic_align_offset(instr) < nir_intrinsic_align_mul(instr)
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5949>
Fixes the two-sided-lighting and vertex-program-two-side piglit tests
on Panfrost.
v2: Use an existing variable for gl_FrontFacing if present.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Tested-by: Urja Rannikko <urjaman@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5915>
This is needed for handling drivers that use an input for loading the
face, for example Panfrost with Midgard GPUs.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Tested-by: Urja Rannikko <urjaman@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5915>
EXT_shader_framebuffer_fetch requires the fetched value to be per-sample, so we
need to load the sample id when in a fragment shader.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5930>
Whenever a struct type is decorated Block or BufferBlock we turn that
into a GLSL_TYPE_INTERFACE. Since these decorations can end up random
places, we should allow them for constants.
Closes: #3252
Fixes: 9d0ae777dd "spirv: Use interface type for block and buffer..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5855>
the final output of gl_Position needs this transform, and geometry shaders
must write this value for stream 0 if rasterization is enabled
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5851>
Otherwise a VS doing the following:
out gl_PerVertex {
vec4 gl_Position;
int gl_ViewportIndex;
};
cannot be compiled because of the following error:
"redeclaration of gl_PerVertex must be a subset of the built-in
members of gl_PerVertex"
v2: add GLSL_PRECISION_HIGH param to add_varying() for "gl_Layer" in
generate_fs_special_vars.
v3: add GLSL_PRECISION_HIGH param to add_varying() for "gl_Layer" in
generate_varyings.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2946
Tested-by: John Galt <johngalt@fake.mail>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v3)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5167>
EXT_shader_image_load_store says:
The format of the image unit must be in the "1x32" equivalence class
otherwise the atomic operation is invalid.
ARB_shader_image_load_store says:
We will only support 32-bit atomic operations on images
Fixes: fc0a2e5d01 ("glsl: add EXT_shader_image_load_store new image functions")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5688>
In the case where SSA use/def chains are broken, NIR prints out a very
cryptic error and then aborts. This abort happens during validation
rather than after the print is complete, hiding any other errors that
may have been found. One might think, "So what? Fix your use/def issue
first." However, what makes this especially bad is that, when use/def
chains are broken, there's usually a much nicer error inline in the
shader that would have been printed had we not aborted early so the
current behavior simply ensures you get the most cryptic error possible
in an already difficult-to-debug case.
While we're at it, we remove the one other case of abort() which is in
the validation of phi instruction sources.
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5809>
this is needed for zink and other drivers which can support fragcolor but
not fragdata and want to correctly handle EXT_multiview_draw_buffers
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5687>
This is not needed for lowering expressions, because they always work with
basic types, but it will be needed for lowering variables.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5746>
The first intrinsic is intended to expose the value set by glLineWidth
to shaders internally. The second intrinsic exposes the value actually
sent to the hardware. This may be wider than the first one in order to
implement anti-aliasing. These will be used in later patches to
implement a line smoothing lowering pass.
v2: Add a second intrinsic for the expanded line width for
anti-aliasing.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>
The line coord is a coordinate along the axis perpendicular to the line.
It is in the range [0,1] between the two edges of the line. It is
available at least on Broadcom hardware.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>
No drivers are using this anymore so we can delete it and not keep
maintaining this legacy code-path. If any drivers want this in the
future, they should use nir_lower_varst_to_explicit_types followed by
nir_lower_explicit_io.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>
For turnip, we use the "bindless" model on a6xx. Loads and stores with
the bindless model require a bindless base, which is an immediate field
in the instruction that selects between 5 different 64-bit "bindless
base registers", a 32-bit descriptor index that's added to the base, and
the usual 32-bit offset. The bindless base usually, but not always,
corresponds to the Vulkan descriptor set. We can handle the case where
the base is non-constant by using a bunch of if-statements, to make it a
little easier in core NIR, and this seems to be what Qualcomm's driver
does too. Therefore, the pointer format we need to use in NIR has a vec2
index, for the bindless base and descriptor index. Plumb this format
through core NIR.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5683>
Add the possibility to specify the source components. This is necessary
to let the UBO/SSBO index have more than one component, and it also lets
us remove a few hand-rolled load intrinsic definitions.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5683>
a pass which rewrites gl_ClipDistance[n] to an undef if the corresponding
clip plane is disabled in the rasterizer state
this pass is needed for zink to handle api disables of clip planes
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5529>
Dot product is multiplication followed by addition, and absolute value
does not distribute into addition.
Only vec4 platforms are affected by this change as scalar-only platforms
never have any of the fdot_replicated instructions. In the shader-db
results, below, shaders in MANY different applications are affected.
Trine, Doom3, Enemy Territory: Quake Wars, Counter Strike: Global
Offensive, Mad Max, Metro Last Light, and on and on... I'm really
shocked that there were no test regressions!
All Haswell and earlier platforms had similar results. (Haswell shown)
total instructions in shared programs: 16219743 -> 16219820 (<.01%)
instructions in affected programs: 12171 -> 12248 (0.63%)
helped: 1
HURT: 78
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.78% max: 0.78% x̄: 0.78% x̃: 0.78%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.35% max: 2.38% x̄: 0.91% x̃: 1.06%
95% mean confidence interval for instructions value: 0.92 1.03
95% mean confidence interval for instructions %-change: 0.78% 1.00%
Instructions are HURT.
total cycles in shared programs: 538481383 -> 538491045 (<.01%)
cycles in affected programs: 470796 -> 480458 (2.05%)
helped: 149
HURT: 142
helped stats (abs) min: 1 max: 1338 x̄: 71.13 x̃: 4
helped stats (rel) min: 0.06% max: 40.99% x̄: 2.76% x̃: 0.67%
HURT stats (abs) min: 1 max: 2092 x̄: 142.68 x̃: 12
HURT stats (rel) min: 0.07% max: 55.38% x̄: 5.07% x̃: 1.07%
95% mean confidence interval for cycles value: -5.28 71.69
95% mean confidence interval for cycles %-change: -0.07% 2.19%
Inconclusive result (value mean confidence interval includes 0).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 62795475e8 ("nir/algebraic: Distribute source modifiers into instructions")
Closes: #3129
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5581>
In a piglit run on s390 a lot of double tests fail, explicitly
packing/shifting things rather than using memcpy seems to help
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5679>
If we have code like:
('f2f16', ('vec2', ('f2f32', 'a@16'), '#b@32'))
We would like to eliminate the conversions, but the existing rules can't
see into the the (heterogenous) vector. So instead of trying to
eliminate in one pass, we add opts to propagate the f2f16 into the
vector. Even if nothing further happens, this is often a win since then
the created vector is smaller (half2 instead of float2). Hence the above
gets transformed to
('vec2', ('f2f16', ('f2f32', 'a@16')), ('f2f16', '#b@32'))
Then the existing f2f16(f2f32) rule will kick in for the first component
and constant folding will for the second and we'll be left with
('vec2', 'a@16', '#b@16')
...eliminating all conversions.
v2: Predicate on !options->vectorize_vec2_16bit. As discussed, this
optimization helps greatly on true vector architectures (like Midgard)
but wreaks havoc on more modern SIMD-within-a-register architectures
(like Bifrost and modern AMD). So let's predicate on that.
v3: Extend for integers as well and add a comment explaining the
transforms.
Results on Midgard (unfortunately a true SIMD architecture):
total instructions in shared programs: 51359 -> 50963 (-0.77%)
instructions in affected programs: 4523 -> 4127 (-8.76%)
helped: 53
HURT: 0
helped stats (abs) min: 1 max: 86 x̄: 7.47 x̃: 6
helped stats (rel) min: 1.71% max: 28.00% x̄: 9.66% x̃: 7.34%
95% mean confidence interval for instructions value: -10.58 -4.36
95% mean confidence interval for instructions %-change: -11.45% -7.88%
Instructions are helped.
total bundles in shared programs: 25825 -> 25670 (-0.60%)
bundles in affected programs: 2057 -> 1902 (-7.54%)
helped: 53
HURT: 0
helped stats (abs) min: 1 max: 26 x̄: 2.92 x̃: 2
helped stats (rel) min: 2.86% max: 30.00% x̄: 8.64% x̃: 8.33%
95% mean confidence interval for bundles value: -3.93 -1.92
95% mean confidence interval for bundles %-change: -10.69% -6.59%
Bundles are helped.
total quadwords in shared programs: 41359 -> 41055 (-0.74%)
quadwords in affected programs: 3801 -> 3497 (-8.00%)
helped: 57
HURT: 0
helped stats (abs) min: 1 max: 57 x̄: 5.33 x̃: 4
helped stats (rel) min: 1.92% max: 21.05% x̄: 8.22% x̃: 6.67%
95% mean confidence interval for quadwords value: -7.35 -3.32
95% mean confidence interval for quadwords %-change: -9.54% -6.90%
Quadwords are helped.
total registers in shared programs: 3849 -> 3807 (-1.09%)
registers in affected programs: 167 -> 125 (-25.15%)
helped: 32
HURT: 1
helped stats (abs) min: 1 max: 3 x̄: 1.34 x̃: 1
helped stats (rel) min: 20.00% max: 50.00% x̄: 26.35% x̃: 20.00%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 16.67% max: 16.67% x̄: 16.67% x̃: 16.67%
95% mean confidence interval for registers value: -1.54 -1.00
95% mean confidence interval for registers %-change: -29.41% -20.69%
Registers are helped.
total threads in shared programs: 2471 -> 2520 (1.98%)
threads in affected programs: 49 -> 98 (100.00%)
helped: 25
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.96 x̃: 2
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for threads value: 1.88 2.04
95% mean confidence interval for threads %-change: 100.00% 100.00%
Threads are [helped].
total spills in shared programs: 168 -> 168 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 186 -> 186 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4999>
GLSL shares functionality with ARB_vertex_program but the GLSL
spec defines the gl_LightSource builtin with a member order that
is different from the packing expected in ARB_vertex_program.
This difference introduces a need for specialist lowering code
when handling builtin structs that is not required for normal
uniform structs due to member location mismatches.
Since gl_LightSource can't be redefined it shouldn't matter if
we add the members in the order listed in the spec, just so long
as we add them all. So here we rearrange the definition of the
glsl builtin to reflex our internal layout and that of
ARB_vertex_program. This required for the following patch.
CC: <stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5656>
nir_load_store_vectorize_test.ssbo_load_adjacent_32_32_64_64 expectations
need to be fixed accordingly.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5589>
The load_per_vertex_{input,output} intrinsics simply mean that they're
reading an arrayed input/output, which have one element per invocation.
Most accesses to those use gl_InvocationID as the subscript. However,
it's totally possible to read any element of the array. For example,
an evaluation shader might read gl_in[2].gl_Position, or a control
shader might read output[0].
For threads processing a single patch, an input/output load is
convergent if and only if both sources (the per-vertex-array subscript
and the offset) are convergent. For threads processing multiple
patches, we continued to mark them divergent.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5613>
Previously all nir_intrinsic_load_uniform that were used as sources were
considered to be dynamically_uniform but when offsets of load_uniform
are indirect it can not be determined.
This fixes artefacts in Google Maps 3D view in V3D.
Fixes: 886d46b089 ("nir: Add a function to determine if a source is dynamically uniform")
Reviewed-by: Neil Roberts <nroberts@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5587>
The GLSL to NIR compiler supports the LowerTessLevel flag to convert
gl_TessLevelInner/Outer from their GLSL declarations as arrays of
floats to vec4/vec2s to better match how they are represented in
hardware.
This commit adds the similar support to the SPIR-V to NIR compiler so
turnip can use the same IR3/NIR tess lowering passes as freedreno.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5059>
This commit adds a tess_levels_are_sysvals flag to
spirv_to_nir_options similar to GLSLTessLevelsAsInputs in the GLSL to
NIR compiler options. This will be used by turnip as the tess IR3
lowering pass (ir3_nir_lower_tess) operates on TessLevelInner and
TessLevelOuter in the DS as sysvals.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5059>
The scheduler has code to handle hardware that shares the same memory
for inputs and outputs. Seeing as the specific stages that need this is
probably hardware-dependent, this patch makes it a configurable option
instead of hard-coding it to everything but fragment shaders.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
nir_deps_state is a struct used as a closure for calculating the
dependencies. Previously it had two fields copied out of the scoreboard.
The closure is initialised in two seperate places. In order to make it
easier to access other members of the scoreboard in the callbacks in
later patches, the closure now just contains a pointer to the scoreboard
and the two bits of copied information are removed.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
load_per_vertex_input should probably be handled in the same way as a
regular load_input. I think the nir_schedule pass was written before V3D
had geometry shader support, so that is probably why it hasn’t taken
this into account until now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
This case is never hit, we don't have a nir intrinsic for this spirv
opcode. And when we do, I'm not sure if it would be vectorized or not.
So best just to drop this case.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5505>
The fix in the previous patch removed an erronous attempt to skip
resizing variable types in each stage. Now that has been removed
iterating over each shader stage is no longer required here.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5487>
The initial support tried to match uniform variables from different
shaders based on the variables pointer. This will obviously never
work, instead here we use the variables name whcih also means we
must disable this optimisation for spirv.
Using the base variable name works because when collecting uniform
references we never iterate past the first array dimension, and
only support resizing 1D arrays (we also don't support resizing
arrays inside structs).
We also drop the resized bool as we can't skip processing the var
just because is was resized in another shader, we must resize
the var in all shaders.
Fixes: a34cc97ca3 ("glsl: when NIR linker enable use it to resize uniform arrays")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3130
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5487>
All the other driver-specific intrinsics are at the end of the file so
Intel's should go there too.
Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5503>
Validate that num_components is only set for vectorized instructions, to
prevent other nir passes or driver backends from mistakenly relying on
num_components for non-vectorized instructions.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5371>
Of the possible intrinsics generated, only load_ssbo is vectorized (and
store_ssbo is never generated)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5371>
It's tricky to merge XFB-outputs correctly, because we need there to not
be any overlaps when we get to `nir_gather_xfb_info_with_varyings` later
on. We currently trigger an assert there if we end up merging here.
So let's not even try. This is an optimization, and we can optimize this
in safe cases later if needed. For now, let's play it safe.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5329>
This seems valid per the SPIR-V spec to use OpSampledImage with
OpUndef instead of OpTypeImage or OpTypeSampler. When the image
operand is undefined, SPIRV->NIR emits an undef instruction that
can be removed later by the compiler.
This fixes shader compilation crashes with Red Dead Redemption II.
Cc: mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5230>
When handling function inputs the optimisation pass incorrectly
assumes the inputs are undefined. Here we simply change things to
assume inputs have always been assigned a value. Any further
optimisations will be taken care of once function inlining takes
place.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2984
Fixes: 65122e9e80 ("ir_constant_variable: New pass to mark constant-assigned variables constant.")
Reviewed-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5413>
There are some passes which really work on the shader level and it's
easier if we have a helper which preserves metadata on the whole shader.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5171>
Depth and stencil writes are combined with color writes, so we need
this intrinsic which has sources for color, RT, depth and stencil.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5065>
In case shader contains two equal macro defines, first one with trailing spaces
and the second one without.
`#define A 1 `
`#define A 1`
The parser crashes
Fixes: 0346ad3774 ("glsl: ignore trailing whitespace when define redefined")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5312>
NVIDIA hardware doesn't have an equivilant to bfi, but we do already have
a lowering for bitfield_insert->bfi.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5373>
The combination must stop when we see a scoped barrier that have
execution scope, i.e. it has control barrier behavior. The code was
mistakenly looking at the wrong scope.
Fixes: 345b5847b4 ("nir: Replace the scoped_memory barrier by a scoped_barrier")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5365>
Fixes: 3ed2123d77 ("spirv: Use scoped barriers for SpvOpControlBarrier")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5365>
Fixes: 345b5847b4 ("nir: Replace the scoped_memory barrier by a scoped_barrier")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5365>
Make sure to propagate the NON_UNIFORM access for UBO loads, so
that non-uniform loads are correctly lowered.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5311>
asin(x) is now implemented using a piecewise approximation, which
improves the precision for |x| < 0.5
Previously, we were using a polynomial approximation for both the
asin() and acos() functions. Unfortunately, for asin(), this polynomial
does not have enough precision to satisfy the Vulkan CTS requiremenents,
which define the asin() precision based on the precision of
atan2(x, sqrt(1.0 - x*x)). The piecewise approximation gives the needed
precision in the problematic range.
v2: Skip the piecewise approximation for acos
Closes: #1843
Acked-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3809>
For shaders where there's already a psiz-variable, we should rather
reuse it than create a second one. This can happen if a shader writes
gl_PointSize, but disables GL_PROGRAM_POINT_SIZE.
Fixes: 878c94288a ("nir: add lowering-pass for point-size mov")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5328>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5318>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5318>
Needed by the next patch, for c++ code which is more strict about
conversions between integers and enums.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5318>
Here we turn on uniform array resizing in the NIR linker and disable
the GLSL IR resizing pass when the NIR linker is enabled.
This will potentially make uniform arrays smaller due to NIR
optimising away more uniform uses.
Shader-db results (SKL):
total instructions in shared programs: 14947192 -> 14944093 (-0.02%)
instructions in affected programs: 138088 -> 134989 (-2.24%)
helped: 822
HURT: 4
total cycles in shared programs: 324868402 -> 324794597 (-0.02%)
cycles in affected programs: 3904170 -> 3830365 (-1.89%)
helped: 2333
HURT: 1485
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4910>
We want to gather information for all stages here before the main
linking loop. In the following patch we will use to information
to reduce the size of uniform arrays where possible.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4910>
This will be used to reduce the size of uniform arrays and replace
the current glsl ir pass. Doing this in NIR allows us to better
optimise the size of uniform arrays.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4910>
If use_scoped_barrier is set to true, we don't have to split the control
and memory barriers.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4900>
SPIRV OpControlBarrier can have both a memory and a control barrier
which some hardware can handle with a single instruction. Let's
turn the scoped_memory_barrier into a scoped barrier which can embed
both barrier types. Note that control-only or memory-only barriers can
be supported through this new intrinsic by passing NIR_SCOPE_NONE to the
unused barrier type.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4900>
We are about to add support for scoped control+memory barriers. Let's
move the convert from SPIRV to NIR enums logic in helpers so we can
easily re-use them.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4900>
This is now possible as we do uniform linking via a nir based linker.
Shader-db results for IRIS (SKL):
total instructions in shared programs: 14947192 -> 14946397 (<.01%)
instructions in affected programs: 39498 -> 38703 (-2.01%)
helped: 230
HURT: 18
total cycles in shared programs: 324868402 -> 324847058 (<.01%)
cycles in affected programs: 706701 -> 685357 (-3.02%)
helped: 599
HURT: 449
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4797>
This helper reflects the rules we follow in the GLSL IR linker when
deciding if we can remove a dead uniform. This check is required to
avoid regressions when turning on NIR dead uniform clean up in the
following patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4797>
This allows us to do API specific checks before removing variable
without filling nir_remove_dead_variables() with API specific code.
In the following patches we will use this to support the removal
of dead uniforms in GLSL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4797>