Different variables need to respect different bounds. In general,
16-bytes is okay, but for 4-channel 16-bit vectors, we can't cross 8
byte boundaries (else the swizzles will not be packable after), so we
update LCRA to allow this more general form.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5151>
We can just set the register mode appropriately and then we don't have
to care anywhere else, and there's no extra NIR to chew through. Make
sure we include sqrt too.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5151>
We can pass it all off to emit time, and let the types in the IR do the
heavylifting in the meantime, which is a lot easier to get right.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5151>
Symmetric with vector mods, except for normal which is packed as
sign-extend. (flag 2 never seen in the wild)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5151>
Rather than recompute liveness every block, compute it just once for the
whole shader, which ends up more efficient.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5123>
Rather than O(N) each call, we can precompute the whole set - also O(N)
- and then subsequent checks are O(1).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5123>
The test and rewrite were both accidentally O(N) to the shader size when
they should be O(1), so overall this takes the pass from O(N^2) to O(N).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5123>
With this, we may remove all invert passes and simply look at the src
modifier on NIR->MIR and fixup at pack time. No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5102>
We have to restructure to ensure NIR->MIR does not mutate the NIR and to
allow passing around dest/outmods for the new helpers. If NIR->MIR were
better designed this would be easier. Sigh.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5102>
Corresponds roughly to what we analyze. Note that "terminate AND
execute" is a contradiction (rather: it's equivalent to just
terminating), hence why there are only three possibilities for the
states of the flags:
.cont = continue, don't execute
.last = don't continue, don't execute
.cont.last = continue and execute
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5014>
This fixes the sky being red in OpenMW, as well as some of the Mesa
demos using shadows (shadowtex, shadow_sampler).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4997>
The new option replaces the two other _split lowering options, since
there's no need for separate options.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4738>
This moves the fi_types to a new mesa_private.h and removes the
imports.c file. The vast majority of this patch is just removing
pound includes of imports.h and fixing up the recursive includes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>
This isn't ISA/compiler specific, it's just looking at the NIR. So let's
move it from midgard to pan_assemble.c so it runs for Bifrost too.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4505>
Let's make it clear what includes are being added everywhere, so that
they can be cleaned up.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4360>
We can move e v e n more code to be shared and let bi_block inherit from
pan_block, which will allow us to use the shared data flow analysis.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4150>
Ideally we would sync the compilers to use the same indexing scheme but
that's a lot more Midgard refactoring than I have time for right now.
This is good enough honestly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4150>
We'd like to share this big chunk of code with Bifrost but that requires
removing the compiler_context parameter... which is totally unused in
fact!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4150>
Midgard has the ability to calculate addresses as part of the load/store
pipeline. We'd like to make use of this to avoid doing this work on the
ALU pipes. To do so, when emitting globals/SSBOs/shareds, we walk the
tree looking for address arithmetic to try to parse out something the
hardware can work with, letting the original instructions be DCE'd
ideally. This analysis is done at the NIR level to properly account for
some messy details of vectorization which we'd rather not poke at the
backend level. (Originally I wrote this as a MIR pass but I'm fairly
sure it was wrong.)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3978>
The swizzles are as-if they were 32-bit regardless of the bitness of the
operation, but the source sizes can and do change depending on the
flags. Account for this in the analysis.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3978>
Many load/store ops (used for globals, SSBOs, shared memory, etc) have
the ability to compute addresses directly. Mark off which ones behave
like this.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3978>
When mixing 32/64-bit, we need to align the 32-bit registers to get the
required alignment. This isn't quite enough yet, though, since user
swizzles could bypass and will need to be lowered to 32-bit moves
(outstanding todo).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3978>
Shared memory is implemented almost identically to global memory from an
ISA perspective, so let's handle the new intrinsics. We include a code
path for constant offsets, which doesn't come up for globals.
Fixes dEQP-GLES31.functional.compute.basic.shared_var_single_invocation
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3775>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3775>
We route it as a sysval. Fixes dEQP-GLES31.functional.compute.basic.ssbo_unsized_arr_single_invocation
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3775>
We need to lower SSBOs to globals regardless. Rather than do this in our
backend like we do now, use the common NIR pass, which will support
bounds checking.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3775>
Just use a dummy name instead.. we can't know a priori what type an
unknown op will consume, but we don't want to dereference a null
pointer.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 24360966ab ("panfrost/midgard: Prettify embedded constant
prints")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
Probably harmless but the w component doesn't appear valid so let's
match the blob... one less bit to be nervous about.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
We're so close, again marking off a few edge cases is enough to allow us
to omit this data entirely. Woot!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
As long as we can disambiguate a few edge cases, we can imply next tags
entirely which cleans up the disassembly a fair bit (though not as much
as implying tags entirely would -- we'll get there!)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
We unify disparate metadata about tags into a single structure to ensure
information is not left out.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
This comes up as a `return;` instruction in a compute shader. We need to
use the special tag 1 to signify "break". Fixes numerous
INSTR_INVALID_ENC faults in dEQP-GLES31.functional.compute.basic.*
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
Barriers execute on the texture pipeline on Midgard, so let's
tentatively handle barrier() as conservatively as possible (forcing
memory barriers of both buffers and shared memory). Implementation isn't
quite there yet -- it doesn't look at interactions of adjacent barriers
like it's supposed to -- but the core is there.
Fixes dEQP-GLES31.functional.compute.basic.ssbo_local_barrier_single_invocation
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
This patch fix these build errors with GCC 10.
/usr/bin/ld: src/gallium/drivers/panfrost/libpanfrost.a(pan_resource.c.o):src/panfrost/midgard/midgard_compile.h:52: multiple definition of `pan_sysval'; src/gallium/drivers/panfrost/libpanfrost.a(pan_screen.c.o):src/panfrost/midgard/midgard_compile.h:52: first defined here
/usr/bin/ld: src/gallium/drivers/panfrost/libpanfrost.a(pan_resource.c.o):src/panfrost/midgard/midgard_compile.h:68: multiple definition of `pan_special_attributes'; src/gallium/drivers/panfrost/libpanfrost.a(pan_screen.c.o):src/panfrost/midgard/midgard_compile.h:68: first defined here
Fixes: 7e8de5a707 ("panfrost: Implement system values")
Fixes: 306800d747 ("pan/midgard: Lower gl_VertexID/gl_InstanceID to attributes")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3752>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3752>
ZS fragment stores are done like color fragment stores, except it's
using a different RT id (0xFF), the depth and stencil values are stored
in r1.x and r1.y.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
[Fix the scheduling part]
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3697>
Midgard can't write depth and stencil separately. It has to happen in
a single store operation containing both. Let's add a panfrost specific
intrinsic and turn all depth/stencil stores into a packed depth+stencil
one.
Note that this intrinsic is not yet handled in emit_intrinsic(), but
we'll address that later.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3697>
Allocate those instructions with ralloc() instead of using mem_dup.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3676>
There's a writeout bool storing the result of this test. Use it instead
of duplicating the test.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3676>
Let's lower bitfield extract to shifts until we figure out if it can be
natively supported.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3676>
nir_intrinsic_base() is assigned nir_variable.data.driver_location,
which is assigned a unique ID based on the variable position in the
shader variable list. There's no guarantee that this position will
match the RT id we want to pass to emit_fragment_store().
Add a search_var() helper to retrieve a nir_variable based on its driver
location, so we can pass the right RT value to emit_fragment_store().
We also make sure the shader output is color, since emit_fragment_store()
is not ready for depth/stencil stores yet.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3676>
We are about to add a special ZS render target, let's add a enum so we
can easily know which render target we're dealing with.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3676>
For constant packing, this is interesting to break down further.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3653>
To avoid hitting the assert in the default case, add a nop for this
intrinsic.
dEQP-GLES3.functional.transform_feedback.random.interleaved.lines.3
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3625>
Until now, embedded constants were printed as all 32 bits integer or
floats, but the compiler can pack constant from different types if
severa instructions with different reg_mode and native type refer to
the constant register. Let's implement something smarter so users don't
have to do a manual conversion when looking at a trace.
Note that 8-bit constants are not decoded yet, as we're not sure how
the writemask is encoded in that case.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3536>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3536>
This way we can convert an 8-bit writemask (Midgard specific
representation) into the more common 1-bit/component representation.
8-bit mode is not supported yet, as we're not sure how the writemask is
encoded for this mode.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3536>
...and that non-barriers don't use a barrier tag. It's not clear what
the difference means quite yet, though.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3580>
We don't need to print all the usual texture noise; just the relevant
fields and the rest can be guarded to zero.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3580>
The 64 bit converter cases were missing, add them now.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3478>
Branch instructions should not be treated as regular ALUs.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3478>
Those size conversion operations work the same way apart from f2f
using an fmov op code and u2u using an imov. Let's handle them in the
same case block to avoid code duplication.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3478>
mir_constant_float() assumes we're dealing with 32-bit integers/floats,
which is only the case if reg_mode is equal to midgard_reg_mode_32.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3478>
Right now, constant combining is not supported in 16 bit mode, and 64
bit mode is simply ignored. Let's rework the function to make it
type/bit-size agnostic.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3478>
Each instruction bundle can contain up to 16 constant bytes. The meaning
of those byte is instruction dependent: it depends on the instruction
native type (int, uint or float) and the instruction reg_mode (8, 16, 32
or 64 bit). Those different layouts can be exposed as a union to
facilitate constants manipulation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3478>
Corner case causing invalid scheduling on shaders with nested csels,
i.e. GLSL code resembling:
(foo ? bool1 : bool2) ? x : y
By explicitly disallowing csels this is fixed.
Fixes INSTR_INVALID_ENC on a glamor shader (noticeable with slowdown and
visual corruption when scrolling "too far" on GTK apps).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3463>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3463>
The current logic considers that the nir_intrinsic_component(store_intr)
encodes the source components start, but it actually encodes the
destination one. Source component offset adjustment is taken care of in
install_registers_instr(), when offset_swizzle() is called.
This fixes dEQP-GLES2.functional.shaders.random.all_features.fragment.45
when PAN_MESA_DEBUG=deqp (looks like exposing GLES3 features has an
impact on the varyings layout).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3429>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3429>
Currently the schedule_program implementation being used is picked
at compile time, which on the Android platform means that the
bifrost compiler & scheduler is used for all targets, including
midgard based hardware.
This commit disambiguates between the two schedule_program functions.
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We need to mindful that we don't clobber the shadow comparator.
Fixes dEQP-GLES3.functional.shaders.texture_functions.texture.sampler2darrayshadow_*
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
They need a very particular form; the naive way we did before is not
sufficient in practice, it doesn't look like. So let's follow the rough
structure of the blob's writeout since this is fixed code anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This still may not be perfect (in the sense that legal shaders might
still get cut off) but this fits how writeout is done with both Panfrost
and the blob, so it's good enough for what we need and allows MRT
shaders to be sanely disassembled.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's a long story... but we'd try to insert constants that weren't there
and end up clobbering fields in the bundle following the constant
array...
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Remove the invert on arguments to branches, and invert the branch
condition instead. This saves one instruction per inverted argument.
Closes#2088
Signed-off-by: Afonso Bordado <afonsobordado@az8.co>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Since we have a separate blend shader for each render target, let's
simplify this structure and reduce the options memory footprint by 88%
or something goofy like that.
Should also enable separate blending per render target.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We need to actually work out the varying format on demand, rather than
assuming rgba32f.
Fixes dEQP-GLES3.functional.fragment_out.basic.int.*
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We have special records for these, put in a fixed location by convention
per the blob.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
These show up in some blend shaders. Let's use the shared lowering and
remove our own.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We need to reshuffle to sync up the shadow coordinate temporary with the
cubemap coordinate temporary. Once that's in place, it's simple enough
(we load the shadow coordinate into .z like 2D).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
My latest divination spell has uncovered a pattern in the aether.
Although the swizzle is unaligned, its format is otherwise standard.
Document this, removing the old incorrect understanding of the swizzle
(which coincided on common special swizzles only).
Fixes dEQP-GLES3.functional.shaders.texture_functions.texelfetchoffset.sampler2d_fixed_fragment
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We zero the extra components anyway. Fixes
dEQP-GLES3.functional.shaders.texture_functions.texelfetch.sampler2d_fixed_fragment
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We may call it with sentinel values (~0 in particular) corresponding to
unused arguments; ignore these.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Due to the succeeding break we would fall into some off-by-one errors.
These should be resolved now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This is a hack since we do have native gradient stuff, but for the
moment I'm more interested in conformance and the lowered code is good
enough. Fixes
dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler2d_fixed_fragment
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3169>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3169>
This regressed since we implemented RECT textures natively, oops.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3169>
We'll reuse some of this code for shadow samplers, which are represented
by a distinct source in NIR.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3125>
This cuts down the number of random environmental variables we need
flying around; now PAN_MESA_DEBUG=precompile is sufficient and
MIDGARD_MESA_DEBUG=shaderdb will be implied.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3125>
I'm honestly unsure what this is for, but it's needed on MFBD systems
for unknown reasons, at least when MRT is actually in use and then
sometimes without MRT (it fixes a blend shader issue on T760?)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
Epilogues are special fixed-function blocks, so they need special
handling for liveness analysis to work completely. This in turns fixes
RA issues for many shaders using MRT.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
The flow is considerably more complicated. Instead of one writeout loop
like usual, we have a separate write loop for each render target. This
requires some scheduling shenanigans to get right.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
This is a branch, like discard, so we need a barrier to make it safe.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
We have to key the blend shader for the render target number due to
writeout silliness.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
Due to this issue we were using 4x the memory we should have for TLS,
which was messing up the size calculations. Oops!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The swizzle on the conditional gets lost.
Fixes "horizontal mirroring" in godot. See
https://gitlab.freedesktop.org/mesa/mesa/issues/2108 which has attached
apitrace.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: d3b3daa9d3 ("pan/midgard: Use new scheduler")
Reported-by: Icecream95
I'm not totally sure why this would *break* things, but it's certainly
not necessary and it does break things. Somehow this gives the RA more
freedom, fixing some spill issues.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We would like no_spill decisions to be class-specific -- spilling from
special register to a work register doesn't preclude also spilling that
work register to stack.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows us to spill two 128-bit values in the same bundle, since we
have two registers we can spill with. This improves the
register allocation flexibility in programs with heavy spilling, though
unfortunately it isn't sufficient (theoretically, 3.5 128-bit values can
be spilled from 3 vector units and 2 scalar units).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This has been unused since the beginning since it's broken. Let's toss
it so it doesn't get in the way of further fixes. Bigger to fish to fry.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We do need some sort of a cost heuristic, but this one is just causing
spilling to behave worse on shaders I'm looking at, and I don't need
more noise in the spill implementation right now.
Get it working first. We can optimize this later.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Let's not worry about spilling twice in a bundle; that's too
restrictive. We'll need to change the schedule itself -- unfortunately,
this can have second-order effects due to pipeline registers.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Instead of having a giant function for both, split into the two
subtasks so we can handle errors better.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We move it to the register allocator itself. It doesn't belong in
midgard_schedule.c!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
As found by UBSAN, it should be harmless but it's good to remove any UB
so the tool's output is useful.
../src/panfrost/midgard/midgard_schedule.c:1094:9: runtime error: index -1 out of bounds for type 'midgard_instruction *[6]'"}
#0 0xad047872 in schedule_block ../src/panfrost/midgard/midgard_schedule.c:1094"}
#1 0xad04d41a in schedule_program ../src/panfrost/midgard/midgard_schedule.c:1116"}
#2 0xad031f98 in midgard_compile_shader_nir ../src/panfrost/midgard/midgard_compile.c:2588"}
#3 0xacf9874e in panfrost_shader_compile ../src/gallium/drivers/panfrost/pan_assemble.c:68"}
#4 0xacf6b268 in panfrost_bind_shader_state ../src/gallium/drivers/panfrost/pan_context.c:1960"}
#5 0xaae2596e in st_update_fp ../src/mesa/state_tracker/st_atom_shader.c:168"}
#6 0xaae12316 in st_validate_state ../src/mesa/state_tracker/st_atom.c:261"}
#7 0xaadc58c2 in prepare_draw ../src/mesa/state_tracker/st_draw.c:132"}
#8 0xaadc58c2 in st_draw_vbo ../src/mesa/state_tracker/st_draw.c:184"}
#9 0xabc4f924 in _mesa_validated_drawrangeelements ../src/mesa/main/draw.c:816"}
#10 0xabc50240 in _mesa_DrawElements ../src/mesa/main/draw.c:970"}
#11 0x73ebd2 in glu::CallLogWrapper::glDrawElements(unsigned int, int, unsigned int, void const*) (/deqp/modules/gles2/deqp-gles2+0x2d4bd2)"}
#12 0x6d86b2 in deqp::gls::FragOpInteractionCase::iterate() (/deqp/modules/gles2/deqp-gles2+0x26e6b2)"}
#13 0x494d16 in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad16)"}
#14 0x7f9cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
#15 0x7fa5f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
#16 0x7e1aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
#17 0x492d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
#18 0xb64b9aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Make sure that the fragment is complete when writing it out.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Just a bit of cleanup. lower_tex can do this lowering for us, which
should also eliminate some special cases (one less thing to fix if we
ever need texturing in tess/geom/etc, perhaps?)
Closes#2133
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
T720 and earlier need this workaround, so check the quirk before
lowering.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Corresponds to errata #10471, applies to T6xx and T720. Fixed in T760.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
We fetch the info with the new intrinsic and lower with ALU ops for txl
instructions, which seemingly correspond to "TEXGRD" instructions (what
we call textureLod).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
We can stuff this information in as parametrized system values, like we
currently do texture size and SSBO addresses.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Rather than open-coding checks on gpu_id in the compiler, let's track
quirks applying to whatever we're compiling for, to allow us to manage
the complexity of many heterogenous GPUs in the compiler.
It was discovered that a workaround used on T720 is also required on
T820 (and presumably T830), so let's fix this. This will also decrease
friction as we continue improving T720 support.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
A 'normal' texture op may be emitted in a vertex shader on T720 but it
still doesn't take any derivatives.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This simplifies manipulation of the offsets dramatically, fixing some
UBO access related bugs.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Eventually, we will want to combine constants across types, but for now
let's not break the world.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
64-bit ops have their own funky swizzles. Let's pack them, both for
native 64-bit sources as well as extended 32-bit sources.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
On newer GPUs, this is a no-op. On older GPUs, this prevents needless
spilling since texture registers are shared with a subset of work
registers.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
We have to disable the fixup.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
We use a different set of texture registers, probably to save hardware.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Early Midgard uses a different set of texture registers; let's not
hardcode.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Not much of a difference but slightly better and slightly less
arbitrary.
total instructions in shared programs: 3560 -> 3559 (-0.03%)
instructions in affected programs: 44 -> 43 (-2.27%)
helped: 1
HURT: 0
total bundles in shared programs: 1844 -> 1843 (-0.05%)
bundles in affected programs: 23 -> 22 (-4.35%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Pretty routine, we do have a hack to force swizzle alignment for !32-bit
for until we implement !32-bit the right way.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
While most load/store operations on 32-bit/vec4 intriniscally, some are
not and have special type-size-dependent semantics for the mask. We need
to convert into this native format.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
We can use the native Midgard ops for this, depending what chip we're
on.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
There are two versions of this opcode, depending what version of the ISA
you're using. I'm not sure if there's a semantic difference; I think
there might be some slight subtleties but it's too early to know at this
stage.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
There aren't texture pipeline registers anymore; instead, space is
shared with work and ldst registers for output and input respectively.
We need to shift the base registers to represent this correctly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The meaning of some bits shifts; we need to account for this to print
swizzles sanely.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We were using old style half-registers; let's update that to be
consistent, preparing us for more disassmbler changes in this area.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We would like to pack not just xyzw swizzles but also efgh swizzles.
This should work for vec4/16-bit. More work will be needed to pack
swizzles for vec8/16-bit and even more work for 8-bit, of course.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Midgard prefetches instructions based on tag (ALU, LD/ST, texture *
size). To do so, the shader descriptor specifies the tag of the first
instruction, all instructions specify the tag of the next linear
instruction is, and all branches explicitly specify the tag of the
branch target.
If you mess this up, you get an INSTR_TYPE_MISMATCH, which unambiguously
refers to this problem, but it's still annoying to try to work out all
the branch targets in your head to debug.
Instead, let's track the tags of various blocks over time, so we can
automatically validate tags of branch targets, to make
INSTR_TYPE_MISMATCH issues immediately obvious in a disassembly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than having hw-specific swizzles encoded directly in the
instructions, have a unified swizzle arary so we can manipulate swizzles
generically.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We want symmetry between loads and stores, so we add a dummy source. So
we get, e.g.
st_int4 _, val, arg_1, arg_2
ld_int4 dest, _, arg_1, arg_2
Semantically, this dummy source represents the data itself, as if the
load is simply a move. That means it has a swizzle that acts as a
source.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than supplying a mask/swizzle to compose with the original, just
supply the offset of the allocated register so we can directly offset
the mask/swizzle, without resorting to composition.
This is simpler, cleaner, and will generalize to non-32-bit.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
v2: make variable names snake_case
v2: minor cleanups in emit_udiv()
v2: fix Panfrost build failure
v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature
v4: remove nir_op_urcp
v5: drop nv50 path
v5: rebase
v6: add back nv50 path
v6: add comment for nir_lower_idiv_path enum
v7: rename _nv50/_llvm to _fast/_precise
v8: fix etnaviv build failure
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
We would like to eliminate not just entire dead instructions, but also
dead components, which increases scheduler flexibility (since some
vector instructions can become scalar after eliminating dead
components). This also will allow better RA in the future.
Results are meh.
total instructions in shared programs: 3453 -> 3451 (-0.06%)
instructions in affected programs: 60 -> 58 (-3.33%)
helped: 2
HURT: 0
total bundles in shared programs: 1826 -> 1824 (-0.11%)
bundles in affected programs: 33 -> 31 (-6.06%)
helped: 2
HURT: 0
total quadwords in shared programs: 3144 -> 3144 (0.00%)
quadwords in affected programs: 0 -> 0
helped: 0
HURT: 0
total registers in shared programs: 321 -> 321 (0.00%)
registers in affected programs: 45 -> 45 (0.00%)
helped: 11
HURT: 11
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 16.67% max: 50.00% x̄: 39.70% x̃: 50.00%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for registers value: -0.45 0.45
95% mean confidence interval for registers %-change: -1.87% 62.18%
Inconclusive result (value mean confidence interval includes 0).
total threads in shared programs: 445 -> 447 (0.45%)
threads in affected programs: 2 -> 4 (100.00%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows for vec16 dependencies in the scheduler, not that we have
any yet (thankfully).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Now that we have notion of byte masks, liveness tracking can be updated
to reflect this extra granularity without loss of correctness.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Read component masks don't have a particular type associated, since the
type of the ALU operation may not match the type of the operands in
question. So let's generate byte masks instead, and update the rest of
the compiler to use byte masks when analyzing reads.
Preparation for mixed types.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
There are essentially two formats of masks in play beginning with this
commit: masks per-channel and masks per-byte. The former make sense
within a given fixed-size instruction; the latter are
typesize-independent. It turns out you need the latter to meaningfully
manipulate instructions containing multiple sizes (which is quite
possible with ALU operations).
Similarly, we have mir_srcsize. We calculate the size of the source by
analyzing the size of the instruction itself and stepping down if there
is a half-modifier.
Finally, we have mir_round_bytemask_down, for when we want to take a
byte mask and "round it down" to a given component size, so that we can
use it as a component mask.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This will allow us to encode properties about the load/store ops like we
do for ALU ops. We include now properties about whether we have a store,
and if there are special cases on the load/store op. We also tag each
instruction by its natural size... this is probably not totally right,
but it's a start.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The trick is realizing even with a destination override, the masks are encoded in the same mode as the
instruction itself, rather than stepping down. The override means that
the smaller type is used, but the mask is parsed as if it were the
higher type. Overriding down is down by printed by blinding doing this. Overriding up can be thought of as printing in the upper size, but shifting the alphabet to use the upper half, i.e. shifting xyzw to become abcd.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Add some comments explaining what's going on in a more natural flow in
order to solve the actual bug.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 2d914ebe81 ("pan/midgard: Fix memory corruption in register spilling")
It doesn't make sense. You already spilled it once, and it didn't help.
Don't try again, or you'll end up in a loop.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Essentially an off-by-one error ... bit of an edge case, but seems to
occur in some glamor shaders.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We don't really need to impose this condition, but we do need to cope
with the slightly more general case.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Now that we have live_out calculated per block as metadata, calculating
liveness of an instruction at a given point in the program becomes O(n)
to the size of the block worst-case, rather than O(n) the program.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Callers should have liveness info ready. Ideally we'd have a nice
metadata tracking framework like NIR to handle this automatically, but
for now this will allow us to make forward progress... when we're about
to do something with liveness, invalidate everything ahead to force a
clean calculation.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This will allow us to explicitly invalidate liveness analysis results so
we can cache liveness results.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
By definition, once liveness analysis has occurred:
live_out = OR {succ} succ->live_in
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
There are unfortunately two distinct liveness analysis passes in the
compiler right now -- one good (but complex) pass used by RA based on
solving data flow equations, and one awful (but simple) pass used for
dead code elimination and bundling based on an abstract walk of the AST.
Let's move RA's pass into shared code so we can work on unifying.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows us to fill in ctx->temp_count explicitly, even if we haven't
squished down the MIR.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We already enforce this with the SSA/register distinction in the
backend. There is no need to duplicate this logic merely for an assert.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Now that we have constant adjustment logic abstracted, we can do this
safely. Along with the csel inversion patch, this allows many more
common csel ops to inline their condition in the bundle.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If we can reuse constant slots from other instructions, we would like to
do so to include more instructions per bundle.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If an instruction could be scheduled to vmul to satisfy the writeout
conditions, let's do that and save an instruction+cycle per fragment
shader.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>