ir3_shader_from_nir() calls ir3_optimize_nir(), which currently sets up
the const state. However, we need to know the number of user consts
reserved by the driver before setting up the const state, which means
that this information needs to be passed into ir3_shader_from_nir()
somehow rather than being set in the shader.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5500>
A bit cleaner than open coding the list manipulation. Plus I want to
use it in the next patch, rather than adding more open coded list
futzing.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
When a sequence of same instruction is encoded with repeat flag,
destination registers are written on successive cycles. Teach the
delay calculation about this.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
These two encodings are mutually exclusive. If the instruction is a
vector(ish) `(rptN)` instruction, then we can't fold a `(nopN)` post-
delay into it.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
In the `try_swap_mad_two_srcs()` case, valid_flags() gets called both
for the src that we want to try to fold, and for the other src that we
are trying to swap to make that possible. It can happen in the 2nd case
that a RELATIV src has already been folded. Since `ssa()` returns non-
null in both the `IR3_REG_SSA` and `IR3_REG_ARRAY` cases (in the later
case, it is the dependent array access that the current instruction
cannot be moved ahead of), we need to explicitly check that the src
reg we are looking at is still an SSA src.
Reported-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
It is better to use `nir_intrinsic_dest_components()` which also handles
the case of intrinsics with a fixed number of dest components.
Somehow this starts showing up with a nir_serialize round-trip with
shader-cache. But we really shouldn't have been relying on
`intr->num_components` directly.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5371>
The only reason for this dependency was the fd_bo used for the uploaded
shader. But this isn't used by turnip. Now that we've unified the
cleanup path from gallium, it isn't hard to pull the fd_bo upload/free
parts into ir3_gallium.
This cleanup has the added benefit that the shader disk-cache will not
have to deal with it.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5476>
ir3_nir_move_varying_inputs is broken when there a load input outside of
the first block which depends on the result of a previous load input.
This simplification/rework avoids the problem, and should also be faster.
Fixes this dEQP-VK test:
dEQP-VK.pipeline.multisample_interpolation.offset_interpolate_at_pixel_center.128_128_1.samples_2
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5465>
Teach RA to setup additional interference to prevent textures fetched
before the FS starts from ending up in a register that is too high to
encode.
Fixes mis-rendering in multiple playcanv.as webgl apps.
Note that the regression was not actually 733bee57eb8's fault, but
that was the commit that exposed the problem.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3108
Fixes: 733bee57eb ("glsl: lower samplers with highp coordinates correctly")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5431>
This is mainly the "piglit optimization" (ie, since piglit launches an
separate process for for each test). It was never wired up for a6xx,
and makes register class setup unnecessarily complicated. Remove it to
simplify the next patch.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5431>
Refactor a bit the limit checking in the bindless case, and add tex/samp
limit checking for the non-bindless case, to ensure we do not try to
prefetch textures which cannot be encoded in the # of bits available.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5431>
The GLES blob on the p3a limits constlen to 512 between VS and FS across
a6xx gpu ids (615, 630, 640, and 650). Experimentally, exceeding that
limit in any one stage results in rendering corruption or GPU hangs
(though my most detailed testing had a loop limit in a uniform, so that
may the cause of the hang). Clamp the limit we use inside of a shader so
we don't exceed it within a stage.
This commit doesn't resovle limiting inter-stage. Experimentally, I've
found that I can push up to a total of ~768 vec4s between VS and FS on
a630, with or without uniform updates between each draw. We'll need to do
some shader key-based limiting of constlen at draw time to respect that
limit, but that's left for future work, and this commit is enough for the
google earth case that initiated this work.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
It turns out the GL uniforms file is larger than the hardware constant
file, so we need to limit how many UBOs we lower to constbuf loads. To do
actual UBO loads, we'll need to be able to upload UBO 0's pointer or
descriptor.
No difference on nohw 1 UBO update drawoverhead case (n=35).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
Unlike other conditions which prevent early-discard of fragments, kill
does not prevent early LRZ test. Split `has_kill` from `no_earlyz` so
we can take advantage of this.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5298>
This allows us to do API specific checks before removing variable
without filling nir_remove_dead_variables() with API specific code.
In the following patches we will use this to support the removal
of dead uniforms in GLSL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4797>
This uses a meson builtin to handle -fvisibility=hidden. This is nice
because we don't need to track which languages are used, if C++ is
suddenly added meson just does the right thing.
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4740>
First element is not a scalar. Just initialize the struct like we do
elsewhere.
src/freedreno/ir3/disasm-a3xx.c:958:33: warning: suggest braces around
initialization of subobject [-Wmissing-braces]
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5174>
The closed GL driver uses resinfo on images with the writeonly flag (using
the texture-path's getsize only for readonly images). The closed vulkan
driver seems to use resinfo regardless. Using resinfo doesn't need any
fixups after the instruction. It also avoids one of the needs for the
TEX_CONST state for the image, which is awkward to set up in the GL
driver.
The new handler goes into ir3_a6xx to be next to the other current image
code, but the a4xx version is left in place because it wants a bunch of
sampler helpers.
Fixes assertion failure in dEQP-VK.image.image_size.buffer.readonly_32.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3501>
Since I'm going to start using the resinfo opcode, make sure we can disasm
the blob's instances of it that I've found. And, since resinfo disasm
will impact ldgb on pre-a6xx, include some of those too.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3501>
For cases where instructions have a src and/or dst type, validate that
it matches the src/dst register types. And for cases where there are
different opcodes for half vs full, validate that the opcode matches.
Now that we maintain this properly throughout the stages of the ir, we
can drop the fixups from the RA pass.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
Add some helpers to properly maintain src/dst types, and in the cases
where opcode depends on src or dst type, maintain that as well.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
When we start doing cp iteratively, we hit the case that we've already
`cmps.s.*` into a `cmps.s.ne p0.x, ...`.. when we try to do that again
we can invert the logic condition. So check specifically the condition
to prevent this.
TODO we could maybe be more clever about this to combine conditions.
But why isn't that happening in nir? For example, see
dEQP-GLES31.functional.ssbo.layout.single_basic_array.packed.bool
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
There can be multiple (for ex.) f32f16's from a single source, in
particular appearing in different blocks. We need to update all uses
of the src which had conversion folded in, not all the uses of the
individual cov. Also, to avoid invalidating the ssa use info that was
gathered at the beginning of the pass, don't actually eliminate the
cov, but instead change it to a simple mov that the cp pass can gobble
up.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
We have to fixup the meta:split half flag, because `ir3_split_dest()` is
called before we fixup the dest type. But we should fixup both the
split src and dest, as well as the thing it is splitting.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
If we're inserting a mov to resolve a conflict between meta:collect's
(ie. for .zyx type swizzles, etc), we should use the correct precision.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
It does pick up a few more cf/cp opportunities, according to sharder-db.
But don't think it will be measurable.
But this will allow some future simplification to cp by pulling out it's
internal iteration.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
For a6xx, since we use same VBO state for binning and VS, we need to
preserve potentially unused inputs. This needs to be done before DCE.
So move it before we add earlier DCE passes.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
Or do the easy thing and claim we always changed something. It is kinda
hard and not worth the effort to determine for real.
Also rip out unused error handling. This pass should never fail. And
we weren't even actually checking the return.
And while we're at it, switch over to taking the 'struct ir3 ir*`
instead of ctx, to standardize with the other passes.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
Later when we do this pass iteratively, we can drop some of the internal
iteration and just rely on this pass getting run until there is no more
progress.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
In a later patch, this will get folded into an IR3_PASS() macro, at
least for most passes. But to do that, it is better to standardize
on printing the ir3 after the pass.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
A3xx apparently has higher alignment requirements than later gens for
indirect const uploads. It also has fewer of them. Add compiler
parameters for both settings, and set accordingly for a3xx and a4xx+.
This fixes all the ubo test failures caused by this optimization.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5077>
This causes failures on a3xx resulting in the non-sensical dEQP failures
on packUnorm2x16. The same test uses ldlv on a4xx+, so just disallow
(sat) on bary.f on all generations.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5074>
The hardware expects that where MRT0 and MRT1 would normally go are the
dual sources for MRT0, whereas GLSL has an extra "index" parameter that
indicates which source it is. Remap it when handling FS outputs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5039>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
It seems less brittle to not assume they are in the same order for src
and dst instructions.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cleans up a couple spots that were still open-coding this.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
These intrinsics are supposed to map to the underlying hardware
instructions, which don't have wrmask. We use them when we lower
store_output in the geometry pipeline and since store_output gets
lowered to temps, we always see full wrmasks there.
It saves addressing math, but may cause multiple loads to be done and
bcseled due to NIR not giving us good address alignment information
currently. I don't have any workloads I know of using non-const-uploaded
UBOs, so I don't have perf numbers for it
This makes us match the GLES blob's behavior, and turnip (other than being
bindful).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4858>
With the upcoming LDC usage in the GL driver, we don't want to be
uploading descriptors for every UBO when they aren't actually in use.
Trimming NIR's num_ubos will avoid that, and cleans up num_ubo handling
elsewhere right now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4858>
Similar to what we do in postsched. It is useful for pre-RA sched to be
a bit aware of things that would cause syncs. In particular for the tex
fetches, since the vecN src/dst tends to limit postsched's ability to
re-order them.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>
If an instruction's only use is as an output, and it increases register
pressure, then try to avoid scheduling it until there are no other
options.
A semi-common pattern is `fragcolN.a = 1.0`, this pushes all these
immed loads to the end of the shader.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>
Similar to avoidance of `(ss)` syncs, it turns out to be helpful to
avoid `(sy)` syncs as well. This helps us turn an tex, (sy)alu, tex,
(sy)alu sequence into tex, tex, (sy)alu, alu, which is a big win in
gfxbench gl_fill2.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>
Once we schedule an instruction that will require an `(ss)` sync flag,
there is no need to delay any further instructions that consume an
SFU result (until the next SFU instruction is scheduled).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>
It seems for short frag shaders, too much prefetch can be detrimental.
I think what we *really* want to do is decide after pre-RA sched, when
we also know about nop's and what the actual ir3 instruction count is.
But that will require re-working how prefetch lowering works. For now
this is a super crude heuristic to attempt to approximate a good
solution.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>
We can no longer assume that `state->ranges[0]` is block 0. It *often*
is, but when we encounter a "real" ubo that we lower to `load_uniform`
before a block 0 `load_ubo`, it could end up another entry in the table.
Resulting in the second pass after gathering ubo ranges, not finding a
valid range. Which results in a `load_ubo` for a thing that is not
actually a ubo making it's way into ir3 frontend. Resulting in grabbing
what we think is a ubo address out of some unrelated const register, and
trying to dereference that. Which as you can imagine, fails in amusing
ways.
Fixes: fc850080ee ("ir3: Rewrite UBO push analysis to support bindless")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4954>
The nir_analyze_ubo_ranges pass removes all UBO block 0 loads to reverse
what nir_lower_uniforms_to_ubo() had done, and we only upload UBO pointers
to the HW for UBO block 1-N, so let's just fix up the shader state.
Fixes an off by one in const state layout setup, and some really dodgy
register addressing trying to deal with dynamic UBO indices when the UBO
pointers happen to be at the start of the constbuf.
There's no fixes tag, though this fixes a bug from September, because it
would require the num_ubos fix in nir_lower_uniforms_to_ubo.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4992>
v1. Replace the existed bool type with new bitfield and edit register
files to take a mask instead of duplicating codes to do masking.
v2. Use fragcoord_compmask != 0 instead of fragcoord_compmask > 0 since
it represents a bitfield.
Tested with
dEQP-VK.glsl.builtin_var.simple.fragcoord_xyz/w
dEQP-GLES2.functional.shaders.builtin_variable.fragcoord_xyz/w
Closes: #2680
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4723>
robclark noted that the blob wasn't doing range reduction in the mediump
case, and I confirmed it on
dEQP-GLES3.functional.shaders.operator.angle_and_trigonometry.sin.mediump_float_fragment
vs
dEQP-GLES3.functional.shaders.operator.angle_and_trigonometry.sin.highp_float_fragment.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4893>
These come from the disasm tests, and fix our disasm of blob's
uniform/nonuniform cat6 operands. We also now include human-readable names
for all the modes we know about (though bindless gets distinguished by its
.baseN, like Connor's original disasm).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4857>
With this I also brought in a few new control flow instruction disasm
tests that I'd made back when I wrote the disasm test, but which were too
far from correct to include until now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4857>
We can remove a bunch of conditional code at key comparison time by
computing a bitmask of used key bits at ir3_shader creation time. This
also gives us a nice place to put additional key simplification to reduce
how many variants we create (like skipping rastflat if we don't read
colors in the FS, or skipping vclamp_color if we don't write colors).
It does mean walking the whole key to AND it, but the key is just 28 bytes
so far so that seems pretty fine.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
We weren't filling in the tess mode of the key, or setting has_gs on GS
shaders, resulting in assertion failures when NIR intrinsics didn't get
lowered.
We have to make a guess at prim mode for TCS, but it should be better to
have some shader-db coverage than none, and it will avoid these failures
happening when we start precompiling shaders.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
Some of the negative API tests make shaders for tess stages that don't do
all the stores they need to. Once we start precompiling (or doing
shader-db of tess), we need to at least not segfault when generating them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
We were failing to tell the allocator about the restriction that scalar
texture instructions (allocated as scalar regs) couldn't be allocated such
that the start of the full unwritemasked vector started before r0. There
was a patch in select_reg_callback on a6xx that tried to work around that,
but you could still end up backed into a corner you shouldn't be because
we didn't tell the RA what it needed.
Fixes compiler assertion failures on a300-a400's blit_z shader, used for
Z32F gmem blits.
Looks like as a result we get tighter register allocation but more nops:
instructions in affected programs: 757945 -> 760356 (0.32%)
nops in affected programs: 317983 -> 320468 (0.78%)
non-nops in affected programs: 27525 -> 27451 (-0.27%)
mov in affected programs: 3098 -> 3023 (-2.42%)
dwords in affected programs: 109664 -> 110656 (0.90%)
last-baryf in affected programs: 112701 -> 112847 (0.13%)
full in affected programs: 4326 -> 4011 (-7.28%)
sstall in affected programs: 120550 -> 120836 (0.24%)
(ss) in affected programs: 13939 -> 13918 (-0.15%)
(sy) in affected programs: 3006 -> 2786 (-7.32%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
When the GS lowering was working on store_output intrinsics, we had to
clean up the split vars to avoid getting confused. Now that we shadow
the output vars instead, there's no confusion and we can drop this
hack.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
We mostly got away with replacing a store_output with a store_var, but
for complex types like structs, that doesn't work. Once the IO has
been lowered from vars to intrinsic, we've lost the deref chains and
can't properly shadow the outputs.
This commits moves the GS lowering up so we do it before the output
variables get lowered to store_output. This way the pass works much
like nir_lower_io_to_temporaries() and cleanly shadows the outputs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
This pass lowers per-vertex input intrinsics to load_shared_ir3. This
was open coded in the TCS and GS lowering passes before - this way we
can share it. Furthermore, we'll need to run the rest of the GS
lowering earlier (before lowering IO) so we need to split off this
part that operates on the IO intrinsics first.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>