mul.s24/u24 always return 32b result regardless of its sources size,
hence we cannot guarantee the high 16b of dst being zero or sign extended.
Fixes cts tests on a650:
dEQP-VK.spirv_assembly.type.scalar.i16.mul_test_high_part_zero_*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12471>
Support for spilling shared registers to normal registers is still TODO.
There are also several improvements to be made, like rematerialization.
Note, there is one behavior change to register pressure accounting: we
now include half registers in the current full pressure directly in
mergedregs mode, rather than adding the max half pressure to the max
full pressure afterwards, which might result in lower calculated max
pressure in some cases with half registers. This is needed for spilling,
since we need to make sure the total pressure including half registers
is below the maximum at each instruction. Because the entire pass is
rewritten, including the register pressure calculating parts, it didn't
seem worth it to separate out this change.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12033>
This was just wrong - we need to check against the entire register file,
and we need to include removed full regs even if the register we're
trying to insert is a half-reg, or else we could run out of space when
reinserting full regs after it. There does need to be an additional
check so that we don't try to insert a half-reg beyond the half-reg
limit, but that has to happen in addition to the normal check.
This fixes KHR-GLES31.core.arrays_of_arrays.InteractionArgumentAliasing6
once spilling is added.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12033>
RA uses this to pop and then reinsert intervals when shuffling around
registers. For spilling, we want to remove the interval and also mark
all its descendants as removed. Since "remove_all" sounds more like the
latter, rename the old "remove_all" to "remove_temp". "remove_all" was
already exposed in ir3_ra.h, so there's no need to add it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12033>
It can happen that we create an enormous merge set, even larger than the
entire register file, in which case find_best_gap() would loop
infinitely. This seems to be triggered more often with
IR3_SHADER_DEBUG=spillall, since it actually happened with a CTS test.
Just bail out in that case.
Fixes: 0ffcb19b9d ("ir3: Rewrite register allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12033>
When we mark live-through sources that are merged with the destination
as killed, we kept the bitsets in sync, but we forgot to keep them in
sync when unmarking them after allocating the destination. The result
was that "available" wasn't correct for any instruction afterwards. This
resulted in a bad register allocation with IR3_SHADER_DEBUG=spillall for
a dEQP-VK test.
While we're changing this, use ra_foreach_src().
Fixes: 0ffcb19b9d ("ir3: Rewrite register allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12033>
Otherwise when an interval is removed and then re-inserted it could
have an invalid/corrupted parent link and child tree. I think RA
happened to never do this, but spilling will.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12033>
Floats have much better precision close to zero than close to one, so
let's make sure we compute an interpolation factor that goes in the
direction that discards the fewest bits.
This makes a big difference when interpolating from very small to very
large values for screen-space positions.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12355>
By checking the output images against the reference ones on the failed
trace jobs, I looked for artifacts via naked eye and image diffs. No
significant change was found. So the trace produced by the failed jobs
can be considered valid.
Updated devices' traces:
* Intel Comet Lake: iris-cml-traces
* Intel Gemini Lake: iris-glk-traces
* Intel Kaby Lake: iris-kbl-traces
* Intel Whiskey Lake: iris-whl-traces
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12394>
Fixes
dEQP-GLES2.functional.fbo.completeness.renderable.texture.color0.rgb10_a2 on
GLES2 drivers which support RGB10_A2 textures.
GL_OES_required_internalformat does not make it a color-renderable
format.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4972
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12464>
CTS draw_indirect usage of TFB output was flaking due to the TFB writes
possibly not having completed. Since GL TFB doesn't require any other
barrier between TFB and use of the BO (as seen by the CTS not emitting any
memory barrier), we have to do it ourselves.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12457>
There are a bunch of optimizations that are broken when DPP is involved.
fossil-db (Sienna Cichlid):
Totals from 100 (0.07% of 150170) affected shaders:
CodeSize: 325204 -> 325192 (-0.00%); split: -0.06%, +0.05%
Instrs: 62773 -> 62664 (-0.17%); split: -0.18%, +0.00%
Latency: 295348 -> 295266 (-0.03%); split: -0.03%, +0.00%
InvThroughput: 73990 -> 73946 (-0.06%); split: -0.06%, +0.01%
Copies: 1650 -> 1609 (-2.48%); split: -2.55%, +0.06%
PreSGPRs: 3554 -> 3520 (-0.96%)
Fossil-db changes are probably because v_sub_f32_dpp(v_mul_f32) is no
longer being combined into MAD and then split back into separate
instructions.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11924>
I haven't gone through every test (particularly ones I think are loop
unrolling or instruction-count-related ones I think), but this gives a
better picture of what's going on in this driver.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12436>
Fixes: 51935d59
temporal_id check is valid only if the num_temporal_layers is set (>0).
When num_temporal_layers is 0, we shouldn't check temporal_id and return
error.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Thong Thai <thong.thai@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12463>
Fixes: 51935d59
In the case where num_temporal_layers is not set (0), set it using the
minimum value 1, otherwise the rate control settings will be missing.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Thong Thai <thong.thai@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12463>