No driver doesn't use this option.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17757>
primitive_param takes up 2 vec4's. Remove an align that I don't
understand.
The align upset
Test case 'dEQP-VK.subgroups.ballot_broadcast.graphics.subgroupbroadcast_vec4'..
deqp-vk: ../src/freedreno/ir3/ir3_nir.c:1039:
void ir3_setup_const_state(nir_shader *, struct ir3_shader_variant *, struct ir3_const_state *):
Assertion `constoff <= ir3_max_const(v)' failed.
with an older version (android11-tests-dev branch) of deqp-vk. This is
because ir3_nir_opt_preamble uses the function for the worst case but
the function fails to replace the align by the worst case.
No regression with dEQP-VK.*tess*.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17570>
Adds a shared consts base offset and a size of it(dwords) to ir3_compiler
since they might be depending on gpu generations. (Danylo Piliaiev <dpiliaiev@igalia.com> )
Adds a flag to present whether shared consts are enabled to
ir3_shader_options and then it sets to ir3_const_state when creating
an ir3 variant. Although this state is not per-shader state, this is
necessary when figureing out real constlens.
v1. Define a hw quirk for geometry shared const files and use it when
calculating const length.
v2. Don't hardcode when calculating a safe const length.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15503>
According to the observation on a630/a650/a660, max_const_pipeline has
to be 512 when all geometry stages are present. Otherwise a gpu hang
happens. Acoordingly maximum safe size for each stage should be under
(max_const_pipeline / 5 (stages)).
Only when VS and FS stages are present, the limit is 640.
v1. Align max_const_safe to 4 vec4's.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15503>
Allow folding constants/undef sources by sharing more code with the image_store
16bit folding pass.
Allow more than one set of sources because RADV wants two, one for
G16 (ddx/ddy) and one for A16 (all other sources).
Allow folding cube sampling destination conversions on radeonsi/radv because
I think the limitation only applies to sources.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16978>
In addition to hopefully generating shorter code, this optimizes out a
comparison of a mediump-cast value in
dEQP-GLES2.functional.shaders.algorithm.rgb_to_hsl_fragment passed
through ANGLE, and allows the test to pass. We believe it to be a
test bug, but emitting better code like apparently everyone else does
is also a fine result.
No change on GLES gfxbench shaders.
Fixes: #6585
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17546>
The if statement we insert would insert a new block before the end block
(and remove the old pre-end-block). If the new block ended up later in
the HT due to its pointer's hash value, you'd emit another copy of the if
statement after the last one. I saw this happen up to 4 times in testing.
The worst case would be if all those additions and removals ended up
reallocating the HT, at which point we might use-after-free.
Fixes inconsistent shader-db results with geometry shaders.
Cc: mesa-stable.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17501>
And also handle tess. In all cases, we want to use the VS lowering pass
on the last geometry stage. We don't make a special exception for GS
like other drivers, because GS gets lowered into a quasi-VS.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17341>
If we didn't EmitPrimitive() then the shadow (old) outputs would not
get copied to the emit temps (to eventually be copied back to the real
outputs. This isn't so bad except that means the realy vertex_flags
output has an undefined value.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17341>
Before, we needed CP post-sched to copy-propagate references to NIR
registers produced by out-of-ssa. Now that we're in SSA, this pass ends
up not doing anything useful, and actually gets in the way by occasionally
creating a cycle in the DAG.
The entire shader-db impact is:
instructions HURT: shaders/closed/steam/tropico-5/78.shader_test FRAG: 238 -> 242 (1.68%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17320>
SSBO access works very differently from UBO access. Straddling
loads/stores isn't an issue, loads/stores instead must be aligned to the
element size and can have up to 4 components.
We support 16-bit access with SSBOs on a650+, and sometimes the
vectorizer tries to create a misaligned 32-bit access when combining
32-bit and 16-bit accesses. The UBO-focused logic didn't reject this,
which is now fixed. This fixes a number of VK-CTS regressions on a650+.
Fixes: bf49d4a084 ("freedreno/ir3: Enable load/store vectorization for SSBO access, too.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17040>
This needs to be accurate so that when we split and then schedule a new
a0.x/a1.x/p0.x write we will eventually make progress. It wasn't taking
the kill_path into account which could create an infinite loop as we
keep scheduling writes whose uses are blocked because they are memory
instructions not on the kill_path.
Closes: #6413
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16635>
Force unroll setting based on GLSL IR settings:
case PIPE_SHADER_CAP_INDIRECT_INPUT_ADDR:
case PIPE_SHADER_CAP_INDIRECT_OUTPUT_ADDR:
case PIPE_SHADER_CAP_INDIRECT_TEMP_ADDR:
case PIPE_SHADER_CAP_INDIRECT_CONST_ADDR:
/* a2xx compiler doesn't handle indirect: */
return is_ir3(screen) ? 1 : 0;
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16366>
ce1a381e57 ("turnip: enable VK_KHR_16bit_storage on A650") determined
that the type of the instr decided the type of the value being stored in
the ".b" case. But it would be surprising if image stores had the type
determine the coordinates' precision instead of the value's, and once we
turned on image instruction precision lowering we ran into asserts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16616>
Each shader stage has its own "early preamble" flag.
Early preamble is likely an optimization to hide some of latency
when loading UBOs into consts in the preamble.
Early preamble has the following limitations:
- Only shared, a1, and consts regs could be used
(accessing other regs would result in GPU fault);
- No cat5/cat6, only stc/ldc variants are working;
- Values writen to shared regs are not accessible by the rest
of the shader;
- Instructions before shps are also considered to be a part of
early preamble.
Note, for all shaders from d3d11 games blob produced preambles
compatible with early preamble mode.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15901>