according to the spec, atomic counters can be bound at any offset divisible by 4,
which means that any driver that uses the ssbo lowering pass and doesn't have
a min offset align of 4 is potentially broken
to handle this, use a statevar to inject the misaligned remainder of the offset
into the shader as a uniform. for well-aligned counter binds, the uniform offset
will be 0
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16749>
this function should be called late to allow for other passes potentially
making changes which affect the states in use by shaders
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16749>
some hardware can't do a ssbo offset=4, as required by the atomic->ssbo
lowering pass, so for these cases an offset can be passed for the counter
as a uniform, and the shaders can be adjusted accordingly
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16749>
This pass will merge instructions like these
MOV output[0].x, temp[5].x___;
MOV output[0].yzw, none._001;
into
MOV output[0].xyzw, temp[5].x001;
It is currently very careful with control flow and dependency
tracking, so there is still room for improvements.
Shader-db stats with RV530:
total instructions in shared programs: 132486 -> 132256 (-0.17%)
instructions in affected programs: 6186 -> 5956 (-3.72%)
helped: 65
HURT: 0
total temps in shared programs: 18035 -> 18014 (-0.12%)
temps in affected programs: 295 -> 274 (-7.12%)
helped: 22
HURT: 1
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16657>
This fixes the "Rewrite of inst X failed Can't allocate source
for Inst X src_type=X new_index=X new_mask=X" errors.
The compiler is quite strict when rewriting registers during
the pair allocation and checks that all of the reads of it are
initialized. However the spec doesn't enfore that, and
specifically with control flow depending on user input we can't
really know...
In the following example temp[4].x is written only in one branch,
that might or might not be taken, but this is enough to keep the
compiler happy:
IF aluresult.x___;
MAD temp[4].x, src0.1__, src0.111, src0.000
ENDIF;
src0.xyz = temp[4], src0.w = temp[4]
MAD color[0].xyz, src0.xyz, src0.111, src0.000
MAD color[0].w, src0.w, src0.1, src0.0
After switch to ntt, more IFs are converted to CMP, and the color
write looks like this. Please note that the CMP here is not TGSI
opcode but rather our US_OP_RGB_CMP: src2 >= 0 ? src0 : src1
src0.xyz = temp[4], src0.w = temp[4], src1.xyz = temp[3], src1.w = temp[12], src2.xyz = temp[2]
CMP color[0].xyz, src0.xyz, src1.xyz, -src2.xxx
CMP color[0].w, src0.w, src1.w, -src2.x
At this point temp[4].x is undefined. Now when compiler tries to
allocate register for temp[4] at some previous instruction, it will
find out that it is used as a source in the final CMP and bail out.
Instead of increasing the complexitty even more trying to account for
this, just get rid of the check completelly.
Fixes:
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec2_dynamic_subscript_write_component_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec2_dynamic_subscript_write_direct_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec2_dynamic_subscript_write_dynamic_subscript_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec2_dynamic_subscript_write_static_loop_subscript_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec2_dynamic_subscript_write_static_subscript_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec3_dynamic_subscript_write_component_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec3_dynamic_subscript_write_direct_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec3_dynamic_subscript_write_dynamic_subscript_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec3_dynamic_subscript_write_static_loop_subscript_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec3_dynamic_subscript_write_static_subscript_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec4_dynamic_subscript_write_component_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec4_dynamic_subscript_write_direct_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec4_dynamic_subscript_write_dynamic_subscript_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec4_dynamic_subscript_write_static_loop_subscript_read_fragment,Fail
dEQP-GLES2.functional.shaders.indexing.vector_subscript.vec4_dynamic_subscript_write_static_subscript_read_fragment,Fail
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16657>
The fixes are mostly from 23dfae4c81
dEQP-GLES2.functional.fragment_ops.depth_stencil tests show random
flakes. The ones in failures are showing unexpected pass, however other
random test failures from the same group keep showing so just mark it
all as flakes.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16657>
It doesn't make sense and was not working anyway. This was spotted
by Filip Gawin in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13978
however the fix there was IMO just papering over the problem.
I don't believe that this could manifest as a real issues, because
when all of the swizzles were constant the file would be set to
RC_FILE_NONE already. So in theory this could lead to an issue only
in the close to impossible circumstance that the out of bounds memory
read by constant->u.Immediate[swz] would end with the same exact value
as another inlineable constant in different channel. However in some
circumstances it would lead to following valgrind warnings:
Conditional jump or move depends on uninitialised value(s)
at 0x5D4E690: ieee_754_to_r300_float (radeon_inline_literals.c:61)
by 0x5D4E690: rc_inline_literals (radeon_inline_literals.c:133)
by 0x5D3877A: rc_run_compiler_passes (radeon_compiler.c:436)
by 0x5D38821: rc_run_compiler (radeon_compiler.c:458)
by 0x5D4AF63: r3xx_compile_fragment_program (r3xx_fragprog.c:139)
by 0x5D48377: r300_translate_fragment_shader (r300_fs.c:499)
by 0x5D491B0: r300_pick_fragment_shader (r300_fs.c:601)
by 0x5D2BFEE: r300_create_fs_state (r300_state.c:1072)
by 0x57DDC36: st_create_nir_shader (st_program.c:538)
by 0x57DF10E: st_create_fp_variant (st_program.c:1056)
by 0x57E057C: st_get_fp_variant (st_program.c:1102)
by 0x57E0AB1: st_precompile_shader_variant (st_program.c:1287)
by 0x57E0AB1: st_finalize_program (st_program.c:1333)
by 0x57CB6F3: st_link_nir (st_glsl_to_nir.cpp:958)
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16657>
When there are multiple MOVs with the same destination in loop
in different branches and some readers after the loop, we would
now errorneously copy propagate the last MOV, like in the following
snippet:
BGNLOOP;
...
IF temp[3].x___;
MOV temp[2], const[1].yxxy;
BRK;
ENDIF;
IF temp[4].x___;
MOV temp[2], const[1].xyxy;
BRK;
ENDIF;
...
MOV temp[2], const[1].xyxy;
ENDLOOP;
ADD_SAT temp[0], temp[2], temp[1];
into:
BGNLOOP;
...
IF temp[3].x___;
MOV temp[2], const[1].yxxy;
BRK;
ENDIF;
IF temp[3].y___;
MOV temp[2], const[1].xyxy;
BRK;
ENDIF;
...
ENDLOOP;
ADD_SAT temp[0], const[1].xyxy, temp[1];
We need the copy propagate just for simple cleanups after ttn,
anything more complex should have been handled already in NIR.
So just bail out if any of the readers is after the loop.
No changes in shader-db.
Fixes few piglit tests when loop unrolling is disabled:
spec@glsl-1.10@execution@vs-loop-complex-unroll
spec@glsl-1.10@execution@vs-loop-complex-unroll-nested-break
spec@glsl-1.10@execution@vs-loop-complex-unroll-with-else-break
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6467
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16657>
opencl_c_h is defined only for llvm < 15.
Fixes: bcc2df4890 ("clc: speed up compilation by not relying on opencl-c.h")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16808>
This is used for the old, buggy and slow GLSL IR loop unrolling
code. All drivers have now switched to the NIR unrolling code so
here we remove the CAP.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16366>
NIR loop unrolling is already enabled so just let it do its job.
Shader-db results (nv120):
total gpr in shared programs: 893490 -> 893898 (0.05%)
gpr in affected programs: 15338 -> 15746 (2.66%)
total instructions in shared programs: 6243205 -> 6237068 (-0.10%)
instructions in affected programs: 71160 -> 65023 (-8.62%)
total bytes in shared programs: 66729616 -> 66664760 (-0.10%)
bytes in affected programs: 759328 -> 694472 (-8.54%)
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16366>
NIR loop unrolling is already enabled so just let it do its job.
Shader-db results (nv92):
total gpr in shared programs: 734638 -> 735037 (0.05%)
gpr in affected programs: 11058 -> 11457 (3.61%)
total instructions in shared programs: 6073415 -> 6073398 (<.01%)
instructions in affected programs: 10079 -> 10062 (-0.17%)
total bytes in shared programs: 41837432 -> 41838872 (<.01%)
bytes in affected programs: 252504 -> 253944 (0.57%)
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16366>
NIR loop unrolling is already enabled so just let it do its job.
Shader-db results (nv40):
total instructions in shared programs: 17446532 -> 17446068 (<.01%)
instructions in affected programs: 15532 -> 15068 (-2.99%)
total gpr in shared programs: 82658 -> 82801 (0.17%)
gpr in affected programs: 1680 -> 1823 (8.51%)
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16366>
Otherwise we will later hit:
gpir_error("nir_ssa_undef_instr is not supported\n");
Unfortunatly this causes a piglit failure due to increased register
pressure in an unrealistic shader but since not doing this can
result in hitting the not supported error in more relistic shaders
this seems the right thing to do for now.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16366>
Force unroll setting based on GLSL IR settings:
case PIPE_SHADER_CAP_INDIRECT_INPUT_ADDR:
case PIPE_SHADER_CAP_INDIRECT_OUTPUT_ADDR:
case PIPE_SHADER_CAP_INDIRECT_TEMP_ADDR:
case PIPE_SHADER_CAP_INDIRECT_CONST_ADDR:
/* a2xx compiler doesn't handle indirect: */
return is_ir3(screen) ? 1 : 0;
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16366>
The NIR unroller is already enabled so just allow it to do its job.
We add a new failure here because llvmpipe fails to handle a
shader that is no longer unrolled.
Previously GLSL IR could unroll the loop because it only had a
single break. However once lower_returns passes over the shader
it ends up with more than 2 breaks making it no longer possible
to unroll. This is a disadvantage of doing the unrolling in NIR
however in practice we don't see shaders in the wild with multiple
returns inside loops.
Being unable to handle this loop is an existing bug with llvmpipe
exposed by the loop no longer being unrolled.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16366>
We now have infrastructure in place to generate variants of vertex shaders
specialized for transform feedback. All that's left is launching these
compute-like kernels before the IDVS job, implementing both the
transform feedback and the regular rasterization pipeline. This implements
transform feedback on Valhall, passing the relevant GLES3.1 tests.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
Valhall has no architectural support for transform feedback. So if a vertex
shader uses transform feedback, we need to split the shader into two: a pure
vertex stage and a compute-like transform feedback stage. This splitting
resembles the splitting we do for IDVS.
When compiling a vertex shader that uses transform feedback on Bifrost, also
compile the transform feedback variant. That variant (marked by internal=true)
will get its stores lowered by the NIR pass introduced earlier in this series.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
Wire the Gallium interface for transform feedback up to the system values that
will be fed into our lowering code. This is based on our existing transform
feedback implementation for Midgard.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
In both GL and VK, the driver may choose not to support vertex shaders with side
effects (SSBOs, atomics, images). Supporting this opens a can of worms for IDVS.
Neither freedreno nor the (Vulkan?) DDK advertise support, for this reason.
Apps should not be using this anti-feature anyway.
Stop advertising support.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
Translate the intrinsics we introduced to lower away transform feedback into
Panfrost system values which the GL driver can handle.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
Add a simple NIR-based implementation of transform feedback, appropriate for
OpenGL ES 3.1 class hardware (compute but no geometry or tessellation shaders).
Stores to varyings that will be captured are replaced by stores to transform
feedback buffers and some addressing math. This allows implementing the semantic
of transform feedback in a compute-like stage.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
This is useful for drivers which wish to consume XFB information. These
hopefully-uncontroversial hunks are extracted from the much more controversial
"st,nir,radeons: Move nir_lower_io_passes to si_nir_lower_io" by Jason.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
These will be used to facilitate transform feedback lowering for Panfrost,
although other backends could use the sysvals in the future.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15720>
Rather than asserting everything in an unused function, just do it in global
context with C11 static_asserts. This is a bit neater now that we depend on C11
projectwide.
Obvious follow-on from !16670.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16856>
Calling this directly in the linker code allows us to place it between
the varying linker and uniform linker calls which allows for better
optimisation/removal of uniforms.
Also in a later patch it allows us to insert a new nir based
lower_const_arrays_to_uniforms() call after the gl_nir_link_opts()
call. This is important because it allows the linking opts to
move constant arrays to later stages if possible before
lower_const_arrays_to_uniforms() turns them into uniforms.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6541
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16770>