With turnip we can have sparse input variables like:
decl_var shader_in INTERP_MODE_NONE float @1 (VERT_ATTRIB_GENERIC1.x, 1, 0)
decl_var shader_in INTERP_MODE_NONE float @2 (VERT_ATTRIB_GENERIC1.y, 1, 0)
decl_var shader_in INTERP_MODE_NONE float @3 (VERT_ATTRIB_GENERIC1.w, 1, 0)
Example of a test fixed:
dEQP-VK.glsl.440.linkage.varying.component.vert_in.vec2.as_float_float_unused
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5818>
For example only some UCPs may be used by the shader, triggering asserts
that too many consts are being uploaded.
While we're at it, also fix the const size when loading UCPs, since
otherwise it doesn't correspond to what the shader is actually using.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5752>
Gallium drivers should never see nir_var_uniform because gallium lowers
regular uniforms to a UBO. No GL driver should ever see either
nir_var_mem_shared because that's lowered in GLSL IR.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>
The previous version assumes tess level outputs will only be written once
in the shader, however its not possible to guarantee that.
It also assumes all invocations will write all the levels, which is also
not guaranteed.
This is required to fix the "tesselation" and "terraintessellation" demos
with turnip.
The comment about nir_lower_io_to_temporaries in lower_tess_ctrl_block is
removed because nir_lower_io_to_temporaries specifically skips TESS_CTRL
shaders so the comment doesn't make sense.
The split load for tess levels workaround is removed, the new version only
has scalar access unless if ever gets vectorized.
This sets NIR_COMPACT_ARRAYS cap to avoid the glsl tess vec lowering with
gallium. It seems this will also disable "LowerCombinedClipCullDistance",
which I'm not sure was needed or not.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5744>
resinfo always writes 3 components, which was not being taken into account
Fixes these tests:
dEQP-VK.renderpass.suballocation.attachment_sparse_filling.input_attachment_3
dEQP-VK.renderpass.suballocation.attachment_sparse_filling.input_attachment_7
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5674>
In cases where every variant is a shader-cache-hit, we never need the
post-finalize round of nir opt/lowering passes. So defer this until
the first shader-cache-miss to avoid doing pointless work.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5372>
Adds a shader disk-cache for ir3 shader variants. Note that builds with
`-Dshader-cache=false` have no-op stubs with `disk_cache_create()` that
returns NULL.
Binning pass variants are serialized together with their draw-pass
counterparts, due to shared const-state.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5372>
For shader-cache, we are going to want to serialize them together.
Which is awkward if the two related variants are not compiled together.
This also decouples allocation and compile, which will simplify adding
shader-cache (which still needs to allocate, but can skip compile).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5372>
Currently we always do sysmem if there is tess. And for GS, the binning
pass VS ends up identical to the draw pass VS, so no point in compiling
it twice. (For GS what we should do someday is generate a binning pass
GS, and possibly if we can do cross-stage linking opts, an optimized
binning pass VS, but the required outputs would somehow have to end up
in the shader variant key.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5372>
Use ir3_compiler_destroy() rather than open-coding ralloc_free(). This
will give us a place to add more compiler related cleanup code in the
following patches.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5372>
The next step is to hook this into pscreen->finalize_nir() so it can
come before the state tracker's shader-caching.
Unfortunately we still need to do lower_io after mesa/st, so that is
split out into a post-finalize pass.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5372>
This is part of adding VK_KHR_shader_draw_parameters for turnip.
IR3_DP_VTXID_BASE/IR3_DP_VTXCNT_MAX offsets are changed to match what
CP_DRAW_INDIRECT_MULTI requires.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5635>
Fixes a case where you have something like:
aVecOutput.z = aScalarInput;
In particular, skipping over things that are not the first component is
wrong.. in the above case the input we need to precolor is the 3rd
component. But we need to adjust the target register according to the
offset.
Fixes android.hardware.nativehardware.cts.AHardwareBufferNativeTests
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5601>
We don't really need the varying remapping, and it seems to somehow
happen twice when shader-cache comes into the picture. But we can
just choose not to have this problem.
Now that everything is using the ir3_point_sprite() helper, we can
flip this pipe cap without it being a massive flag-day.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5595>
This will simplify a bit the logic for setting up vinterp/vprepl in the
driver backend, and also avoid it being a flag-day when we switch the
texcoord pipe cap.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5595>
As per discussion on !5059, we don't see any particular reason as to
why MERGEDREGS should be disabled on HS/DS/GS, and none of the dEQP
tests (both VK and GL) fail when MERGEDREGS is enabled. In fact, some
of the VK dEQP tests fail when MERGEDREGS is disabled (e.g. tests
with shaders that employ a0.x). As a result, let's just enable
MERGEDREGS unconditionally on a6xx.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5059>
lower_tess_ctrl_block assumes that the gl_TessLevel*
intrinsic_store_outputs have already been collapsed into a single
instruction before the tess lowering step:
store_output ... /* base=0 */ /* wrmask=xyzw */ /* component=0 */
store_output ... /* base=1 */ /* wrmask=xy */ /* component=0 */
While this is true in fd because of st_nir_vectorize_io, we don't do
the same lowering in turnip so each tess level component still has
its own store instruction:
store_output ... /* base=0 */ /* wrmask=x */ /* component=0 */
store_output ... /* base=0 */ /* wrmask=x */ /* component=1 */
store_output ... /* base=0 */ /* wrmask=x */ /* component=2 */
store_output ... /* base=0 */ /* wrmask=x */ /* component=3 */
store_output ... /* base=1 */ /* wrmask=x */ /* component=0 */
store_output ... /* base=1 */ /* wrmask=x */ /* component=1 */
This commit adds a component offset to the tess control lowering. An
alternative is to also perform nir_lower_io_to_vector in turnip, but
ir3 seems to generate the same assembly either way and it's nice to
not have a lowering prereq before tess lowering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5059>
Since binning pass variants share the same const_state with their
draw-pass counterpart, we should re-use the draw-pass variant's ubo
range analysis. So split the two functions of the existing pass
into two parts.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5526>
This serves two purposes, one during ubo range analysis, where we want
to create new ranges, and another during the actual ubo lowering. Split
these in two, with read-only ubo analysis state in the second case, to
prepare to split this pass in two.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5526>
For shader-cache, we want to not have anything important in `ir3_shader`.
And to have shader variants with lower const size limits (to properly
handle cross-stage limits), we also want variants to be able to have
their own const_state.
But we still need binning pass shaders to align with their draw pass
counterpart so that the same const emit can be used for both passes.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
Make it an rzalloc'd ptr instead of embedded struct, so it can serve as
the mem ctx for immediates. This gets rid of needing to explicitly free
the immediates, so one less thing to deal with when moving const_state.
(Also, after we move const_state to the shader variant, we won't need
one for binning pass variants)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
When we move const_state to the variant, this will need to stay in the
shader, as it applies to all variants (and we need to store it somewhere
before we have any variants)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
The `ir3_shader` is the root mem ctx, with `ir3_shader_variant` hanging
off that, and various variant specific allocations hanging off the
variant.
This lets us delete a bunch of cleanup code.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
It seems a lot of the lowerings being run the second time were
unnecessary. In addition, when const_state is moved to the variant,
then it will become impossible to know ahead of time whether a variant
needs additional optimizing, which means that ir3_key_lowers_nir() needs
to go away. The new approach should have the same effect, since it skips
running lowerings that are unnecessary and then skips the opt loop if no
optimizations made progress, but it will work better when we move
ir3_nir_analyze_ubo_ranges() to be after variant creation.
The one maybe controversial thing I did is to make
nir_opt_algebraic_late() always happen during variant lowering. I wanted
to avoid code duplication, and it seems to me that we should push the
_late variants as far back as possible so that later opt_algebraic runs
don't miss out on optimization opportunities.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
The only difference between this and `const_state->num_ubos` was that
the latter is counting # of ubos loaded via `ldg` (based on UBO addrs
in push-consts). But turns out there isn't really any reason to care.
Instead just add an early return in the one code-path that cares about
the number of `ldg` UBOs.
This gets rid of one more thing we need to move from `ir3_shader` to
`ir3_shader_variant`.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
As with const_state, this will also need to move into the variant. To
simplify that, just move it into the const_state itself, since after all
it is related.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
We are going to want to move this back to the variant, and come up with
a different strategy for binning/nonbinning to share the same constant
layout, in order to implement shader-cache support. (Since then we
can have a mix of dynamically compiled variants and cache hits, so there
is no good place to serialize the const-state.)
To reduce the churn as we re-arrange things, move direct access to the
const-state to a helper fxn. This patch is the boring churny part.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
Deduplicate a bit of hand-building of ir3_shader/_variant from
computerator and delay test. This also removes the need for
external things to depend on generated ir3_parser header.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
Rather than assuming a6xx+ means mergedregs. We can actually (mostly?)
do splitregs on a6xx as well. And GS/DS/HS currently require it, which
might be papering over a bug, or might be something to do with how
chaining shaders works. At any rate, we should at least be consistent,
and not have the compiler thinking we are doing mergedregs when we are
actually doing splitregs.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5458>