If the coherent flag is present, then we need to not have an incoherent
cache between us and previous stores to the image that were also decorated
as coherent. isam apparently (unsurprisingly) goes through a texture
cache. Use ldib instead, so that we don't get the wrong result.
We would need a similar fix for a4xx, but that uses ldgb and I don't
have hardware to test on.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12704>
On a6xx resinfo returns size in bytes divided by IBO_0_FMT format size
(not just size in dwords), we have to shift it back to NIR meaning which
is size in bytes.
Make freedreno use 16b buffers when they are supported in order to be
able to depend on hardware capabilities when lowering ssbo size.
Fixes: ce1a381e57 "turnip: enable VK_KHR_16bit_storage on A650"
Fixes cts tests:
dEQP-VK.ssbo.unsized_array_length.float_offset_explicit_size
dEQP-VK.ssbo.unsized_array_length.float_no_offset_whole_size
dEQP-VK.compute.basic.write_multiple_unsized_arr_single_invocation
and many more
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12485>
For the same reason as previous patch. Mostly we only care about the
generation, so convert things to use compiler->gen instead.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12159>
This is where it should be rather than having to pass it into the
optimisation pass every time.
It also allows us to call the loop analysis pass without having to
duplicate these options which we will do later in this series.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>
The way that the blob obtains the subgroup id on compute shaders is by
just and'ing gl_LocalInvocationIndex with 63, since it advertizes a
subgroupSize of 64. In order to support VK_EXT_subgroup_size_control and
expose a subgroupSize of 128, we'll have to do something a little more
flexible. Sometimes we have to fall back to a subgroup size of 64 due to
various constraints, and in that case we have to fake a subgroup size of
128 while actually using 64 under the hood, by just pretending that the
upper 64 invocations are all disabled. However when computing the
subgroup id we need to use the "real" subgroup size. For this purpose we
plumb through a driver param which exposes the real subgroup size. If
the user forces a particular subgroup size then we lower
load_subgroup_size in nir_lower_subgroups, otherwise we let it through,
and we assume when translating to ir3 that load_subgroup_size means
"give me the *actual* subgroup size that you decided in RA" and give you
the driver param.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
Be consistent with other usages in Vulkan and SPIR-V, and the recently
added workgroup_size field.
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
I forgot that after rebasing on large_consts support that this is now
called after the first time nir_lower_wrmask is called and can generate
partial writemasks that need to be lowered. While we're here, also call
the main optimization loop if things are lowered to scratch because it
generates address arithmetic that may need to be cleaned up.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10922>
We don't want to have to deal with vector phis in freedreno, because
vectors are always split/unsplit around vectorized instructions anyways,
and the stated reason for not scalarising them (it hurting coalescing)
won't apply to us because we won't be using nir_from_ssa. Add this
option so that we don't have to do the equivalent thing while
translating from NIR.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10809>
No handling of Acquire/Release because at the moment scheduler
works as if any barrier is Acq+Rel.
Instead of removing scoped_barrier with scope/mode that for TCS
corresponds to a control_barrier or a memory_barrier_tcs_patch in
ir3_nir_lower_tess_ctrl - remove them in emit_intrinsic_barrier.
And do the same for memory_barrier_tcs_patch and control_barrier.
While in any case hw fence/barrier shouldn't be emitted for them,
they still affect ordering of stores, and in feature ir3 backend
may want to have that information.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9054>
Do all the necessary lowering in one place, during finalization, and
stop uselessly calling nir_lower_indirect_derefs in turnip. Splitting
i/o to elements should no longer be necessary since we use the i/o
semantics instead of variables now.
This has the side effect that we no longer generate enormous if-ladders
for tess/GS shaders with turnip.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7274>
We have to keep sampler uniforms around for later YUV lowering, and we
only need to remove uniforms that take up storage space. Code comes from
radeonsi.
Closes: #4644.
Fixes: de17b4aab5 ("freedreno: Remove uniform variables after finalizing NIR.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10246>
This needs to be part of the compiler because it's the only piece that
we always have access to in all the places ir3_optimize_loop() is
called, and it's only enabled for the whole Vulkan device. Right now
it's just used for constraining vectorization, but the next commit adds
another use.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7573>
Added:
* a pass that renumbers bases of IO intrinsics
* a pass that converts mediump IO to 16 bits, optionally using the new
packed varying slots
* a pass that sets (forces) mediump in IO intrinsics (for testing)
* a pass that remaps VARYING_SLOT_VAR[0..15]_16BIT to VARYING_SLOT_VAR[0..31]
(if some shader stages don't want packed varyings)
* a pass that folds type conversions around texture opcodes into those
opcodes (e.g. tex(f2f32(coord), ..) is changed into tex accepting f16)
* a pass that changes (legalizes) sampler src and dst types based on specified
hw constraints (e.g. derivatives must be the same type as coordinates)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9050>
Vulkan 1.1 has VK_KHR_device_group and VK_KHR_device_group_creation
promoted to core, thus we should handle DeviceIndex built-in.
While we are here, also add these extensions to the extensions list,
even though they are not doing anything useful.
Fixes test:
dEQP-VK.compute.device_group.device_index
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9516>
This commit replaces the new_src parameter of nir_ssa_def_rewrite_uses()
with an SSA def, removes nir_ssa_def_rewrite_uses_ssa(), and rewrites
all the users as needed.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
The GL frontend can do it for us now, so just use their code instead of
our own shader variants. In the past we had to do hide the GL shader
variants in the driver to get precompiles from st, but no longer as of
!8601.
I tested with drawoverhead -test 6 (shader program change, n=30) and -test
1 (no statechanges, n=43) and saw no change in driver overhead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8997>
mesa/st optimizes the uniform storage if you have the finalize hook in
place, causing the uniforms declared to potentially not have storage in
the ParameterValues list any more. If you leave your uniforms around in
the NIR, then a later finalization after variant creation will re-add the
uniforms to parameters, defeating the optimization and likely reallocating
the uniform storage (causing use-after-free). So, we have to do this
before we can start using variants in mesa/st.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8997>
Right now if the shader indirects on some large constant array, we see NIR
load_consts (usually from the const file) of its contents into general
registers, then indirection on the GPRs. This often results in register
allocation failures, as it's easy to go beyond the ~256 dwords of
registers per invocation.
By moving the large constants to a UBO, we can load an arbitrary number of
them. They also can be theoretically moved to the constant reg file (~2k
dwords), though you're unlikely to hit this path without an indirect load
on your large constant, and we don't yet let UBO indirect loads get moved
to constant regs.
This possibly won't work out right if we have 16-bit load_constants, but
without other MRs in flight we won't see 16-bit temps to be lowered to
this.
This allows 2 kerbal-space-program shaders to compile that previously
would fail, and fixes the new dEQP-VK and -GLES2 tests I wrote that
dynamically index a 40-element temporary array of float/vec2/vec3/vec4
with constant element initializers.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2789
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
With indirect load_uniform, we can only encode 10b of constant base
offset. This pass detects problematic cases and peels out the high
bits of the base offset.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7612>