Important for SM70 INSBF/EXTBF lowering, as these can can often be
eliminated completely.
v2:
- skip CF when subOp is set
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
This replaces the existing implementation without adding lowering for
earlier GPUs. The reason for this is because the existing code isn't
at all correct, and it also can't be hit anyway.
Will be required to support SM70 lowering passes.
v2:
- fixup source selection
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
We already use a hack from NVC0LegalizeSSA::handleShift() on GK110 and
newer which encodes SHF into the existing SHL/SHR opcodes, but there's
a couple of problems with it:
- LO/HI are swapped in one of the directions, which is very confusing.
- The initial SM70 code will emit this from NIR->NVIR, and using the
existing encodings will confuse the optimisation passes.
As I want to limit the impact on other GPUs from the initial bring-up
of Volta/Turing, let's add an explicit representation of SHF in the IR.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
Will be required to support SM70, but is also available on earlier GPUs.
v2:
- add convenience macro suggested by Karol
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
Setting the depth-scale to 1 while leaving the depth-translation at 0
means our near-plane is at -1 in OpenGL semantics, which is
out-of-range on some drivers. In particular, Zink has this limitation.
But since we'll only pass a zero z in here anyway, we might as well
multiply it by zero, and get the same result. This avoids the problem.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5408>
Serialize and check if the object is in the cache, it there is
a cached object skip compilation code once we've constructed
the function interface.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
This needs to be reworked, but it's a bit messy as we have to store
all the fetch pointers to be added as globals later once gallivm
has been initialised further. For now just refuse to cache shaders
that hit these paths (mainly ETC1 and BPTC).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
This hooks up the gallium API and adds the APIs needed
for shader stages to search and add things to the cache.
It also adds cache stats debug printing.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
MCJIT uses an ObjectCache object to implement the cache,
this creates and instances of it and adds it to the MCJIT
instances, it stores the cached object for later use by
the outer layers.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
Cached shaders require relinking, so hardcoding the pointer
can't work. This switches out the printf code to use new
proper API.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
When using cached shaders we have to relink the shader with
external symbols when it's loaded. However the way gallivm does
function calls now hardcodes the function pointer into the shader.
LLVM had a mechanism for doing this properly using global mappings,
this switches the coroutine alloc/free code to use a global mapping.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
When we use an object cache for the MCJIT we can have identical
cache entries from the same shader variant in different shaders,
but the JIT objcache uses the function name to relink things,
so it has to be consistent. Just drop the variants from the
function names.
Note the modules still have the variant info.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
Need to list arm_test-base here as well, or jobs using this
template may spuriously run if the arm_test-base job fails or
is cancelled.
Suggested-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5405>
This reduces overhead when resizing windows or when allocating
similar image sizes over and over again.
v2: optimize the memory footprint of the cache
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5398>
The retile map is not used in this case, and the retile map computation
takes 39% of CPU time when resizing a window.
This brings it down to 23%.
The dcc_retile_use_uint16 setting has to be derived from DCC sizes.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5398>
16-bit abs/neg creates v_xor_b32/v_and_b32 with v2b definitions. These
instructions never do partial writes without SDWA.
No shader-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5040>
This uses the new nir_intrinsic_store_combined_output_pan intrinsic,
which can write depth, stencil and color in a single instruction. If
there are no color writes, the "depth RT" is written to.
Fixes the dEQP GLES3 depth write tests, as well as the piglit tests
fragdepth_gles2, glsl-1.10-fragdepth and when modified to not rely
on depth/stencil reload, glsl-fs-shader-stencil-export.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5065>
We schedule depth writeout to smul and stencil to vlut, so scheduling
to smul has to be disabled in these cases.
When only writing stencil, scheduling to smul is still disabled to
prevent stencil writeout from being scheduled there.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5065>
Depth and stencil writes are combined with color writes, so we need
this intrinsic which has sources for color, RT, depth and stencil.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5065>
By setting the swizzle for the fragment color, and setting qmask to ~0
for branches, the special case for writeout branches can be removed
from mir_bytemask_of_read_components_index.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5065>
We need to be able to do color writeout at the same time as depth
writeout. The old code can't do that, so needs to be removed.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5065>
It is broken for when there are also color writes, and will be
replaced with a new lowering which takes that into account.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5065>
There will need to be sources for depth and stencil writeout, so
something has to be moved to the dest of the writeout branch.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5065>