v2:
- add header debug_printf(), and indent the output
v3:
- rename one of the helper macros
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
SM75 has a bunch more stuff, but is otherwise backwards-compatible
with SM70 SASS.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
v2:
- add TargetGV100::isBarrierRequired() for OP_BREV
- use NV50_IR_SUBOP_LOP3_LUT() convenience macro where it makes sense
- separated out nir_lower_idiv into its own commit
- make use of the shared function to generate compiler options
- disable lower_fpow, nir's lowering is broken
v3:
- use replaceCvt() instead of custom NEG/ABS/SAT lowering
v4:
- remove WAR from peephole, not needed now we're using replaceCvt()
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
About to disable lowering for extract_byte/word in favour of a better
local implementation, but still need lowering for 64-bit versions.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
We can enable some more things here vs earlier GPUs.
v2:
- make use of the shared function to generate compiler options
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
PFETCH lowering will be changed to use this as it's more SM70-friendly,
and this will also allow us to implement extract_byte/word opcodes.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
NIR provides a common implementation of this so we don't need to use a
hand-written built-in library.
v2:
- use idiv_precise instead
Especially important on SM70 where we don't have an assembler.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
These seem to make more sense living with the compiler.
v2:
- use a shared function to generate the per-chipset structs
- remove nir.h include from header, not needed
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
We have nir_intrinsic_dest_components and nir_intrinsic_src_components
which handle all the corner cases.
Fixes a bunch of regressions like front_face stuff.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
Important for SM70 INSBF/EXTBF lowering, as these can can often be
eliminated completely.
v2:
- skip CF when subOp is set
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
This replaces the existing implementation without adding lowering for
earlier GPUs. The reason for this is because the existing code isn't
at all correct, and it also can't be hit anyway.
Will be required to support SM70 lowering passes.
v2:
- fixup source selection
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
We already use a hack from NVC0LegalizeSSA::handleShift() on GK110 and
newer which encodes SHF into the existing SHL/SHR opcodes, but there's
a couple of problems with it:
- LO/HI are swapped in one of the directions, which is very confusing.
- The initial SM70 code will emit this from NIR->NVIR, and using the
existing encodings will confuse the optimisation passes.
As I want to limit the impact on other GPUs from the initial bring-up
of Volta/Turing, let's add an explicit representation of SHF in the IR.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
Will be required to support SM70, but is also available on earlier GPUs.
v2:
- add convenience macro suggested by Karol
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
Setting the depth-scale to 1 while leaving the depth-translation at 0
means our near-plane is at -1 in OpenGL semantics, which is
out-of-range on some drivers. In particular, Zink has this limitation.
But since we'll only pass a zero z in here anyway, we might as well
multiply it by zero, and get the same result. This avoids the problem.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5408>
Serialize and check if the object is in the cache, it there is
a cached object skip compilation code once we've constructed
the function interface.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
This needs to be reworked, but it's a bit messy as we have to store
all the fetch pointers to be added as globals later once gallivm
has been initialised further. For now just refuse to cache shaders
that hit these paths (mainly ETC1 and BPTC).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
This hooks up the gallium API and adds the APIs needed
for shader stages to search and add things to the cache.
It also adds cache stats debug printing.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
MCJIT uses an ObjectCache object to implement the cache,
this creates and instances of it and adds it to the MCJIT
instances, it stores the cached object for later use by
the outer layers.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>