This removes the need for drivers to handle both versions. The base will
get added once in nir_lower_system_values when converting from deref to
intrinsic and will be replaced by a zero for users not supporting it.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
This removes the need for drivers to handle both versions. The base will
get added once in nir_lower_system_values when converting from deref to
intrinsic and will be replaced by a zero for users not supporting it.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
OpenGL doesn't have them and rusticl handles them for CL already.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
The compiler supports those intrinsics only for task/mesh shaders and it
never caused any issues, because the way `nir_lower_compute_system_values`
is doing its lowering.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26800>
NIR can't handle those component counts, so we have to split it into 2
SGPR vectors where each has max 4 components.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28725>
If the clear color isn't 0 or 1, we used a slow clear. This adds a new
DCC clear where the DCC buffer is cleared to a special value and the clear
color is stored at the beginning of each 256B block in the image.
It can be very fast, but it's not always faster than a slow clear.
There is a heuristic that determines whether this new fast clear is
better.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28725>
Since the TCS epilog is no more, this is required to apply those bits
to monolithic shaders.
tessfactors_are_def_in_all_invocs was unused.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28725>
otherwise the options would be ignored if the shader cache had already
cached the same shader with the option inverted.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28725>
Starting from MTL there is registers in HW to read the IP version of
graphics, media and display IPs, those registers are called GMD.
IPs can be used in any combination to form a SOC/platform and each IP
has it own stepping/revision, making complex to track each IP stepping
using just PCI revision.
Since MTL will be supported by default by i915 KMD that don't have
a uAPI fetch IP versions, this feature will only be supported in LNL
and newer that are backed by Xe KMD.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26908>
Sync xe_drm.h with 31ced035ecde ("drm/xe/uapi: Restore flags VM_BIND_FLAG_READONLY and VM_BIND_FLAG_IMMEDIATE").
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26908>
In some (probably malformed) cases, even weights BOs for strided or depthwise
convolutions can become bigger when using ZRL compression.
To avoid running out of space in the BO, play safe and calculate the
actual optimum ZRL bit count. This does slow compilation for quite a
bit, though (2x slower for MobileNetV1).
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28879>
By using the on-chip SRAM to cache the input image we can save some more
bandwidth and increase the utilization of the NN cores, with the
following improvements:
MobileNetV1: 9.991ms -> 6.2ms
SSDLite MobileDet: 27ms -> 24.3ms
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28879>