This was implemented in 1d3bffaf9c,
but missing the WRITE_COMPRESS_ENABLE bit, then disabled by
4dc6ed2a59040f04648eadbffeb1522587d00f3.
This commits reimplements it to:
- avoid disabling dcc when uploading FP16 textures
(see si_use_compute_copy_for_float_formats)
- being able to use compute to upload textures in more cases, rather
than using the blit path
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8958>
The new enum has 3 values:
- SI_CP_DMA_CLEAR_METHOD: equivalent to force_cp_dma = true
- SI_COMPUTE_CLEAR_METHOD: to force the clear to use compute
- SI_AUTO_SELECT_CLEAR_METHOD: equivalent to force_cp_dma = false
No functional change yet, but this will be used later.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8958>
We always return GL_TRUE from the Unmap functions.
gl_marshal.py is modified so as not to use "return" in the unmarshal
function, which always returns void.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9029>
The pointer is a GPU offset if a PBO is bound.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9029>
so that GL functions with a non-const pointer don't print a warning when
we call them, such as glGetTexImage with a PBO where the pointer is really
just an offset.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9029>
We want glthread to ignore variable-sized parameters if the only thing
we want is to pass the pointer parameter as-is, e.g. when a PBO is bound.
Making it conditional on marshal_sync is kinda hacky, but it's the easiest
path towards handling PBOs, which will use marshal_sync to check whether
a PBO is bound.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9029>
It must be 1 only if both sampler and non-sampler VMEM instructions
that return something are used. BVH counts as a sampler instruction.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9028>
We don't have any nir_variables for uniforms, so this code wasn't
doing anything. Also, uniform handles are almost always uniforms.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9028>
so that create_stream_output_target doesn't use the context and can be
called from any thread. This is for u_threaded_context.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9028>
Doing these MINs at the end could have undone optimizations for the LDS
size and threadgroup size, so move the MINs up.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9028>
We can't invoke gfx6 addrlib (overridden by SI_FORCE_FAMILY) with a gfx9
family ID.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9028>
si_check_render_feedback only relied on si_images::enabled_mask and
si_samplers::enabled_mask to determine if a texture was being used
both as input and output.
Given that some samplers/images can be considered active (so accounted
for by enabled_mask) but not used by the current shader this could
lead to false-positive.
This commit fixes this by and-ing the above mask with the information
from shader_info for each active shader.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4227
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8869>
Shaders that fail register allocation were dumped with an instruction
count of 0, so getting them to compile would show up as an instruction
count regression. Also, the LOST/GAINED stats depend on us not dumping
data for failed shaders, which is why we were always seeing 0/0 there.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>
As we add more compiler optimizations that can increase register pressure
we may decide to disallow TMU spilling in more cases so it is probably
better to move this to its own helper function.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>
If we emit a new uniform and that uniform has already been emitted
in the same block we can just reuse that.
There is a balancing game here between reducing ldunif instructions
and not increasing register pressure too much though, so we put
a limit to how far back we are willing to look for a previous
definition of the uniform. Based on shader-db results, 20 instructions
produces best results.
total instructions in shared programs: 14928266 -> 14907432 (-0.14%)
instructions in affected programs: 6431841 -> 6411007 (-0.32%)
helped: 15270
HURT: 10772
Instructions are helped.
total uniforms in shared programs: 3944672 -> 3840276 (-2.65%)
uniforms in affected programs: 1827184 -> 1722788 (-5.71%)
helped: 30423
HURT: 845
Uniforms are helped.
total inst-and-stalls in shared programs: 14957813 -> 14936873 (-0.14%)
inst-and-stalls in affected programs: 6475349 -> 6454409 (-0.32%)
helped: 15287
HURT: 10852
Inst-and-stalls are helped.
v2 (Eric):
- consider ldunifrf too
- check that no other instruction writes to the register
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>
It was interpreting the value as hexadecimal when it is unsigned.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8524>
Fix defect reported by Coverity Scan.
Resource leak (RESOURCE_LEAK)
leaked_storage: Variable cs going out of scope leaks the storage it points to.
Fixes: c9e8b49b88 ("etnaviv: gallium driver for Vivante GPUs")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9034>
On Gen7, we have to split shuffles into two MOVs for 64-bit types so we
can't handle source modifiers. On Gen12.5, we have to use integer types
all the time so we can't use them there either. Fixing that will be a
different commit but it interacts with this one.
Fixes: 90c9f29518 "i965/fs: Add support for nir_intrinsic_shuffle"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9068>
We already throw out any variables which may have a complex use so we
just need to make sure that our mode checks don't assert if we have a
deref which may_be but not must_be nir_var_function_temp.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9068>
We can't move the shuffle to a new block so this only works if the
shuffle and the bcsel are in the same block. Fortunately, in the
motivating case, this is true.
Also, we have to be careful around discard. We could try really hard to
just avoid moving them past discard but we choose to simply bail if we
see a discard instead.
Fixes: 4ff4d4e569 "nir/opt_intrinsic: Optimize bcsel(b, shuffle..."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9068>
The nvc0 nouveau backend already claimed to support format modifiers, but
in practice it ignored them when allocating buffers outside of a
perfunctory check for the linear modifier in the first element of the
format modifier list.
This change deduces the supported modifiers, if any, for a given miptree
creation request, prioritizes them based on performance and memory waste
properties, compares the requested modifiers against the prioritized list
of supported modifiers, and overrides the internal layout calculations
based on the layout defined by the resulting modifier.
Additionally, if modifiers are provided and none are compatible with the
miptree creation request, the function now fails. This brings the nouveau
behavior in line with other drivers such as i965 and etnaviv.
Signed-off-by: James Jones <jajones@nvidia.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724>
Other drivers fail resource allocation when a list of modifiers for the
resource is provided but none are supported. This includes cases when the
never-supported DRM_FORMAT_MOD_INVALID modifier is explicitly passed. To
enable matching that functionality in nouveau, use an empty modifier list
rather than creating a one-entry list containing only
DRM_FORMAT_MOD_INVALID when the non-modifier resource creation function is
used.
This change stops short of failing allocations when no modifier is
specified, because the current code ignores all modifiers except the linear
modifier when creating resources, so there is not yet a framework in place
to determine which modifiers are valid for a given resource creation
request, and hence no way to reject only those which are invalid.
Signed-off-by: James Jones <jajones@nvidia.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724>
Replace existing usage of the NVIDIA_16BX2_BLOCK format modifiers with
parameterized use of the more general macro. Nouveau will now report
support for slightly different modifiers depending on whether the
underlying chip is a tegra GPU or not, and will potentially report valid
format modifiers for more resource types, but overall this should be a
functional no-op for existing applications.
Signed-off-by: James Jones <jajones@nvidia.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724>
Older Tegra GPUs use a different sector bit swizzling layout than desktop
and Xavier GPUs. Hence their format modifiers must be differentiated from
those of other GPUs. As a precursor to supporting more expressive block
linear format modifiers, deduce the sector layout used for a given GPU from
its chipset and stash the layout in the nouveau screen structure.
Signed-off-by: James Jones <jajones@nvidia.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724>
this writes the pipeline cache to disk on shutdown
ideally we'd rather write this incrementally any time we make a new pipeline,
but that ends up breaking the disk cache infrastructure since we're always writing
to the same file, so this is the best we can do for now
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9094>
Draw-time variants are still synchronous, but I'm not sure there is much
(easy) benefit from generating them asynchronously. Without patching
the cmdstream later before batch submit, we'd end up waiting for them
immediately. But we should mostly only hit draw-time variants for
desktop GL (and mostly legacy features).
Note: new xfb xfail on a5xx, but most of the xfb tests are already xfail
so I think we just managed to change the timing a bit, rather than this
being related to async compile.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3857
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8795>
There was just a single remaining caller of ir3_shader_create_compute(),
so fold that into ir3_shader_compute_state_create().
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8795>
This had only a single caller, so no need to be exported. With
that done, fold the ir3_shader creation (ie. the cheap part) into
ir3_shader_state_create(), and rename what is left.
This is prep to moving initial variant creation to a work queue.
It does slightly change the error handling, in that we don't
cleanup the shader hwcso. We wouldn't be able to do this anyways
with async compile. But it ends up using the same error handling
paths that we'd hit if we got a compile failure for a draw-time
variant.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8795>
All the ir3 using backends already call ir3_screen_init(), so lets just
move compiler creation there.
In a subsequent patch, we'll add initialization of the queue for async
compile.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8795>
These were identical between a5xx and a6xx, so move into shared helper
that can be directly plugged into pctx, similar to the various 3d shader
state creation.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8795>