Easier to understand (and match to actual hardware behaviour) this way.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10271>
This was renamed when I was in high school. I remember updating the
Midgard compiler while sitting in AP Physics.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10296>
We already support everything necessary and just need to ask the frontend
to DTRT. This makes UBO0 get more tightly packed, saving upload space,
and allows for _mesa_optimize_state_parameters() as well.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10150>
Since we can handle arbitrary offsets in the load_ubo paths, we can let
the GLSL compiler pack UBO 0 tighter, saving uniform uploads. This may
cause some straddling loads that could reduce performance for vec3s, but
those are rare in shader-db and we expect this to be outweighed by the
wins for normal float/vec2 packing.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10151>
Without this extension, we misrender when custom border colors are
used. Let's document this, and emit a warning when the extension is
missing.
Reviewed-by: Joshua Ashton <joshua@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10316>
Add vn_queue_submission_count_batch_semaphores that works on a single
batch at a time. Also avoid code duplication.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10290>
In looking at the profile of dEQP, GLES3 was spending 5-10% of its time in
ReadPixels, and almost all of that is b8g8r8a8_unorm8. It's really slow
because we're getting about 47MB/s by doing uncached reads 32 bits at a
time in the code-generated unpack. If we use NEON to generate larger bus
transactions, we can speed things up to 136MB/s. In comparison, raw
ldr/str read/writes with no byte swapping can hit a max of 216MB/sec.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10014>
We have only a few callers of unpack that do rects, so add a helper that
iterates over y adding the strides. This saves us 36kb of generated code
and means that adding cpu-specific variants for RGBA format unpack will be
much simpler.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10014>
OpTerminateInvocation provides the behavior required by the GLSL
discard statement, which we already implement.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9460>
The "demote" intrinsic has the semantics of D3D discard, which means
it doesn't change the control flow, allowing derivatives to work.
On A6xx there is no known way to check whether invocation was demoted,
thus we use nir_lower_is_helper_invocation.
Add "logical" OPC_DEMOTE which is later translated to "kill".
Such separation is necessary to run "kill" specific optimizations
which are invalid for "demote".
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9460>
Some hardware doesn't have a way to check if invocation was demoted,
in such case we have to track it ourselves.
OpIsHelperInvocationEXT is specified as:
"An invocation is currently a helper invocation if it was originally
invoked as a helper invocation or if it has been demoted to a helper
invocation by OpDemoteToHelperInvocationEXT."
Therefore we:
- Set gl_IsHelperInvocationEXT = gl_HelperInvocation
- Add "gl_IsHelperInvocationEXT = true" right before each demote
- Add "gl_IsHelperInvocationEXT = gl_IsHelperInvocationEXT || condition"
right before each demote_if
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9460>
Make sure to preserve signed zeroes.
Fixes dEQP-VK.spirv_assembly.instruction.compute.opquantize.flush_to_zero
on GFX6 (Pitcairn). Untested on GFX7.
Fixes: 54a09545ec ("aco: optimize a*0.0")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10319>
immediates_count and immediates_size are supposed to have the same
units, but it was only incrementing immediates_count by 1. While we're
here, also fix the case where constants are specified out-of-order.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10291>
Otherwise we can end up in situations like having divide-by-zero. If the optimization is smart enough
that we end up with a *constant* divide-by-zero, then the DXIL validator will fail to sign, which
can trigger fatal errors with CLOn12.
We want to run an initial translation of all kernels during program build, but at that point we don't
know the local size to be able to specify it through kernel specialization data.
v2: Metadata output of 0 is used to indicate that the size wasn't explicitly specific. Copy the
size to the metadata before overriding it to (1,1,1). If conf was explicitly specified,
update the metadata again (though nobody should be paying attention to it).
Closes: https://github.com/microsoft/OpenCLOn12/issues/20
Closes: https://github.com/darktable-org/darktable/issues/8700
Reviewed-By: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10303>
Certain games create and destroy lots of resources without binding them.
This can take quite a bit of time and even create unneeded
synchronization. However, we know that if a resource was never bound to
anything, it can be cached. This change does that.
Counting the number of uncached allocation with a tabletop simulator trace:
Before: 2967 uncached allocations over the replay
After: 24 uncached allocations over the replay
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10225>
If a has been lowered to float16 here, then we end up trying to
construct a vector of mixed precision, which the validator asserts
about.
So let's make sure we use the same type for all arguments.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10201>
This is going to make it easier to implement the custom border-color
extension.
While we're moving the code, tweak the memset code a bit, so we don't do
any float-ism in the int-case. It doesn't change anything functionally,
just makes it slightly clearer what's going on here.
Reviewed-by: Joshua Ashton <joshua@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10320>
this lets us use the more consistent codepath and ordering (buffers -> elems)
as well as set take_ownership=true to skip the unref
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10257>
tcs user vars are var_size[32], which isn't actually how many slots they need,
just how big the variable is (oops), so this needs to be divided
by MAX_PATCH_VERTICES to get the real slot count
slot mapping has always been broken for all tcs inputs, but this probably fixes
all of the related issues there, including unlimited crashes when playing Tomb Raider
Fixes: 2d98efd323 ("zink: pre-populate locations in variables")
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10269>
tcs user inputs need to have their size adjusted in order to determine whether
they'll overflow the existing slot map
Fixes: 5c5e1abea2 ("zink: evaluate existing slot map during program init and force new map as needed")
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10269>
nir_convert_from_ssa and assign_io_locations both modify this unconditionally,
the latter of which possibly re-modifies variables in ways that can break the
slot map and cause stack overflows during vk driver pipeline compilation
Fixes: 2b4609b66c ("zink: run nir_convert_from_ssa last during compile")
Fixes: 2d98efd323 ("zink: pre-populate locations in variables")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10269>
This cap should only ever be emitted for fragment-shaders, but we
accidentally emit it for all shaders. Let's tighten the check to avoid a
validator warning when emitting non-fragment shaders without support for
VK_EXT_shader_stencil_export.
Fixes: 8724d4fb36 ("zink: check shader stencil output")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10249>
This reverts commit acdf1a1234.
While this commit fixed the gles CTS regressions, it introduced
regressions that made the driver unusable, hence the revert.
Closes#4657
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10311>
With the new Vulkan Video extensions a bunch of enum values have been
added. The problem is that those are behind #ifdef so our generated
code complains that we using enum values not defined.
Since we're not using enum names but integer values, we can silence
all those by casting the enum type to int64_t.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10304>
This ends up polluting the namespace if you build iris/crocus
at once, just move it to where it's used for now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10308>
Only internal compute shaders use DCC stores, so the TODOs are not
critical yet.
Fixes: 1d64a1045e - radeonsi: enable dcc image stores on gfx10+
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10261>
The lowering code removes the "VS inputs" item from the list because the hw
doesn't support indirect indexing of VS inputs.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10261>
because gfx103.json is automatically generated and can't be changed
manually. This fixes the file generator without changing the generated
header.
Missing registers must be in registers-manually-defined.json, and
missing fields must be in parse_kernel_headers.py.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10261>
A helper called from vn_wsi_create_scanout_image to hijack wsi scanout
image creation.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10294>
The prog_to_nir->NIR-to-TGSI change ended up causing regressions on r300,
and svga against r300-class hardware, because nir_lower_uniforms_to_ubo()
introduced shifts that nir_lower_ubo_vec4() tried to reverse, but that NIR
couldn't prove are no-ops (since shifting up and back down may drop bits),
and the hardware can't do the integer ops.
Instead, make it so that nir_lower_uniforms_to_ubo can generate
nir_intrinsic_load_ubo_vec4 directly for !INTEGER hardware.
Fixes: cf3fc79cd0 ("st/mesa: Replace mesa_to_tgsi() with prog_to_nir() and nir_to_tgsi().")
Closes: #4602
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10194>
Our "integer" index is stored as a float in this case, and we just need to
use teh right opcode for loading it, which will be the only one supported
by !native_integers hardware.
Fixes: cf3fc79cd0 ("st/mesa: Replace mesa_to_tgsi() with prog_to_nir() and nir_to_tgsi().")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10194>
This has been confirmed to fix sporadic graphics corruption on Gen12
platforms for a number of workloads (including Heaven, Valley and
CS:GO among others). Corruption seems to occur during context switch
fairly consistently, but unfortunately this problem doesn't seem to be
documented. Until the hardware team comes up with a better
workaround, fix the problem by reemitting constants at the beginning
of each batch.
No corruption has been observed so far in GL due to preemption,
however this is a possibility to keep in mind, it may be necessary to
disable preemption in addition to this patch in order to fully address
this problem (see also 81201e4617).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4412
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4454
Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously UBOs only supported static indices, and SSBOs only
supported dynamic indices. UBO support for descriptors was added
as an alternative to static indices, but the logic for detecting
descriptors to SSBOs couldn't just differentiate on constants vs not.
Add a helper which can differentiate cleanly across the board and
handle pre-created handles from descriptors, or static/dynamic raw
indices.
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10149>
Instead of doing all of the handle logic in the descriptor load, split
it so that the resource index is actually computed during resource_index
processing, and it's converted to a handle during the load_descriptor.
At the same time, add SSBO handling and dynamic indexing handling.
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10149>
The resources need to be emitted in a particular order, so CBVs
have to be emitted first and can't be emitted as we iterate through
instructions.
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10149>
Fixes the following building error:
external/mesa/src/egl/drivers/dri2/platform_android.c:1263:10: error: use of undeclared identifier 'FALLTHROUGH'
FALLTHROUGH; /* for pbuffers */
^
1 error generated.
Fixes: 2928c21eb7 ("Convert most remaining free-form fall-through comments to FALLTHROUGH")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10264>
Instead of checking whether the attribute is settable for each
attrib type, check that once at the beginning of the loop.
Instead of having an if for each attrib type, use a switch.
Return an error if we encounter an unknown attribute. This allows
the caller to make sure settable attributes aren't ignored. The
intel media driver seems to just assert [1] that it doesn't encounter
unknown attributes.
[1]: 95d413e519/media_driver/linux/common/ddi/media_libva.cpp (L2530)
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10104>
We were using the pipeline layout to discard uniform updates for
stages that don't use descriptors, but we can do better by keeping
track of the stages used by the specific dirty descriptor sets and
only update uniforms for stages that are included in those.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10283>
In 27ee40f4c9 ("anv: Add support for sample locations") we
introduced the ability to emit sample locations baked in as part of
the pipeline or dynamically.
This is different from the previous dynamic states that were always
removed from the pipeline batch and instead emitted dynamically all
the time.
The mistake in 27ee40f4c9 is that sample locations are now emitted
all the time, leading to bigger command buffers for unnecessary
reasons.
This change introduces a bit fields of what is baked in the pipeline
and doesn't need to be dynamically emitted.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4ad4cd8906 ("anv: Enabled the VK_EXT_sample_locations extension")
Cc: <mesa-stable>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10282>
When pipeline->dynamic_state.sample_locations.samples is not set
because the state is dynamic, we're currently calling
genX(emit_multisample) with a 0 samples value which is incorrect.
Found when using renderdoc with the drawing overlay.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4ad4cd8906 ("anv: Enabled the VK_EXT_sample_locations extension")
Cc: <mesa-stable>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10282>
Fixes
dEQP-VK.api.image_clearing.core.clear_color_image.2d.optimal.single_layer.e5b9g9r9_ufloat_pack32_33x128
with RADV_DEBUG=forcecompress on GFX10.3.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 21.1 <mesa-stable>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10176>
The initialization we're doing for it in __glXExtensionsCtr is trivial,
and this is only to make glGetString(GL_EXTENSIONS) work in indirect
contexts anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10223>
These are rare enough to not be worth tracking separately. Especially
since after this change all the known_gl_extensions have N for both
direct_support and direct_only (unsurprising, since that's only used to
compute usable indirect extensions).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10223>
... at least as far as libGL is concerned, because they'd work just fine
if you tried. The only thing the ATI extension seems to add (I can't
find an official spec, this is inferred from the registry XML) is
selecting the GLX_RENDER_TYPE, which we don't validate before putting on
the wire. The only thing the NV extension adds is an additional fbconfig
attribute, and that only known by glXGetFBConfigAttrib; our
implementation of that just reads the value the server sends, if any,
and doesn't try to filter out unknown attributes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10223>
The "left here but disabled" comment dates to 2004! The idea here is to
add extensions implied by a particular GL version to the GL extension
string, but nothing useful is accomplished by doing so, and this is all
only used in the case of indirect rendering anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10223>
Without image stores, DCC is always decompressed on compute.
Cc: 21.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10168>
This would be easy to support except that it doesn't support RDNA 2.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10199>
It doesn't support Navi1x and the removal enables this nice code cleanup.
v2: rebase - mareko
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1)
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10199>
This patch includes a number of reworks and fixes squashed in by
Nanley Chery, Sagar Ghuge, Jordan Justen and Francisco Jerez.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>