Without TC, we'd get draw->count==0.. which is obviously not correct.
With TC we get random garbage for draw->count. Which turns into
exciting things like trying to allocate multi-gigabyte buffers for
tess param/factor buffers.
But we can just tell the CP to split up large tess draws, and put an
upper bound on the tess param/factor buffer sizes.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9864>
when building with Meson. It requires libexpat that is not available
on Android and we want to avoid it.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9848>
nir_to_tgsi lowers nir_op_isub to UADD and the negate source mod
on the second operator. Since r600 doesn't support source mods
for integers but support SUB_INT we switch the opcode and clear the
negate flag.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9579>
Since the last round of enhanced layout packing, it seems like this code
is no longer needed. This allows us to use the HANDLE_EMIT_BUILTIN()
macro, which simplifies things further.
It would have been incorrect anyway, because it's the only code that
uses the location directly as an index. All other locations multiplies
it by four first.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9860>
Fixes the following building error:
In file included from external/mesa/src/amd/addrlib/src/gfx9/gfx9addrlib.cpp:36:
external/mesa/src/amd/addrlib/src/chip/gfx9/gfx9_gb_reg.h:43:2: error: "BIGENDIAN_CPU or LITTLEENDIAN_CPU must be defined"
^
NOTE: Android ABI specifies Little Endian for arm and x86 targets
Fixes: 3616e02ef3 ("amd/addrlib: define endianess differently")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9869>
If we've added the array, then we should update the info. This is the
value that gallium drivers setting !PIPE_CAP_CLIP_PLANES have to use in
place of rasterizer->clip_planes_enabled.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9815>
Since image stores can now compress and we can't track image stores
this also stops using predication for DCC decompression.
In GFX10 this was benchmarked to be faster. For GFX10.3 the microbenchmarks
are not as possible though I haven't tested any games, so this is not enabled
there yet.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6796>
For 16x16 we get 4 16x4 waves, which is bad for DCC image stores.
The workgroup size doesn't really matter for speed, the important
part is the number of waves, which should stay constant here.
(Though some optimization would be nice, but out of scope for this
patch)
The compute DCC compress shader still uses 16x16 due to functional
requirements (and we're sure it won't write with DCC compression ...)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6796>
We are providing a BO with the default attribute values for the
GL_SHADER_STATE_RECORD, that contains 16 vec4. Such default value for
each vec4 is (0, 0, 0, 1). As the attribute format could be int or
float, the "1" value needs to take into account the attribute format.
But in the practice, the most common case is all floats. So we create
one default attribute values BO assuming that all attributes will be
floats, and we store it at v3dv_device and only create a new one if a
int format type is defined. That allows to reduce the amount of BOs
needed.
Note that we could still try to reduce the amount of BOs used by the
pipelines if we create a bigger BO, and we just play with the
offsets. But as mentioned, that's not the usual, and would add an
extra complexity,so it is not a priority right now.
This makes the following test passing when disabling the pipeline
cache support:
dEQP-VK.api.object_management.max_concurrent.graphics_pipeline
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9845>
The upper 32 bits are truncated before converting, which still produces
correct results since they never meaningfully contribute to the result.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
The previous code produced incorrect results for inputs outside the
range [INT16_MIN, INT16_MAX].
A problematic case is e.g. i2f16 32768, which previously would be
converted to -32768.0 instead of returning the exactly representable
floating point result.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
This function has evolved to be a generic helper function used throughout
the file, so having those assumptions written down explicitly and document
unsupported edge cases should help prevent incorrect use.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
This doesn't change semantics but allows us to reject this potentially
ambiguous configuration in convert_int in a later change.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
Previously, inputs such as 0x100000000 would have their upper 32-bits
ignored despite being representable by 32-bit floats.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
We might only need to emit a read or write cap for a given image. This
could provide the Vulkan driver with the chance to optimize things
slightly.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9855>
I don't think the hw culls these primitives and NGG culling isn't
yet a thing. This also matches PAL.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9691>
This reverts commit 5743a36b2b.
Now that _eglAddDevice is always called with the correct software
hint, no need to bail out if the device doesn't have a render node.
On split render/display SoCs, the DRM device won't have a render
node, yet rendering is hardware-accelerated (via kmsro).
Signed-off-by: Simon Ser <contact@emersion.fr>
Fixes: 5743a36b2b ("egl: Don't add hardware device if there is no render node v2.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4178
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9697>
We don't want to expose an EGL device for a display-only DRM devices
(like VKMS). For these DRM devices we have a separate software-rendering
device (the first in the list, always present).
There is a similar check in _eglAddDRMDevice, however it will be
removed in a future commit to allow split render/display devices
to be properly added. We can't figure out whether we're on a split
render/display system before loading the driver.
Signed-off-by: Simon Ser <contact@emersion.fr>
Fixes: 5743a36b2b ("egl: Don't add hardware device if there is no render node v2.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9697>
On the EGL DRM platform, call _eglAddDevice with the software flag
set if GBM has loaded a software driver. This allows _eglAddDevice
to make the difference between llvmpipe and kmsro.
This is important on split render/display SoCs: we don't want to
advertise EGL_MESA_device_software on these systems.
Completely drop disp->Options.ForceSoftware, because GBM is
responsible for choosing software rendering and doesn't take this
hint into account.
Signed-off-by: Simon Ser <contact@emersion.fr>
Fixes: 5743a36b2b ("egl: Don't add hardware device if there is no render node v2.")
References: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4178
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9697>
"plane[0].i32" is the plane being lowered, it's not the sampler we're looking
for.
It worked when there's a single sampler because, eg for NV12, plane[0].i32 for
the UV plane would be 1 and the added ":uv" sampler would also land at binding
point 1.
Fixes: 079e5f73d7 ("mesa/st: rewrite src var when lowering tex_src_plane")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9812>
Now that we have surface descriptors defined in midgard.xml we can use
pan_pack() to emit them.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9827>
Right now the code manipulates mali_ptr, but having surface descriptors
properly defined will allow us to use the descriptors allocator when
allocating a midgard texture.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9827>
Got the alignment wrong a few times, so I figured it would be good to
have helpers that take care of the alignment requirements based on
the midgard.xml definitions.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9827>
Once we have that in place, we can provide macros to simplify descriptor
allocation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9827>
The way we handle thrsw instructions is that we try to merge them
back into previously scheduled instructions to fill up its delay
slots. This is generally safe, because the thrsw won't happen until
after the delay slots, so we are not really changing the execution
order of the instructions and we just need to make sure we don't
violate a few specific restrictions.
If we have not managed to fill up all delay slots after doing this,
then we emit as many NOPs as needed to fill them. This is to ensure
that we don't schedule an instruction that needs to execute after the
thread switch before the thread switch happens. However, doing this
can lead to inefficient code, since some times the instructions we
schedule after a thrsw are indepdent of the thrsw and could be safely
executed in its delay slots.
This change removes the fixed NOP emission after a thrsw to fill
delay slots and instead adds code to ensure that our instruction
scheduling is aware of when it is scheduling instructions in the
delay slots of a previous thrsw to avoid selecting conflicting
instructions.
The only case were we still emit fixed NOPs is for the thread end that
we emit to terminate the program after scheduling all instructions
because we can't end the instruction stream before the thread end
is properly executed.
total instructions in shared programs: 13691004 -> 13648140 (-0.31%)
instructions in affected programs: 4345951 -> 4303087 (-0.99%)
helped: 19645
HURT: 652
Instructions are helped.
total max-temps in shared programs: 2319317 -> 2318687 (-0.03%)
max-temps in affected programs: 10510 -> 9880 (-5.99%)
helped: 532
HURT: 9
Max-temps are helped.
total sfu-stalls in shared programs: 31752 -> 32354 (1.90%)
sfu-stalls in affected programs: 840 -> 1442 (71.67%)
helped: 7
HURT: 467
Sfu-stalls are HURT.
total inst-and-stalls in shared programs: 13722756 -> 13680494 (-0.31%)
inst-and-stalls in affected programs: 4335590 -> 4293328 (-0.97%)
helped: 19453
HURT: 758
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9825>
We have helpers to check if an instruction writes to specific
accumulators. This one will check if it writes any of the general
purpose accumulators, which will come in handy in a follow-up
patch.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9825>
This reverts the resource_get_param changes from Tomeu's commit
ab7744b280. It breaks every single
program run on iris (such as wflinfo -a gl -p glx).
iris's I915_FORMAT_MOD_Y_TILED_CCS modifier reports two planes - one for
the main surface, and a second one for the CCS surface. However, there
is only one underlying pipe_resource. We only use pipe_resource::next
for modifier information when importing buffers from elsewhere; there is
no res->next for internally allocated resources created with modifiers.
resource_get_param() is not supposed to chase res->next pointers. The
hook is intended to take the base image, and the plane, as parameters.
The driver can then walk its own data structures however it sees fit
in order to find the appropriate plane, rather than enforcing a linked
list of resources.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9847>
This ensures we get a properly aligned size for the buffer so we don't
trip over HW limits for push constants.
Closes#3703
Fixes dEQP-VK.robustness.image_robustness.push.* on HSW
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9699>
And use ANV_UBO_ALIGNMENT for it instead of a magic number.
This increases the alignment to 64B, but that ought to be good for
everyone.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9699>
We're using it for in-memory cache as well, so it needs to be computed
unconditionally.
Fixes: bf09ba5385 ("lima: implement shader disk cache")
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9838>
If we're going to have a #define for UBO alignments, it's probably a
good idea to make sure everything is aligned to that. This increases
the alignment from 32B to 64B but that shouldn't hurt anyone.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9837>