We now track per-stage bind history for constant and shader buffers,
shader images, and sampler views by adding an extra res->bind_stages
field to go with res->bind_history.
This lets us flag IRIS_DIRTY_CONSTANTS for only the specific stages
involved, and also skip some CPU overhead in iris_rebind_buffer.
Cuts 4% of 3DSTATE_CONSTANT_XS packets in a Shadow of Mordor trace
on Icelake.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The underlying buffer isn't changing - so we don't need to update any
SURFACE_STATE descriptors - we just might have new constants, meaning
we need to re-emit 3DSTATE_CONSTANT_XS. On Gen9, this means we need
to update 3DSTATE_BINDING_TABLE_POINTERS_XS too, but that's now handled
by the explicit check in the previous patch.
On Gen9, this should cause us to re-emit the binding table /pointer/ on
writing to a buffer with PIPE_BIND_CONSTANT_BUFFER, rather than emitting
a whole new /table/.
On Gen8 and Gen11, this avoids binding table churn altogether.
Cuts 61% of 3DSTATE_BINDING_TABLE_POINTERS_XS packets in a Shadow of
Mordor trace on Icelake.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Right now, we usually flag both IRIS_DIRTY_{CONSTANTS,BINDINGS}_XS,
because we have SURFACE_STATE for constant buffers in case the shaders
access them via pull mode.
But this flagging is overkill in many cases. Gen8 and Gen11 don't need
it at all. Gen9 doesn't need that large of a hammer in all cases.
Just handle it explicitly so the right thing happens.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We upload a new SURFACE_STATE for the UBO/SSBO in question, which
means that we need new binding tables as well.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Apparently we already enabled it without having support ...
Not sure if we also need to set disable_start_of_prim when the PS
has memory writes, but this mirrors radeonsi.
Doubles fillrate in my dual_quad_bench from ~16 pixels/cycles to
~32 pixels/cycle on a Raven.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The pass assumed that "Most ALU ops produce an undefined result if any
source is undef" which is completely untrue. Due to how we lower if
statements to selects and then optimize on those selects later, we
simply cannot make that assumption. In particular this pass tried to
replace an ior of undef and true, which had been generated by
optimizing a select which itself came from flattening an if statement,
to undef causing a miscompilation for a CTS test with radeonsi NIR.
We fix this by always doing what the non-undef path did, i.e. duplicate
the instruction twice. If there are cases where the instruction before
the loop can be folded away due to having an undef source, we should add
these to opt_undef instead.
The comment above the pass says that if the phi source from before the
loop is undef, and we can fold the instruction before the loop to undef,
then we can ignore sources of the original instruction that don't
dominate the block before the loop because we don't need them to create
the instruction before the loop. This is incorrect, because the
instruction at the bottom of the loop would get those sources from the
wrong loop iteration. The code never actually did what the comment said,
so we only have to update the comment to match what the pass actually
does. We also update the example to more closely match what most actual
loops look like after vtn and peephole_select.
There are no shader-db changes with i965, radeonsi NIR, or radv. With
anv and my vkpipeline-db there's only one change:
total instructions in shared programs: 14125290 -> 14125300 (<.01%)
instructions in affected programs: 2598 -> 2608 (0.38%)
helped: 0
HURT: 1
total cycles in shared programs: 2051473437 -> 2051473397 (<.01%)
cycles in affected programs: 36697 -> 36657 (-0.11%)
helped: 1
HURT: 0
Fixes
KHR-GL45.shader_subroutine.control_flow_and_returned_subroutine_values_used_as_subroutine_input
with radeonsi NIR.
Akin to 1a25980c46 ("egl: drop incorrect pkg-config file for
glvnd") and b01524fff0 ("meson: don't build libGLES*.so with
GLVND") , removes a pkg-config file that shouldn't have been there in
the first place, but was needed because of that GLVND bug.
Now that the glvnd bug has been fixed, it was apparent that this gl.pc
pkg-config file was forgotten to be removed, so let's do just that :)
Suggested-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Having Python and C variables sharing name in the same block of code
makes its understanding a bit confusing. Make it explicit that the
Python bit_size variable refers to the destination bit size.
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Seems to fix a hang with excessive vertex emissions when NGG is used for
GS.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
No GS copy shader if a pipeline enables NGG GS.
This fixes
dEQP-VK.pipeline.executable_properties.graphics.*geometry_stage*.
Fixes: 86864eedd2 ("radv: Implement radv_GetPipelineExecutablePropertiesKHR.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* Set missing STENCIL_CONFIG_EXT2 bits
* Swap stencil sides when rendering CCW
Fixes following deqp tests (which were 99% failing):
dEQP-GLES2.functional.fragment_ops.depth_stencil.*
Note: deqp tests require --deqp-gl-config-name=rgba8888d24s8ms0
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
We have to load 2 32-bit integer and to cast correctly.
This fixes crashes with gs-double-interpolator.vk_shader_test.
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111734
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Use the fastest way only if both aspects are used. Oops.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111728
Fixes: 218ce34962 ("radv: add mipmap support for the clear depth/stencil values")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is needed in particular to get a recent enough version of meson in
the stretch image, but should be generally beneficial.
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Pros:
* Less fragile due to not mixing packages from stretch and buster
* No longer need to use third-party LLVM packages
* The buster image now uses GCC 8 for C++ as well (previously 6 for C++,
8 for C), allowing to drop some hacks
Con:
* The stretch image now only uses GCC 6 for C as well as C++
* Need separate jobs for testing old LLVM versions
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
If installing new packages would require removing previously installed
ones, this flag causes apt-get to abort with an error instead,
preventing later obscure failures due to the missing packages.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes: 89785e2d56 ("i965: add support for sampling from AYUV")
Fixes: 7cab8d3661 ("i965: Add support for sampling from XYUV images")
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
If we want to execute several batches in parallel they need to have
their own tiler and scratchpad BOs. Let move those objects to
panfrost_batch and allocate them on a per-batch basis.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If we want the batch dependency tracking to work correctly we must
make sure all BOs are added to the batch->bos set early enough. Adding
FBO BOs when generating the fragment job is clearly to late. Add a
panfrost_batch_add_fbo_bos helper and call it in the clear/draw path.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This helper automates the panfrost_bo_create()+panfrost_batch_add_bo()+
panfrost_bo_unreference() sequence that's done for all per-batch BOs.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We don't know who else is using the BO in that case, and thus shouldn't
re-use it for something else.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Thanks to that we avoid the recursive call into panfrost_bo_create()
and we can get rid of panfrost_bo_release() by inlining the code in
panfrost_bo_unreference().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
panfrost_bo_unreference() should be used instead.
The only difference caused by this change is that the scratchpad,
tiler_heap and tiler_dummy BOs are now returned to the cache instead
of being freed when a context is destroyed. This is only a problem if
we care about context isolation, which apparently is not the case since
transient BOs are already returned to the per-FD cache (and all contexts
share the same address space anyway, so enforcing context isolation
is almost impossible).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Store a screen pointer in panfrost_bo so we don't have to pass a screen
object to all functions manipulating the BO.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
panfrost_bo_mmap() already takes care of that.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
They are not expected to be called directly, users should use
panfrost_bo_{create,release}() instead.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Right now, the BO API is spread over pan_{allocate,resource,screen}.h.
Let's move all BO related definitions to a separate header file.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Change the prefix for BO allocation flags to make it consistent with
the rest of the BO API.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>