The write flag is redundant since it can be inferred easily from the
iris_address::access domain. This allows the iris_address struct to
be laid out more efficiently in memory, leading to a measurable
improvement in several Piglit Drawoverhead test-cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
Probably the most annoying patch to review from the whole series --
Mark every buffer object use as accessed through some caching domain
with the sequence number of the current synchronization section of the
batch. The additional argument of iris_use_pinned_bo() makes sure I'd
have gotten a compile error if I had missed any buffer added to the
batch validation list.
There are only a few exceptions where a buffer is left untracked while
adding it to the validation list, justified below:
- Batch buffers: These are strictly read-only for the moment.
- BLORP buffer objects: Their seqnos are bumped manually at the end
of iris_blorp_exec() instead, in order to avoid plumbing domain
information through BLORP address combining.
- Scratch buffers: The contents of these are strictly thread-local.
- Shader images and SSBOs: Accesses of these buffers are explicitly
synchronized at the API level.
v2: Opt out of tracking more aggressively (Ken): In addition to the
above, surface states, binding tables, instructions and most
dynamic states are now left untracked, which means a *lot* more BO
uses marked IRIS_DOMAIN_NONE which need to be reviewed extremely
carefully, since the cache tracker won't be able to provide any
coherency guarantees for them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This delimits all batch operations which access memory between
iris_batch_sync_region_start() and iris_batch_sync_region_end() calls.
This makes sure that any buffer objects accessed within the region are
considered in use through the same caching domain until the end of the
region.
Adding any buffer to the batch validation list outside of a sync
region will lead to an assertion failure in a future commit, unless
the caller explicitly opted out of the cache tracking mechanism.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This introduces some minimalistic infrastructure which will be used in
order to partition the batch into a series of sections, each one with
a unique, monotonically-increasing sequence number. Section
boundaries will typically lie at points in the batch where the
execution and memory coherency status of some previous commands are
known, e.g. at batch buffer boundaries or PIPE_CONTROL commands.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
The purpose of this is to represent the cache coherency state of a
buffer as a vector of integers (AKA seqnos), one for each incoherent
caching domain of the GPU. A seqno will identify a single section of
a batch buffer uniquely across the whole pipe_screen (which means that
there will be no ambiguity about what context a given seqno belongs to
even if there are multiple threads accessing the same buffer in
parallel), and is guaranteed to be allocated in monotonically
increasing order within any given context. The iris_bo_bump_seqno()
helper is provided for marking the last update of a buffer from a
given caching domain in a lockless manner.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
Emulating them will be a rather annoying dance. Let's not worry about
this until further down the line when we have a better sence of how to
do handle them efficiently.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5290>
In this case, vs->writes_point_size is true as the VS writes
gl_PointSize, but panfrost_writes_points_size() is false as we are not
drawing points so the hardware doesn't process it. Thus the varying
descriptor is emitted but elements is never written. When the VS runs,
it will attempt to write to elements, a NULL pointer.
The behaviour is architecture-independent. On Midgard, the write
silently fails, hence why this bug was never noticed before. On Bifrost,
this raises an MMU fault.
The fix is to set the format to VARYING_DISCARD to ignore the write.
Noticed on Neverball.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5290>
It's like gl_FragCoord. Still not implemented. This unfortunately makes
point sprites a lot more complicated.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5290>
We're nearly out of dirty bits, and some patches pending review on
GitLab no longer apply due to that. Make room for them by splitting
off shader stage-specific bits into a separate stage_dirty mask.
An alternative would be to split compute-related bits into a separate
mask, but that would prevent the '<< stage' indexing done in various
parts of the driver from working.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5279>
This makes iris_batch_prepare_noop() return a boolean instead of
passing through the relevant set of dirty flags. It will make it
easier to change the representation of dirty flags.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5279>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5318>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5318>
Needed by the next patch, for c++ code which is more strict about
conversions between integers and enums.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5318>
We need to pass it thru to EARLY_Z and WRITES_GLOBAL instead of ignoring
and assuming respectively. Nontrivial performance fix.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5300>
Via the ES3.1 early-z testing force, I've confirmed this bit is e-z.
I've also confirmed e-z must be disabled for global writes, as expected.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5300>
16-bit subgroup ops are implemented with 32-bit instructions
on GFX6-GFX7.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5227>
SDWA is only GFX8+. Use v_mov_b32 since the upper 16 bits don't matter.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5227>
At the NIR level this is a second vector source of the first (only)
argument; at the BIR level this is a pair of scalars.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5307>
Instead of the vendored version. Only for blend shaders at the moment,
frag shaders fb_fetch has a lot more going on.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5285>
Now that we can handle destination sizes directly, this keeps us from
needing to chew through so many conversions.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5285>
Since these shaders are purely internal, the optimization criteria are a
bit different, so it's worth calling attention to this when dumping.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5285>
Here we turn on uniform array resizing in the NIR linker and disable
the GLSL IR resizing pass when the NIR linker is enabled.
This will potentially make uniform arrays smaller due to NIR
optimising away more uniform uses.
Shader-db results (SKL):
total instructions in shared programs: 14947192 -> 14944093 (-0.02%)
instructions in affected programs: 138088 -> 134989 (-2.24%)
helped: 822
HURT: 4
total cycles in shared programs: 324868402 -> 324794597 (-0.02%)
cycles in affected programs: 3904170 -> 3830365 (-1.89%)
helped: 2333
HURT: 1485
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4910>
We want to gather information for all stages here before the main
linking loop. In the following patch we will use to information
to reduce the size of uniform arrays where possible.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4910>
This will be used to reduce the size of uniform arrays and replace
the current glsl ir pass. Doing this in NIR allows us to better
optimise the size of uniform arrays.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4910>
i965 currently uses the NIR uniform linker for spirv support. Until
now the only reason there has been no issue with calling the
lowering pass before the linker is because no garbage collection
is done between the calls.
An upcoming change to the linker will add an optimisation to resize
unform arrays where possible. Because lowering causes the array
defs to no longer be used the new optimisation ends up resizing the
arrays to 0. To fix this we move the lowering call after the
linking calls.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4910>
Drivers (Gallium, i965) expose a linear view of the buffer via
gbm_bo_map.
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Daniel Stone <daniel@fooishbar.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5238>