To make it clear that only GFX8-9 have missing DCC features.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9526>
addrlib uses the S swizzle mode which disables DCC completely.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9526>
Reported by Coverity.
Fixes: 0a7224f3ff ("anv: group as many command buffers into a single execbuf")
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9596>
V3D 4.x allows more flexibility, so take advantage of that. Generally,
we can reorder any writes in the same sequence, so long as they are
not the sequence terminator (which must always be last, since it is
the one triggering the operation), and TMUD writes, since these must
be ordered with respect to each other.
total instructions in shared programs: 13735183 -> 13731927 (-0.02%)
instructions in affected programs: 903057 -> 899801 (-0.36%)
helped: 2358
HURT: 746
Instructions are helped.
total max-temps in shared programs: 2322020 -> 2322009 (<.01%)
max-temps in affected programs: 619 -> 608 (-1.78%)
helped: 19
HURT: 11
Inconclusive result (value mean confidence interval includes 0).
total sfu-stalls in shared programs: 31494 -> 31489 (-0.02%)
sfu-stalls in affected programs: 182 -> 177 (-2.75%)
helped: 40
HURT: 40
Inconclusive result (value mean confidence interval includes 0).
total inst-and-stalls in shared programs: 13766677 -> 13763416 (-0.02%)
inst-and-stalls in affected programs: 901343 -> 898082 (-0.36%)
helped: 2349
HURT: 746
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9555>
Instead of using a write depdency. We use last_tmu_config to ensure ordering
of instructions participating in different TMU sequences. To this end,
all sequence terminators flag a write dependency on last_tmu_config, but
wrtmuc is not a sequence terminator, so we can be more flexible by flagging
it as a read depedency. This would prevent it to be moved into a previous
sequence (since it cannot be moved past the previous sequence terminator due
to the read depedency), but it allows it to be reordered with instructions in
the same sequence, which allows us to pair it up more effectively. Particularly,
it allows to pair up a wrtmuc with the sequence terminator of the same sequence,
turning code like this:
nop ; mov tmut, r0 ; thrsw; wrtmuc (tex[0].p0 | 0x3)
nop ; nop ; wrtmuc (tex[0].p1 | 0x0)
nop ; mov tmus, r1
Into this:
nop ; mov tmut, r0 ; thrsw; wrtmuc (tex[0].p0 | 0x3)
nop ; mov tmus, r1 ; wrtmuc (tex[0].p1 | 0x0)
total instructions in shared programs: 13755738 -> 13735183 (-0.15%)
instructions in affected programs: 2510921 -> 2490366 (-0.82%)
helped: 10963
HURT: 485
Instructions are helped.
total max-temps in shared programs: 2322828 -> 2322020 (-0.03%)
max-temps in affected programs: 11303 -> 10495 (-7.15%)
helped: 608
HURT: 19
Max-temps are helped.
total sfu-stalls in shared programs: 31545 -> 31494 (-0.16%)
sfu-stalls in affected programs: 235 -> 184 (-21.70%)
helped: 62
HURT: 11
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 13787283 -> 13766677 (-0.15%)
inst-and-stalls in affected programs: 2525187 -> 2504581 (-0.82%)
helped: 10989
HURT: 477
Inst-and-stalls are helped.
v2: add a comment explaining the read depdency (Piñeiro).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9555>
We now handle compression in the shared cache item creation code.
Compressing the cache item header with the already compressed blob
doesn't help much so lets just remove it.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9593>
This makes compression use more consistent between the zstd and
zlib libraries. It also reduces the amount of code required for
zlib use.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9593>
This will be used by the following patch. It allows us to detangle
compression from the disk cache code, and abstract the underlying
compression libraries we use.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9593>
we already have the batch usage info here for the resource, so if we know
the resource is already used on the batch then we don't need to also perform
a hash lookup to double check that it's really there
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9565>
Fix defect reported by Coverity Scan.
Uninitialized pointer field (UNINIT_CTOR)
member_not_init_in_gen_ctor: The compiler-generated constructor
for this class does not initialize r63.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9326>
this was only needed to cover up some other bugs:
* missing barriers for buffer sampler/image descriptors
* weirdness with first frame handling
there's better ways of handling both cases, and now they're handled better
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9566>
list_length() complexity is O(n), so it's better to store number of regs
separately and use it instead of list_length().
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9570>
This flag will be used by run from mesa-shader-db to trigger shader
compilation with default settings.
Tested-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9583>
The in-memory shader cache can get significantly
huge in some rare cases.
Limit its size to 64MB on 32 bits, and 1GB else.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9578>
The disadvantages of the DYNAMIC path over the
non-dynamic path are minor.
The advantages are many.
As we don't know if bad behaving apps use
non-dynamic SYSTEMMEM in a dynamic fashion,
let's be safe and always be dynamic.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9451>
SW vertex processing buffers are supposed to be sorted in RAM
and to be immediately idle after use (thus you can write at the
same location again immediately).
DYNAMIC SYSTEMMEM is by far the best fit for now for these kind
of buffers, though it can be improved further. Indeed the use
pattern will cause a lot of syncs with csmt actived.
Thus disable csmt when full sw vertex processing is requested.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9451>
Some apps use DYNAMIC SYSTEMMEM buffers and fill them in a
dynamic fashion with discard and nooverwrite locking flags.
To prevent uploading the whole buffer every draw call,
track the region needed for the draw call, and
upload only that region (or a bit more in order
to ease valid region tracking).
Try to aggressively upload with discard/unsynchronized.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9451>