Instead of using a write depdency. We use last_tmu_config to ensure ordering
of instructions participating in different TMU sequences. To this end,
all sequence terminators flag a write dependency on last_tmu_config, but
wrtmuc is not a sequence terminator, so we can be more flexible by flagging
it as a read depedency. This would prevent it to be moved into a previous
sequence (since it cannot be moved past the previous sequence terminator due
to the read depedency), but it allows it to be reordered with instructions in
the same sequence, which allows us to pair it up more effectively. Particularly,
it allows to pair up a wrtmuc with the sequence terminator of the same sequence,
turning code like this:
nop ; mov tmut, r0 ; thrsw; wrtmuc (tex[0].p0 | 0x3)
nop ; nop ; wrtmuc (tex[0].p1 | 0x0)
nop ; mov tmus, r1
Into this:
nop ; mov tmut, r0 ; thrsw; wrtmuc (tex[0].p0 | 0x3)
nop ; mov tmus, r1 ; wrtmuc (tex[0].p1 | 0x0)
total instructions in shared programs: 13755738 -> 13735183 (-0.15%)
instructions in affected programs: 2510921 -> 2490366 (-0.82%)
helped: 10963
HURT: 485
Instructions are helped.
total max-temps in shared programs: 2322828 -> 2322020 (-0.03%)
max-temps in affected programs: 11303 -> 10495 (-7.15%)
helped: 608
HURT: 19
Max-temps are helped.
total sfu-stalls in shared programs: 31545 -> 31494 (-0.16%)
sfu-stalls in affected programs: 235 -> 184 (-21.70%)
helped: 62
HURT: 11
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 13787283 -> 13766677 (-0.15%)
inst-and-stalls in affected programs: 2525187 -> 2504581 (-0.82%)
helped: 10989
HURT: 477
Inst-and-stalls are helped.
v2: add a comment explaining the read depdency (Piñeiro).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9555>
We now handle compression in the shared cache item creation code.
Compressing the cache item header with the already compressed blob
doesn't help much so lets just remove it.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9593>
This makes compression use more consistent between the zstd and
zlib libraries. It also reduces the amount of code required for
zlib use.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9593>
This will be used by the following patch. It allows us to detangle
compression from the disk cache code, and abstract the underlying
compression libraries we use.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9593>
we already have the batch usage info here for the resource, so if we know
the resource is already used on the batch then we don't need to also perform
a hash lookup to double check that it's really there
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9565>
Fix defect reported by Coverity Scan.
Uninitialized pointer field (UNINIT_CTOR)
member_not_init_in_gen_ctor: The compiler-generated constructor
for this class does not initialize r63.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9326>
this was only needed to cover up some other bugs:
* missing barriers for buffer sampler/image descriptors
* weirdness with first frame handling
there's better ways of handling both cases, and now they're handled better
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9566>
list_length() complexity is O(n), so it's better to store number of regs
separately and use it instead of list_length().
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9570>
This flag will be used by run from mesa-shader-db to trigger shader
compilation with default settings.
Tested-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9583>
The in-memory shader cache can get significantly
huge in some rare cases.
Limit its size to 64MB on 32 bits, and 1GB else.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9578>
The disadvantages of the DYNAMIC path over the
non-dynamic path are minor.
The advantages are many.
As we don't know if bad behaving apps use
non-dynamic SYSTEMMEM in a dynamic fashion,
let's be safe and always be dynamic.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9451>
SW vertex processing buffers are supposed to be sorted in RAM
and to be immediately idle after use (thus you can write at the
same location again immediately).
DYNAMIC SYSTEMMEM is by far the best fit for now for these kind
of buffers, though it can be improved further. Indeed the use
pattern will cause a lot of syncs with csmt actived.
Thus disable csmt when full sw vertex processing is requested.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9451>
Some apps use DYNAMIC SYSTEMMEM buffers and fill them in a
dynamic fashion with discard and nooverwrite locking flags.
To prevent uploading the whole buffer every draw call,
track the region needed for the draw call, and
upload only that region (or a bit more in order
to ease valid region tracking).
Try to aggressively upload with discard/unsynchronized.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9451>
The tracking enables to avoid flushing the csmt thread
when locking repeatedly the same buffer, as long
as the locks are non-overlapping.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9451>
Use the MANAGED path for SYSTEMMEM buffers.
Tests point out the locking behaviour of SYSTEMEMM
buffers is very close to MANAGED.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9451>
So far we did nothing on EndScene, but the API
doc says it flushes the GPU command queue.
The doc implies one can optimize CPU usage
by calling EndScene long before Present() is called.
Implementing the flush behaviour gives me +15-20%
on the CPU limited Halo. On the other hand, do limit
the flush to only once per frame.
3DMark03/3Mark05 get a 2% perf hit with the patch,
but 5% if I allow more flushes.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9451>
The enum is defined in EXT_framebuffer_object (available on desktop),
and was copied to OES_framebuffer_object (available in GLES1). The ES2
spec has no mention of such an enum. If the underlying implementation
does not support this, it will set a generic incomplete error (as is
done in st_cb_fbo.c if mixed_formats == false).
This should fix the following dEQP tests on ES2 drivers:
dEQP-GLES2.functional.fbo.completeness.attachment_combinations.rbo_tex_none_none
dEQP-GLES2.functional.fbo.completeness.attachment_combinations.tex_rbo_none_none
Fixes: fd017458bc (mesa: fix fbo attachment size check for RBs, make it trigger in ES2)
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4444
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9572>
In compat GL, the gl_FrontColor/BackColor gets clamped unless you
explicitly turn it off with ARB_color_buffer_float, and most apps doing
floating point color rendering are going to be using non-compat varyings
anyway instead of hoping that ARB_color_buffer_float exists. Thus, guess
that the app will get clamping on the color outputs to reduce draw-time
recompiles.
Saves 60 VS recompiles on half-life-2.trace on freedreno (and similarly for
many other compat GL apps).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9509>
This ARB_debug_output is particularly useful in that default apitrace will
log it, so we can find when we're doing draw-time recompiles we shouldn't be.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9509>
Fixes the following issues I can see:
- Non-VS NIR shaders not gathering info after ucp lowering
- Non-VS NIR shaders not doing GL_CLAMP lowering
- Non-VS TGSI shaders not setting up stream output state.
- Non-VS TGSI shaders leaking lower_depth_clamp lowering across variant
compiles.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9509>
the push set index isn't always 0, so the offsets need to be updated
in order to avoid clobbering other sets
Fixes: 6be19765cf ("lavapipe: add support for VK_KHR_push_descriptor")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9558>