When moving the batch cache to the context, I added hash table lookups
from batch to rsc for "is this resource in use" because we could no longer
store data in the rsc bo under the batch cache's lock.
We can save that cost by tracking a bitfield of resources referenced by
the batch, which gives us very cheap checks in the draw path at a minor
cost in memory. We can just use the GEM BO handle, since it's a nice
small integer already (we can't use the TC buffer ID, because the frontend
changes that, and we're in the driver thread).
This required moving the !pending() assert up in resource shadowing, since
the BO swap meant we were checking pending on the wrong resource.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11511>
Newer GPU's are moving away from using gpu_id, including the code
landing upstream for "7c Gen 3". But most of the places in the gallium
driver where we were looking at gpu_id, we only cared about the major
generation. So convert those to use screen->gen instead.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12159>
The destination of the upcoming blit (the old rsc struct that houses the
fresh BO) wouldn't have its ubwc cleared first, which if it got
unfortunate data in a recycled BO could lead to blit failures.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11544>
The valid flag indicates whether the bo has had any data written to it.
Failure to swap it meant that if for some reason we fell back to SW
mappings during the blit from shadow, the PIPE_MAP_READ staging blit would
get dropped.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11544>
This reverts commit 5cb043cf82.
While the numbers were impressive for drawoverhead, it comes at the cost
of additional flushes, which for gmem access (what we actually care about
most!) would greatly increase the actual cost to render. Also, gl_driver2
overhead is increased, probably due to spending time in the kernel for the
flushes.
drawoverhead's win came from the increased flushing causing the GPU to
start processing the buffers sooner on everything but test 1, which
already had some incremental flushing happening. That was certainly not
intended by the change.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11599>
The resource_busy() hook was having to check the batch cache for usage of
the resource, since TC didn't know how long our driver would. By
committing to calling the tc_driver_internal_flush_notify() hook on
non-deferred flushes, TC keeps track of which buffers have been used but
not flushed and considers them busy, saving us needing to look in the BC
(which we won't be able to do once we move it to being per-context).
drawoverhead test results (all numbers are throughput, n=5):
1, DrawElements ( 1 VBO| 0 UBO| 0 ) w/ no state change: -4.94214% +/- 2.45047%
7, DrawElements ( 1 VBO| 8 UBO| 8 Tex) w/ vertex attrib change: 48.3992% +/- 5.02827%
8, DrawElements ( 1 VBO| 8 UBO| 8 Tex) w/ 1 texture change: 26.0974% +/- 1.14932%
9, DrawElements ( 1 VBO| 8 UBO| 8 Tex) w/ 8 textures change: 12.6963% +/- 3.01077%
17, DrawElements ( 1 VBO| 8 UBO| 8 Tex) w/ 8 UBOs change: 54.3846% +/- 35.0049%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11513>
When storing depth- or stencil-only texture data that has been packed into
a depth/stencil texture, the tex store gets PIPE_MAP_READ added onto it
since the other channel will get ORed into the incoming data, but
sometimes we know that the other component is undefined because the whole
texture is either fresh or just invalidated.
Cleans up a confusing extra blit in a dEQP case I've been debugging, and
should be less work for dEQP CI.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11452>
It wasn't checking that the transfer map would definitely overwrite all of
the data being initialized by the back blit, and if we knew that it
would then the caller would have provided PIPE_MAP_DISCARD_WHOLE_RESOURCE.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11439>
We take references under the lock, but then accessed the lock-requiring
batch_cache structure without holding the lock. The batches wouldn't get
freed and removed from their slots until the last ref goes away so it was
safe (other than the assert at the end), but writing the simple code is
shorter and requires fewer assumptions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11439>
Normally when we shadow a resource (whether it is changing the modifier
or not) we do not need to flush existing batches, since they reference
the original version of the resource. There is a special case for
resources that are referenced by a batches framebuffer state, because
this state is emitted when the batch is flushed. Because of this, we
need those batches to be flushed before we shadow the resource.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11371>
Because R8G8 has a different layout from R16, we not only need to demote
to uncompressed to (for example) sample R8G8 as R16 (or visa versa) but
we also need to demote further to linear.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11371>
If a format is not supported as a render target, there is no point in
trying a staging blit, as it will end up in a CPU copy fallback.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11371>
tc already calculates all the rebinding that needs to be done on a given
context, so (some of) this info can be passed on to drivers to enable
optimizations
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11245>
In multi-context scenarios, one context writing to a resource can race
with a pctx->flush_resource() on another context/thread. Which means
that by the end of flush_resource() we can have a new write_batch.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11200>
The u_resource_vtbl indirection is going to be removed.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10659>
pctx->flush_resource() has the same expectations that the resource can
be shared with an external client as pctx->flush(), but without the
convenience of a fence to know *when* the resource must be visible to
that external client. So we need to ensure the batch is flushed all the
way to the kernel so that implicit-sync can do it's job.
Fixes: e9a9ac6f77 ("freedreno/drm: Async submit support")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10530>
Most of them were actually unused. The memory type (KMEM vs SMI) only
applied to very old a2xx era devices that had a small/fast stacked
memory (SMI) vs normal memory (KMEM). And the cache flags are ignored
(ie. everything is writecombine), but we can add new cache flags later
when they actually do something.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
aa1ddb6fe3 skipped the tracking for the
!dirty case, but we can do a bit better and track at bind time whether
the state change is one that requires resource tracking.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9581>
Doesn't change anything yet, but this will let us more easily add
mapping from dirty gallium state to dirty gen-specific state-groups.
Note that the mapping from shader-state to global state in
fd_context_dirty_shader() optimizes out for release builds. This
is kind of important, in the next patch we'll want ffs(SOME_CONST)
to optimize away even more.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9581>
Currently only initialized for a6xx, mostly because that is the easiest
setup for me to test and debug at the moment. But the couple a6xx changes
should not require counterparts in older gens.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
Previously we were expecting cb0 to be user_buffer. (We did in some
cases upload it to a gpu buffer, but this was an internally allocated
buffer and not something subject to rebind.) But with TC it becomes
a gpu buffer.
(Technically, with pctx->const_uploader, we shouldn't hit the rebind
path for cb0, but better to not try to be overly clever.. sooner or
later that would bite us.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
With threaded_context, in the TC_TRANSFER_MAP_UNSYNC case, we are
getting called from the frontend thread, rather than driver thread.
So we need a different slab_child_pool for that.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
This will be used by threaded_context to avoid stalls in the
DISCARD_WHOLE_RESOURCE case (and DISCARD_RANGE cases that can
be promoted to DISCARD_WHOLE_RESOURCE).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
Split out the usage simplification from main part of transfer_map and
handle the threaded-context specific TC_TRANSFER_x flags.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
For threaded_context, to properly handle replace_buffer_storage, we'll
need to handle multiple "iterations" of a resource using the same
tracking in order to implement transfer_map() correctly.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
With threaded-context we won't have a chance to apply the workaround in
the backend driver. But the previous commit moves it to a driconf
configured workaround in mesa/st, so we can drop this now.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9316>