now that submission is serialized better, it's not actually the resource that should be
tagged for scanout sync, it's the batch state, as multiple contexts might reuse the same
resource, thus requiring synchronization on every submit
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11437>
this uses a pointer to a batch state substruct for timeline tracking,
which provides a few nice benefits:
* explicit ability to detect unflushed batches (even on other contexts)
* the context doesn't need to have a "current" timeline id
* timeline (batch) ids can be distributed during submit, not when recording begins
* an abstracted api which can be more easily changed under the hood
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11437>
this works by flagging the next barrier to use the current sample locations
so that everything works as expected during decompression
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11189>
now we have tracking for vbo binds and can automatically reapply the correct
barrier just before draw if the resource is modified after bind
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10849>
there's not actually any reason to do these during draw other than wasting
cpu cycles, and it has the side benefit of providing a bunch more info for rebinds
image resources for gfx shaders need to have their barriers deferred until draw time,
however, as it's unknown what layout they'll require until that point due to potentially
being bound to multiple descriptor types
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10849>
when emitting barriers, we don't need to end the renderpass in some cases
this handles the case of emitting barriers for new resources which have
never before been accessed and thus do not require synchronization at a
specific point in the batch
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10700>
* for samplers this is a per-stage slot mask that will be used for fast layout updates
* images get a count for gfx and compute usage
* any writable descriptor bind gets its own gfx/compute counter
* all descriptor binds are now counted
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10699>
this minimizes overhead of maintaining scanout objects
Fixes: 104603fa76 ("zink: create separate linear tiling image for scanout")
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10358>
rendering onto a linear-tiled image is unbelievably slow if any sort of
blending is enabled, so instead always render to optimal tiling and then
copy to linear for scanout
this doubles performance for now and can be deleted in its entirety along
with the rest of the related hacks once real wsi support is implemented
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10180>
it turns out there's not actually a requirement that resources be unmapped,
which means that a ton of overhead can be saved both in the unmap codepath
(the cpu overhead here is pretty insane) and then also when mapping cached
resource memory, as the map can now be added to the cache for immediate reuse
as seen in radeonsi
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9980>
this includes:
* async buffer mapping/replacement
* async queue submission
* async (threaded) gallium flush handling
the main churn here is from handling async gallium flushes, which involves
creating multiple gallium fences (zink_tc_fence) for each zink fence (zink_fence).
a tc fence may begin waiting for completion at any time, even before the zink_fence
has had its cmdbuf(s) submitted, so handling this type of desync ends up needing
almost a complete rewrite of the existing queue architecture
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9935>
this massively speeds up memory access across the driver, specifically
for pbo-related operations
it does require that mapped memory is manually invalidated/flushed, however, and
we also need to manually align host-visible memory to be able to do that
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9884>
the STORAGE_TEXEL and STORAGE_IMAGE bits can't be accurately applied due
to opengl allowing all resources to be used everywhere, so instead we can
create a separate object on demand which is used only by shaders and gets
extra barriers inferred along with the base object to avoid desync whenever
it is used
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9788>
now that batches aren't limited and flushing is less costly, there's no
reason to keep these separate
the primary changes here are removing the zink_queue enum and collapsing
related arrays which used it as an index, e.g., zink_batch_usage into single
members
remaining future work here will include removing synchronization flushes which
are no longer necessary and (eventually) removing batch params from a number of
functions since there is now only ever a single batch
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9765>
vk spec disallows mapping memory regions more than once, but we always
map the full memory range, so we can just refcount and store the pointer
onto the resource object for reuse to avoid issues here
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9725>
historically zink has been bound to 4 gfx batches and then a separate compute batch
was added. this is not ideal for a number of reasons, the primary one being that if
an application performs 5 glFlush commands, the fifth one will force a gpu stall
this patch aims to do the following, all of which are necessarily done in the same patch
because they can't be added incrementally and still have the same function:
* rewrite batch tracking for resources/views/queries/descriptors/...
|originally this was done with a single uint32_t as a bitmask, but that becomes cumbersome
to track as batch counts increase, not to mention it becomes doubly-annoying
when factoring in separate compute batches with their own ids. zink_batch_usage gives
us separate tracking for gfx and compute batches along with a standardized api for
managing usage
* flatten batch objects to a gfx batch and a compute batch
|these are separate queues, so we can use an enum to choose between an array[2] of
all batch-related objects
* switch to monotonic batch ids with batch "states"
|with the flattened queues, we can just use monotonic uints to represent batch ids,
thus freeing us from constantly using bitfield operations here and also enabling
batch counts to scale dynamically by allocating/caching "states" that represent a batch
for a given queue
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9547>
I'll be rewriting how resource tracking works, so abstracting it and removing
direct uses is going to reduce the chances of breaking things as well as code churn
plus it's a bit easier to use
downside is that until that rewrite happens, this will be a (very small) perf hit and
there's some kinda gross macros involved to consolidate code
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9626>
this really just means to nuke the counter buffer, and this can be done
by using a special bind_history bit that can be unset when the buffer has
been rebound
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9546>
the resource object is now a state tracker for the inner resource_object,
so it can safely be destroyed even if the backing resource is still in use
this also requires updating various things to keep descriptor states/caches working
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9544>
this stores a number (currently 5) of backing allocations for resources
for later reuse when creating matching resources
because this is on the screen object it requires locking, but this is still
far faster than allocating new memory each time
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9540>
this is getting to be enough code that it's getting to be a hassle to
keep with the program stuff
also rename a couple of the moved functions
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9348>
we can pass the offset of the 'invalid' flag directly to the resources
to let them run through and invalidate their sets in time for us to detect
it when we recycle the set during batch reset and throw it onto our allocation array
additionally, by adding refs for the actual objects used in a descriptor set, we can
ensure that our cache is as accurate as possible
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9348>
this is a lot of churn that more or less amounts to hashing the descriptor
state during draw and then performing lookups with this to determine whether
we can reuse an existing descriptor set instead of allocating one
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9348>
this simplifies the barrier helper callsites as we no longer need
to check whether a barrier is needed beforehand in order to avoid potentially
ending a renderpass too early
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9205>
we need more detail on some of these to ensure proper synchronization
and availability/visibility of image data between commands/stages
the signature for needs_barrier() is still funky here to avoid breaking
usage in update_descriptors()
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8945>
if we have proper barrier usage to start with, then we don't need to do
any kind of weird flushing upon changing vertex inputs and can also remove
a flag from zink_resource
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8945>
The reason we check for staging-resources here is really because they
are the only images guaranteed to be host-visible.
But on UMA architectures, it's quite likely to have memory that is
*both* host-visible *and* device-local, so let's see what we found
instead.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8858>