To enable per-context priorities, we need to have per-context pipe's.
Unfortunately we still need to keep the global screen pipe, mostly just
for screen->get_timestamp().
Signed-off-by: Rob Clark <robdclark@gmail.com>
Similar to the way width/pitch alignment works, it seems like we need to
do similar for height. Otherwise the BLIT from system memory to GMEM
can over-fetch beyond the end of the buffer, triggering a fault.
I'm not sure if there is a better solution yet. Possibly we could fall
back to pre-a5xx style DRAW packets for cases where BLIT might over-
fetch. (We in theory have that problem already with rendering to higher
mipmap levels, although fortunately those tend to use GMEM bypass.)
This fixes issues reported with glamor.
Reported-by: don.harbin@linaro.org
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
pipe_draw_info::indexed is replaced with index_size. index_size == 0 means
non-indexed.
Instead of pipe_index_buffer::offset, pipe_draw_info::start is used.
For indexed indirect draws, pipe_draw_info::start is added to the indirect
start. This is the only case when "start" affects indirect draws.
pipe_draw_info::index is a union. Use either index::resource or
index::user depending on the value of pipe_draw_info::has_user_indices.
v2: fixes for nine, svga
It is always the draw ring. Except for a5xx queries like time-elapsed,
where we will eventually want to emit cmds into both binning and draw
rings.
Signed-off-by: Rob Clark <robdclark@gmail.com>
For a5xx (and actually some queries on a4xx) we can accumulate results
in the cmdstream, so we don't need this elaborate mechanism of tracking
per-tile query results. So make it into vfuncs so generation specific
backend can use it when it makes sense.
Signed-off-by: Rob Clark <robdclark@gmail.com>
In particular, move per-shader-stage info out to a seperate array of
enum's indexed by shader stage. This will make it easier to add more
shader stages as well as new per-stage state (like SSBOs).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Make this an array indexed by shader stage, as is done elsewhere for
other per-shader-stage state. This will simplify things as more shader
stages are eventually added.
Signed-off-by: Rob Clark <robdclark@gmail.com>
pipe_mutex_unlock() was made unnecessary with fd33a6bcd7.
Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_unlock(\([^)]*\)):mtx_unlock(\&\1):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
replace pipe_mutex_lock() was made unnecessary with fd33a6bcd7.
Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_lock(\([^)]*\)):mtx_lock(\&\1):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We will only hit this with multi-planar YUV external images, so we would
probably never hit this code path in the first place. But if we did, it
wouldn't do the right thing so just bail.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Since clears are more or less just normal draws, there isn't that much
benefit in having hand-rolled clear path. Add support to use u_blitter
instead if gen specific backend doesn't implement ctx->clear().
Signed-off-by: Rob Clark <robdclark@gmail.com>
With the state accessed from GMEM+submit factored out of fd_context and
into fd_batch, now it is possible to punt this off to a helper thread.
And more importantly, since there are cases where one context might
force the batch-cache to flush another context's batches (ie. when there
are too many in-flight batches), using a per-context helper thread keeps
various different flushes for a given context serialized.
TODO as with batch-cache, there are a few places where we'll need a
mutex to protect critical sections, which is completely missing at the
moment.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add a bit of extra book-keeping about blits and back-blits (from
resource shadowing). If the app uploads all mipmap levels, as opposed
to uploading the first level and then glGenerateMipmap(), we can discard
the back-blit (as opposed to being naive and shadowing the resource for
each mipmap level). Also, after a normal blit, we might as well flush
the batch immediately, since there is not likely to be further rendering
to the surface.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Push query state down to batch, and use the resource tracking to figure
out which batch(es) need to be flushed to get the query result.
This means we actually need to allocate the prsc up front, before we
know the size. So we have to add a special way to allocate an un-
backed resource, and then later allocate the backing storage.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Basically, to "DCE" blits triggered by resource shadowing, in cases
where the levels are immediately completely overwritten. For example,
mid-frame texture upload to level zero triggers shadowing and back-blits
to the remaining levels, which are immediately overwritten by
glGenerateMipmap().
Signed-off-by: Rob Clark <robdclark@gmail.com>
To make batch re-ordering useful, we need to be able to create shadow
resources to avoid a flush/stall in transfer_map(). For example,
uploading new texture contents or updating a UBO mid-batch. In these
cases, we want to clone the buffer, and update the new buffer, leaving
the old buffer (whose reference is held by cmdstream) as a shadow.
This is done by blitting the remaining other levels (and whatever part
of current level that is not discarded) from the old/shadow buffer to
the new one.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Note that I originally also had a entry-point that would construct a key
and do lookup from a pipe_surface. I ended up not needing that (yet?)
but it is easy-enough to re-introduce later if we need it for the blit
path.
For now, not enabled by default, but can be enabled (on a3xx/a4xx) with
FD_MESA_DEBUG=reorder.
Signed-off-by: Rob Clark <robdclark@gmail.com>
To flush batches out of order, the gmem code needs to not depend on
state from fd_context (since that may apply to a more recent batch).
So this all moves into batch.
The one exception is the gmem/pipe/tile state itself. But this is
only used from gmem code (and batches are flushed serially). The
alternative would be having to re-calculate GMEM layout on every
batch, even if the dimensions of the render targets are the same.
Note: This opens up the possibility of pushing gmem/submit into a
helper thread.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Introduce the batch object, to track a batch/submit's worth of
ringbuffers and other bookkeeping. In this first step, just move
the ringbuffers into batch, since that is mostly uninteresting
churn.
For now there is just a single batch at a time. Note that one
outcome of this change is that rb's are allocated/freed on each
use. But the expectation is that the bo pool in libdrm_freedreno
will save us the GEM bo alloc/free which was the initial reason
to implement a rb pool in gallium.
The purpose of the batch is to eventually facilitate out-of-order
rendering, with batches associated to framebuffer state, and
tracking the dependencies on other batches.
Signed-off-by: Rob Clark <robdclark@gmail.com>
to reduce the call indirections with u_resource_vtbl.
The worst call tree you could get was:
- u_transfer_inline_write_vtbl
- u_default_transfer_inline_write
- u_transfer_map_vtbl
- driver_transfer_map
- u_transfer_unmap_vtbl
- driver_transfer_unmap
That's 6 indirect calls. Some drivers only had 5. The goal is to have
1 indirect call for drivers that care. The resource type can be determined
statically at most call sites.
The new interface is:
pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data)
pipe_context::texture_subdata(ctx, resource, level, usage, box, data,
stride, layer_stride)
v2: fix whitespace, correct ilo's behavior
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
The use of transfer_inline_write() in TexSubImage path (see fb9fe352ea)
exposed a bug for "layer_first" resources (ie. a4xx) not setting correct
layer_stride.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This will allow drivers to make better decisions about texture sharing
for DRI2, DRI3, Wayland, and OpenCL.
v2: add read/write flags, take advantage of __DRI_IMAGE_USE_BACKBUFFER
Reviewed-by: Axel Davy <axel.davy@ens.fr>
When layer is the container, slices are tightly packed inside of each
layer. We don't need any additional alignment. On a3xx, each slice
contains all the layers, so having alignment makes sense.
This fixes a whole slew of array-related piglits, including texelFetch
and tex-miplevel-selection varieties.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
a4xx hardware has real support for RGTC so there's no need to fake it
like we do on a3xx. Undo the hacks, and keep track of an "internal
format" of a resource, which on a3xx will be different, triggering the
transfer-time conversions to take place.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
A smarter implementation would make it possible to attach this to emit
state for the BY_REGION versions to avoid breaking the tiling. But this
is a start.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Also throw in LATC while we're at it (same exact format). This could be
made more efficient by keeping a shadow compressed texture to use for
returning at map time. However... it's not worth it for now...
presumably compressed textures are not updated often.
Lastly fix up Z32S8 transfers to non-0 layers.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This only appears in cubemaps which have have packed layers, so are very
sensitive to any layout disagreement between sw and hw.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Collapse dirty/reading bools into status bitmask (and drop writing which
should really be the same as dirty). And use 'used_resources' list for
all tracking, including zsbuf/cbufs, rather than special casing the
color and depth/stencil buffers.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
With stream-out (transform-feedback) we have the case where resources
are *written* by the gpu, which needs basically the same tracking to
figure out when rendering must be flushed.
Signed-off-by: Rob Clark <robclark@freedesktop.org>