The use of transfer_inline_write() in TexSubImage path (see fb9fe352ea)
exposed a bug for "layer_first" resources (ie. a4xx) not setting correct
layer_stride.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This will allow drivers to make better decisions about texture sharing
for DRI2, DRI3, Wayland, and OpenCL.
v2: add read/write flags, take advantage of __DRI_IMAGE_USE_BACKBUFFER
Reviewed-by: Axel Davy <axel.davy@ens.fr>
When layer is the container, slices are tightly packed inside of each
layer. We don't need any additional alignment. On a3xx, each slice
contains all the layers, so having alignment makes sense.
This fixes a whole slew of array-related piglits, including texelFetch
and tex-miplevel-selection varieties.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
a4xx hardware has real support for RGTC so there's no need to fake it
like we do on a3xx. Undo the hacks, and keep track of an "internal
format" of a resource, which on a3xx will be different, triggering the
transfer-time conversions to take place.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
A smarter implementation would make it possible to attach this to emit
state for the BY_REGION versions to avoid breaking the tiling. But this
is a start.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Also throw in LATC while we're at it (same exact format). This could be
made more efficient by keeping a shadow compressed texture to use for
returning at map time. However... it's not worth it for now...
presumably compressed textures are not updated often.
Lastly fix up Z32S8 transfers to non-0 layers.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This only appears in cubemaps which have have packed layers, so are very
sensitive to any layout disagreement between sw and hw.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Collapse dirty/reading bools into status bitmask (and drop writing which
should really be the same as dirty). And use 'used_resources' list for
all tracking, including zsbuf/cbufs, rather than special casing the
color and depth/stencil buffers.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
With stream-out (transform-feedback) we have the case where resources
are *written* by the gpu, which needs basically the same tracking to
figure out when rendering must be flushed.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Enables ARB_depth_buffer_float. There is no sampling support for
interleaved Z32F_S8, so we store the two textures separately, one as
Z32F, the other as S8. As a result, we need a lot of additional logic
for restores and transfers.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Copies nouveau_buffer and radeon_buffer. This allows a write to proceed
to an uninitialized part of a buffer even when the GPU is using the
previously-initialized portions.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
A resource flush is an upload of a hypothetically-staging texture to the
GPU. For a UMA system, this will largely be a no-op or
cache-maintenance. Move the render flush logic into transfer_map where
it belongs, and clear out the transfer_flush function.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The SZ2 field contains the layer size of a lower miplevel. It only
contains 4 bits, which limits the maximum layer size it can describe. In
situations where the next miplevel would be too big, the hardware
appears to keep minifying the size until it hits one of that size.
Unfortunately the hardware's ideas about sizes can differ from
freedreno's which can still lead to issues. Minimize those by stopping
to minify as soon as possible.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
For example if width were 65, the first slice would get 96 while the
second would get 32. However the hardware appears to expect the second
pitch to be 64, based on halving the 96 (and aligning up to 32).
This fixes texelFetch piglit tests on a3xx below a certain size. Going
higher they break again, but most likely due to unrelated reasons.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
We only program in one layer size per texture, so that means that all
levels must share one size. This makes the piglit test
bin/texelFetch fs sampler2DArray
have the same breakage as its non-array version instead of being
completely off, and makes
bin/ext_texture_array-gen-mipmap
start passing.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Collapse things back into a setup_slices() which takes the desired
alignment as a param. This gets things ready for a4xx which has some
slightly different requirements.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
fd_bo_cpu_prep() doesn't realize the bo is already referenced in
unflushed cmdstream. It could be made to do so (but would have to be
implemented twice, ie. both for msm and kgsl). But we still can't do
the expected thing if the caller isn't using _NOSYNC. Because of the
way the tiling works, we need to build quite a bit of cmdstream at flush
time, which is not possible to do at the libdrm level.
So rather than trying to make fd_bo_cpu_prep() smarter than it can
possibly be, just *always* discard and reallocate if the
PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag is set.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Get rid of fd3_vertex_buf and use fd_vertex_state directly for all
draws. Removes a tiny bit of CPU overhead for munging around the vertex
state every time it is emitted, but more importantly it cleans things up
for later optimizations, so the emit paths don't have to special case
internal draws (gmem<->mem, clears, etc) with regular draws.
Instead of constructing fd3_vertex_buf array each time for internal
draws, and context init time pre-create solid_vbuf_state and
blit_vbuf_state.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Real GPU queries need some infrastructure to track samples per tile and
accumulate the results. But fortunately this can be shared across GPU
generation.
See:
https://github.com/freedreno/freedreno/wiki/Queries#hardware-queries
Signed-off-by: Rob Clark <robclark@freedesktop.org>
r600g needs explicit flushing before DRI2 buffers are presented on the screen.
v2: add (stub) implementations for all drivers, fix frontbuffer flushing
v3: fix galahad
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
When the old contents do not need to be preserved, it is faster to
create a new backing bo rather than stall.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Split the parts that are specific to adreno a2xx series GPUs from the
parts that will be in common with a3xx, so that a3xx support can be
added more cleanly.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Optimize out parts of the render target that are scissored out by taking
into account maximal scissor bounds in fd_gmem_render_tiles().
This is a big win on things like gnome-shell which frequently do partial
screen updates.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Currently works on a220. Others in the a2xx family look pretty similar
and should be pretty straightforward to support with the same driver.
The a3xx has a new shader ISA, and while many registers appear similar,
the register addresses have been completely shuffled around. I am not
sure yet whether it is best to support with the same driver, but
different compiler, or whether it should be split into a different
driver.
v1: original
v2: build file updates from review comments, and remove GPL licensed
header files from msm kernel
v3: smarter temp/pred register assignment, fix clear and depth/stencil
format issues, resource_transfer fixes, scissor fixes
Signed-off-by: Rob Clark <robdclark@gmail.com>