fd_bo_cpu_prep() doesn't realize the bo is already referenced in
unflushed cmdstream. It could be made to do so (but would have to be
implemented twice, ie. both for msm and kgsl). But we still can't do
the expected thing if the caller isn't using _NOSYNC. Because of the
way the tiling works, we need to build quite a bit of cmdstream at flush
time, which is not possible to do at the libdrm level.
So rather than trying to make fd_bo_cpu_prep() smarter than it can
possibly be, just *always* discard and reallocate if the
PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag is set.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Get rid of fd3_vertex_buf and use fd_vertex_state directly for all
draws. Removes a tiny bit of CPU overhead for munging around the vertex
state every time it is emitted, but more importantly it cleans things up
for later optimizations, so the emit paths don't have to special case
internal draws (gmem<->mem, clears, etc) with regular draws.
Instead of constructing fd3_vertex_buf array each time for internal
draws, and context init time pre-create solid_vbuf_state and
blit_vbuf_state.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Real GPU queries need some infrastructure to track samples per tile and
accumulate the results. But fortunately this can be shared across GPU
generation.
See:
https://github.com/freedreno/freedreno/wiki/Queries#hardware-queries
Signed-off-by: Rob Clark <robclark@freedesktop.org>
r600g needs explicit flushing before DRI2 buffers are presented on the screen.
v2: add (stub) implementations for all drivers, fix frontbuffer flushing
v3: fix galahad
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
When the old contents do not need to be preserved, it is faster to
create a new backing bo rather than stall.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Split the parts that are specific to adreno a2xx series GPUs from the
parts that will be in common with a3xx, so that a3xx support can be
added more cleanly.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Optimize out parts of the render target that are scissored out by taking
into account maximal scissor bounds in fd_gmem_render_tiles().
This is a big win on things like gnome-shell which frequently do partial
screen updates.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Currently works on a220. Others in the a2xx family look pretty similar
and should be pretty straightforward to support with the same driver.
The a3xx has a new shader ISA, and while many registers appear similar,
the register addresses have been completely shuffled around. I am not
sure yet whether it is best to support with the same driver, but
different compiler, or whether it should be split into a different
driver.
v1: original
v2: build file updates from review comments, and remove GPL licensed
header files from msm kernel
v3: smarter temp/pred register assignment, fix clear and depth/stencil
format issues, resource_transfer fixes, scissor fixes
Signed-off-by: Rob Clark <robdclark@gmail.com>