This marks the end of code sharing between r600 and radeonsi.
It's getting difficult to work on radeonsi without breaking r600.
A lot of functions had to be renamed to prevent linker conflicts.
There are also minor cleanups.
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
To be able to properly distinguish between GL_ANY_SAMPLES_PASSED
and GL_ANY_SAMPLES_PASSED_CONSERVATIVE.
This patch goes through all drivers, having them treat the two
query types identically, except:
1. radeon incorrectly enabled conservative mode on
PIPE_QUERY_OCCLUSION_PREDICATE. We now do it correctly, only
on PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE.
2. st/mesa uses the new query type.
Fixes dEQP-GLES31.functional.fbo.no_attachments.*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The data is read when the render_cond_atom is emitted, so we must
delay emitting the atom until after the flush.
Fixes: 0fe0320dc0 ("radeonsi: use optimal packet order when doing a pipeline sync")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The result written by the shader workaround needs to be written back, or
the CP may read stale data.
Fixes: 78476cfe07 ("radeonsi: enable ARB_transform_feedback_overflow_query")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We have come to the conclusion that it doesn't improve performance.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The QBO workaround compute grid launch emits the render condition atom
when dirty, so install the render condition in the context only after
launching the compute grid. This avoids a redundant SET_PREDICATION.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There is a firmware regression that causes failures. Work around it by
using the compute shader for query_buffer_objects to summarize the query
results.
v2: rename to PREDICATION_OP_BOOL64 (consistent with sid.h)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The predication bits are "visible or no overflow" and "not visible or
overflow", so we need to invert the check relative to the GL and Gallium
interface semantics.
Also, predication by the other streamout-related queries is not allowed.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The issue here is that the immediate is treated as a 64-bit value,
and fetching it does not work reliably with swizzles that are different
from xy and zw.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Useful for debugging performance issues when ARB_bindless_texture
is enabled. This query doesn't make a distinction between texture
and image handles.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
"tc" will be initialized by the next commit.
v2: rename stuff according to v2 changes in u_threaded_context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Only the Radeon kernel driver exposed the GPU temperature and
the shader/memory clocks, this implements the same functionality
for the AMDGPU kernel driver.
These queries will return 0 if the DRM version is less than 3.10,
I don't explicitely check the version here because the query
codepath is already a bit messy.
v2: - rebase on top of master
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There are even more counters in the CP_STAT register but I think
these ones are enough for now.
v2: only read (and expose) CP_STAT on VI+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For simplicity, GPU-sdma-busy will return 0 on previous gens.
v2: only read SRBM_STATUS2 on Evergreen+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This just needs to be done for r600g in the screen.
We don't need an IB submission for every new context created for GCN.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This new query returns the current visible usage of VRAM accessed
by the CPU. It will return 0 on radeon because it's unimplemented.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Useful when debugging applications which map a ton of buffers
and also because we used to run into Linux's limit on the number
of simultaneous mmap() calls.
v2: - update the commit message
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It's also possible to monitor them via performance counters but
the hardware can only use two counters simultaneously. It seems
easier to re-use the existing code which reads from MMIO instead
of writing a multi-pass approach.
v2: - add new lines after ':'
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will allow to expose more queries in order to know which
blocks are busy/idle.
v2: - add new lines after ':'
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It should be close to the GPU load, but it can be much lower if something
is stalling shader execution (e.g. CP DMA).
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>