We need more GS/NGG bits, so we need to add current_gs_state for that.
This simplifies the logic in the draw code.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
When NGG is active, the GS invocation counter is always incremented, even
if there's no explicit GS.
Implementing the counter manually fixes it:
* in emit_gs_epilogue for the legacy path
* in gfx10_ngg_gs_emit_prologue for the ngg path
This fixes piglit's arb_query_buffer_object-qbo test.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15861>
To support PIPE_STAT_QUERY_GS_INVOCATIONS and PIPE_STAT_QUERY_GS_PRIMITIVES
being used at the same time we have to reuse the same buffer.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15861>
Statistics only work in non-NGG mode. If screen->use_ngg is true, we can't
know if the draw will actually use NGG or not, so this commit switch
to a shader based implementation of this counter.
To avoid modifying si_query, the shader implementation behaves like the hw
one: it uses the same buffer size and offset.
The emulation path activation in the shader is controlled by vs_state_bit[31].
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15861>
Currently this just has wait, but in order to get the right answer
for vulkan partial, lavapipe/llvmpipe need to pass a partial flag
through here in the future.
This just changes the API so that's possible.
v2: use an enum (zmike)
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15009>
Performance counters will be used by RADV for VK_KHR_performance_query
and also for adding SPM support.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11140>
This will allow removing the winsys pointer from buffers.
The amdgpu winsys adds dummy_ws to get radeon_winsys because there can be
no radeon_winsys around (e.g. while amdgpu_winsys is being destroyed), but
we still need some way to call buffer functions.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9809>
DCC/CMASK/HTILE clears will not set this. We could do a better job
at not setting this in other cases too
Image copies also don't set this.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9795>
Slabs always allocate the next power of two size from their pools. This
wastes memory if the size is not a power of two.
bo->base.size is overwritten because the default is the allocated power of
two size, but we need the real size to compute the wasted size in
amdgpu_bo_slab_destroy. entry_size is added to the hole in pb_slab_entry
to hold the real entry size.
Like other memory stats, no atomics are used.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8683>
We often do this:
pipe->set_constant_buffer(pipe, shader, slot, &cb);
pipe_resource_reference(&cb->buffer, NULL);
That results in atomic increment in set_constant_buffer followed by
atomic decrement after set_constant_buffer. This new interface
eliminates those atomics.
For the case above, this should be used instead:
pipe->set_constant_buffer(pipe, shader, slot, true, &cb);
cb->buffer = NULL; // if cb is not a local variable, else do nothing
AMD Zen benefits from this. The perf improvement is ~3% for Viewperf13/Catia.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8298>
This decreases the release libgallium_dri.so size without debug symbols
by 16384 bytes. The CPU time spent in si_emit_draw_packets decreased
from 4.5% to 4.1% in viewperf13/catia/plane01.
The previous code did:
cs->current.buf[cs->current.cdw++] = ...;
cs->current.buf[cs->current.cdw++] = ...;
cs->current.buf[cs->current.cdw++] = ...;
cs->current.buf[cs->current.cdw++] = ...;
The new code does:
unsigned num = cs->current.cdw;
uint32_t *buf = cs->current.buf;
buf[num++] = ...;
buf[num++] = ...;
buf[num++] = ...;
buf[num++] = ...;
cs->current.cdw = num;
The code is the same (radeon_emit is redefined as a macro) except that
all set and emit functions must be surrounded by radeon_begin(cs) and
radeon_end().
radeon_packets_added() returns whether there has been any new packets added
since radeon_begin.
radeon_end_update_context_roll(sctx) sets sctx->context_roll = true
if there has been any new packets added since radeon_begin.
For now, the "cs" parameter is intentionally unused in radeon_emit and
radeon_emit_array.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8653>
Replace mesa's slightly different container_of() with one more aligned
to the linux kernel's version which takes a type as the 2nd param. This
avoids warnings like:
freedreno_context.c:396:44: warning: variable 'batch' is uninitialized when used within its own initialization [-Wuninitialized]
At the same time, we can add additional build-time type-checking asserts
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7941>
There are many issues with SDMA across many generations of hardware.
A recent example is that gfx10.3 suffers from random GPU hangs if
userspace uses SDMA.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7908>
It's straightforward except that the amdgpu winsys had to be cleaned up
to allow this.
radeon_cmdbuf is inlined and optionally the winsys can save the pointer
to it. radeon_cmdbuf::priv points to the winsys cs structure.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7907>
This is a relatively minimal change to adjust all the gallium interfaces
to use bool instead of boolean. I tried to avoid making unrelated
changes inside of drivers to flip boolean -> bool to reduce the risk of
regressions (the compiler will much more easily allow "dirty" values
inside a char-based boolean than a C99 _Bool).
This has been build-tested on amd64 with:
Gallium drivers: nouveau r300 r600 radeonsi freedreno swrast etnaviv v3d
vc4 i915 svga virgl swr panfrost iris lima kmsro
Gallium st: mesa xa xvmc xvmc vdpau va
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The NGG hardware pipeline doesn't track these statistics automatically,
and in fact *cannot* track them automatically when API geometry shaders
are involved, so we accumulate statistics in the shader using atomic
adds.
This implementation accumulates statistics via the memory system and
the RW buffer descriptor setup. We could use GDS, but since these
atomics aren't latency-sensitive, that basically just trades off
L2$ bandwidth vs. export bus bandwidth. One single memory transaction
per shader workgroup doesn't seem too bad. The result ring buffer in
memory is needed either way to avoid pipeline stalls.
The shader code contains the atomic unconditionally, though the
GFX10_GS_QUERY_BUF is a null buffer when no queries are active. The
atomic is simply discarded by the shader hardware in that case.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>