The NGG hardware pipeline doesn't track these statistics automatically,
and in fact *cannot* track them automatically when API geometry shaders
are involved, so we accumulate statistics in the shader using atomic
adds.
This implementation accumulates statistics via the memory system and
the RW buffer descriptor setup. We could use GDS, but since these
atomics aren't latency-sensitive, that basically just trades off
L2$ bandwidth vs. export bus bandwidth. One single memory transaction
per shader workgroup doesn't seem too bad. The result ring buffer in
memory is needed either way to avoid pipeline stalls.
The shader code contains the atomic unconditionally, though the
GFX10_GS_QUERY_BUF is a null buffer when no queries are active. The
atomic is simply discarded by the shader hardware in that case.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Yes, really. Note that non-format buffer loads are unaffected and work
just fine with unaligned pointers (as long as SH_MEM_CONFIG is setup
correctly, which amdgpu ensures).
Fixes e.g. KHR-GL45.vertex_attrib_64bit.vao
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Each gfx10 shader engine corresponds to two gfx9 shader engines, so scale
the number of offchip buffers accordingly.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
DCC alignment can be less than the alignment of the main surface. In that
case, the DCC tile swizzle needs to be masked accordingly. Should have no
impact on pre-gfx10.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
MSAA is only supported for 64KB_{R,Z}_X modes, so the micro tile
optimization that we use on gfx9 and earlier does not work.
Be very explicit about how the swizzle mode of the temporary surface is
selected.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
With NGG, the VGT_GS_OUT_PRIM_TYPE can change without a shader change.
The VS_STATE is required for both streamout and culling from a vertex
shader without pre-compiling outprim-specific variants.
We could consider compiling specialized variants in the future. We
could also consider compiling the NGG logic as an epilog.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For pipelines without API GS. We will later expand this to cover NGG
geometry shaders as well.
Note that the vtx offset passed into the GS part is just the
vertex index multiplied by VGT_ESGS_RING_ITEMSIZE.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This does not support geometry shading yet. Also missing are streamout
and NGG-specific optimizations.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Also add the shader main part NGG variant, so that in principle
we can switch between legacy in NGG modes.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>