The compiler can't handle load/store_output directly for nontrivial tilebuffer
layouts. Add a NIR pass to lower these intrinsics, applying a given layout.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
This hw instruction writes out an entire block from the tilebuffer to an
attached render target (PBE descriptor). It is used (only?) in end-of-tile
shaders to implement write out. We need to handle it in the compiler as a
prerequisite to compiling end-of-tile shaders ourselves, instead of hardcoding.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
Mutlisampled texture fetch (txf_ms) is encoded like regular txf. However, we now
need to pack the multisample index in the right place, which we do by extending
our existing NIR texture source lowering pass. 2D MS arrays use a new value of
dim which requires tweaking the encoding slightly. Otherwise, everything is
bog standard.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
It appears that multisampled textures on AGX have all samples of the same pixel
contiguous in memory, effectively using the layout of a single-sampled texture
with a larger block size. Handle in ail.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
The track_alloc and track_free symbols are used, we need to link them in.
Depending on build flags / environment / etc, fixes the potential build error
hit by a CI job:
mold: error: undefined symbol: agxdecode_track_alloc
>>> referenced by agx_device.c
>>> src/asahi/lib/libasahi_lib.a(src/asahi/lib/libasahi_lib.a.p/agx_device.c.o):(agx_shmem_alloc)>>> referenced by agx_device.c
>>> src/asahi/lib/libasahi_lib.a(src/asahi/lib/libasahi_lib.a.p/agx_device.c.o):(agx_bo_create)
mold: error: undefined symbol: agxdecode_track_free
>>> referenced by agx_device.c
>>> src/asahi/lib/libasahi_lib.a(src/asahi/lib/libasahi_lib.a.p/agx_device.c.o):(agx_bo_unreference)
...when trying to link with libasahi_lib without libasahi_decode for unit tests.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
This avoids exposing the ISA-internal agx_format to the driver, instead hiding
it behind a real PIPE_FORMAT. This lets us use real pipe formats in formatted
intrinsics in NIR, which is convenient; it will allow us to simplify the
compiler/driver ABI; and it lets us use common format helpers (e.g.
util_format_get_blocksize) for the internal formats in driver lowering.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
On blob v512.490, using WRAP_GPU_ID to fake GPU versions, I see 0x41 used
everywhere, except for BLIT_OP_SCALE on a630. Define the magic number in
dev info so it can be reused in the two places that set the
non-BLIT_OP_SCALE value.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19794>
For some reason the setup that comes after the scratch buffer
setup calls clobber the PS output configuration. Emitting the
scratch buffer setup as last action before the actual draw commands
seems to fix this.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19804>
Freeing BOs tends to be bursty (ie. when a submit is retired, or
expiring entries from BO cache). Sending lots of small SET_IOVA
messages to the host can quickly eat up the available virtqueue
slots, resulting in (eventually) starving the guest waiting for
free virtqueue space. By batching, we can avoid this and handle
things more efficiently on the host (ie. in a single wakeup rather
than many).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19832>
Batch up handles before closing them to give the drm backend a chance to
batch up any extra handling needed (ie. virtio batching up messages to
host to release IOVA).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19832>
Submits tend to hold references to a lot of BOs, which get unref'd when
the submit is destroyed/retired. For now, all this does is reduce lock
aquire/release, but the next commit will build on it.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19832>
We already have the notion of an agx_batch, which encapsulates a render
pass. Extend the logic to allow multiple in-flight batches per context, avoiding
a flush in set_framebuffer_state and improving performance for certain
applications designed for IMRs that ping-pong unnecessarily between FBOs. I
don't have such an application immediately in mind, but I wanted to get this
flag-day out of the way while the driver is still small and flexible.
The driver was written from day 1 with batch tracking in mind, so this is a
relatively small change to actually wire it up, but there are lots of little
details to get right.
The code itself is mostly a copy/paste of panfrost, which in turn draws
inspiration from freedreno and v3d.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19865>
In 4ceaed7839 we made scratch surface state allocations part of the
internal heap (mapped to STATE_BASE_ADDRESS::SurfaceStateBaseAddress)
so that it doesn't uses slots in the application's expected 1M
descriptors (especially with vkd3d-proton).
But all our compiler code relies on BSS
(STATE_BASE_ADDRESS::BindlessSurfaceStateBaseAddress).
The additional issue is that there is only 26bits of surface offset
available in CS instruction (CFE_STATE, 3DSTATE_VS, etc...) for
scratch surfaces. So we need the drivers to put the scratch surfaces
in the first chunk of STATE_BASE_ADDRESS::SurfaceStateBaseAddress
(hence all the driver changes).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4ceaed7839 ("anv: split internal surface states from descriptors")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7687
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19727>
We're about to make scratch surface states part of the surface state
heap. Because those are required to be in the low 26bits parts surface
state heap (we're limited in bits handed in the CFE_STATE, 3DSTATE_VS,
etc... instructions), this change splits the 32bit surface state heap
as follow:
- 8Mb of surface states for scratch
- 1Gb - 8Mb of binding tables
- 3Gb of surface states
That way all of the surfaces are located within a 4Gb region visible
from STATE_BASE_ADDRESS::SurfaceStateBaseAddress
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19727>
So we don't have to do it in the traversal loop. Should 2 and
instructions and a 64-bit shift, so 4/8 cycles per instance node
visit.
Totals from 7 (0.01% of 134913) affected shaders:
CodeSize: 208460 -> 208292 (-0.08%)
Instrs: 38276 -> 38248 (-0.07%)
Latency: 803181 -> 803142 (-0.00%)
InvThroughput: 165384 -> 165376 (-0.00%)
Copies: 4912 -> 4905 (-0.14%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19706>
Moving entire chunks of code into a dummy if block is causing issues
in some situations. To work around the issue that we tried to fix in
35d82ecf1e ("nir/lower_shader_calls: put inserted instructions into a
dummy block") which is that we cannot cut and past a block of
instruction that ends with a jump if there are more instruction behind
where we're going to past. We can instead just wraps the jumps into
dummy if blocks.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19820>
Adds a cost field to radv_ir_node and uses it to model the cost of tree
depth. This improves framerates by 2% if my benchmarking is correct.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19756>