We need to insert parallel copies at the logical end of blocks, before branches.
Add a pseudo instruction signaling that. Cribbed from ACO.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16268>
Lifted from ir3. Algorithm is the same; the data structures and interface are
lightly modified to decouple from ir3's IR.
Sequentializing parallel copies after RA is tricky. ir3's implementation works
well enough, so I use that one.
Original implementation by Connor Abbott.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16268>
Rather than using builder magic (implicitly lowered on emit), add actual pseudo
operations (explicitly lowered before encoding). In theory this is slower, I
doubt it matters. This makes the instruction aliases first-class for IR prining
and machine inspection, which will make optimization passes easier to write.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16268>
Lifted from Bifrost. Add some basic optimizer tests (they pass!) to show the
compiler is ready to be unit tested. Given we can't have hardware CI for Asahi
yet -- and dEQP is still pretty janky -- unit testing should prove quite useful.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16268>
Instructions, bytes, and registers -- this should hold us over until we
can reverse the underlying uarch and get proper cycle estimations.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16268>
This controls the whole lowering of "make tex ops with implicit
derivatives on non-implicit-derivative stages be tex ops with an explicit
lod of 0 instead", but it's really hard to describe that in a git commit
summary.
All existing callers get it added except:
- nir_to_tgsi which didn't want it.
- nouveau, which didn't want it (fixes regressions in shadowcube and
shadow2darray with NIR, since the shading languages don't expose txl of
those sampler types and thus it's not supported in HW)
- optional lowering passes in mesa/st (lower_rect, YUV lowering, etc)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>
This is to match other NIR terminology.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15103>
Automatically packs and unpacks float <==> clamped 4:6 fixed point, used
for min/max LOD fields on the Sampler descriptor.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
C++ requires explicit casts from integers to enums. Fixes errors like
the following when trying to use Asahi GenXML from a GTest unit test.
src/asahi/lib/agx_pack.h:554:23: error: assigning to 'enum agx_channels' from incompatible type 'uint64_t' (aka 'unsigned long long')
values->channels = __gen_unpack_uint(cl, 0, 6);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
Required to clamp array indices against the array sizes per the GLSL
spec. Metal also does this, implying it's required by the hardware for
correct operation.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
As far as I can tell, these *must* be tiled. Other than that, the
implementation is completely routine. Passes
dEQP-GLES3.functional.texture.format.unsized.*2d_array*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
Use the usual macro trick via Panfrost. Fixes textures with formats with
non-32-bit bpp, including:
dEQP-GLES2.functional.texture.specification.basic_teximage2d.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14903>
Like explicit LODs, biases must be 16-bit, so add a lowering rule for
this. With the LOD mode selection updated for txb, we can then ingest
biases like explicit LODs and allowlist txb. Passes:
dEQP-GLES2.functional.shaders.texture_functions.fragment.texture2d_bias
dEQP-GLES2.functional.texture.mipmap.2d.bias.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14899>
We still need to implement discard itself, but this means we don't need
to worry about discard_if. This compiles down to the same idiom as
the vendor compiler (Metal) generates
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14217>
This lets us support indirect access to UBOs easily. The existing
constant special case disappears too, since the peephole optimizer can
inline the constant later. (note: this is too conservative since we can
go up to 16-bit immediates...)
Unfortunately, nir_opt_algebraic can't seem to optimize expressions like
"((a << 3) + 4) >> 2" to "(a << 1) + 1" which would be necessary for
reasonable perf out of this...
Fixes:
dEQP-GLES2.functional.shaders.indexing.uniform_array.float_dynamic_loop_read_fragment
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14581>
Lower to `sample_mask = 0`. Actually that implements a demote... doing
discard correctly probably requires rewriting the shader control flow to
insert a return where necessary...
Also, possibly we should be lowering this in NIR to play nice with
gl_SampleMask writes but that's a problem for when we understand the
hardware better.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14219>
Sets the output sample mask to a given 8-bit immediate or 16-bit
register. Also used to implement discards, which is my ES2 interest.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14219>
Enough bits of this packet are known that open-coding hex bytes for it
is annoying. Add some XML correpsonding to what we know.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14219>
Fix a bug in BIND_PIPELINE XML reported by Dougall, which cleans up
a bit of both decoder and driver.
Instead of...
* 17 bytes BIND_PIPELINE (17)
* An unused 8 byte record (25)
* A set of N 8 byte records (25 + 8 * N)
* Oops, 1 byte too many! One just disappeared (24 + 8 * N)
It seems to instead be
* 24 bytes BIND_PIPELINE (24)
* A set of N 8 byte records (24 + 8 * N)
without the sentinel record. These means the 8 byte records themselves
are shuffled, with the high byte of the pointers split from the low
word, but that's less gross than an off-by-one.
It's still not clear what the last 8 bytes of the BIND_VERTEX_PIPELINE
structure mean, or the last 4 byte of the BIND_FRAGMENT_PIPELINE
structure which seems to be a bit shorter.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13784>
Now that we're not hardcoded any magic BO IDs, there is no minimum
number of allocations needed. Remove the unneeded -- and obnoxious --
workaround of allocating unused BOs on startup.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13784>
Dougall Johnson observed these structures make more sense with indices[]
first in the entries and indices[] absent from the header. Then the
sentinel entry disappears, nr_entries makes more sense, and a few magic
numbers pop out. Many thanks to Dougall's astute eyes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13784>
This is where it should be rather than having to pass it into the
optimisation pass every time.
It also allows us to call the loop analysis pass without having to
duplicate these options which we will do later in this series.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>