It does the same thing as align() from u_math.h, no need to
have a etnaviv specific version.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17695>
When the shader state is destroyed before the async shader compile is
done, we get a use after free in the async thread, as the shader state
it is operating on is gone. Fix this by dropping any pending job from
the async queue or wait for it to finish before destroying the state
by calling util_queue_drop_job.
Also call util_queue_fence_destroy, which would have caught this issue
by asserting that the queue_fence is in signalled state when the
shader state is destroyed.
Fixes: 1141ed5859 ("etnaviv: async shader compile")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17534>
There is nothing more TODO here. With softpin, which is available on all
GPUs using texture descriptors, there is no need for the kernel to patch
the descriptor, as the proper GPU virtual address is filled in by userspace.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17448>
There already is a error handling label to free the sampler view
struct and return failure. Consistently use this label to make
error handling more uniform.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17448>
Texture descriptors currently waste a massive ammount of memory, as every
one is allocated via a separate BO. As the allocation granularity of the
kernel is 4KB and the descriptor is only 256B, 93.75% of the allocated
memory is wasted.
Add a simple suballocator for the texture descriptors, to allocate multiple
ones out of a single kernel BO. This isn't perfect, as freed slots in the
suballocated resource are not reused, but worst-case we end up with the
same waste as we had before. This also potentially improves efficiency at
the kernel side, as this reduces the number of BOs needed for the sampler
views in each submit.
As the BO is now used by multiple descriptors, avoid syncing with the GPU
via the cpu_prep/fini calls, as to not introduce stalls between pending
rendering and new descriptors being filled. This is safe, as each
suballocation slot is only used once, so newly filled slots are certainly
not in use by the GPU.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17448>
The dummy texture descriptor and the dummy render target relocs are not ever
changed by a context operation, so we can save some space by moving them to
the screen and potentially share them and the BOs backing them between
multiple contexts.
Also don't hold two pointers to the same BO, one in the reloc and one raw,
but always just use the reloc one.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17448>
Fill in support for TXD instruction which emits shader TEXLDD opcode.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17500>
Rename the args from low_bias/compare to src1/src2, since they
are used for different purposes depending on the texture load
type. No functional change.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17500>
Now that usage flags can be specified even when using the modifier path for
allocation and frontends like GBM and EGL wayland do this properly, we can
drop the assumption that all resources allocated through the modifier
enabled path need to be SCANOUT capable.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17364>
While a resource might be shared across different contexts all synchronization
of commands is the resposibility of the user (OpenGL spec Chapter 5 "Shared
Objects and Multiple Contexts").
Currently etnaviv tries to be extremely helpful by flushing foreign contexts
when using a resource that is still pending there. This introduces a lot of
issues, as context flushes can now happen at basically any time and also
introduces a lot of overhead due to the needed tracking and locking. Get rid
of all this cross-context tracking and flushing.
The only real requirement here is that we need to track pending resources
without mutating the state of the etna_resource, as this might be shared
across multiple contexts and thus be used by multiple threads concurrently.
Introduce a hash table to track the current pending resources and their
states in the local context.
A side-effect of this change is that we don't need to keep all used
resources referenced until context flush time, as we don't need to mutate
them anymore to track the status. This allows to free some of them a bit
earlier. Note that this introduces a small possibility of a new resource
matching the key of a already destroyed resource still in the
pending_resources hashtable, but even if we get such a collision the worst
outcome is a not strictly necessary flush of the local context being
performed, which is acceptable if it doesn't happen very often.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14466>
Don't access the status member of etna_resource directly, as this
go away to get rid of shared mutable state. Add a wrapper function
that allows to plug in the new status lookup.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14466>
Etnaviv has no restrictions on buffers being mapped during execution. In
fact most buffers are already always mapped during their lifetime as the
unmap is a no-op. Let the frontend know that it doesn't need to bother
with unmapping buffers.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17343>
As we do not suballocate any buffers, the real map buffer alignment
is determined by the GEM BO map alignment, which is at least 4KB.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17343>
Passes following piglit:
- spec@khr_parallel_shader_compile@basic
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16172>
Early depth test is broken when the linear render target mode is used
and depth is written from the PE stage. It seems RA and PE disagree
about the cache layout in that case, so the RA sees unwritten/invalid
depth cache lines leading to random depth test fails. Early test works
fine if depth is written from the RA stage.
To work around this issue, detect the combination of linear RT, early
test and late write and switch to late test in that case.
Fixes: 53445284a4 ("etnaviv: add linear PE support")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17215>
The drivers not setting it were:
- nv30, which gets lowering using NIR's lower_fsat flag.
- r300, which gets lowering using NIR's lower_fsat flag.
- a2xx, which has was getting it optimized back to fsat anyway.
This drops the check for the cap from gallium nine. While nine does have
a non-nir path, I think it's safe to assume that if you have SM3
texturing, you can do fsat.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16823>
This is used for the old, buggy and slow GLSL IR loop unrolling
code. All drivers have now switched to the NIR unrolling code so
here we remove the CAP.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16366>
GPUs with the LINEAR_PE feature bit have the ability to render into linear
buffers. While this decreases PE cache effectiveness and is thus slower than
rendering into a (super-)tiled buffer, it's still preferable for cases where
we would need a blit to get into linear otherwise, i.e. when importing a
linear buffer or when linear is forced on allocation by usage flags or
modifiers.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16615>
The blob only switches to the 3 single buffer state when required, which seems
to be the case when any color or ZS target is <= 16bpp. Using 2 as the single
buffer state gives a very small 1-2% performance improvement on fillrate
constrained rendering, so it likely affects some PE cache setting.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16615>
This allows glamor to successfully compile its shaders on the GC400.
When running glamor using the GC400, Xorg reports that the compiled
shaders exceed the maximum allowed instructions because the value
reported from the kernel is halved.
Xserver[314]: etna_draw_vbo:318: compiled shaders are not okay
$ cat /sys/kernel/debug/dri/128/gpu | grep instruction_count
instruction_count: 256
However, the spec for the Unified vertex-fragment shader explicitly
lists 256 as the maximum number of instructions for each shader
("256 for vertex shaders; 256 for fragment shaders").
Signed-off-by: Kyle Russell <bkylerussell@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16383>
Now that all consumers of GLSL use NIR, make the remaining drivers take
the path that relies on NIR to really do optimization.
nouveau steam shader-db runtime -6.69631% +/- 1.29235% (n=12).
No change on shader-db there.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16364>
The only interesting ones here were LOWER_IF_THRESHOLD (which previously
had connected to some lowering in GLSL that was broken in the face of side
effects), and FMA (which turned GLSL IR's fma() into TGSI_OPCODE_FMA
instead of MAD).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8044>
This controls the whole lowering of "make tex ops with implicit
derivatives on non-implicit-derivative stages be tex ops with an explicit
lod of 0 instead", but it's really hard to describe that in a git commit
summary.
All existing callers get it added except:
- nir_to_tgsi which didn't want it.
- nouveau, which didn't want it (fixes regressions in shadowcube and
shadow2darray with NIR, since the shading languages don't expose txl of
those sampler types and thus it's not supported in HW)
- optional lowering passes in mesa/st (lower_rect, YUV lowering, etc)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>
I added this hack to my tree when testing another MR and ended up
squashing it into c2a3236d1a (etnaviv: clean up tiling setup in
etna_compile_rs_state) by accident when doing some changes to this
commit. Reinstate the assert.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16212>
On older GPUs a color tile was always 64 Byte. On new GPUs with
CACHE128B256BPERLINE support the tile size is either 128 Byte or
256 Byte depending on the TS mode. Add a helper to return the
color tile size and use in in places that use hard-coded tile
size values or do their own calculation.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255>
128B/256B tile support is not a HALTI5 property, but has its own
separate feature bit.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255>
With access to HALTI5 GPUs with and without DEC400 compression it's
obvious that the previous compression state setup only worked when
DEC400 was present. Properly set up the compression state bits.
This is only the second part of the fix, first part is moving the
compression state to the correct bit location, which has already
happened via the import of new rnndb headers.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255>
On GPUs with the CACHE128B256BPERLINE feature the RS gained some
new state bits to deal with the new additional information required
for this big tile support.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255>
Using the raw layout bits in the tiling setup makes this function harder
to read than necessary. Use the tiling bit defines and assign them to
some local bools with a proper name to make this easier to read.
No functional change.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255>
Support for multiple constant sources per instruction is not a HALTI5
capability, there is a separate feature bit to signal the availability
of this shader core enhancement.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255>
We used the number of pipes to determine which state registers to use
for the RS pipe address configuration, as the dual pipe GPUs were the
first one where the new states were used. This isn't correct though,
as now there are single pipe GPUs which also use the new state
addresses.
There actually is a feature flag telling us to use the new RS pipe
address states, use it. As this feature flag is not available on early
GPUs using the new base address (mostly because we don't have HWDB
entries for them), still check for more than a single pipe as an
additional clue to use new states.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255>
We used the number of pipes to determine which state registers to use
for the PE pipe address configuration, as the dual pipe GPUs were the
first one where those new states were used. Now there are some new
single pipe GPUs where this logic breaks. HALTI0 added the new PE
address states and all GPUs with at least this feature level are using
the new states exclusively, even if they only have a single PE pipe.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255>
Bits per tile and the tile clear value are not determined by the
HALTI version, but by two separate feature bits that are not always
present on HALTI5 GPUs. With big 128B/256B tile support the bits
per tile are always 4.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255>
Update to rnndb commit ad665b720421.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255>
The kernel exposes more minor GPU feature registers. Fill them
all into our internal feature struct.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255>
The debug option only disables the general can_supertile spec of the GPU, so
we should also take this into account when deciding about the layout of a
sampler resource.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9255>
New blob versions always emit this state on GPUs that don't have the
NEW_GPIPE feature bit before drawing a primitive, as it needs to be
set according to the primitive type.
Closes: #2933
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16094>