Support PRIME render offload between a Wayland server gpu and a Wayland
client gpu with different channel ordering for their color formats,
e.g., between Intel drivers which currently only support ARGB2101010
and XRGB2101010 import/display and nouveau which only supports ABGR2101010
rendering and display on nv-50 and later.
In the wl_visuals table, we also store for each format an alternate
sibling format which stores colors at the same precision, but with
different channel ordering, e.g., ARGB2101010 <-> ABGR2101010.
If a given client-gpu renderable format is not supported by the server
for import, but the alternate format is supported by the server, expose
the client-gpu renderable format as a valid EGLConfig to the client. At
eglSwapBuffers time, during the blitImage() detiling blit from the client
backbuffer to the linear buffer, the client format is converted to the
server supported format. As we have to do a copy for PRIME anyway,
this channel swizzling conversion comes essentially for free.
Note that even if a server gpu in principle does support sampling
from the clients native format, this conversion will be a performance
advantage if it allows to convert to the servers preferred format
for direct scanout, as the Wayland compositor may then be able to
directly page-flip a fullscreen client wl_buffer onto the primary
plane, or onto a hardware overlay plane, avoiding an extra data copy
for desktop composition.
Tested so far under Weston with: nouveau single-gpu, Intel single-gpu,
AMD single-gpu, "Optimus" Intel server iGPU for display + NVidia
client dGPU for rendering.
v2: Implement minor review comments by Eric Engestrom: Add some
comment and assert, and some style fixes for clarity.
No functional change.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Surface reads don't need them because they just have the one address
payload. With surface writes, on the other hand, we can put the address
and the data in the different halves and avoid building the payload all
together.
The decrease in register pressure and added freedom in register
allocation resulting from this change reduces spilling enough to improve
the performance of one customer benchmark by about 2x.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We're about to add some more if cases so let's have the giant re-indent
in it's own patch to make review easier.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
These have clearly never seen any use.... On gen8, the bottom 4 bits are
missing so we need to shift them off before we call set_bits and shift
again when we get the bits. Found by inspection.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Instead of fetching the information out of the instruction directly,
fetch the descriptor and then pluck the information out of the
descriptor. The current scheme works ok for SEND but with SENDS, it all
falls to pieces because the descriptor is completely shuffled around.
This commit doesn't actually convert everything. One notable exception
is URB messages which don't even use descriptors in emit_urb_WRITE yet.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We want to be able to extract data from descriptors as well as unify a
bit of the descriptor construction.
One of the unifications we do is to unify the read/write and dataport
descriptors. On gen4-5, read/write are substantially different and the
read descriptors change between gen4 and gen4.x. On gen6, they unified
layouts between read, write, and dataport. Then, on gen8, they added
one bit to the message type field but left it reserved MBZ for
read/write messages. This commit chooses to treat that as if they
expanded the field everywhere and just didn't have enough enum values
for read/write to bother with the extra bit.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This commit pulls the surface descriptor helpers out into brw_eu.h and
makes them no longer depend on the codegen infrastructure. This should
allow us to use them directly from the IR code instead of the generator.
This change is unfortunately less mechanical than perhaps one would like
but it should be fairly straightforward.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Instead of magically falling back to SIMD8 for atomics and typed
messages on Ivy Bridge, explicitly figure out the exec size and pass
that into brw_surface_payload_size.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Like all the other sends, it's just mlen * REG_SIZE.
Fixes: 3cbc02e469 "intel: Use TXS for image_size when we have..."
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
If you pass a bool in as the value to set, the C standard says that it
gets converted to an int prior to shifting. If you try to set a bool to
bit 31, this lands you in undefined behavior. It's better just to add
the explicit cast and let the compiler delete it for us.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
There are piles of fields that it doesn't check so using it is a lie.
The only reason why it's not causing problem is because it has exactly
one user which only uses it for MOV instructions (which aren't very
interesting) and only on Sandy Bridge and earlier hardware. Just get
rid of it and inline it in the one place that it's actually used.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Previously we tried to normalize nr_samples to MAX2(1, nr_samples) to
avoid having to deal with 0 vs 1 everywhere. But this causes problems
in mesa/st, for example st_finalize_texture() will think there is a
nr_samples mismatch and recreate the texture. Somehow this manifests
as corrupt x11 font rendering on generations that do not support MSAA
(but apparently works fine on a5xx and a6xx which do support MSAA.)
Fixes: cf0c7258ee freedreno/a5xx: MSAA
Signed-off-by: Rob Clark <robdclark@gmail.com>
Switched to the raw bo list api to avoid having to use 2 arrays for
everything.
This was introduced in libdrm 2.4.97 which we already depend upon.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This has been disabled some months ago because it introduced
rendering issues with Shadow Of Warrier II (DXVK). This game is
no longer affected, I wonder if 824cfc1ee5 ("radv: rework the
TC-compat HTILE hardware bug with COND_EXEC") fixed the problem.
I checked The Forest on my Polaris, and it renders fine too.
According to Phillip, this gives +5.5% with Rise Of The Tomb
Raider and DXVK. This is because DXVK uses 16-bit depth surfaces
while the native port from Feral uses 32-bit depth surfaces.
Unfortunately, Shadow Of The Tomb Raider isn't affected because
it clears each layer of a D16 array texture individually. So it
doesn't hit the fast clear path.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The core Mesa with_asm_arch and USE_ARM_ASM flags are disabled for meson
cross-builds because of the need to run host binaries on the build system.
vc4 doesn't need to do that, so skip with_asm_arch to enable NEON on my
cross-builds.
Fixes: ebcb4c2156 ("meson: Enable VC4's NEON assembly support.")
Otherwise, the compiler is free to reuse the register containing the input
for another call and assume that the value hasn't been modified. Fixes
crashes on texture upload/download with current gcc.
We now have to have a temporary for the cpu2 value, since outputs must be
lvalues.
(commit message by anholt)
Fixes: 4d30024238 ("vc4: Use NEON to speed up utile loads on Pi2.")
This fixes the depth/stencil clear on a20x, and adds a fast clear path.
The fast clear path is only used for a20x, needs performance tests on a22x.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
This allows us to avoid expensive string compares since we already have
a map to the pointers.
These compares were taking ~30 seconds for a single shader compile
in Godot due to it using 64,000+ uniforms.
Fixes: c4cff5f402 ("glsl: add basic support for resource list to shader cache")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109229
meson.build:166:21: ERROR: Unknown method "verson_compare" for a string.
Fixes: c1efa240c9 ("meson: Add warnings and errors when using ICC")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cc: 18.3 <mesa-stable@lists.freedesktop.org>
Enable using etnaviv for KMS renderonly. This still needs KMS driver
name mapping to kmsro to be used automatically.
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Rob Herring <robh@kernel.org>
This allows vc4 to initialize on the Adafruit PiTFT 3.5" touchscreen with
the hx8357d tinydrm driver
v2: Whitespace fix noted by Eric Engestrom, update commit message for the
driver being merged.
v3: Rebase on Rob Herring's pipe-loader changes.
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Acked-by: Emil Velikov <emil.velikov@collabora.com> (v1)
If we can't find a driver matching by name, then use the kmsro driver.
This removes the need for needing a driver descriptor for every possible
KMS driver.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
The vc4 driver can do prime sharing to many different KMS-only devices,
such as the various tinydrm drivers for SPI-attached displays. Rename the
driver away from "pl111" to represent what it will actually support:
various sorts of KMS displays with the renderonly layer used to attach a
GPU.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Instead of using this useless array_params_mask variable.
This should set these two attributes to streamout buffers too.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>