Remembering latest states of registers to eliminate redunant SET_CONTEXT_REG packets
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Fixes truncation warning in gcc 8.1
Fixes: 8539c9bf31 ("gallium/radeon: add the kernel version into the renderer string")
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The compiler queue was limited to 3 threads, so shader-db running
on a 16-thread CPU would have a bottleneck on the 3-thread queue.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It will contain more variables.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reduce swizzle constraints to the ALPHA_IS_ON_MSB constraint and the clear
value of 1.
This significantly changes the DCC fast clear code, and fixes fast clear
for RGB formats without alpha.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(This patch doesn't enable the behavior. It will be enabled in a later
commit.)
Draw calls from multiple IBs can be executed in parallel.
v2: do emit partial flushes on SI
v3: invalidate all shader caches at the beginning of IBs
v4: don't call si_emit_cache_flush in si_flush_gfx_cs if not needed,
only do this for flushes invoked internally
v5: empty IBs should wait for idle if the flush requires it
v6: split the commit
If we artificially limit the number of draw calls per IB to 5, we'll get
a lot more IBs, leading to a lot more partial flushes. Let's see how
the removal of partial flushes changes GPU utilization in that scenario:
With partial flushes (time busy):
CP: 99%
SPI: 86%
CB: 73:
Without partial flushes (time busy):
CP: 99%
SPI: 93%
CB: 81%
Tested-by: Benedikt Schemmer <ben@besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
so that the draw is started as soon as possible.
v2: only prefetch the API VS and VBO descriptors
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Using mesa OpenCL failed on a big endian PowerPC machine because
si_vgt_param_key is using bitfields and a 32 bit int for an
index into an array.
Fix si_vgt_param_key to work correctly on both little endian
and big endian machines.
Signed-off-by: Bas Vermeulen <bas@daedalean.ai>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Most files in gallium/radeon now include si_pipe.h.
chip_class and family are now here:
sscreen->info.family
sscreen->info.chip_class
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This requires out-of-band creation of fences, and will be signaled to
the pipe_context::flush implementation by a special TC_FLUSH_ASYNC flag.
v2:
- remove an incorrect assertion
- handle fence_server_sync for unsubmitted fences by
relying on the improved cs_add_fence_dependency
- only implement asynchronous flushes on amdgpu
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It's inaccurate. Instead, see the copyright and use "git log" and
"git blame" to know the authorship.
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It appears that flushing the DB metadata is actually not sufficient
since the driver uses the new VS blit shaders. This looks quite
strange though, but it seems like we need to flush DB for fixing
the corruption.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102955
Fixes: 69ccb9dae7 (radeonsi: use new VS blit shaders (VS inputs in SGPRs)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We'll use it in the scissors / clip / guardband state.
v2: avoid a performance regression on r600 when applied to
(pre-fork) stable branches
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes the extremely unlikely case that an application uses
0x80000000 or 0x3f800000 as border color for an integer texture and
helps in the also, but perhaps slightly less, unlikely case that 1 is
used as a border color.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
The hardware usually does this automatically. However, we upgrade
depth to Z32_FLOAT to enable TC-compatible HTILE, which means the
hardware no longer clamps the comparison value for us.
The only way to tell in the shader whether a clamp is required
seems to be to communicate an additional bit in the descriptor
table. While VI has some unused bits in the resource descriptor,
those bits have unfortunately all been used in gfx9. So we use
an unused bit in the sampler state instead.
Fixes dEQP-GLES3.functional.texture.shadow.2d.linear.equal_depth_component32f
and many other tests in dEQP-GLES3.functional.texture.shadow.*
Fixes: d4d9ec55c5 ("radeonsi: implement TC-compatible HTILE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
We do not enable this by default for additive blending, since it slightly
breaks OpenGL invariance guarantees due to non-determinism.
Still, there may be some applications can benefit from white-listing
via the radeonsi_commutative_blend_add drirc setting without any real
visible artifacts.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This option enables a performance optimization where typical non-blending
draws with depth buffer may be rasterized out-of-order (on VI+, multi-SE
chips).
This optimization can lead to incorrect results when an applications
renders multiple objects with the same Z value at the same pixel, so we
will never enable it by default. But there may be applications that could
benefit from white-listing.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This does not take commutative blending into account yet.
R600_DEBUG=nooutoforder disables it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
When the HS wave is empty, the hardware writes the LS VGPRs starting at
v0 instead of v2. Workaround by shifting them back into place when
necessary. For simplicity, this is always done in the LS prolog.
According to the hardware team, this will be fixed in future chips,
so take that into account already.
Note that this is not a bug fix, as the bug was already worked
around by commit 166823bfd2 ("radeonsi/gfx9: add a temporary workaround
for a tessellation driver bug"). This change merely replaces the
workaround by one that should be better.
v2: add workaround code to shader only when necessary
v3: clarify the prefer_mono comment
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes GL45-CTS.shader_image_load_store.basic-glsl-earlyFragTests.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>