Straightforward by using the pixel hashing table computation helper
previously introduced, assuming we know the fraction of work that
needs to be submitted to each pixel pipe. Note that AFAIA the
hardware maps indices in the table to pixel pipes from largest to
smallest, so it shouldn't be necessary to permute indices based on the
physical IDs of the pixel pipes as we are doing on Gen11.
Improves performance of most non-trivial graphics workloads I've tried
on an 80 EU TGL. E.g. the following testcases improve performance
significantly with sample size 27 and statistical significance 1%:
gputest/pixmark_piano: 62.89% ±0.10%
gputest/pixmark_volplosion: 61.51% ±0.06%
unigine/valley: 26.72% ±0.25%
gfxbench/gl_5_high: 24.70% ±0.19%
unigine/heaven: 23.54% ±0.17%
steam/csgo: 22.75% ±4.36%
gfxbench/gl_manhattan31: 22.43% ±0.29%
gfxbench/gl_4: 20.92% ±0.35%
warsow/benchsow: 19.15% ±2.53%
gfxbench/gl_trex_off: 18.84% ±0.27%
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8749>
Pixel hashing tables are a pain to type in, review and maintain IMHO.
In order to obtain satisfactory load balancing on all Gen12 parts
currently in production this series would need to add 5 different
additional tables. Instead this introduces a simple algorithm able to
calculate a table on the fly based on a handful of parameters.
Note that the Gen11 tables generated with this algorithm are not
identical to the hardcoded ones, however the only difference should be
a phase shift that isn't expected to have any effect on performance,
since it shouldn't change the fraction of work submitted to each pixel
pipe.
The CPU overhead from this change is negligible since the tables only
need to be programmed once at context init time.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8749>
Unlike Gen11, Gen12 hardware supports up to three pixel pipes per
slice.
Unfortunately the kernel interface is somewhat inconsistent between
Gen11 and Gen12: I915_PARAM_SUBSLICE_MASK returns a mask of enabled
*dual* subslices since TGL, so there is half the number of bits per
pixel pipe in the mask. This is worked around here so we're able to
calculate the correct size of each pixel pipe, but the result is
returned in dual subslice units, inheriting the inconsistency from the
kernel -- Reason is that as of now all our Gen12 subslice counts
returned by gen_device_info.c are really dual subslice counts, and the
num_eu_per_subslice counts are also scaled accordingly, so it seems
like it would only make the matter worse if I fixed the units of this
field only without also fixing the rest.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8749>
This command allows programming custom pixel hashing tables
controlling the balancing of load across pixel pipes. Rather
confusingly 3DSTATE_SLICE_TABLE_STATE_POINTERS was serving the same
purpose on Gen11: A pixel is mapped to the pixel pipe with index
specified by the entry in the table corresponding to the LSBs of the
pixel coordinates [Yes you read right the entries are neither subslice
nor slice indices!]. Either a 2-way or a 3-way table can be
programmed based on whether the platform has two or three pixel pipes
per slice. In addition the 16x8 tables defined below can hold two
separate 8x8 tables when in DUAL_TABLE mode (which AFAIA is only
useful for platforms with multiple asymmetric slices -- I.e. no
production platforms as of today to my knowledge).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8749>
The former "Subslice Hashing Mode" field is no longer used by the
hardware, Gen12 parts always do 16x16 subslice pixel hashing -- Remove
it since it's no longer useful. In addition add a couple of bits that
will be useful in order to make some adjustments to the default pixel
pipe hashing behavior.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8749>
Enable pipe capability of exporting stencil from shader when Vulkan
extension is available.
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9244>
Enable SPV_EXT_stencil_export and SpvCapabilityStencilExportEXT and
mark output with FragStencilRefEXT when fragment shader writes to
reference stencil value.
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9244>
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9244>
this attempts to dynamically establish an upper bound for per-batch descriptor
use, flushing all batches and resetting the pools on alloc failure in
an attempt to be more robust about it
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9117>
previously if any of the pending clears required an explicit clear then
we'd clear them explicitly, but with this patch we're shifting the first
pending clear into the renderpass begin if possible and then applying the
remaining clears on top of that in order to reduce gpu operations
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9206>
we have src regions for all the blit/copy/map calls, so we can use those to
verify whether we actually need to apply the clears now or if we can keep
sitting on them a while longer
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9206>
if we know we're going to be reading from a region then we can examine the
pending clears to see if there's any overlap, which helps us decide whether
we need to apply them immediately
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9206>
this just requires adding an explicit flush for the deferred clears in
case conditional rendering is disabled before a draw happens
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9206>
instead, we can attach the clear to the next renderpass start and even add it to
the renderpass cache for reuse
also add handling for flushing clears on map or fb switching to avoid brekaing behavior
this should save us a lot of time with potentially beginning/ending renderpasses as well
as allowing drivers to do better batching of clears by passing in all the buffers at
once
this doesn't handle deferring conditional renders yet in a futile attempt to try and keep
the size of the patch down
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9206>
Dead-cells (and perhaps others) does MapBufferRange(UNSYNC|DISCARD_RANGE)
to update single buffered VBOs every frame in the opening menu screen,
and because we were considering UNSYNC with higher priority than
DISCARD_RANGE, this would result in the game racing VBO updates between
binning and tile passes, causing visibility stream inconsistency between
the two passes, resulting in "tile flicker".
The letter of the gl spec implies this is undefined behavior (at least
to my reading of it). But we already hand DISCARD_RANGE in the !UNSYNC
case, so just go down this path instead. It means we could potentially
end up invalidating (and back-blit) in cases where the app really knows
what it is doing, but oh well.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4337
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9223>
At this stage I can't think of any reason a swrast would need to
limit descriptor sets at this point. I'm sure there are some in the
future.
Otherwise it just fixes up a lookup that really is a noop, but
may as well make it correct.
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9238>
Previously can_do_source_mods was used to determine whether a value with
a source modifier or a value from a scalar source (e.g., a uniform)
could be copy propagated. The former is a superset of the latter, so
this always produces correct results, but it is overly restrictive. For
example, a BFI instruction can't have source modifiers, but it can have
scalar sources.
This was originally authored to prevent a small number of shader-db
regressions in a commit that marked SHR has not being able to have
source modifiers. That commit has since been dropped in favor of a
different method.
v2: Refactor register region restriction detection to a helper function.
Suggested by Jason.
No fossil-db changes on any Intel platform.
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20039111 -> 20038943 (<.01%)
instructions in affected programs: 31736 -> 31568 (-0.53%)
helped: 104
HURT: 0
helped stats (abs) min: 1 max: 9 x̄: 1.62 x̃: 1
helped stats (rel) min: 0.30% max: 0.88% x̄: 0.45% x̃: 0.42%
95% mean confidence interval for instructions value: -2.03 -1.20
95% mean confidence interval for instructions %-change: -0.47% -0.42%
Instructions are helped.
total cycles in shared programs: 980309750 -> 980308897 (<.01%)
cycles in affected programs: 591078 -> 590225 (-0.14%)
helped: 70
HURT: 26
helped stats (abs) min: 2 max: 622 x̄: 23.94 x̃: 4
helped stats (rel) min: <.01% max: 2.85% x̄: 0.33% x̃: 0.12%
HURT stats (abs) min: 2 max: 520 x̄: 31.65 x̃: 6
HURT stats (rel) min: 0.02% max: 2.45% x̄: 0.34% x̃: 0.15%
95% mean confidence interval for cycles value: -26.41 8.64
95% mean confidence interval for cycles %-change: -0.27% -0.03%
Inconclusive result (value mean confidence interval includes 0).
No shader-db changes on earlier Intel platforms.
Reviewed-by: Anuj Phogat anuj.phogat@gmail.com [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9237>
We have the software decode hooked up now, and it's not like this
decompression is more expensive than a lot of other software decompression
we have. One less feature to be missing from the old swrast classic driver.
Reviewed-by: Adam jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9194>
The size is in bytes, pos is a dword index.
Fixes these asan failures (not tested in CI since we run a fraction):
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.const_expression_compute,Crash
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.const_literal_compute,Crash
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9162>
xmesa_copy_st_framebuffer would attempt to flush from the front buffer
to the display, but we don't actually have an attachment for the front
buffer (just the back) so nothing would happen. Fix this by flushing
fron the back to the display, threading the dirty box through so we
don't update more than we were told to.
This has the virtue of displaying correctly, but glx-copy-sub-buffer
still fails since there is no real front buffer, reads from GL_FRONT
actually read from the back buffer. The test does: clear to red, swap,
clear to green, copy sub-buffer, expect a green square inside of a red
one from the front buffer. Since you're really reading from the back you
instead get solid green.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9140>
Include "util/list.h" as per "util/simple_mtx.h" one line later
Fixes the following building error in Android:
In file included from external/mesa/src/amd/common/ac_rgp.c:24:
external/mesa/src/amd/common/ac_rgp.h:31:10: fatal error: 'list.h' file not found
^~~~~~~~
1 error generated.
Fixes: 12515d6b ("ac/rgp: add rgp co, col, pso data structures")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4334
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9202>
dEQP in CI was hitting these, and debug_printf is not enabled on non-debug
(such as debugoptimized or release) builds. Besides, mesa_loge() gets you
logging on Android, should someone ever do zink for that.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8891>