The term 'last' may be misleading because the offset represents
the current unifa offset, which is the offset used by the last
load plus 4 bytes, so rename these to use the term 'current'
instead.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
This implements a NIR pass that groups together constant UBO loads
for the same UBO index in order of increasing offset when the distance
between them is small enough that it enables the "skip unifa write"
optimization.
This may increase register pressure because it can move UBO loads
earlier, so we also add a compiler strategy fallback to disable the
optimization if we need to drop thread count to compile the shader
with this optimization enabled.
total instructions in shared programs: 13557555 -> 13550300 (-0.05%)
instructions in affected programs: 814684 -> 807429 (-0.89%)
helped: 4485
HURT: 2377
Instructions are helped.
total uniforms in shared programs: 3777243 -> 3760990 (-0.43%)
uniforms in affected programs: 112554 -> 96301 (-14.44%)
helped: 7226
HURT: 36
Uniforms are helped.
total max-temps in shared programs: 2318133 -> 2333761 (0.67%)
max-temps in affected programs: 63230 -> 78858 (24.72%)
helped: 23
HURT: 3044
Max-temps are HURT.
total sfu-stalls in shared programs: 32245 -> 32567 (1.00%)
sfu-stalls in affected programs: 389 -> 711 (82.78%)
helped: 139
HURT: 451
Inconclusive result.
total inst-and-stalls in shared programs: 13589800 -> 13582867 (-0.05%)
inst-and-stalls in affected programs: 817738 -> 810805 (-0.85%)
helped: 4478
HURT: 2395
Inst-and-stalls are helped.
total nops in shared programs: 354365 -> 342202 (-3.43%)
nops in affected programs: 31000 -> 18837 (-39.24%)
helped: 4405
HURT: 265
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
This adds a minimum thread count parameter to each compilation strategy with
the intention to limit the minimum allowed thread count that can be used to
register allocate with that strategy.
For now all strategies allow the minimum thread count supported by the
hardware, but we will be using this infrastructure to impose a more
strict limit in an upcoming optimization.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
We will be using this distance to setup another optimization in a
follow-up patch.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
x# Please enter the commit message for your changes. Lines starting
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
This can be called outside a render pass so we should not expect to have
a job available. Also, we should not be emitting state here, instead we
should do in the pre-draw handler with all the other draw call state.
Fixes cases of crashes in RenderDoc when selecting elements in the
Event Browser.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10130>
first_component is an uint, and thus if it takes value 0 we can't know
if it is because writemask has its first bit to 1, or all bits to 0.
As we want to ensure that at least one bit is set, apply the assertion
in writemask.
Fixes CID#1472829 "Macro compares unsigned to 0 (NO_EFFECT)".
v2:
- Restore "first_component <= last_component" assertion (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10103>
SB si known to be buggy and the ultimate aim is to make it go away. To
test workloads with better optimizations it makes sense to be able to
enable SB, but for the NIR backend it should not be enabled together
with NIR the default. Therefore an a specific debug option "nirsb" that
enables NIR with SB.
Fixes: 3b27243b01
r600: Enable sb also for NIR
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10108>
CopyImage supports copying MSAA images if the number of samples match.
Found by inspection because this is untested by CTS for some reasons.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10055>
This shouldn't be needed and this is going to be wrong with VRS
attachments because dimensions are divided by the VRS texel size.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10111>
It's really hard to make sure we have the right amount of %s in the
format string, so let's change how we generate this string.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9896>
The reason the member has a leading underscore is because volatile is a
keyword in C. We don't want to carry that detail into the error-string,
so let's drop the underscore.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9896>
This format-string seems to have been incorrect since it's inception.
But there's also been commits that have both forgotten to add and remove
flags as appropriate as well.
Let's correct the format-list. This was done by counting by hand. A
better solution for the long-term is coming in a future commit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9896>
First, we need to lower alu to scalar so that all alu ops on doubles
only take one input. Then, we can use our new double lowering pass.
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Reviewed-by: Michael Tang <tangm@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10063>
MakeDouble is pretty straightforward, but SplitDouble is interesting
since it returns a unique 2-element struct.
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Reviewed-by: Michael Tang <tangm@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10063>
Whenever we have an ALU op that's operating on a double, we'll unpack
it as an integer, then repack it as a float. When we have an ALU op that
returns a double, we'll unpack it as a double, then repack it as an integer.
Then, simple algebraic opts will remove any redundant unpack/repack ops,
so we should be left with constructing and deconstructing doubles using
the right operations.
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Reviewed-by: Michael Tang <tangm@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10063>
HLSL doesn't support bitcasting a 64bit integer to a double. DXIL
doesn't have generic pack/unpack instructions, so we lower those to
integer bitwise ops. As a result, NIR generic double pack/unpack would
require our backend to emit a bitcast to get a double, but we want
to match HLSL semantics and emit MakeDouble/SplitDouble.
Adding a dedicated opcode for double pack/unpack allows us to add a
pass to emit that instead, which lets our backend emit the right
instruction to pack and unpack doubles.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10063>
unsetting zink from GALLIUM_DRIVER is required in order for lavapipe to
work, but setting it back is totally broken in the case where an app
creates a ton of screens simultaneously
instead, just leave it set to llvmpipe, and if a race condition occurs,
at least llvmpipe isn't going to fail a test that zink passes
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10120>
Again, NVIDIA supports this on every fbconfig, Mesa alternates it on and
off for some reason but only on some drivers, and in particular llvmpipe
doesn't try to create sRGB-ful fbconfigs. Nerf it out of the fbconfig.
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1648>
Mesa marks accumful fbconfigs as slow (for no especially good reason,
it's only accum operations that aren't accelerated, and we could fix
that). NVIDIA doesn't. All that mismatching on this attribute can do is
prevent a config from working exactly as well as it possibly can.
Trust the server's opinion here (but warn if you ask for warnings).
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1648>
For hardware drivers we've never set this to anything interesting so
there's no benefit to validating it. For drisw we're at the mercy of
whatever the X server sent to us anyway, and it's not like any server is
going to vary two fbconfigs by _just_ these values.
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1648>
Let's preface this all by noting that the accum buffer is unused even by
legacy feature standards, and that anybody needing it to be performant
is probably not using Mesa to begin with since we've never accelerated
it. This fix is really about making drisw work under hostile GLX
environments, since it doesn't have any control over what's running on
the server side.
NVIDIA's driver simply lists RGBA16 accumulation buffers for every
fbconfig, and the accum buffer's alpha channel is always non-zero even
if the color buffer is RGBX. If we try to point llvmpipe at such a
screen, then _none_ of the depth-24 fbconfigs will find a matching DRI
config, since DRI's accumful config will have 0 accum alpha bits. This
is somewhat limiting since most X applications are expecting an RGBX
config and will be accidentally translucent at depth 32.
Due to the somewhat ugly nature of how xserver constructs fbconfigs, if
you run a driver with this fix against a server from before this fix (or
vice versa), you will find the opposite result: none of your RGBX
fbconfigs will have an accum buffer, though the RGBA ones still will.
That's a pretty acceptable tradeoff to me since what we're gaining is
the ability to use llvmpipe at all.
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1648>
a synchronous driver can use PIPE_MAP_ONCE to infer that a buffer is
guaranteed to not be mapped multiple times, as this is only used when
doing map -> memcpy -> unmap directly
a threaded driver performs maps/unmaps asynchronously, so this flag
can only be used by the driver to confirm that the mapped region is accessed
exactly once, not that it will not need to remain mapped for other transfer_map
uses after it is unmapped
in short, consider this scenario:
transfer_map(A) -> memcpy(map, data) -> transfer_unmap(map_A) ->
transfer_map(A) -> memcpy(map, data) -> transfer_unmap(map_A)
when a synchronous driver executes this, the call chain is unmodified
when a tc driver executes this, the call chain may become:
transfer_map(A) -> memcpy(map, data) ->
transfer_map(A) -> memcpy(map, data) ->
transfer_unmap(map_A) -> transfer_unmap(map_A)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10113>
GPUs with the feature bit PE_NO_ALPHA_TEST set have no fixed-function
alpha test unit and we want to let st lower it with a shader variant.
For GC7000K this fixes all fbo-alphatest-formats piglits like:
spec@ext_framebuffer_object@fbo-alphatest-formats
spec@ext_packed_float@fbo-alphatest-formats
spec@ext_texture_srgb@fbo-alphatest-formats
This only works with the NIR compiler backend.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Tested-by: Lukas F. Hartmann <lukas@mntmn.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9871>
getenv is not working well with Android VM. Instead, use os_get_option
to read Android system property.
e.g. adb shell setprop mesa.vn.debug drm
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10112>
They were used for tracking whether SSA needed to be repaired,
but now the repair is done for all functions with structured control flow.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7755>
This fixes OpSwitch corner case when switch doesn't have any targets
just a `default` and SSAs defined in it is used after switch block
directly without phis.
v2: Just use `repair_ssa` for all structured control-flow cases
( - Jason Ekstrand <jason@jlekstrand.net>
- Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> )
Closes: #3787
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7755>
v2: Instead of fixing unitialized member 'fs_visitor::input_vue_map'
(as reported by Coverity Scan in defect CID 1474559),
remove unused members 'vec4_tcs_visitor::input_vue_map' and
'fs_visitor::input_vue_map'.
Also fixed 'debug_enabled' argument skipped in a fs_visitor constructor
call from brw_compile_tes().
Signed-off-by: Yevhenii Kharchenko <yevhenii.kharchenko@globallogic.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10040>