Even if the data race wasn't real (I'm not great at reasoning about
this), helgrind is a nice enough tool that keeping noise out of it is
probably worthwhile. Besides, typing out the numbers keeps the data
in the read-only data section instead of emitting code to initialize
it every time.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Shaders are shared across contexts in gallium (part of making it so
that you get shader compile at link time, for shader-db and to reduce
compiles at draw time). So, we need to protect from variant creation
for a shader from multiple threads at the same time.
Reviewed-by: Rob Clark <robdclark@gmail.com>
There is a single ir3_compiler in the screen, and each context may be
compiling ir3 shaders, which call ir3_create. ralloc doesn't do any
locking on its own, so eventually you can end up racing to break
ralloc's linked lists.
We really don't want struct ir3 to live as long as the compiler (maybe
struct ir3_shader's lifetime, if anything), so you'd better be freeing
it anyway.
Fixes: 8fe2076243 ("freedreno/ir3: convert over to ralloc")
Reviewed-by: Rob Clark <robdclark@gmail.com>
If the variable's going to be static, we shouldn't be memsetting it
from every thread and instead just have it in the data section.
Reviewed-by: Rob Clark <robdclark@gmail.com>
On Haswell, the format works but it doesn't properly do an sRGB decode.
It appears to act identically to R8G8B8_UNORM. Only Vulkan uses this
format so this only affects Vulkan on HSW.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
To select the correct layer the z-coordinate must be rounded before it
is multiplied by six.
Fixes a number of tests out of
dEQP-GLES31.functional.texture.filtering.cube_array.formats.*
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
v2: Eric's nits
v3: Reuse timespec utils (Daniel)
Deal with ppoll being interrupted by a signal (Daniel)
v4: Remove unnecessary time check
v5: Deal with EAGAIN from wl_display_prepare_read_queue() (Daniel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2)
Reviewed-by: Daniel Stone <daniels@collabora.com>
Copied from Weston, upon Daniel's suggestion
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
For now, this keeps the "100 bytes" allocation; we can try to figure out
the correct size as a follow up.
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It causes issues on GFX10.
This fixes rendering issues with vkmark and Wreckfest at least.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
This might happen when a pipeline doesn't define the vertex input
state, so the buffer data format is 0 (aka INVALID).
This fixes crashes when compiling some shaders on GFX10.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Seems to fix some hair artifacts in Max Payne 3:
https://github.com/daniel-schuermann/mesa/issues/76
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: f4e499ec79 ('radv: add initial non-conformant radv vulkan driver')
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This commit rewrites opt_find_array_copies to be able to handle
an array copy sequence with other intervening operations in between. In
particular, this handles the case where we OpLoad an array of structs
and then OpStore it, which generates code like:
foo[0].a = bar[0].a
foo[0].b = bar[0].b
foo[1].a = bar[1].a
foo[1].b = bar[1].b
...
that wasn't recognized by the previous pass.
In order to correctly handle copying arrays of arrays, and in particular
to correctly handle copies involving wildcards, we need to use a tree
structure similar to lower_vars_to_ssa so that we can walk all the
partial array copies invalidated by a particular write, including
ones where one of the common indices is a wildcard. I actually think
that when factoring in the needed hashing/comparing code, a hash table
based approach wouldn't be a lot smaller anyways.
All of the changes come from tessellation control shaders in Strange
Brigade, where we're able to remove the DXVK-inserted copy at the
beginning of the shader. These are the result for radv:
Totals from affected shaders:
SGPRS: 4576 -> 4576 (0.00 %)
VGPRS: 13784 -> 5560 (-59.66 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 8696 -> 6876 (-20.93 %) dwords per thread
Code Size: 329940 -> 263268 (-20.21 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 330 -> 898 (172.12 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We were missing handling for a few other ops that rearrange their
sources somehow in codegen, namely complex2 and select.
This should fix spec@glsl-1.10@execution@built-in-functions@vs-asin-vec3
and possibly other random regressions from the new scheduler which were
supposed to be fixed in the commit right after.
Fixes: 54434fe670 ("lima/gpir: Rework the scheduler")
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Qiang Yu <yuq825@gmail.com>
Remove an unnecessary nir_lower_regs_to_ssa as that should be done by
the state tracker, and add a missing DCE pass after running copy
propagation in order to remove the dead copies. This shouldn't fix
anything but the second part will reduce shader sizes.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
In try_node(), we assume that the node we pick can still be scheduled
successfully after speculatively trying all the other nodes. Normally we
always undo every node after speculating it, so that when we finally
schedule best_node the scheduler state is exactly the same and it
succeeds. However, we also try to spill nodes, which can change the
state and in a corner case that can make scheduling best_node fail. In
particular, the following sequence of events happened with piglit
shaders@glsl-vs-if-nested: a partially-ready node N was spilled and a
register store node S, which is a use of N, was created and then later
the other uses of N were scheduled, so that S is now ready and N is
partially ready. First we try to schedule S and succeed, then we try to
schedule another node M, which fails, so we try to spill the remaining
uses of N. This succeeds, but scheduling M still fails so that best_node
is still S. However since one of the uses of N is one cycle ago, and
therefore we inserted a read dependent on S one cycle ago when spilling
N, S can no longer be scheduled as read-after-write latency is three
cycles.
While we could ad-hoc try to catch cases like this, or (the best option
but very complicated) treat the spill as speculative and roll it back if
we decide not to schedule the node, a simpler solution is to just
give up on spilling if we've already successfully speculatively
scheduled another node. We'd give up a few cases where we discover that
by spilling even harder we could schedule a more desirable node, but
that seems like it would be pretty rare in practice. With this we
guarantee that nothing has been touched after best_node was successfully
scheduled. We also cut down on pointless spilling, since if we already
scheduled a node it's unlikely that spilling harder will let us schedule
an even better node, and hence any spilling at this point is probably
useless.
While we're here, clean up the code around spilling by flattening the
two if's and getting rid of the second unnecessary check for INT_MIN.
Fixes: 54434fe670 ("lima/gpir: Rework the scheduler")
Acked-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
With OpenCL, kernels can take arguments and return values (?). However
in practice, there is no more TGSI compute implementation, and even if
there were, it would probably have named functions and no explicit main.
This improves RA considerably for compute shaders, since temps are not
kept around as return values.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
The meson conversion chose to change the meaning of DEBUG to "used for
debugging" to be "used for expensive things for debugging", primarily
for nir_validate. Flip things over so that we get nice things with
optimizations enabled.
While we're at it, also kill off nouveau_statebuf.h which is unused (and
has a mention of DEBUG which is how I found it).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Previously the code only handled it for positions 1 and up (as would be
for UBO's in GL). It's not a lot of trouble to handle this, and vl or
vdpau want this.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
This caused nouveau's function handling logic to think that the MAIN
function was due to receive external parameters, and cascaded some
failures after that. Instead avoid having the undefined components in
the first place.
Fixes: f6ac0b5d71 (gallium/auxiliary/vl: Add compute shader to support video compositor render)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111217
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will enable us to fuse inverts in various ways. Marginal hurt:
total instructions in shared programs: 3610 -> 3611 (0.03%)
instructions in affected programs: 67 -> 68 (1.49%)
helped: 0
HURT: 1
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than putting registers after SSA in the MIR indexing, put them
side-by-side, shifted 1, using the bottom bit as the SSA/reg select.
This will allow us to generate SSA temps in the compiler.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Set cu_qp_delta_enable_flag on when rate control is enabled, and set it
off when rate control is disabled (e.g. constant qp).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110673
Cc: mesa-stable@lists.freedesktop.org
V2: fix typo and add bugzilla info
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Set cu_qp_delta_enable_flag on when rate control is enabled, and set it
off when rate control is disabled (e.g. constant qp).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110673
Cc: mesa-stable@lists.freedesktop.org
V2: fix typo and add bugzilla info
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
MaxPicOrderCntLsb should be at least 16 according to the spec,
therefore add minimum value check.
Also use poc value passed from st instead of calculation
in slice header encoding.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110673
Cc: mesa-stable@lists.freedesktop.org
V2: Fix typo
V3: Use MAX2 macro instead of coding. Also MaxPicOrderCntLsb
should be power of 2 according to spec.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
MaxPicOrderCntLsb should be at least 16 according to the spec,
therefore add minimum value check.
Also use poc value passed from st instead of calculation
in slice header encoding.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110673
Cc: mesa-stable@lists.freedesktop.org
V2: Fix typo
V3: Use MAX2 macro instead of coding. Also MaxPicOrderCntLsb
should be power of 2 according to spec.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
We don't have calculate final quotient in order to calculate unsigned
modulo result. Once we are done with error correction we have partial
result which can be used to find out modulo operation result
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Rather than bailing if we see something that's not SSA, do out the
analysis to check if we can pipeline and do so if we can.
total registers in shared programs: 392 -> 391 (-0.26%)
registers in affected programs: 3 -> 2 (-33.33%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than always emitting an extra move for fragments, check the
actual criteria and emit accordingly. (This was lost during the RA
improvements at the end of May).
total bundles in shared programs: 2210 -> 2176 (-1.54%)
bundles in affected programs: 501 -> 467 (-6.79%)
helped: 34
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.59% max: 33.33% x̄: 13.13% x̃: 12.50%
95% mean confidence interval for bundles value: -1.00 -1.00
95% mean confidence interval for bundles %-change: -16.06% -10.21%
Bundles are helped.
total quadwords in shared programs: 3639 -> 3605 (-0.93%)
quadwords in affected programs: 795 -> 761 (-4.28%)
helped: 34
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.96% max: 33.33% x̄: 11.22% x̃: 8.33%
95% mean confidence interval for quadwords value: -1.00 -1.00
95% mean confidence interval for quadwords %-change: -14.31% -8.13%
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Think of this pass as register coalescing part 2. After RA runs, but
before scheduling, we scan for code of the form:
mov rN, rN
and delete the move, since it's totally redundant. This pass helps
already, but it'd of course be much more effective paired with
register coalescing to encourage moves in general to end up in this
form. Nevertheless, even by itself:
total instructions in shared programs: 3665 -> 3613 (-1.42%)
instructions in affected programs: 2046 -> 1994 (-2.54%)
helped: 52
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 25.00% x̄: 8.02% x̃: 4.00%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -10.26% -5.79%
Instructions are helped.
total bundles in shared programs: 2256 -> 2213 (-1.91%)
bundles in affected programs: 1154 -> 1111 (-3.73%)
helped: 43
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.33% max: 25.00% x̄: 9.10% x̃: 5.56%
95% mean confidence interval for bundles value: -1.00 -1.00
95% mean confidence interval for bundles %-change: -11.60% -6.60%
Bundles are helped.
total quadwords in shared programs: 3689 -> 3642 (-1.27%)
quadwords in affected programs: 2025 -> 1978 (-2.32%)
helped: 47
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 25.00% x̄: 7.86% x̃: 3.85%
95% mean confidence interval for quadwords value: -1.00 -1.00
95% mean confidence interval for quadwords %-change: -10.30% -5.42%
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The source and destination were incorrectly flipped in the move, but
some details of our internal regalloc made this function anyway. Now
that we're changing the regalloc, we need to fix this to avoid
regressing blend shaders.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This is a special case of DCE designed to run after the out-of-ssa pass
to cleanup special register lowering.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>