The load-store vectorizer can create a large amount
of unnecessary nir_op_vec and nir_op_mov instructions.
This prevents nir_opt_move from stalling to much and
potentially also helps other passes.
Closes: #4778
Fixes: 1958381c9a ('radv: Reorder some NIR optimizations in preparation for the I/O changes.')
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10804>
It could happen that we tested negative out-of-range
registers for p_create_vector operands resulting in a crash.
Fixes: 8962510e38 ('aco/ra: Conservatively refactor get_reg_specified to use PhysRegInterval')
Closes: #4697
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10799>
Not every MR, but several per day since I landed the apitrace switch which
increased trace replay resolution to 1080p. Hopefully some day we can
tune the hangcheck to be less aggressive.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10793>
VS outputs are "per vertex" but not the kind of I/O we want to match
with this helper. Change to a name that covers the "arrayness"
required by the type.
Name inspired by the GLSL spec definition of arrayed I/O.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10493>
Otherwise the lowering pass might try to lower any other load from
a deref if its data.location value happens to be zero.
Fixes: 418c4c0d7d
compiler/nir: extend lower_fragcoord_wtrans to support VARYING_SLOT_POS
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10577>
when emitting barriers, we don't need to end the renderpass in some cases
this handles the case of emitting barriers for new resources which have
never before been accessed and thus do not require synchronization at a
specific point in the batch
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10700>
we've been tracking which attachments are deferring clears but not
specifically the ones that are going to be cleared in a renderpass
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10761>
this is a small optimization for the no-wait case when unflushed usage
exists since it's impossible for a qbo update to happen instantly
no functionality will be fixed by this, it's just a very minor optimization
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10796>
a fence applies to all the submitted cmdbufs, so it's necessary to do
the flush which creates the user fence after all the cmdbufs have been
processed in order to avoid creating a fence that only applies to the
first cmdbuf
Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10795>
this breaks the driver!
the uploader always maps its own pointer, so modifying that at any
point just explodes things later
Fixes: d179c5d28e ("zink: implement threaded context")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10787>
FS will get the last geometry VUE, but it still needs to recompute in
case the number of position slots assigned by geometry is larger than
one -- this happens when Primitive Replication is used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10653>
We've only considered the perf query pool change previously. But we
also need to pay attention to the pass index.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0a7224f3ff ("anv: group as many command buffers into a single execbuf")
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10301>
Giving NULL for anv_combine_address() triggers an assert in that
function.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8525ebe6e3 ("intel/mi_builder: Return an address from __gen_get_batch_address")
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10301>
The initial value needs to be taken from the instruction that is being
moved over, not the one to be moved.
Additionally the parameter of this function was removed because it was
misleading. Setting it to any value other than source_idx would cause
register_demand to be initialized incorrectly. (Instead, the maximum
demand among the covered instructions would need to be determined.)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10644>
By exposing a subgroupSize of 64, reductions with
cluster_size 32 in wave32 might be considered divergent,
and thus, result in a VGPR.
Fixes: dEQP-VK.subgroups.clustered.graphics.subgroupclustered* with wave32
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10769>
MinGW DEF parsing is even more broken than before, and 32-bits import
libs are broken regardless one uses opengl32.mingw.def or opengl32.def.
This change removes opengl32.mingw.def and addresses the issue differently:
- link opengl32.dll with --enable-stdcall-fixup
- use the systems opengl32 import lib (libopengl32.a/opengl32.lib)
instead of our own
This change also gets test_wgl built with MinGW (even if it's never tested),
which I used to verify this; and to not link against internal libraries.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
v2: Revert back to shared_library.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10767>
From the ARB_enhanced_layouts spec:
"As with input layout qualifiers, all shaders except compute shaders
allow *location* layout qualifiers on output variable declarations,
output block declarations, and output block member declarations. Of
these, variables and block members (but not blocks) additionally
allow the *component* layout qualifier."
We previously had compile tests in piglit to make sure this was not a
compile error but no execution tests.
Fixes: d99a040bbf ("i965: enable ARB_enhanced_layouts for gen8+")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10763>
With the idea of branching classic device support in to its own tree now would be a good time to also raise the minimum
requirements to something that is more "modern" on x86.
SSE2 was introduced in 2000(!) by default let's make it the minimum spec now
All the old hardware that is moving to the maintenance branch will finally be out of the way.
For the 64-bit side of the discussion there isn't much changed.
* GCC already enables -msse and -msse2 by default
* Same with clang
* fpmath=sse might remove some extraneous x87 usage
** Clang implies fpmath=sse ALWAYS
For the 32-bit side of things is where the exciting details change
* GCC by default doesn't enable sse1 or sse2
** Does all `float`, `double`, and `long double` math with x87
** -msse2 enables sse2 and sse1, gcc still uses x87 even with those enabled
** -mfpmath=sse moves away from using x87 and instead uses sse1 and sse2
* Clang already default enables sse1/sse2 which then turns on their implied fpmath=sse
What does this mean for users?
On Linux raises the default minimum processor spec to SSE2 supporting CPUs
* Intel requirements raise from P5 (1993) to Netburst (2000)
* AMD requirements raise from Athlon(1999/2000) to Athlon 64 (2003)
* Via requirements raise from C3(2001) to C7 (2005)
What does it mean for package maintainers?
For x86-64 distributions that have i386/i686 multilib, then nothing changes. You're already on a platform guaranteed to support SSE2.
For i386/i686 distributions they will need to weigh their min spec against this. Not sure how many still support classic processors.
Who is left out in the cold?
* Intel Quark (2013)
** Embedded board, doesn't have a GPU, Technically has 1x PCIe 2.0 lane that someone could plug a GPU in to
* Some older transmeta CPUs, but they had a followup that also had SSE2.
** Anyone hacking on these with a modern GPU? I'm guessing they know how to turn this option off
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9868>
Fix defect reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
member_not_init_in_gen_ctor: The compiler-generated constructor for this
class does not initialize foldCount.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10713>
Fix defect reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member progType is not initialized in
this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10715>
This is most likely a rebase mistake :(
Fixes: f3e5cd813a ("intel/fs: Handle regioning restrictions of split FP/DP pipelines.")
Ref: aa53665fda ("intel/fs/copy_prop: check stride constraints with actual final type")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10764>