By exposing a subgroupSize of 64, reductions with
cluster_size 32 in wave32 might be considered divergent,
and thus, result in a VGPR.
Fixes: dEQP-VK.subgroups.clustered.graphics.subgroupclustered* with wave32
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10769>
MinGW DEF parsing is even more broken than before, and 32-bits import
libs are broken regardless one uses opengl32.mingw.def or opengl32.def.
This change removes opengl32.mingw.def and addresses the issue differently:
- link opengl32.dll with --enable-stdcall-fixup
- use the systems opengl32 import lib (libopengl32.a/opengl32.lib)
instead of our own
This change also gets test_wgl built with MinGW (even if it's never tested),
which I used to verify this; and to not link against internal libraries.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
v2: Revert back to shared_library.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10767>
From the ARB_enhanced_layouts spec:
"As with input layout qualifiers, all shaders except compute shaders
allow *location* layout qualifiers on output variable declarations,
output block declarations, and output block member declarations. Of
these, variables and block members (but not blocks) additionally
allow the *component* layout qualifier."
We previously had compile tests in piglit to make sure this was not a
compile error but no execution tests.
Fixes: d99a040bbf ("i965: enable ARB_enhanced_layouts for gen8+")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10763>
With the idea of branching classic device support in to its own tree now would be a good time to also raise the minimum
requirements to something that is more "modern" on x86.
SSE2 was introduced in 2000(!) by default let's make it the minimum spec now
All the old hardware that is moving to the maintenance branch will finally be out of the way.
For the 64-bit side of the discussion there isn't much changed.
* GCC already enables -msse and -msse2 by default
* Same with clang
* fpmath=sse might remove some extraneous x87 usage
** Clang implies fpmath=sse ALWAYS
For the 32-bit side of things is where the exciting details change
* GCC by default doesn't enable sse1 or sse2
** Does all `float`, `double`, and `long double` math with x87
** -msse2 enables sse2 and sse1, gcc still uses x87 even with those enabled
** -mfpmath=sse moves away from using x87 and instead uses sse1 and sse2
* Clang already default enables sse1/sse2 which then turns on their implied fpmath=sse
What does this mean for users?
On Linux raises the default minimum processor spec to SSE2 supporting CPUs
* Intel requirements raise from P5 (1993) to Netburst (2000)
* AMD requirements raise from Athlon(1999/2000) to Athlon 64 (2003)
* Via requirements raise from C3(2001) to C7 (2005)
What does it mean for package maintainers?
For x86-64 distributions that have i386/i686 multilib, then nothing changes. You're already on a platform guaranteed to support SSE2.
For i386/i686 distributions they will need to weigh their min spec against this. Not sure how many still support classic processors.
Who is left out in the cold?
* Intel Quark (2013)
** Embedded board, doesn't have a GPU, Technically has 1x PCIe 2.0 lane that someone could plug a GPU in to
* Some older transmeta CPUs, but they had a followup that also had SSE2.
** Anyone hacking on these with a modern GPU? I'm guessing they know how to turn this option off
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9868>
Fix defect reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
member_not_init_in_gen_ctor: The compiler-generated constructor for this
class does not initialize foldCount.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10713>
Fix defect reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member progType is not initialized in
this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10715>
This is most likely a rebase mistake :(
Fixes: f3e5cd813a ("intel/fs: Handle regioning restrictions of split FP/DP pipelines.")
Ref: aa53665fda ("intel/fs/copy_prop: check stride constraints with actual final type")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10764>
Rockchip SoCs RK3566 and RK3568 have a Gondul with
one shader core and two execution engines, with product ID 0x7402.
Quoting the datasheet:
Mali-G52 1-Core-2EE
* Support 1600Mpix/s fill rate when 800MHz clock frequency
* Support 38.4GLOPs when 800MHz clock frequency
To distinguish it from other variants of G52, we agreed
to call this "G52L", L is for Little.
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10771>
If we only read 8 bytes from a UBO, we need to use LD_UNIFORM.64 as
opposed to LD_UNIFORM.128. In addition to probably being faster, this
fixes a buffer overrun manifesting as MMU faults with source ID
0x500/0x600/0x700, visible in WebGL Aquarium.
This is essentially the same patch as 616394cf31 ("pan/mdg: Use
appropriate sizes for global loads/stores"), only this is for UBOs where
that was for SSBOs.
Before enabling PACKED_UNIFORMS, this bug was not visible since we could
guarantee the UBO size was a multiple of 16. We no longer have that
invariant, and in rare cases the last 8 bytes of the last 16-byte slot
of a mapped uniform buffer would overrun the BO and trigger a fault,
even if the result is unused.
Fixes: 24d7c413fe ("panfrost: Enable packed uniforms.")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10772>
The GLSL precision lowering is still buggy (see !10729), no other driver
enables all the CAPs yet. I don't know enough GLSL IR to be the one to
fix this. In the mean time, this is a hotfix to expose the same set of
CAPs that radeonsi does.
By keeping it with is_deqp, we still get CI coverage of int16.
Closes: #4759
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10777>
Forcing round-to-nearest-even results in loss of opportunities for
conversion folding, causing a regression in gfxbench gl_alu2.
Fixes: de195671bd ("ir3: nir_op_f2f16 should round to even")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10773>
For the multi planes case, only the first plane is required with the
template buffer formats, and shouldn't fail for other planes.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10751>
Add helper for casting the call, which when asserts are enabled will
sanity check the call size to detect corruption.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10753>
This pulls in a fix for the max-texture-size test using piglit-runner,
among other things.
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10749>
It could happen that VGPR spilling without SGPR spilling
calculated a negative spills_to_vgpr number and then
increasing the VGPR target demand above the limit.
Cc: mesa-stable
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10756>
This allows us to emit the gs_alloc_req independently of the
wave ID check, which is what the NIR lowering will need.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>
sanitize_cf_list can in fact invalidate the dominance metadata,
which we need to use eg. nir_unsigned_upper_bound.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>
NGG already needs to use workgroup barriers, but this commit allows
them to come from NIR as opposed to just emitting it in ACO
instruction selection.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>
These intrinsics represent what the hardware can actually do.
Lowering our shaders to use these intrinsics will allow us to
deal with mapping the classic VS, TES, GS (and the future MS)
stages to the hardware capabilities using NIR, which makes our
backend compilers simpler.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>
The lowered NIR code of NGG VS shaders uses this intrinsic
when the VS has to export the primitive ID.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>
These allow us to generate slightly better code in some cases,
eg. multiplications in ACO.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>
These intrinsics will be used when lowering NGG shaders, including
currently supported stages like VS, TES, GS and also by mesh shaders
in the future.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10740>
The strategy is lifted from Freedreno. The trick is to remove buffers
from the resolve set, rather than add buffers to a discard set (as you
would naively try). The latter is wrong -- draws after the
glInvalidateFramebuffer() still need to be respected.
glmark2 -btexture on-screen with sway on Mali T860 from 1393fps to
1998fps
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Closes: #2407
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6295>
This is a bit simpler and will allow resolve to be disabled
independently.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6295>
This just ensures that we don't end up reading/writing outside of the
space reserved in the key. This would have made it easier to to track
down the issue in the previous commit faster.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10730>