this can get called from multiple threads with the recent llvmpipe
overlapping rendering changes, so make sure to lock around the
map/unmapping so they can't race.
This should fixes some crashes seen with kwin.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Tested-by: Adam Williamson (Fedora)
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17531>
If the fragment shader exports MRTZ and the epilog some color exports,
DONE/VM should be added to the last export.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17485>
The main fragment shader can only export MRTZ (if present) and the
epilog will export colors.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17485>
The first operand of this new pseudo-instruction is a 64-bit SGPR for
the continue PC, followed by a variable list of fixed VGPRS for the
color exports which are the PS epilog inputs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17485>
The PS epilog would be a "normal" compiled shader using RA, etc. It
will declare up to 8x4 VGPRs for all color exports.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17485>
The main FS would have to jump to the PC of the PS epilog. Given that
shaders are allocated in the 32-bit addr space, one user SGPR is fine.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17485>
To pass the necessary pipeline information for compiling PS epilogs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17485>
This is currently disabled but it will be used for testing first,
and then for graphics pipeline libraries.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17485>
When doing 10bit encoding in ffmpeg it uses
VaDeriveImage, and that could result in missing
mapping the chroma buffer of the input frame.
This WA to disallow ffmpeg using VaDeriveImage
function, so that VaCreateImage and VaPutImage can
be used and WA the chroma buffer mapping issue.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17472>
We're not quite conformant with the extension, because we don't have
a valid conformance version.
That's not a quick-fix, so we should probably just accept some failures
for now.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16998>
This is just a bag of misc properties that we should fill in.
Not all of them are filled out super accurately, but this is the best we
can do for now.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16998>
This should be possible to support, but we don't support minmax blending
at all yet, so let's leave these as unsupported for now.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16998>
Before enabling Vulkan 1.2 support, we need to fix the TODO in here.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16998>
These might not be exactly right, but they are good enough for now.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16998>
We can do better here in the future, but this is what's supported right
now.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16998>
When using a static callable stack, the required scratch has already
been allocated.
Dynamic stacks are located at the end of scratch memory
and are allocated on demand using radv_set_rt_stack_size.
Static stacks live at the start of scratch memory and are allocated in
create_rt_shader by setting scratch_size.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-By: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17579>
When culling enabled, it will use LDS space, which overlap with
the prim id export.
Fixes: e97f0463a8 ("ac/nir: Implement NGG deferred attribute culling in NIR.")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17593>
This function causes a crash with RADV_DEBUG=llvm and this commit
works around that crash.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17580>
this ensures the alpha component is full if it must be read for fbfetch
fixes (RGBX swapchain config):
KHR-GL46.blend_equation_advanced*
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17366>
these are only ever going to hurt tiler perf, so remove the footgun
this also means there's no more srgb format conversion needed, so delete
all of that too
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17366>
in the scenario where:
* at least 1 color buffer was bound and a depth buffer was bound
* no color clear was enabled
* a zs clear was enabled
* the zs clear was never flushed
* the zs clear needs a renderpass
* the fb state changes
the color buffer(s) would be unbound, following which the depth buffer unbind
would trigger a renderpass, which would utilize the just-unbound color buffers,
which have no batch tracking, thus creating a case where the surface was destroyed
while it was still in use
cc: mesa-stable
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17366>
formats like GL_RGB10_A2UI can be cleared with out of range values,
so to ensure consistent driver behavior, pre-clamp to the valid range
affects:
KHR-GL46.direct_state_access.renderbuffers_storage_multisample
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17366>
it's unknown whether there may be clears to the depth attachment at the start of
a renderpass, so always assume there will be
Fixes: c132a28745 ("zink: use store op NONE when necessary for depth usage")
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17366>
this was really stupid: instead of just binding a new fb and firing off
a clear, the code was calling u_blitter to bind a new fb and do actual
draws
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17366>
These are reachable, and
dEQP-VK.api.smoke.triangle_ext_structs,Crash is why.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17586>
LLVM-15 enables opaque pointers by default. We temporarilly request
non-opaque pointers while we migrate our code to support non-opaque pointers.
This workaround needs to be removed before LLVM-16.
See #6615
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17514>
Similar motivation as c426e21ff1 ("turnip: Reverse the order of walking
pipes or tiles on odd rows."), but instead we just swap the order of
alternate rows of fd_tile the the gmem stateobj.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17303>
When point smoothing is enabled then this lowering pass will
modifies the alpha component of every write to fragment output.
Anti-aliased points get rounded with respect to their radius instead
of square.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15117>
When poly_line smoothing is enabled then this lowering pass will
modify the alpha component of every write to fragment output
using sample coverage mask.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16245>
You'll get all this and more anyway once you're in NIR. This lets us GC a
bunch more ARB program transformation code.
No effect in shader-db on softpipe.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17528>
This replaces our mesa_remove_output_reads(), which in turn GCs some other
ARB program transformation code.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17528>
We don't need to go grubbing around in the ARB program when we can use the
right variable type at prog_to_nir time. This does leave
fp->system_values_read/inputs_read as they were, but I don't see anywhere
that that matters (the NIR will have its info gathered appropriately, and
other lowering may also cause mismatch between the gl_program and the
NIR).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17528>
This uses APIs that are not available on Win7. Since this is a build-time
configuration, and since we can't use the SDK version as an indicator
(since you can support Win7 via new SDKs), a new option is added to allow
disabling it, to maintain Win7 support if desired.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17431>
This change adds the tracepoints that can help understand app behavior
for debugging and performance optimization purposes.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17497>
Previously we suballocate only for host visible memory type to reduce
the kvm mem slot usage. That is no longer an issue given the limit has
been raised. However, we should still suballocate to make layering
clients performant. So we just suballocate regardless of mem type.
This change also increases the allowed suballocation size request from
64K to 128K, which makes layering clients happier.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17497>
binding/unbinding a texture affects the previously specified parameters,
so ensure the driver flag for clamp emulation is also set to perform
updates as needed
Fixes: e8f71f6ac4 ("mesa/st: add PIPE_CAP_GL_CLAMP")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17459>
nir->info.has_transform_feedback_varyings is set for all stages in the
pipeline when xfb is present, so it can't be used for this
harmless, but this is more correct
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17404>
this splits all the members of a struct into separate variables to
improve xfb inlining and reduce the number of locations consumed by
xfb outputs, reducing the chances of running out of shader outputs
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17404>
get_slot_components() returns the total number of output components
for arrays for initial evaluation phase, but during the packed->inlined
conversion the arrayed size must be normalized to the slot's component count
in order to effectively catch and inline the array
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17404>
these are not necessarily the same case even if in glsl they are the same,
and by splitting it out a bunch of redundant array[scalar] code can be deleted
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17404>
We used to do it for every queue, which was duplicate work as pstate is
per-device. It could also cause trouble when multiple hw_ctx are created as
the call will succeed for only one of them and the rest will return -EBUSY.
Simplify and fix this by only setting for the first non-null hw_ctx.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17541>
In the direct path we already skipped draws, but in DGC I noticed
that just emitting these packets can cause issues ...
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17269>
Got to put the commandbuffers & uploadbuffers there. With DGC
those can be allocated by the application.
Excluding it from all other buffers/images to avoid using the
precious 32bit address space.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17269>
while some (tg4) sample ops can use different bit sizes in spirv, most
cannot, and all the shader variables are always emitted as 32bit, so
ensure the 32bit type is always what's being used for sampling
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17427>
It's everyone's favourite day, infrastructure maintenance Friday.
This includes manual disables for a618-vk and zink-anv-tgl, because
apparently the disable-on-variable rules don't carry through to those
jobs for ... some reason.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17553>
When the shader state is destroyed before the async shader compile is
done, we get a use after free in the async thread, as the shader state
it is operating on is gone. Fix this by dropping any pending job from
the async queue or wait for it to finish before destroying the state
by calling util_queue_drop_job.
Also call util_queue_fence_destroy, which would have caught this issue
by asserting that the queue_fence is in signalled state when the
shader state is destroyed.
Fixes: 1141ed5859 ("etnaviv: async shader compile")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17534>
The two macros introduced here form a (hopefully) unobjectionable
subset of those added in !17203.
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17488>
This type is useful beyond the scope of winsys.
It can now be used without being lumbered with a dependency on
pvr_winsys.h. Since pvr_winsys.h is used by pvr_private.h, this can be
a common cause for circular dependencies during development.
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17488>
It's still 1.17 but the version is changed due to the fixed size
fw struct update.
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Rajnesh Kanwal <rajnesh.kanwal@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17491>
Instead of trying to compact the surface state table to get rid of any
unused render targets, emit MAX(1, colorAttachmentCount) surface states
always. This ensures that secondaries will always match with primaries
when we go to do the copy since there's no rule requiring the secondary
to have VK_FORMAT_UNDEFINED when the primary has a NULL image view.
Fixes: 3501a3f9ed ("anv: Convert to 100% dynamic rendering")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17543>
On Linux, the static glapi path sees libGL.so implementing the static
glapi, and the drivers (libgallium_dri.so) updating/reading the TLS
vars.
On Windows, to allow libgallium_wgl.dll to be a full ICD, it's
responsible for implementing the actual static glapi. However, before
this change, OpenGL32.dll was also implementing the static glapi,
meaning that GL API calls from OpenGL32.dll didn't route to the driver
correctly because the TLS vars were never actually set - the driver set
its copy, and OpenGL32.dll read its own copy.
Now, always build a bridge and static version of glapi when not using
shared. The bridge version is linked into OpenGL32.dll, and the static
version is linked into the driver on Windows. GLES only builds with
shared glapi - but after this, shared glapi is not really needed on
Windows for GLES, since the driver has all of the data.
Fixes: f36921ef ("wgl: Refactor drivers to a libgallium_wgl.dll")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6560
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16713>
The sampler views array needs to be dimensioned by
PIPE_MAX_SHADER_SAMPLER_VIEWS, not PIPE_MAX_SAMPLERS.
This fixes out-of-bounds array writes when using more than 32
textures in a shader.
Also add some assertions to check array indexing elsewhere.
And change loop limits to be based on ARRAY_SIZE().
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17062>
This fixes another VMware test (dx9-stretch-formats-copy-a8r8g8b8-x8r8g8b8).
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17062>
The logicop_enable and independent_blend_enable vars need to always
be assigned, otherwise, once turned on, they could never be disabled.
This fixes a number of failures in VMware's test suite.
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17062>
This allows a VMware test to pass. The comments in lp_bld_limits.h
mention SM 3.0 requirements, but we're in the SM 5.0 era...
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17062>
For the linear rendering fast path, we need to know whether the
texcoord argument to tex instructions comes directly from an FS input
(swizzling OK, but no arithmetic, etc). Use the input register info
to fill in the tex_info object.
This is part of a fix for some linear rendering bugs.
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17062>
We were saving the address of the constants[] and nir_constants[]
arrays in the jit structure. But those arrays went out of scope
before use.
This patch moves the constants[] array to the function scope and
consolidates the TGSI/NIR paths.
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17062>
The texture_unit field needs to be set like the sampler_unit field.
Also, add a swizzle initialization and some comments.
Signed-off-by: Brian Paul <brianp@vmare.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17062>
And GET_DADX(), GET_DADY(), GET_PLANES(). This is a bit more
readable, making the expected parameter types explicit.
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17062>
The code did not handle more than 32 textures. We have a test that
exercises 128 textures (views) and crashed w/ memory corruption down
in the llvm code generator because of this.
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17062>
To get the size (in bits) of a bitset.
And minor clean-up in __bitset_ffs().
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17062>
Use the existing 'set_state' field. Some code was using 'state'
and other code was using 'set_state'. It didn't really matter
since lp_rast_cmd_arg is a union, but this removes some potential
confusion.
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17062>
Streamout is not L3 coherent so previous writes to the same address
might be pending and overwrite the SO writes later when they get
flushed from L3, even though the SO write happened later in the batch.
v2: Use the right flag (not COUNTER)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6680
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17503>
Unlike image writes, buffer writes may access the same memory in
different ways, which we've seen in the past can cause problems. Use an
incoherent access to force flush/invalidate between accesses to the same
buffer, unless we know the barrier applies to images only.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17193>
Required for indirect(2) ray tracing to work.
Fixes the following tests:
dEQP-VK.ray_tracing_pipeline.trace_rays_indirect2.indirect_*
dEQP-VK.ray_tracing_pipeline.trace_rays_cmds_maintenance_1.indirect2_*
Fixes: 16585664 ("radv: vkCmdTraceRaysIndirect2KHR")
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17316>
Required for indirect(2) ray tracing to work.
Fixes: b30f96dd ("radv,aco: Use ray_launch_size_addr")
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17316>
Certain player has hard coded max_transform_hierarchy_depth_inter and
max_transform_hierarchy_depth_intra values set through VA-API, which
doesn't work on radeon HW. Until properly fixing it on player side,
temporarily adding this workaround to use calculated values instead.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17489>
the si_pm4_clear_state() initializes the variable in struct si_pm4_state which
anyway gets freed in si_pm4_free_state(). Hence no need to call
si_pm4_clear_state() in si_pm4_free_state().
Signed-off-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17504>
Fixes dEQP-VK.ray_query.advanced.using_wrapper_function.comp.*
An empty struct is causing problems because when passing it as
argument the spirv parser will just drop the argument, considering it
does not hold any data.
v2: update radv CI
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4c703686db ("spirv: handle ray query intrinsics")
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17420>
Format on buffers is irrelevant and bind flag validation needs to be disabled.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Reviewed-by: Giancarlo Devich <gdevich@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17446>
If the memobj wraps a resource, then we only succeed the
mapping operation if the gallium desc matches the D3D12
resource desc.
If the memobj wraps a heap, then we can place whatever
gallium is describing on the heap.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Reviewed-by: Giancarlo Devich <gdevich@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17446>
Win32 memory objects can be imported by name (const void *
that will be interpreted as const wchar_t *)
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Marek Olák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17446>
Remove the previous compile-time early-ZS implementation and replace it with the
decoupled early-ZS implementation. This uses more efficient settings in some
cases (depth/stencil tests always passes or do not write), and fixes the
settings used in another case (alpha-to-coverage enabled with an otherwise
early-ZS shader.)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Closes: #6206
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
If we know ahead-of-time that depth/stencil testing will always pass, it's
better to use weak_early than force_early. However, if depth/stencil testing
could fail (discarding pixels), we'd rather use force_early. Determine which
case we're in at CSO create time.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
The new early-ZS helpers are pure functions, leaf nodes of the call graph, and
implemented with a different algorithm from the "oracle" table of correct values
for various combinations of states. Further, incorrect settings often still pass
CTS while causing game bugs or inefficiencies. That combination makes the
helpers an excellent candidate for unit tests. Add some.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
Bifrost (and Valhall) separate early-ZS configuration into two fields: when does
the depth/stencil buffer update happen? and when are pixels killed by the
depth/stencil tests? The driver separately configures these to occur early
(before the shader executes) or late (after the ATEST instruction executes at
the end of the shader). Early tests are generally more efficient, but various
combinations of API state and fragment shader properties can require late
updates and/or late kills for correctness. Determining how to configure these
fields is nontrivial.
Our current implementation (on Bifrost) configures these fields at fragment
shader compile time and bakes the settings into the RSD. This is both wrong
(using early testing when late testing is required) and suboptimal (using late
testing when early testing would suffice). We need to defer this configuration
until draw time, when we know rasterizer and Z/S state.
Reclassifying at draw time (as we currently do on Valhall) would be expensive,
especially with the extra terms added in here. To cope, decouple the shader
classification from the draw-time configuration. Since there are only a few bits
of draw state involved, this implementation just calculates all possible states.
Then the draw time classification is just indexing into a lookup table.
The actual algorithm used to classify is written with correctness and clarity in
mind. Unlike the current classification algorithm (which tries to match what the
DDK does, poorly), this algorithm embeds its proofs of correctness.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
In theory this wait is required for correct behaviour of discarded threads with
ATEST. Mesa usually waits before the instruction after ATEST, so this wait will
get optimized out by va_merge_flow, but as our scheduler gets more sophisticated
this could become an issue.
Let's stay on the safe side and insert the recommended wait.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
In theory, ATEST can take any combination of registers for inputs.
Experimentally, however, ATEST requires the coverage mask in R60. This avoids
regressing the following dEQP tests, which write their coverage mask with
pixel-frequency-shading but without writing to the depth/stencil buffer.
dEQP-GLES31.functional.shaders.sample_variables.sample_mask.discard_half_per_pixel.*
This issue is known to affect both Mali-G52 (v7) and Mali-G57 (v9). I am unsure
if this is a silicon bug or just an obscure implementation detail.
No shader-db changes.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17428>
Nos that glsl_to_nir is setting sample_shading_enable whenever FB fetch
is used, we don't need to duplicate it here.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
Now that this information is accurately gathered by spirv_to_nir, we no
longer need the hack. We just need to fix up the way we handle some of
the key bits.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
Thanks to the previous commit, we no longer need this check.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
If framebuffer fetch is used, we have to enable sample shading because
the fetched framebuffer value is per-sample.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
We don't really want to base this on a late nir_gather_info for two
reasons:
1) The Vulkan spec says that if a sample-qualified input, SampleID, or
SamplePosition are in the entry-point's interface, you get
per-sample dispatch. This means we really should gather this
information before dead-code has a chance to delete anything.
2) We want to be able to add nir_intrinsic_load_sample_pos intrinsics
as part of lowering passes without causing per-sample interpolation.
This means nir_gather_info needs to stop gathering it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
We don't really want to base this on a late nir_gather_info for two
reasons:
1) The GL spec says that any static use of a sample-qualified input,
gl_SampleID, or gl_SamplePosition causes per-sample dispatch. This
means we really should gather this information before dead-code has
a chance to delete anything.
2) We want to be able to add nir_intrinsic_load_sample_pos intrinsics
as part of lowering passes without causing per-sample interpolation.
This means nir_gather_info needs to stop gathering it.
For 1, this doesn't actually get us quite there as GLSL IR may have
deleted something already. However, it does get us closer.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
This lets us drop demote_sample_qualifiers as well as a back-end check
for key->multisample_fbo.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
On Intel, we have to do this because we can't ask for the per-sample
barycentrics without setting the per-sample dispatch bit or the GPU will
hang. However, nothing we're doing in this pass is Intel-specific and
it may be a useful optimization for someone else so we may as well make
it a generic NIR pass. This version actually does a bit more than the
current brw_nir_demote_sample_qualifiers() pass as it also handles
pre-nir_lower_io interp_dref_at* as well as a couple system values which
we can easily constant-fold.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
NIR constructs this information for us as part of nir_gather_info these
days so we can simplify our logic a bit. This will also let us be more
correct once we move uses_sample_shading scraping earlier.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
Pandecode is not thread-safe (for a large number of reasons) and does not even
try to be. This is a problem when tracing (or just using PAN_MESA_DEBUG=sync)
multithreaded applications. The most common symptom of the problem are assertion
failures deep in the red-black tree implementation, which is not thread-safe.
Just protect the whole thing by a "in pandecode?" mutex, since this is not
performance sensitive code and we don't really care about the extra
serialization incurred. As pandecode does not recurse into itself, we may simply
lock at the beginning and unlock at the end of each entrypoint in pandecode,
which is thread-safe regardless of how pandecode is used. A few entrypoints are
refactored to avoid early returns to keep the lock/unlock calls in obvious
visual pairs.
Fixes flakes when running the CL CTS with PAN_MESA_DEBUG=sync like we would in
CI (e.g: events.event_flush)
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Tested-by: Icecream95 <ixn@disroot.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17409>
The physical tile buffer size (and hence the maximum available tilebuffer size)
are implementation-defined. Track this information on the device so we can
correctly select tile sizes, instead of hardcoding the value for Midgard.
Implementation values are pulled from the "Tile bits/pixel" row of the public
Mali data sheet [1]. That row lists the maximum number of bits available for a
pixel given the maximum tile size and pipelining. For currently supported
hardware (v9 and older), that maximum tile size is 16x16. So those values should
be multiplied by (16 * 16 * 2) / 8 to get the physical size in bytes.
This may improve Bifrost/Valhall performance on workloads using multiple render
targets. It also gets us ready for the dazzling array of tile sizes available
with v10.
[1] https://developer.arm.com/documentation/102849/latest/
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17432>
Separate out "calculating the size of each pixel", "selecting a tile size", and
"calculating the colour buffer allocation". Then implement the middle (selecting
a tile size) with a simple constant time expression, rather than a loop. There's
a bit of related clean up in here.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17432>
The if statement we insert would insert a new block before the end block
(and remove the old pre-end-block). If the new block ended up later in
the HT due to its pointer's hash value, you'd emit another copy of the if
statement after the last one. I saw this happen up to 4 times in testing.
The worst case would be if all those additions and removals ended up
reallocating the HT, at which point we might use-after-free.
Fixes inconsistent shader-db results with geometry shaders.
Cc: mesa-stable.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17501>
va_pack asserts a large number of invariants about the instruction being packed.
If one of these fails (due to an invalid instruction), it's helpful to inspect
the failing instruction, as it may not be apparent in a large shader. Pass the
instruction through with all the assertions in va_pack for easier debugging.
Now assertion failures when packing are easier to debug:
Invalid invariant lo.type == hi.type:
= STORE.i32.flow2.wls br5, br2, wls_ptr[1], byte_offset:0
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17224>
When we assert out due to certain invalid encoding, it's helpful to know what
instruction is causing the failure, since it may not be obvious from the
assembly for large shaders. Now we get nice errors when failing:
Invalid opcode:
br0 = VAR_TEX.f32.flow8.store.skip.lod_mode.center , texture_index:0, varying_index:0
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17224>
Apparently, the point coord origin within a batch can change with Gallium
(seemingly even with GLES? where that's impossible at an API level?) but that
doesn't matter if we're not drawing points. This might have to do with internal
Gallium CSOs (e.g. u_blitter). Issue noticed in SuperTuxKart, which was getting
state change flushes. Performance on one level on a Valhall GPU improved from
26fps to 29fps.
Fixes: 3641dfe436 ("panfrost: Flip point coords in hardware")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17430>
b6a30b72ab ("panfrost: Implement provoking vertices on Valhall") added an
assertion that every draw selects a particular provoking vertex. The intent was
to ensure provoking vertex selection actually happened. Unfortunately, the
assertion is too strong, as the provoking vertex is irrelevant for some (most)
draws. For those, we don't *want* to commit to a particular provoking vertex for
those to avoid flushing.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17430>
driFetchDrawable is only ever called from the MakeCurrent path, which
means it has to handle the case of pre-GLX-1.3 Windows being named as
the drawable. When it finds the drawable in the hash, it increments its
refcount before returning it, so for a GLXWindow it would be 2 on first
return, one from glXCreateWindow and one from glXMakeCurrent. But when
it does not find the drawable and creates one for the naked Window, the
reference count on first return would only be 1. As a result, if this
context was then ever bound to a different drawable, the old Window's
DRI drawable state (like the back buffer) would be destroyed.
Fixes piglit's glx-multi-window-single-context and glx-make-current for
a variety of drivers.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6713
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17479>
There is nothing more TODO here. With softpin, which is available on all
GPUs using texture descriptors, there is no need for the kernel to patch
the descriptor, as the proper GPU virtual address is filled in by userspace.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17448>
There already is a error handling label to free the sampler view
struct and return failure. Consistently use this label to make
error handling more uniform.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17448>
Texture descriptors currently waste a massive ammount of memory, as every
one is allocated via a separate BO. As the allocation granularity of the
kernel is 4KB and the descriptor is only 256B, 93.75% of the allocated
memory is wasted.
Add a simple suballocator for the texture descriptors, to allocate multiple
ones out of a single kernel BO. This isn't perfect, as freed slots in the
suballocated resource are not reused, but worst-case we end up with the
same waste as we had before. This also potentially improves efficiency at
the kernel side, as this reduces the number of BOs needed for the sampler
views in each submit.
As the BO is now used by multiple descriptors, avoid syncing with the GPU
via the cpu_prep/fini calls, as to not introduce stalls between pending
rendering and new descriptors being filled. This is safe, as each
suballocation slot is only used once, so newly filled slots are certainly
not in use by the GPU.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17448>
The dummy texture descriptor and the dummy render target relocs are not ever
changed by a context operation, so we can save some space by moving them to
the screen and potentially share them and the BOs backing them between
multiple contexts.
Also don't hold two pointers to the same BO, one in the reloc and one raw,
but always just use the reloc one.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17448>
libc++ does not yet implement the c++17 monotonic_buffer_resource,
so emulate it by doing normal allocations that are cleaned up
when the resource is destroyed.
v2: - Use C include and version without namespace aligned_alloc
- add include for sstream needed with clang++ (maurossi)
Closes: #6836
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17452>
With that most piglits for compatibility contexts are
passing, so enable higher compatibility profile support.
v2: fix formatting ahd fallthrough tag (Filip)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17484>
Another hot-fix: when a local register is written to, it is
actually unlikely that the value is never used, so just make
sure that this is never done.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17484>
On Cayman bank_swizzle[4] is never counted up, so add an
additional condition to make sure the loop is finished
at one point. This is a hot-fix, the logic below should be
improved.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17484>
Now that we have separate C types for the different sub-command types,
we can require a pointer to that type to be passed into functions
which expect the current sub-command to be of a specific type.
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Rajnesh Kanwal <rajnesh.kanwal@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17458>
This is a simple optimization to make type-specific uses of struct
pvr_sub_cmd slightly less verbose.
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Rajnesh Kanwal <rajnesh.kanwal@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17458>
The compiler has improved significantly since we found this issue
and this is no longer required.
Notice that because we are increasing the number of samplers
supported beyond what we can loop unroll (currently capped at 16),
some piglit tests that test the maximum number of samplers supported
start to fail because they use indirect indexing on a sampler array
and we don't support that (previously the indirect indexing was
removed by loop unrolling). This is a bug in tests which the
GLSL linker detects, failing to compile the shaders.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17509>
Also, remove the FIXME to pre-compute this in images. We only use
this helper from copy/clear operations where we may be working
with a compatible framebuffer format instead of the original image.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17509>
Correct the max thread number for DG2+ platforms according
to below bspec.
Ref: Bspec: 47202
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17506>
Before, if the ssbo is too large this would always return 0.
Also, this code is easier to optimize, so the common case of offset 0
and pot stride results in one ushr instead of 5+ instructions.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17468>
SIGSEGV is used by Vulkan API trace layers to track user changes in
device memory mapped to user space. Now with drivers such as Zink, GLES
applications are translated into Vulkan API calls and therefore it is
possible to be tracked by Vulkan api trace layers.
Blocking SIGSEGV hinders one of the memory tracking mechanisms used by
such layers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17273>
With dynamic vertex bindings the vertex format lookups are a lot
more frequent (vs being baked in the pipeline). Add a simple lookup
cache using a dynamic array to keep track of the hw values, and
avoid repeated translation.
This also reduces the memset to just the bitfields since all
the others will be overwritten.
Seen in perf traces gputest gimark with zink on radv.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15846>
Ensure that if we have swizzle on the initial format, that the
component bits are identical with the lowered format.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17385>
For some formats like VK_FORMAT_B5G6R5_UNORM_PACK16, we have no direct
matching HW format. We can support it by swizzling.
We already apply those swizzles for image views. We just forgot to
deal with buffer views.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6235
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17385>
We were defaulting to a swap interval of 1, but we can follow dri2/dri3's
lead and respect the driconf var.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17470>
We need to track what the caller has given us for swap interval, and use
that to set the present mode at startup.
Fixes incorrect vblank syncing in apitrace's glretrace, which sets the
swap interval to 0 before the swapchain is made.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17470>
Compare to dri2SetSwapInterval() and dri3_set_swap_interval()
implementations of the same method.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17470>
Fill in support for TXD instruction which emits shader TEXLDD opcode.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17500>
Rename the args from low_bias/compare to src1/src2, since they
are used for different purposes depending on the texture load
type. No functional change.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17500>
While SPIR-V's OpKill is block terminating, the converted discard
intrinsic is not block terminating. This can lead to issues where
instruction could be placed after discard.
This patch adds an extra pass that drops all instructions after discard
before we convert discards.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17474>
It was checking "mesa's theoretical max attributes" rather than "the
driver's max attributes."
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17449>
Fixes arb_timer_query-timestamp-get on my radv system, where the GPU has
been on for many days and the timestamp would only increment every once in
a while.
Part of fixing #6808
Fixes: 7a40b734ee ("zink: handle timestamp queries")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17477>
glsl_get_explicit_size can return non-16 byte aligned sizes.
Therefore, make sure the sure the size isrounded up so that OOB does not happen.
Fixes: ea8a0654f5 ("zink: further improve bo sizing")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17460>
From looking at the CTS,
VK_QUERY_TYPE_ACCELERATION_STRUCTURE_SIZE_KHR
refers to the serialization size and not to the
actual, current size.
Fixes the following CTS:
dEQP-VK.ray_tracing_pipeline.acceleration_structures.query_pool_results.cpu.buffer.size
dEQP-VK.ray_tracing_pipeline.acceleration_structures.query_pool_results.cpu.memory.size
dEQP-VK.ray_tracing_pipeline.acceleration_structures.query_pool_results.gpu.buffer.size
dEQP-VK.ray_tracing_pipeline.acceleration_structures.query_pool_results.gpu.memory.size
Fixes: 5d56c2c ("radv: Add accel struct queries for maintenance1")
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17444>
This intrinsic is only produced when the compiler is instructed
to handle layer id as a system value, which we don't use. Also,
we have been supporting layered rendering for a while and passing
all the relevant tests which would've failed if we were hitting
this lowering.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17483>
When using the texel buffer copy path to copy a buffer we need to
sample from the buffer and for that we need a texture shader state
record where we specify the base offset of the texture (the buffer).
If the copy operation has a start offset we can't add that offset
to the base address of the buffer because the texture state record
requires the base pointer to be 64-byte aligned, so it would only
work for offsets that are multiple of 64B. Instead, we pass the
offset (in elements) to the shader and we use that to shift the
indices into the buffer when selecting the source texel to copy.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17482>
This hasn't actually been exported for a while. I think I probably broke
this in
commit 63a6b719d9
Author: Adam Jackson <ajax@redhat.com>
Date: Tue Dec 5 11:10:09 2017 -0500
glx: GLX_MESA_multithread_makecurrent is direct-only
in which I made it no longer default to having client support, but
failed to instruct dri{2,3,sw} to enable it. In any case, it was never
widely used, there is no EGL equivalent, and we've had zero complaints
about it getting nerfed.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17473>
Those tests either fail or hang so just exclude
all of them for now to make ray tracing CTS usable
again.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17443>
The GL extension, EXT_texture_compression_astc_decode_mode, enables
applications to specify the desired decoding precision when decoding
non-sRGB ASTC textures. The options for the channels are FP16 (the
default), UNORM8, and RGB9_E5.
The ASTC LDR decoder outputs to UNORM8 by doing the following
conversions: UNORM16 -> FP16 -> UNORM8. This doesn't seem to be defined
by any specification and is costly according to perf profiles. To
conform to the decode mode spec (and for better performance), we convert
UNORM16 to UNORM8 by simply storing/keeping the top 8 bits.
In a texture upload microbenchmark, this decreases the upload time for
textures in the linear color space by about 34%.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17195>
The ASTC extension specs state that a vector of UNORM8 values are
returned when decoding sRGB ASTC textures. For the alpha channel
however, they don't seem to specify how to get there from the UNORM16
produced after interpolation (or returned from a void-extent block).
The ASTC decoder in the VK-GL-CTS project treats the alpha channel like
the RGB channels and simply uses the top 8 bits of the UNORM16. For
better performance, we choose to do the same.
In a texture upload microbenchmark, this decreases the upload time for
textures in the sRGB color space by about 13%.
Ref: https://gitlab.khronos.org/egl/DataFormat/-/merge_requests/32
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17195>
There's slots in the glapi static dispatch table (which is still
arguably ABI) which we need to preserve, but we can stop exposing the
extension string or doing anything in the added functions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17425>
Because clear colors are stored as 4 32bit component values, there is
an issue if you try to format instance :
- clearing in R16G16_UNORM
- draw in R32_UINT
Clear will use 2 components of the clear color in dword0 & dword1.
While draw will use only one component of dword0.
This change uses the mutable format information to track whether clear
colors can be non-zero for fast clears.
With :
- non mutable formats, we can fast clear with any color on Gfx > 8
- mutable formats with incompatible component sizes, we can only
fast clear with 0 color
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5930
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17329>
Add a double-underscore prefix to mark this macro as internal-only. It
should not be used directly, and only exists as a helper for the
pvr_csb_*() macros.
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17359>
These do not modify the input struct pvr_csb at all, so the function
signature should reflect this.
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17360>
Over-estimating latency can cause us to delay the critical paths of
the shader unnecessarily, producing larger QPU programs that take more
time to execute as a result (and it also adds register pressure) so
striking a balance is important. The thread switching model in V3D
is quite effective at hiding latency and usuallly we just need to
hint it to delay TMU instructions a little bit to find the best
compromise for performance.
The new latency numbers have been chosen empirically by testing
V3DV with Sponza and a few UE4 samples.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17451>
Based on empirical testing with Sponza and a few UE4 samples this is
consistently slightly benefitial for performance.
The most likely reason why this helps is that thrsw is probably
already quite effective at hiding latency and we are already trying
to hide latency at NIR scheduling and also via TMU pipelining, so
piling up on this when scheduling QPU typically ends up providing no
benefit at all for latency and is instead possibly preventing us to
unblock critical paths in the shader that depend on the TMU result,
requiring us to execute more cycles to complete the program.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17451>
This workaround looks actually broken. We added it in the past
because otherwise the game would just report 3GiB of video memory
(ie. size of GTT on SD). Though, with this workaround enabled, the
game explodes in memory easily.
One theory is that because we fake integrated GPUs as discrete GPUS,
and because we report 6GiB of VRAM (ie. driver redistributes memory
for small carveout), the game thinks there is 6GiB of VRAM only and
then keep allocating stuff.
People reported that the memory explosion is gone without this
workaround applied and I confirmed this myself.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17421>
This pass tries to move register usage closer to SSA, and for large
shaders this means we can overflow the register index, which only has
RC_REGISTER_INDEX_BITS size. This creates invalid code and leads to
crash at a later stage. Limit the pool of available registers to
RC_REGISTER_MAX_INDEX, currently is was two times the number of
shader instructions.
This means we'll fail the compile right away if we wanted more than
RC_REGISTER_MAX_INDEX temps, but when we've got that many we're
already well past how many instructions we can support anyway.
CC: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6017
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17393>
The set of supported vector sizes in NIR has holes in it. For example, we
support vec5 and vec8, but not vec6 or vec7. However, this pass did not take
that into account, and would happily shrink a vec8 down to a vec7, causing NIR
validation to fail. Instead, the pass should round up to the next supported
vector size.
Fixes NIR validation fail in OpenCL's test_basic hiloeo subtest.
v2: Clamp -> round rename.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17194>
the EXT_external_object spec originally was underspecified with regards
to this function, leaving room for synchronization errors where:
* app calls SignalSemaphoreEXT to signal a semaphore
* mesa defers pipe_context::fence_server_signal with threaded context
* driver defers gpu submission
* SignalSemaphoreEXT has long since returned, app submits vk cmdbuf waiting on semaphore
* spec violation / device lost
to prevent this, the spec is being changed to:
1) require an implicit flush when calling SignalSemaphoreEXT
2) require that this implicit flush also forces GPU submission before SignalSemaphoreEXT returns
all affected drivers have been updated
fixes#6568
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17376>
The hardware doesn't support 3D textures. We had been lying about 3D
texture level support in the past so that we got GL 2.1, but now reporting
levels==0 doesn't disable GL 2.1 (since we don't check for GL2 extensions
any more). But, by not lying, we now fix the majority of the remaining
GLES2 deqp failures.
This regresses a few desktop GL piglits which get GL errors that they
notice instead of what would be silent rendering failures on 3D texturing
operations.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17350>
This will be used for vc4, where incorrectly exposing 3D textures accounts
for most of the GLES2 conformance failures it has. This leaves
EXT_texture3d exposed in the (already non-conformant) GL2.1 support it
exposes, which has always been a best-effort thing.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17350>
The dri2_allocate_buffer() can be called with arbitrary height, however
the struct pipe_resource .height0 member is uint16_t. Check height for
maximum size to avoid overflow. Note that .width0 is unsigned int, so
it does not have the same issue.
The uint16 limit comes from commit:
e6428092f5 ("gallium: decrease the size of pipe_resource - 64 -> 48 bytes")
The overflow can be triggered e.g. by requesting large BO:
```
gbm_bo_create(dev, 1, 640*480*4, GBM_FORMAT_R8, GBM_BO_USE_LINEAR);
```
Signed-off-by: Marek Vasut <marex@denx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16513>