This is enough for zink to expose ARB_texture_buffer_object_rgb32 and
therefore GL 4.0. We could enable sampled images with a few more
workarounds, but the blob doesn't bother and there isn't any need at the
moment.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16980>
If we don't do that, and we get passed a dummy geometry shader (one
that has no EmitVertex() calls) we get a DXIL validation error.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17039>
We currently have two implementations of the same logic. Let's pick
the d3d12 one, move it to dxil_nir.c and let nir_to_dxil() call it
when appropriate.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17039>
This fixes the following errors when compiling Mesa with Clang 14:
../mesa-9999/src/util/u_cpu_detect.c:368:5: error: expected ';' after struct
} alignas(16) fxarea;
^
;
This has been tested with Clang 14.0.5 and GCC 12.1
Fixes: e3bc78b8e3 ("util: use c11 alignas instead of rolling our own")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6667
Signed-off-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17035>
GDS is basically scalar in gfx11.
This is not exactly how it's supposed to be done (we should be using
the GDS_STRMOUT registers), but it works.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16990>
These are needed to address the task draw and payload ring buffers
when the task shader dispatch is 3 dimensional.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17023>
NV_mesh_shader workgroups are only 1 dimensional, so it's OK to
only add firstTask to the X dimension.
However, let's keep the Y and Z dimensions so this doesn't mess up
the workgroup ID for future 3 dimensional task shader dispatches.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17023>
This allows future mesh shaders to use a 3D workgroup ID.
Also changes how the NV_mesh_shader first_task is emulated.
The new code moves the responsibility from ac_nir into radv.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17023>
This wave ID is used to address the scratch ring, so it should be
enabled when the scratch ring is used.
Note: the naming is confusing here, as this ID is actually the same
accross the whole workgroup. (It would be more appropriate to call
it workgroup ID instead.)
Fixes: 0280b526d5
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17022>
Do not assume thrd_t to be a pointer or integer, as the C11 standard tells us:
thrd_t: implementation-defined complete object type identifying a thread
At https://en.cppreference.com/w/c/thread
So we always return the thread creation return code instead of thrd_t value, and judge the return
code properly.
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15087>
By doing this, now the global variable impl_tss_dtor_tbl are only defined one time.
So the memory usage would reduced
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15087>
This is a prepare for move the c11 threads implementation into c source code
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15087>
Only two tests have been fixed and the list has been incorrectly
updated.
Fixes: b240be28e3 ("zink: check for pending clears to determine write status of zs attachments")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16975>
This was for vk-cts-image, but it seems like Mesa CI only supports
pbuffer.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16975>
Fixes following errors:
../src/microsoft/spirv_to_dxil/dxil_spirv_nir.c
In file included from ../src/compiler/nir/nir_builder.h:365,
from ../src/microsoft/compiler/dxil_nir.h:29,
from ../src/microsoft/spirv_to_dxil/dxil_spirv_nir.c:28:
../src/microsoft/spirv_to_dxil/dxil_spirv_nir.c: In function 'dxil_spirv_nir_passes':
src/compiler/nir/nir_builder_opcodes.h:1321:11: error: 'dyn_yz_flip_mask' may be used uninitialized in this function [-Werror=maybe-uninitialized]
1321 | return nir_build_alu2(build, nir_op_iand, src0, src1);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/microsoft/spirv_to_dxil/dxil_spirv_nir.c:290:59: note: 'dyn_yz_flip_mask' was declared here
290 | nir_ssa_def *y_flip_mask = NULL, *z_flip_mask = NULL, *dyn_yz_flip_mask;
| ^~~~~~~~~~~~~~~~
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16671>
Add STATIC_ASSERT to guard `dxil_spirv_shader_stage` and `gl_shader_stage`
to be same for each enum value.
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16671>
This context needs to be locked before usage, and flushed after.
If it's forgotten, radeonsi may crash (eg #6666).
To avoid this kind of error, introduce 2 helpers.
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17032>
If this environment variable is set, then a detected compute engine
will be used as described in docs/envvars.rst.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14395>
This is now needed following Ken's 8831cb38aa.
Ref: 8831cb38aa ("anv: Stop updating STATE_BASE_ADDRESS on XeHP")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14395>
zink-anv-tgl wasn't using Zink at all because this variable was missing
and then not passed to the runners...
This introduces a list of failures for Zink/Anv and also few tests
are skipped because they take too long (> 60s).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17031>
piglit select tests fail, so add a gallium cap to disable
for crocus explicitly.
crocus may choose to enable hardware select only for GPU
SKU which tested to be OK again.
Fixes: 6489af145c ("mesa: enable HardwareAcceleratedSelect")
Closes: #6644
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16955>
This set of changes:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15649
caused a regression in Xorg when using swrast_kms:
(EE) AIGLX error: Calling driver entry point failed
This commit changes the swrast_kms driver to use a dedicated screen init function
(which I believe was overlooked); I also took the opportunity to rename the
associated plumbling to have swrast-specific names.
Signed-off-by: Alexander Kanavin <alex@linutronix.de>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16942>
.wideLines = false, which forbids the user to set the line width
to something different than 1. We're thus safe to claim support
for dynamic line width and do nothing in CmdSetLineWidth() other
than checking the value passed is 1.0f.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16971>
Now that we have support for pipeline variants, we can take the dynamic
depth testing parameters into account and create a new pipeline state
using those dynamic parameters.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16971>
We were completely ignoring the primitive-restart case in the
index-rewrite logic used to emulate triangle fans. Unfortunately, this
case is way more complicated than a regular index rewrite:
- we need to skip all primitive-restart entries when turning the triangle
fan into a triangle list, which implies serializing the index buffer
rewrite procedure (at least I didn't find any clever way to parallelize
things)
- the number of triangles can no longer be extrapolated from the number
of indices in the original index buffer, thus forcing us to lower
direct indexed draws into indirect draws and patching the indexCount
value when the new index buffer is forged
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16971>
We can't hardcode the strip cut value to 0xffffffff, otherwise we break
support for 16-bit index buffers. Let's use the pipeline variant
infrastructure to deal with that case.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16971>
Some D3D12 states can't be updated dynamically and require the creation
of a new pipeline state. In order to support setting those dynamically
we will have to support creating pipeline variants at draw time.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16971>
Iterating over a util_sparse_array is very expensive; replace this
with a standard dynarray.
Using the sparse 'nodearray' datastructure instead was tested, but
found to be slower in some cases.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16988>
Because this impacts most of the registers in the BLEND draw state, we
make the entire draw state dynamic so that it all gets re-emitted when
the logicOp changes. This also lays the groundwork for
VK_EXT_color_write_enable.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16936>
There were a few problems with this:
- It wasn't considering logic op at all, which is another source of
reading from the destination.
- It was conditioned on the blend_enable_mask, so it was missing the
case where there's no blending but some of the outputs were masked
out.
- It wasn't considering attachments with less than 4 components (for
example, normals in a typical deferred rendering setup) and would
always consider them partially written unless the user added extra
unnecessary components.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16936>
The 2048 descriptors limit comes from the maximum number of samplers
per heap, but the limit for other descriptors is actually much bigger.
Let's implement GetDescriptorSetLayoutSupport() to reflect that.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16926>
Add a dzn_desc_type_has_sampler() helper instead of duplicating
the SAMPLER || COMBINED_IMAGE_SAMPLER test everywhere.
Suggested-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16926>
VkMemoryDedicatedAllocateInfo, when present, provides us with extra
information about the memory usage, which allow us to lower the alignment
requirements.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16926>
We don't support exporting memory objects yet, so let's make sure the
user doesn't request that.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16926>
We don't support Ycbcr sampler conversion. Add dummy implementations to
make us Vulkan 1.1 compliant.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16926>
We don't support sparse memory yet, but this function needs to be
present.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16926>
We don't support device groups, but Vulkan 1.1 requires a
GetDeviceGroupPeerMemoryFeatures() implementation, so let's just
advertise no peer-memory features.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16926>
We don't support importing/exporting images yet, so let's zero out the
whole external properties struct, if present.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16926>
It's like load_sample_id except it shouldn't force per-sample shading
when not already enabled. In that case, we simply return 0.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16926>
Use an unreachable() statement instead of the incorrect assertion in the
unsupported intrinsic path.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16926>
This flag should be set to true when the RWTexture is attached a vector,
and we always declare a vec4 right now, so it should always be true.
Might be worth reworking the dxil_module_get_res_type() to use a scalar
when the image only has one component.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16926>
emit_barrier_impl() was still checking the nir_var_uniform flag to
detect images, which is no longer correct.
Fixes: cfdc7ee066 ("spirv: Use nir_var_mem_image")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16926>
nir_to_dxil() uses those types to pick the right operation overload,
and atomic loads/stores are no different from their non-atomic
counterpart apart from the atomicity property, so it makes sense to
pass a type to the deref_{load,store} intrinsic in that case too.
Suggested-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16926>
This field copy was missing in
vk_get_physical_device_core_1_1_property_ext().
Fixes: 19ff5019b7 ("vulkan: Add helpers for filling exts for core features and properties.")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16926>
as @Venemo discovered, zs layouts were being incorrectly set to readonly
in the case where the attachment was only used for an explicit clear,
so ensure that gets taken into account
cc: mesa-stable
fixes (radv):
dEQP-GLES31.functional.stencil_texturing.render.depth24_stencil8_clear
dEQP-GLES31.functional.stencil_texturing.render.depth32f_stencil8_clear
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17033>
it's illegal to have a renderArea larger than the attachments, so perform
clamping to avoid that scenario
Fixes: c81cd989c8 ("zink: use dynamic rendering (most of the time)")
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16947>
this isn't always 32
Fixes: 2d745904ca ("zink: add a gently mangled version of the d3d12 cubemap -> array compiler pass")
fixes:
dEQP-GL45-ES31.functional.program_uniform.by_pointer.render.struct_in_array.sampler2D_samplerCube_*
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17008>
each sampler is 1 driver location, so use the base variable
Fixes: 2d745904ca ("zink: add a gently mangled version of the d3d12 cubemap -> array compiler pass")
fixes:
dEQP-GL45-ES31.functional.shaders.opaque_type_indexing.sampler.const_expression.*.samplercubearray
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17008>
deleting the generated shader on the first loop iteration like this was
broken if the shader was used in multiple programs, so delete at the end
cc: mesa-stable
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17010>
This requires shader-side lowering, which is handled in
dxil_nir_lower_vs_vertex_conversion().
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15955>
There's really just one case where this is supported; on GCC for x86.
All other cases do nothing, so let's remove the complexity that is no
longer needed.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16908>
There's no reason to have standard includes in two different sections of
the header, let's merge them. While we're at it, let's sort the includes
as well.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16908>
Due to how alignas is defined, it itsn't allowed to use it on a struct,
it needs to be used on the first member instead. So move the declaration
in those cases.
This still leaves the ALIGN16 macro using compiler-specific directives,
because it's a lot of work to untangle the above. This probably deserves
its own MR.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16908>
While MSVC doesn't do __STDC_VERSION__ correctly for C99, it does for
C11, which is what we now require. So we can remove this hack.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16908>
This fixes a crash in this test:
dEQP-VK.renderpass2.suballocation.simple.color_unused_omit_blend_state
Fixes: 2d0798440b ("dzn: Add support for dynamic rendering")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17027>
DXIL metadata strings and function names have a limited size. Truncate
the name when they don't fit. This is a quick&dirty workaround since it
doesn't address the problem for all kind of strings, and doesn't ensure
there's no collision in the function names after the truncation. That's
not an issue right now because I don't think we have implementations
keeping more than one function (the entrypoint), but it might be a
problem at some point.
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16961>
We can't use linear interpolation on integer types, and varyings using
a struct type might actually contain only fp32 members, in which case
interpolation should happen as requested.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16961>
Right now we had two methods that tries to optimize the nir shader,
nir_optimize and st_nir_opts. The latter is being used when we are
linking, but again, it has basically the same purpose that
nir_optimize.
So this commit adds more lowerings to nir_optimize_nir, add some extra
comments on the method, and replaces st_nir_opts with nir_optimize.
Ideally we would like to just use the already existing
v3d_optimize_nir that we have at the backend But:
* Using it leads to some regressions on Vulkan CTS tests, due some
lowerings that are already there.
* We would need to move to the backend some additional
lowerings/optimizations that are used on the Vulkan
frontend. That would require to check that we are not getting any
regression or performance drop on OpenGL
So for now we are keeping a Vulkan specific nir_optimize method.
Additionally this fixes the following test:
dEQP-VK.graphicsfuzz.cov-loop-condition-clamp-vec-of-ones
Shaderdb stats, using some well known Vulkan apps (ue4 demos, Quake3e,
etc):
total instructions in shared programs: 124974 -> 125108 (0.11%)
instructions in affected programs: 50328 -> 50462 (0.27%)
helped: 4
HURT: 79
total uniforms in shared programs: 19019 -> 19020 (<.01%)
uniforms in affected programs: 60 -> 61 (1.67%)
helped: 0
HURT: 1
total max-temps in shared programs: 13438 -> 13444 (0.04%)
max-temps in affected programs: 85 -> 91 (7.06%)
helped: 0
HURT: 2
total inst-and-stalls in shared programs: 125715 -> 125849 (0.11%)
inst-and-stalls in affected programs: 50429 -> 50563 (0.27%)
helped: 4
HURT: 79
total nops in shared programs: 8203 -> 8204 (0.01%)
nops in affected programs: 732 -> 733 (0.14%)
helped: 7
HURT: 9
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16986>
That is what most others Vulkan drivers do (radv, anv, turnip at
least).
The origin of this change cames from a CTS test where the loop
unrolling converted a ubo index defined inside a loop from constant to
non constant. That is not desiderable on any driver, but a problem on
v3dv, as v3dv doesn't support that case.
Although we initially tried to fix it on the loop unroll, we discarded
that approach, and focused on the existing nir lowerings/optimizations
as this was not happening with other drivers.
We noted that in other drivers this case of a ubo index going from
const to non-const were also happening with nir_lower_explicit_io, but
in that case it was able to be converted back to a const on following
lowerings. The only difference with other drivers is that we were
calling it before the first nir optimization loop.
So this change helps with fixing the following CTS test (for that we
also need to run additional lowerings, which we do in a later patch):
dEQP-VK.graphicsfuzz.cov-loop-condition-clamp-vec-of-ones
You can get further details on the following issue and RFC merge
request, specially the merge request:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/6051https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15391
We also made some shaderdb stats with our usual Vulkan apps (ue4
demos, quake3, etc):
Total instructions in shared programs: 125014 -> 124974 (-0.03%)
instructions in affected programs: 7544 -> 7504 (-0.53%)
helped: 7
HURT: 4
total uniforms in shared programs: 19026 -> 19019 (-0.04%)
uniforms in affected programs: 514 -> 507 (-1.36%)
helped: 5
HURT: 0
total max-temps in shared programs: 13430 -> 13438 (0.06%)
max-temps in affected programs: 270 -> 278 (2.96%)
helped: 0
HURT: 8
total sfu-stalls in shared programs: 739 -> 741 (0.27%)
sfu-stalls in affected programs: 30 -> 32 (6.67%)
helped: 0
HURT: 2
total inst-and-stalls in shared programs: 125753 -> 125715 (-0.03%)
inst-and-stalls in affected programs: 7685 -> 7647 (-0.49%)
helped: 7
HURT: 4
total nops in shared programs: 8228 -> 8203 (-0.30%)
nops in affected programs: 546 -> 521 (-4.58%)
helped: 9
HURT: 2
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16986>
Needed to be able to call nir_opt_gcm on the v3dv driver. This change
is needed as on v3dv we honor vulkan resource index returning a vec2.
See commit 21b0a4c80c for more info.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16986>
Currently uint64_t_to_bitmask(..) is used in combination with
the pattern 'match'. This only works for values smaller then
64 bit. Add support for bigger isa sizes.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16996>
Since we only consume barriers at the beginning of a new job, if
a command buffer ends with a barrier we will not handle it. Fix
this by emitting a noop job in that case to consume it. Ideally,
we could do better and check the pending barrier state to fine
tune the noop job so we don't wait on all queues, but for now
this fixes flakyness with some CTS pipeline barrier tests that
started to show up after we optimized binning sync barriers. It
is likely that the additional sync we had before that change was
enough to prevent the problem from showing up.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17020>
When we switched to using structs to track barrier state we made a mistake
and started to overwrite barrier state in primary command buffers with
the pending state from secondary command buffers executed inside them, when we
should've been merging the state instead.
Fixes flakyness with some CTS barrier tests.
Fixes: f7ce42636c ('v3dv: use an explicit struct type to track barrier state')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17020>
This is not safe because it may skip regenerating the flags for the
loop condition in the loop continue block and these flags may be
stomped in the loop body by other conditionals.
Fixes: 9909fe6ba ('broadcom/compiler: Skip bool_to_cond where possible')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17020>
This logic accidentally got flipped in a refactoring. Let's correct it!
Fixes: e293691a99 ("dzn: Get rid of the render pass logic")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16997>
We are already doing the 0*anything = 0 by default and we are also
using the DX versions of math ops like RCP. It looks like R300 and
R400 can't do IEEE math anyway (but its hard to tell without docs).
For R500 we can do IEEE math, but testing showed that some apps
are dependent on the DX behavior, so considering we only advertise
GLSL 1.20 where this is left ot the driver, just keep the curent
status and expose PIPE_CAP_LEGACY_MATH_RULES so that nine can stop
emiting math workarounds.
Also fixes two Xnine tests.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17007>
I don't know when this was added but it's really neat and we should use
it instead of NIR_PASS_V since NIR_DEBUG=print and a few validation
things will work better.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17014>
I don't know when this was added but it's really neat and we should use
it instead of NIR_PASS_V since NIR_DEBUG=print and a few validation
things will work better.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17014>
implicit sync is hard, and many drivers get it wrong, so assume that
anyone who isn't mesa might need some hand-holding
cc: mesa-stable
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17009>
If we have a combined Z/S image, the image has depth, so we proceed down the
depth path, which does not set clear.s even though there's *also* a stencil
component. Unify the control flow to fix this.
Fixes (among others):
dEQP-VK.api.image_clearing.core.clear_depth_stencil_image.single_layer.d24_unorm_s8_uint_multiple_subresourcerange
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>
Rather than generating shaders to clear depth and stencil attachments, run the
rasterizer without a shader and configure the depth/stencil hardware to do the
clear. These settings are known to be efficient on Valhall, presumably the
depth/stencil pipeline on Bifrost is similar enough that it is also the
efficient way there. It's certainly much simpler.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>
These were removed in an earlier series containing ae77c207e0 ("panvk: Use push
constants for copy shaders"), but the unused variables hung around.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>
On Bifrost and newer, blend descriptors are decoupled from render target. That
means we can always use a clear shader reading from blend_descriptor_0 and
specify the desired render target in the sole blend descriptor we pass.
Likewise on Bifrost and newer we don't need blend descriptors when we don't
blend, which is the case for the Z/S clears.
This reduces the number of shaders compiled on startup from 468 to 426.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16950>
Use radv_graphics_pipeline_info instead of pCreateInfo.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16958>
Use radv_graphics_pipeline_info instead of pCreateInfo.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16958>
Use radv_graphics_pipeline_info instead of pCreateInfo.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16958>
pCreateInfo pointers have to be completely replaced for graphics
pipeline library.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16958>
usually inlining is optimal for cpu drivers since the majority of
time is spent in the shaders, and any amount of reduction to shader code
will be optimal
if, however, the shaders are still really big after inlining, this improvement
will be negated by the insane amount of time spent doing stupid llvm optimizer
passes, so check post-inline size to see whether it exceeds a size threshold
lavapipe release build - 1700% improvement
* spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-vec4-index-rd-after-barrier
before: 142.15s user 0.42s system 99% cpu 2:23.14 total
after: 8.60s user 0.07s system 99% cpu 8.677 total
fixes#6647
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16977>
The hardware writes one CRC per (effective) tile, the tile size of the CRC
buffer is the same as the configured effective tile size. However, all our CRC
infrastructure assumes 16x16 tiles. In case CRC is used with smaller tiles,
buffer overflows and incorrect rendering are all possible. Don't use CRC at
smaller tile sizes. Note disabling CRC correctly invalidates any bound CRC
buffers.
Fixes: 2e97d7c835 ("panfrost: Transaction elimination support")
Closes: #6332
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16983>
It has a single user -- in a section of code that only runs for MFBD GPUs and
that has already decided whether to use CRCs -- so inlining it simplifies its
definition greatly and may avoid redeciding the CRC setting.
[Note for mesa-stable maintainers: This is not a bug fix but is marked for
backport so the next patch applies cleanly.]
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16983>
info.fs.sidefx considers discard() to be a side effect. That definition is...
dubious at best. It certainly isn't the definition needed for forward pixel
kill. The only reason pixels couldn't be killed by FPK is if the shader has side
effects in the sense of writing to memory. Use that more precise condition so
FPK works more often.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Closes: #5607
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16984>
This fixes many tests in following groups on DG2:
dEQP-VK.memory_model.*
dEQP-VK.fragment_shader_interlock.*
v2: use memory scope and setup descriptor also
for barriers without defined scope (Curro),
use local scope and flush type none with
NIR_SCOPE_NONE scope, cleanups (Lionel)
v3: use LSC_FENCE_THREADGROUP for NIR_SCOPE_WORKGROUP,
remove default case (Curro), use eviction if scope
was not defined, use LSC_FENCE_GPU scope for vertex
stage
v4: use LSC_FENCE_TILE independent of stage for device
scope (Curro)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15743>
amdgpu returns 12 TB of GTT on Kaveri, which resulted in 0 KB of GTT
after the conversion to uint32_t, which caused us to report 0 as the UBO
size, which disabled UBOs and downgraded the driver to OpenGL 3.0.
Fixes: aee8ee17a5 - radeonsi: change max TBO/SSBO sizes again and rework max alloc size
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6642
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
We just need to pass the uses_discard flag to the epilog.
The hw skips the export anyway. This will hang if SPI registers declare
an output format or KILL_ENABLE is set because those cases require
an export with done=1.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
If tmp_buffer (in ssbo[1]) is NULL, setting the writable bit causes
the called function to access the NULL buffer.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
ac_llvm_add_target_dep_function_attr has no effect if the function is
inlined.
amdgpu-gds-size determines m0 for ds_sub_u32 gds, which hangs if it's 0.
This helps both gfx10 and gfx11, though it will only be used by gfx11
after we enable streamout.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
We need more GS/NGG bits, so we need to add current_gs_state for that.
This simplifies the logic in the draw code.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
We need to handle the fact that it kills pixels.
v2: also update si_update_ps_inputs_read_or_disabled
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
This skips the preamble for following IBs if the queue receives IBs from
the same context back-to-back. This eliminates VGT_FLUSH (for tess and
legacy GS) and PS_PARTIAL_FLUSH (for gfx11) in those cases if the preamble
contains them.
v2: only use this on gfx10+ due to stability issues on Stoney and limited
testing
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16885>
We need to move tr intrinsics to the top of the
shader that might be overwritten by
nir_intrinsic_rt_trace_ray.
Fixes the Khronos reflection sample.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16889>
textureLoad() doesn't work on cube images. We need to lower cube
images to 2D arrays.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16904>
These functions only deal with images, so let's make that clear.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16904>
If the first time the preamble is written, one of the rings
isn't allocated, we wouldn't write the RING_SIZE to the preamble.
Later, when the preamble gets updated after the ring allocation,
the new RING_SIZE packet would overwrite other packets.
To prevent this, always write the RING_SIZE (the alternative would
be to write NOP packets).
This fix "*ERROR* Illegal register access in command stream" hangs
I observed on GFX8.
Fixes: 32c7805ccc ("radeonsi: merge all preamble states into one")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16962>
These both exist since Navi and we can have instructions which are one but
not the other.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16969>
This looks clearer and will avoid confusion with future Zink CI testing.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16943>
src/freedreno/vulkan/tu_pipeline.c:1722:72: runtime error: index 5 out of bounds for type 'uint64_t [5]'
Fixes: 05329d7f9a
("tu: Implement pipeline caching with shared Vulkan cache")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16967>
Sample shading has similiar definitions in Vulkan and OpenGL, and they
both require unique associated data. While the definition for Vulkan
might change, we should stick to the current definition until the change
takes place and until apps (i.e., ANGLE) are updated.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16700>
If the type is not an array, glsl_get_length() returns 0 and we don't
update the new_vars[]/flat_vars[] entries.
Fixes: bcd14756ee ("nir/lower_io_to_vector: add flat mode")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16960>
As far as I can't tell, there's no native operation doing this
equivalent of fquantize2f16. Let's lower this operation to
if (val < MIN_FLOAT16)
return -INFINITY;
else if (val > MAX_FLOAT16)
return -INFINITY;
else if (fabs(val) < SMALLER_NORMALIZED_FLOAT16)
return 0;
else
return val;
which matches the definition of OpQuantizeToF16:
"
If Value is an infinity, the result is the same infinity.
If Value is a NaN, the result is a NaN, but not necessarily the same NaN.
If Value is positive with a magnitude too large to represent as a 16-bit
floating-point value, the result is positive infinity. If Value is negative
with a magnitude too large to represent as a 16-bit floating-point value,
the result is negative infinity. If the magnitude of Value is too small to
represent as a normalized 16-bit floating-point value, the result may be
either +0 or -0.
"
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16959>
Using 'initialized' to guard the one-time init, means it can be set to
false as part of .bss instead setting 'first' to true in .data. This
is more efficient and works at .ctor time.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16952>
Now that TTN hooks this up to use_legacy_math_rules, we can flip the
switch and gallium nine can get the desired behavior from the hardware
instead of emitting math workarounds.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16176>
Now that TTN hooks this up to use_legacy_math_rules, we can flip the
switch and gallium nine can get the desired behavior from the hardware
instead of emitting math workarounds.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16176>
This reverts commit 7f01299c40.
Now that I've got it hooked up to use_legacy_math_rules on the NIR side
and made sure that NIR frontends on drivers with
PIPE_CAP_LEGACY_MATH_RULES handle it, we should be able to enable this
again.
Fixes: #5678
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16176>
These control the same behavior, now that we've clarified what the flags
do.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16176>
This is the same flag TGSI sets for LEGACY_MATH_RULES.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Tested-by: Mobin Aydinfar <mobin@mobintestserver.ir>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16176>
The TGSI backend chooses these opcodes for LEGACY_MATH_RULES, so do the
same thing here for ARB programs which want the same behavior.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16176>
This should help get correct math for ARB_fp/vp after the NTT transition,
and will be used for wine nine shortly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16176>
This is a clearer name for what it does than MUL_ZERO_WINS, and matches up
to the new name in shader_info.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16176>
On iris and crocus, this flag is used to set "alt mode" math on the shader
as a whole. Some other drivers have a similar mode for DX9/ARB-program
behavior, so document what it does so we can start using it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16176>
this was correct for 64bit loads and manually converted 32bit loads (e.g., bindless),
but it was broken for the case where 64bit was not supported, as the offset wasn't
being correctly adjusted
break out the offset division to hopefully make this a little clearer
Fixes: 150d6ee97e ("zink: move all 64-32bit shader load rewriting to nir pass")
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16669>
all of ntv requires scalarized io since the offsets are now array indices
instead of byte offsets, so enforce scalarization here to avoid breaking
the universe
Fixes: 150d6ee97e ("zink: move all 64-32bit shader load rewriting to nir pass")
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16669>
zink can't use lower_io, so this all has to be done manually and in
excruciating depth and detail
fixes (tu):
KHR-Single-GL46.arrays_of_arrays_gl.InteractionFunctionCalls2
KHR-GL46.gpu_shader_fp64.fp64.named_uniform_blocks
KHR-GL46.gpu_shader_fp64.fp64.varyings
KHR-GL46.vertex_attrib_binding.advanced-bindingUpdate
KHR-Single-GL46.enhanced_layouts.varying_array_components
KHR-Single-GL46.enhanced_layouts.varying_array_locations
KHR-Single-GL46.enhanced_layouts.varying_components
KHR-Single-GL46.enhanced_layouts.varying_locations
KHR-Single-GL46.enhanced_layouts.xfb_explicit_location
dEQP-GLES3.functional.transform_feedback.basic_types.interleaved.lines.highp_mat3x4
dEQP-GLES3.functional.transform_feedback.basic_types.interleaved.lines.mediump_vec3
dEQP-GLES3.functional.transform_feedback.basic_types.interleaved.points.mediump_vec4
dEQP-GLES3.functional.transform_feedback.basic_types.separate.points.highp_vec3
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16669>
I left this semi-unfinished back when I discovered that I could blast
out xfb values inline with variable declarations, but this is not viable
for all scenarios, so it has to work and it has to be able to pass cts
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16669>
the previous conditional here was stupid and wrong: it should be comparing
to see whether the writemask is the full mask of the type's size
cc: mesa-stable
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16669>
fs_visitor::assign_curb_setup() maps UNIFORM registers to HW regs,
and contains the following assert:
assert(inst->src[i].stride == 0);
emit_a64_oword_block_header's striding tricks run afoul of this
restriction, by producing stride 1 values on a 64-bit UNIFORM source.
Work around this by copying the UNIFORM value to a VGRF first.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16938>
Instead of attempting to signal based on the memory object, use the new
DMA_BUF_IOCTL_EXPORT_SYNC_FILE to get a sync_file for the dma-buf and
use that to signal the semaphore or fence. Because this happens before
we transfer ownership back to the driver, the resulting sync_file should
only contain dma_fences from the compositor and/or display and shouldn't
be mixed up with the driver in any way. This gives us a real semaphore
and fence (as opposed to the dummy objects we've used int the past)
without over-synchronization.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
This isn't a functional change today because the set of drivers which
use set_ownership and those that use signal_fence/semaphore_for_memory
are mutually exclusive. It's important for the next commit, though.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
Instead of treating the blit submit specially in the buffer_blit_queue
case, treat the dummy submit as special. This lets us keep all the
handling of special-queue blits together. It also means that the
wsi_memory_signal_submit_info gets chained into the final submit which
is what we want if we're to rely on it for implicit sync. If we chain
it into the dummy submit, we'll implicit sync on all work previous to
the blit but not the blit. This won't work if X11 or a Wayland
compositor is depending on that to synchronize the linear copy.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
The WSI code is about to start querying for available semaphore handle
types via GetPhysicalDeviceExternalSemaphoreProperties in wsi_init().
For drivers that use vk_sync, supported_sync_types needs to be
initialized before GetPhysicalDeviceExternalSemaphoreProperties is
called. Really, wsi_init() should be the very last step of physical
device setup.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
The only reason for the wrapper was so that we could dummy signal the
semaphore and fence. Now that the WSI code always dos this for us, we
can drop our wrapper.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
The only reason for the wrapper was so that we could dummy signal the
semaphore and fence. Now that the WSI code always dos this for us, we
can drop our wrapper.
Reviewed-by: Rajnesh Kanwal <rajnesh.kanwal@imgtec.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
The only reason for the wrapper was so that we could dummy signal the
semaphore and fence. Now that the WSI code always dos this for us, we
can drop our wrapper.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
Also, stop setting wsi_device::signal_semaphore/fence_with_memory
because those cause the WSI code to call the function we just dropped.
Since the core WSI code is now setting dummy syncs by default, we don't
need any of this anymore.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
The only reason for the wrapper was so that we could dummy signal the
semaphore and fence. Now that the WSI code always dos this for us, we
can drop our wrapper.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
The only reason for the wrapper was so that we could dummy signal the
semaphore and fence. Now that the WSI code always dos this for us, we
can drop our wrapper.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
If the driver wants to do something special, it can reset the semaphore
or fence again and re-signal it. Sure, that wastes a malloc/free but
this is the window-system path. It'll be fine.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
This will happen automatically when they're waited on by the dummy
submit in wsi_common_queue_present().
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037>
With the old value, anv didn't think that the hardware supported 48-bit
addresses, and hit this assert:
assert(device->supports_48bit_addresses == !device->use_relocations);
The new value of 1ull << 48 is the one reported on my Icelake machine.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16933>
If the executables are still hanging out,
anv_GetPipelineExecutableStatisticsKHR will try to dereference NULL
pointers in pipeline->shaders[MESA_SHADER_FRAGMENT].
At least in terms of fossil-db output, this matches the behavior from
before 73b3efcd59.
Fixes: 73b3efcd59 ("anv: Handle the null FS optimization after compiling shaders")
Closes: #6590
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16898>
It should be more efficient than u_blitter since it skips
vertex shader stage. Also it's a prerequisite for supporting
MSAA since u_blitter can't do MSAA resolve for us.
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16686>
With the following structures :
struct StructA
{
uint64_t value0;
uint8_t value1;
};
struct TopStruct
{
struct StructA a;
uint8_t value3;
};
Currently offsetof(struct TopStruct, value3) = 9. While the same code
on the CPU gives offsetof(struct TopStruct, value3) = 16.
This is impacting OpenCL kernels we're trying to use to build
acceleration structures.
v2: Add comment/link to some description of the alignment/size
computation
Cc: mesa-stable
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16940>
If the driver said it can't do the shader, then listen to it and don't ask
it to create the shaders anyway. Fixes a bunch of spam on i915/r300 (with
!16878) about unsupported opcodes during dEQP runs.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16895>
As the default option for msvc 2019 does support designated initializers
```
../src/util/tests/timespec_test.cpp(302): error C7555: use of designated initializers requires at least '/std:c++20'
../src/util/tests/timespec_test.cpp(303): error C7555: use of designated initializers requires at least '/std:c++20'
../src/util/tests/timespec_test.cpp(312): error C7555: use of designated initializers requires at least '/std:c++20'
../src/util/tests/timespec_test.cpp(313): error C7555: use of designated initializers requires at least '/std:c++20'
```
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15497>
Create c11/time.h instead of put timespec_get in `c11/threads.h`
Creating impl folder is used to avoid `#include <time.h>` point the c11/time.h file
Detecting if `struct timespec` present with meson
Define TIME_UTC in `c11/time.h` instead `c11/threads.h`
Define `struct timespec` in `c11/time.h` when not present.
Implement timespec_get in c11/impl/time.c instead threads.h
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15497>
Looks like some hardware needs this info in the shader to match the
topology. Since there's no spot in the shader info for it, we're
currently using the array size of the TCS input vars to store it.
Cc: mesa-stable
Reviewed-by: Paul Dodzweit <paul.dodzweit@amd.com>
Tested-by: Paul Dodzweit <paul.dodzweit@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16920>
Perform address calculation in 32 bits when
dealing with inbounds array derefs.
Closes: #6562
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16729>
Preserving information about inbounds access and
the required bit size for the bounds will help
with avoiding 64-bit operations when lowering io.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16729>
the 'notemplates' debug mode is somewhat misleading since there's no
uncached+notemplates mechanism, meaning that if the descriptor cache
explodes it'll still use templates for updating in the fallback path
Fixes: 4e3768914d ("zink: add ZINK_DESCRIPTORS env var to explicitly set a mode")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16927>
since all the buffers are now an array descriptor, hashing needs
to iterate over all the descriptors in order to be accurate
Fixes: a7327c7cac ("zink: implement indirect buffer indexing")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16927>