This will host some code that's moving and ported to match style
with the rest of the driver, and other code that will be re-written.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17688>
Otherwise we will mix and match mesa's custom cross slot packing
with arb_enhanced_layouts style packing and we won't correctly
handle the size of the vars needed for the mesa custom packing.
The code was working correctly if the shader interface had both
a matching input and output but when we only had one side of
the interface we were only marking a single slot location as
packed.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Fixes: e5122a5543 ("glsl: add a NIR based varying linker")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6853
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17550>
It does the same thing as align() from u_math.h, no need to
have a etnaviv specific version.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17695>
We're already handling all the meaningful types here. The other types
like samplers, images, structs etc aren't really appropriate here.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17676>
These are already all the bitsizes there are. No need to test for them.
Besides, get_uvec_type already contains an assert for the same
condition anyway.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17676>
The LLVM backend keeps track of 16-bit output variables and it will
miscompile shaders when these outputs aren't the correct bitsize.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17706>
LLVM always believes that this instruction's upper bound is the wave
size, regardless of ac_set_range_metadata and regardless of whether
the add source is used.
As a workaround, emit an extra add instruction.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17706>
There are a bunch of places, where this is done manually.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17625>
When we initialize the graphics pipeline state, we skip VI and FSR state
if they're 100% dynamic. We need to do this if the current set of
dynamic things contains VI/FSR or if the set of dynamic state already in
the vk_graphics_pipeline_state has them dynamic. Look state->dynamic
after we've merged instead of just looking at the dynamic set from the
VkGraphicsPipelineCreateInfo we were passed.
Also, when we validate, we need to assume that VI and FSR exist or else
we'll assert if dynamic VI or FSR are set.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17696>
Some of our asserts and other checks depend on the total set of stages,
not just the stages set in the current pCreateInfo. Recording the stage
mask lets us combine them in vk_graphics_pipeline_state_merge().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17696>
Test runtime has crept up with more CTS tests and more features. The last
vk_full 1/2 run I tried timed out at:
Pass: 268488, Fail: 2, ExpectedFail: 7, Warn: 1, Skip: 602571, Duration: 1:29:29, Remaining: 45
Rude.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17662>
Add process_frame to pipe_video codec
Add new structures/caps for video post-processing with rotation,
flip, alpha blending, crop, and scaling, via the video engine.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17557>
Previously, it was in a divergent branch, therefore
it could hang the GPU when a workgroup had a primitive-only wave.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17581>
There's no dependency between them.
This can simplify the compiler backend translation by
always storing prim id before vertex export, which also
benefits the LLVM backend in latter changes.
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17581>
In particular, we now call it before running dead variables so we get
the XFB info even for things which are never written. This fixes a 102
Vulkan CTS tests on ANV and probably turnip as well.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17644>
Rework:
* Ken: Check bo for IRIS_MMAP_NONE rather than the global
intel_vram_all_mappable
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17349>
This can be false on systems where the PCI Base Address Register (BAR)
is too small for the amount of VRAM. Eventually the kernel will be
able to tell us that a system can't map all of VRAM, and
`all_vram_mappable` will then be false.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17349>
We might not be able to map all vram buffers in the future, so only
map the buffer when actually required.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17349>
We cannot rely on unallocated_size on system memory for
VK_EXT_memory_budget.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4aecfbf0f4 ("intel/dev: Add devinfo::mem to store i915 regions information")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17349>
Allow folding constants/undef sources by sharing more code with the image_store
16bit folding pass.
Allow more than one set of sources because RADV wants two, one for
G16 (ddx/ddy) and one for A16 (all other sources).
Allow folding cube sampling destination conversions on radeonsi/radv because
I think the limitation only applies to sources.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16978>
the alternative here is to just spin aimlessly until the process ooms,
which causes problems when trying to detect failures in cts caselists
a separate env var is used so that it can be exported without affecting
ZINK_DEBUG
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17525>
It should be the responsibility of the driver to make sure, that "format" is a valid pipe_format.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17490>
The script that generates the format tables does not set every pipe_format.
In practice, the length of the format tables is equal to PIPE_FORMAT_COUNT.
I just added the explicit size to future-proof it.
(If the largest valid format is not part of the format tables,
there will be a mismatch between the array length and PIPE_FORMAT_COUNT)
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17490>
The shared reg usage involved in the subgroup-related macros can cause
trouble for the spiller, and spilling may be implicated in CTS failures
with old versions of the subgroup tests, so let's make sure we get some
coverage. It does seem to catch a couple of failures.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17642>
..Instead of DEBUG so these work in debugoptimized builds.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17408>
In debugoptimized builds, DEBUG is not set (and neither is NDEBUG). The
intention of NDEBUG is to disable assertions. As such, list assertions should be
gated on !NDEBUG as opposed to on DEBUG.
But assert() is already disabled in that case, so we don't need our own special
assert (Eric).
This would have caught an assertion failure (due to the wrong iterator used)
sooner for the Valhall compiler.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17408>
It will get turned into SSA and copy-propagated in NIR, no need to walk
the IR collapsing it here.
iris shader-db results appear to be noise:
total instructions in shared programs: 8932195 -> 8932147 (<.01%)
instructions in affected programs: 537 -> 489 (-8.94%)
LOST: 12
GAINED: 11
lost/gained are simd32 switches in unigine, l4d2, portal2, asphalt9.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17613>
Now that we have no non-NIR drivers, we can retire the old code. We just
need to pass the variable accesses through to it.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17610>
The DISPATCH_TASKMESH_INDIRECT_MULTI_ACE packet has a firmware bug,
it hangs the GPU when the draw count is zero.
This commit adds a workaround sequence using COND_EXEC packets
which make sure that this indirect packet is never executed when
the draw count is zero.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
Add a separate flush_bits field for tracking cache
flushes in the ACE internal cmdbuf.
In barriers and image transitions we add these flush bits to ACE.
Create a semaphore in the upload BO which makes it possible
for ACE to wait for GFX for the purpose of synchronization.
This is necessary when a barrier needs to block task shaders.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
This implements NV_mesh_shader draw calls with task shaders.
- On the GFX side:
DISPATCH_TASKMESH_GFX for all draws
- On the ACE side:
DISPATCH_TASKMESH_DIRECT_ACE for direct draws
DISPATCH_TASKMESH_INDIRECT_MULTI_ACE for indirect draws
Additionally, the NV_mesh_shader indirect BO layout is
incompatible with AMD HW, so we add a function that copies
that into a suitable layout.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
This is mainly going to be used by task shaders, because
the HW implementation mismatches the API:
- In the API, task shaders are considered graphics shaders which
are part of a graphics pipeline and the draws are submitted to
a graphics queue.
- The HW requires the driver to dispatch task shaders on
an async compute queue.
When a pipeline is bound that has a task shader, create a
driver-internal ACE (async compute engine) cmdbuf which
we are going to submit to an ACE queue.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
This is going to be used with task shader dispatches.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
We are going to reuse them outside of radv_pipeline.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
- Move the uses_perf_counters ternary expression out of
the loop into a variable called cs_offset.
- Constify cmd_buffer_count.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
This cleans up radv_flush_constants and also
the new function will be reused later.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
Allow emitting these packets without a radv_cmd_buffer object.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
Initialize the inverted predication VA only when it is used
for the first time.
This is needed to get conditional rendering work correctly with
task shaders because the internal compute cmdbuf may not exist
yet when conditional rendering starts.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16531>
in some cases it becomes desirable to "maybe" stop and start the current
renderpass, such as when updates MAY result in layout changes for attachments
for such cases, avoid splitting the renderpass unless it actually needs to
be split
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17640>
this is more accurate and fixes usage with lavapipe
cc: mesa-stable
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17627>
in this case, lying about having multiple images and then returning the
same image every time doesn't work, so use the busy flag
and return an available image when possible
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17590>
Fix defects reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member sgpr_spill_slots is not
initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member vgpr_spill_slots is not
initialized in this constructor nor in any functions that it calls.
Fixes: 7d34044908 ("aco: refactor VGPR spill/reload lowering")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17583>
the number of viewports in use is based on the outputs of the last vertex
stage, not the viewports passed by the state tracker
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17639>
this is a weird corner case where glsl permits a zero value, so clamp to 1
and then don't emit any vertices to avoid driver hangs
affects:
dEQP-GL45-ES31.functional.geometry_shading.emit.points_emit_0_end_0
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17639>
Two advantages:
* When using NIR_DEBUG=nir_print_xx, will print outcome only if
there is a change
* We can use NIR_PASS(_, ...) instead of NIR_PASS_V, that has
slightly more validation checks.
This includes:
* v3d_nir_lower_image_load_store
* v3d_nir_lower_io
* v3d_nir_lower_line_smooth
* v3d_nir_lower_load_store_bitsize
* v3d_nir_lower_robust_buffer_access
* v3d_nir_lower_scratch
* v3d_nir_lower_txf_ms
As we are here we also simplify some of them by using the
nir_shader_instructions_pass helper.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>
The trigger for this commit was when we found that we were not calling
nir_metadata_preserve when lowering the layout code. But then I found
that it would be better to just update the code to use
nir_shader_instructions_pass, so we can avoid to manually:
* Initialize the nir_builder
* Call nir_foreach functions (we pass the callback)
* Call nir_metadata_preserve functions (that as mentioned we were not calling)
We also get a nice cleanup of several functions by reducing the number
of parameters (we pass a state struct).
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>
Without it we got a metadata assert:
deqp-vk: ../src/compiler/nir/nir_metadata.c:108: nir_metadata_check_validation_flag: Assertion `!(function->impl->valid_metadata & nir_metadata_not_properly_reset)' failed
if we try to use NIR_PASS(_, instead of NIR_PASS_V (that among other
things, do more validations).
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>
VK_NULL_HANDLE descriptor set layouts are allowed when creating pipeline
layouts without VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT.
From VUID-VkPipelineLayoutCreateInfo-graphicsPipelineLibrary-06753:
> If graphicsPipelineLibrary is not enabled, elements of pSetLayouts
> must be valid VkDescriptorSetLayout objects
From VUID-VkPipelineLayoutCreateInfo-pSetLayouts-parameter:
> If setLayoutCount is not 0, pSetLayouts must be a valid pointer to an
> array of setLayoutCount valid or VK_NULL_HANDLE VkDescriptorSetLayout
> handles
Signed-off-by: Ricardo Garcia <rgarcia@igalia.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17629>
This represents an offset from the actual start of the image data,
not from the start of the memory allocation bound to the image.
Fixes:
dEQP-VK.image.subresource_layout.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17648>
From the Vulkan spec:
"If poolSizeCount is not 0, pPoolSizes must be a valid pointer to an
array of poolSizeCount valid VkDescriptorPoolSize structures"
So 0 is actually allowed and there is a CTS to check it is handled gracefully.
Fixes:
dEQP-VK.api.descriptor_pool.zero_pool_size_count
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17648>
cached mode was great 2 years ago when template support was less widespread,
but now that templates are everywhere, caching is less performant in
every scenario
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17636>
Previously we would just unroll the loop one extra iteration and let
other optimisation passes clean up the mess. This worked to a degree
but if the loop happened to be nested inside another loop we would
end up with phi chains that would block other passes from being able
to do the cleanup.
With this commit we explicitly clone the variables create by lcsaa
and insert them directly in the last continue branch after we are done
unrolling. With this optimisation passes can recognise both sides
of the if output the same values and can progress further.
Help with the issues described in:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/6051
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17611>
LRZ fast-clear works on all gens, however blob disables it on
gen1 and gen2. We also elect to disable fast-clear on these gens
because for close to none gains it adds complexity and seem to work
a bit differently from gen3+. Which creates at least one edge case:
if first draw which uses LRZ fast-clear doesn't lock LRZ direction
the fast-clear value is undefined.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6829
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17599>
We're getting several more 630s in the lab, but need a bit of time to swap
out some broken old ones and stabilize the new ones. Fritz thinks this
should be done in an hour or so, but I want to turn off the CI for main so
that we don't block anyone else.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17641>
We should not scrub raster dedicated states when dynamic state includes
VK_DYNAMIC_STATE_RASTERIZER_DISCARD_ENABLE.
Test:
- dEQP-VK.pipeline.extended_dynamic_state.*_raster
- dEQP-VK.api.pipeline.pipeline_invalid_pointers_unused_structs.graphics
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17582>
zink_kopper_acquire_readback will flush any outstanding clears, this means that
the current clears need to be applied first before calling zink_kopper_acquire_readback.
This was already done for native_blit and native_resolve, also do this for the emulated draw path.
Seen as intermittent failures in cts case GTF-GL33.gtf21.GL2FixedTests.buffer_clear.buffer_clear.
Cc: mesa-stable
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17631>
Spirv spec does not allow the use of OpEmitVertex or OpEndPrimitive when there are multiple streams.
Instead emit the multi-stream version of these with stream set to 0.
This issue was seen when testing cts case KHR-GL46.transform_feedback.draw_xfb_stream_test
Fixes: 35e346f428 ("zink: handle vertex streams")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17513>
When we made depth/stencil dynamic, we lost the optimization. This is
particularly important for cases where the stencil test is enabled but
never writes anything as certain combinations with discard can cause
the stencil write (which doesn't do anything) to get moved late which
can be a measurable perf hit. According to 028e1137e6 ("anv/pipeline:
Be smarter about depth/stencil state", it was a couple percent for DOTA2
on Broadwell back in the day. No idea how it affects current titles.
This may also improve the depth/stncil PMA workarounds on Gen8 and Gen9
since they're now looking at optimized depth/stencil state.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17564>
Now that we've stopped trying to do dynamic stuff up-front, we're only
merging in one bit: DoubleSidedStencilEnable. There's no point in all
the merging code for one bit which is a constant anyway.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17564>
For one thing, we were deceptively setting it wrong in genX_cmd_buffer.c
and then overwriting it in each of of gfx7_cmd_buffer.c and
gfx8_cmd_buffer.c. Pull it all into genX_cmd_buffer.c so it's no longer
duplicated. Also, stop doing the PATCHLIST conversion in anv_pipeline.c
and just store the number of patch vertices.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17564>
This attempts to be a pretty minimal refactor. We just switch all the
state emit code to use the new graphics pipeline state object.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17564>
Following our rule that anything that can be dynamic should be set
(and is this case is already) in the following files :
- genX_cmd_buffer.c
- gfx8_cmd_buffer.c
- gfx7_cmd_buffer.c
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17564>
The only reason why we recorded them per-sample-count is because Intel
hardware is weird starting with Broadwell. The API, requires that the
dynamic sample pattern be reset every time the sample count changes so
we only need to record the pattern for the current sample count.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17564>
There's no good reason to defer figuring out the size until we emit the
packet. We know everything when the bind happens.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17564>
If the driver can do 16-bit ALU ops, then store RelaxedPrecision phi
values into 16-bit NIR variables with downconverts/upconverts on the way
in/out.
This has no impact on shader-db on freedreno (not that we have a ton of
GLES content there), but it does cause an ANGLE-translated CTS shader on
vulkan to get consistent conversions between two copies of a value, and
avoid a test bug.
Reviewed-by: Emma Anholt <emma@anholt.net>
Closes: #6585
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14018>
This can't do it in the loop because it doesn't easily know what
attributes use a binding.
We could do it in a separate loop, but there's no point, especially since
zink does CmdSetVertexInputEXT() after CmdBindVertexBuffers2().
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Fixes: c335a4d70e ("radv: dynamically calculate misaligned_mask for dynamic vertex input")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17521>
We can't do uniform buffer objects and from the hardware
perspective constants (uniforms) and immediates are treated in
the same way. They are uploaded together and fit together into the
(rather low) total constant limit. Therefore, there is actually no
advantage in converting immediates to uniforms, and a whole lot of
disadvantages (less possible optimizations and no inlining).
Fixes the dEQP regressions from https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16770
The tests failed because when there is an indirect array access the
compiler inserts a big if ladder and converting the temp to a uniform
means more instructions because the ifs cant be lowered at least
partially to selects. It is particularly visible, because the code
NIR currently emits for the indirect access doesn't really fit the
hardware, see: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6366
No change in my shader-db, so this concerns the tests only.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17576>
This implements the "tortoise and the hare" algorithm for detecting
cycles in graphs. We use the caller's iterator as the hare and our own
internal copy as the tortoise. Conveniently, VkBaseOutStructure (and
VkBaseInStructure which is identical except the pointer type on pNext)
have a pointer we can use for the tortoise and an sType which we can use
for a counter to ensure we only increment the tortose every other loop
iteration.
There are more efficient algorithms than tortoise and hare but they
require allocating memory for something like a hash set of seen nodes.
Since this for debug purposes only, it's ok for it to be a bit
inefficient in the case where it hits the assert. In the usual case of
no loops, it's the same runtime efficiency as the unchecked version
except that it does a tiny bit of math and 50% more pointer chases.
Version 1 worked fine with clang and with GCC 12.1 with -O0 but not with
optimizations. See https://gitlab.freedesktop.org/mesa/mesa/-/issues/6895.
Unfortunately, the first version required modifying a temporary declared
const inside the for loop and that seems to have been the problem. This
version instead has an iterator struct which is managed by an outer for
loop and the inner for loop exists to declare the user's requested
iteration variable and manage the actual iteration. Because the outer
for loop is effectively `for (bool done = false; !done; done = true)`,
it will execute exactly once, regardless of the inner loop, so break and
continue inside the inner loop should work the same as if it's a single
for loop.
The other major difference with the new version is that the code is the
same for debug and release except the half_iter and loop check are gone.
I've verified by hand that this produces virtually identical code to the
old simple iterators on both GCC andl clang with an optimized build.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17630>
We're about to make it so that the compiler warns/errors if you use the
wrong iterator macro. Fix up a bunch of places where someone used the
wrong one before we break anything.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17630>
Instead of having stencil writes as an out parameter of the optimization
function, we add a new write_enable field for stencil that's equivalent
to the similarly named field for depth. This doesn't mean drivers must
actually support disabling stencil writes independently but the
information may be helpful on some hardware.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17328>
Otherwise, in presence of p_exit_early_if the main FS will always
jump to the PS epilog regardless the exact mask.
This fixes dEQP-VK.draw.renderpass.shader_invocation.helper_invocation
and few vkd3d-proton regressions when PS epilogs are forced.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17617>
Because the data register of atomic VMEM instructions
is shared between src and dst, it might be necessary
to create live-range splits during RA.
Make the live-range splits explicit in WQM mode.
Totals from 7 (0.01% of 134913) affected shaders: (GFX10.3)
Latency: 17209 -> 17210 (+0.01%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15347>
When the application uses depth values outside of the [0.0,1.0] range
with VK_EXT_depth_range_unrestricted, or when explicitly disabled.
Otherwise, the driver can clamp to [0.0,1.0] internally for optimal
performance.
From the Vulkan spec "in 28.10.1. Depth Clamping and Range Adjustment":
"If depth clamping is not enabled and zf is not in the range [0, 1]
and either VK_EXT_depth_range_unrestricted is not enabled, or the
depth attachment has a fixed-point format, then zf is undefined
following this step."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16349>
This feature allows shaders to use pointers to buffers which may
not be bound via descriptor sets. Access to these buffers is done
via global intrinsics.
Because the buffers are not accessed through descriptor sets, any
live buffer flagged with VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT_KHR
can be accessed by any shader using global intrinsics, so the driver
needs to make sure all these buffers are mapped by the kernel when
it submits the job for execution.
We handle this by tracking if any draw call or compute dispatch in
a job uses a pipeline that has any such shaders. If so, the job is
flagged as using buffer device address and the kernel submission
for that job will add all live BOs bound to buffers flagged with the
buffer device address usage flag.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17275>
This adds support for global 64-bit GPU addresses as a pair of
32-bit values. This is useful for platforms with 32-bit GPUs
that want to support VK_KHR_buffer_device_address, which makes
GPU addresses explicitly 64-bit.
With the new format we also add new global intrinsics with 2x32
suffix that consume the new address format.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17275>
If we have 2 pipelines that consume the same push constant data
but where one of them only uses direct access and the other has
indirect access, a draw with the first pipeline would clear the
dirty flag without updating the UBO and by the time we bind and
draw with the second pipeline we won't upload the constants either
because the first draw cleared the dirty flag.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17536>
Since we allocate this ourselves we can immediately add it to the
job at the time we allocate it.
This also fixes a bug we introduced when we implemented inline
uniforms because since that commit, if we had an inline uniform
buffer at index 1 which happend to have indirect access we would
track it in slot 0 instead of slot 1, potentially overwriting
the push constant buffer reference.
Fixes: ea3223e7a4 ('v3dv: implement VK_EXT_inline_uniform_block')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17536>
We have code in there to allocate various segments of MAX_PUSH_CONSTANTS_SIZE
to handle the case of various draw calls in the same command buffer requiring
different push constants, so we are implicitly expecting it to be larger than
this. In fact, this only works now because when we allocate a BO we are always
at least allocating a full page, so the least we ever allocate is 4096 bytes,
so be explicit about it to avoid confusion.
Also, since we were always mapping MAX_PUSH_CONSTANTS_SIZE and the mapping
always starts at the beginning of the BO, it looks like after the first copy
when the resource offset is not zero, we would be writing outside the mapped
range. Always map the full size of the BO instead to ensure this doesn't
happen.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17536>
We have been always uploading MAX_PUSH_CONSTANTS_SIZE but now that
we track the actual size of the push constant buffer we can use
this instead.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17536>
If the command buffer didn't have any push constants or the meta
operation didn't write any new constants we don't need to restore
the state.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17536>
This dynamic state uses the pipeline information so it needs to be
reemitted whenever the pipeline changes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17601>
Those fields have confusing names. They hold non dynamic state,
specified at pipeline creation that we copy into the dynamic state of
the command buffer whenever we bind a pipeline. This non dynamic state
might get picked up in the dynamically emitted instructions of the 3D
pipeline because our HW packets are not exactly splitted like the
Vulkan API.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17601>
Even though this doesn't change anything, it's not good to use an
ARRAY_SIZE for one array to iterate over another.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17601>
anv_descriptor_set_layout already has the information we're gather
here.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17601>
This bit mask is already computed in
anv_graphics_pipeline::dynamic_states in anv_graphics_pipeline_init().
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17601>
This implements the "tortoise and the hare" algorithm for detecting
cycles in graphs. We use the caller's iterator as the hare and our own
internal copy as the tortoise. Conveniently, VkBaseOutStructure (and
VkBaseInStructure which is identical except the pointer type on pNext)
have a pointer we can use for the tortoise and an sType which we can use
for a counter to ensure we only increment the tortose every other loop
iteration.
There are more efficient algorithms than tortoise and hare but they
require allocating memory for something like a hash set of seen nodes.
Since this for debug purposes only, it's ok for it to be a bit
inefficient in the case where it hits the assert. In the usual case of
no loops, it's the same runtime efficiency as the unchecked version
except that it does a tiny bit of math and 50% more pointer chases.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17596>
[anholt: changed to make all drivers do the right thing by moving the
payload barycentric check into the compiler]
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17381>
In addition to hopefully generating shorter code, this optimizes out a
comparison of a mediump-cast value in
dEQP-GLES2.functional.shaders.algorithm.rgb_to_hsl_fragment passed
through ANGLE, and allows the test to pass. We believe it to be a
test bug, but emitting better code like apparently everyone else does
is also a fine result.
No change on GLES gfxbench shaders.
Fixes: #6585
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17546>