Those shaders are just like the blorp ones.
v2: Use a single internal cache for blorp/RT (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 7f1e82306c ("anv: Switch to the new common pipeline cache")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16741>
For cases with lots of very small primitives, this may improve
performance because we're not executing those dead channels all the
time.
Shader-db reports no instruction or cycle-count changes. However, by
hacking up the driver to report when this optimization triggers, it
appears to affect about 10% of shader-db.
v2 (Kenneth Graunke): Always enable VMask prior to XeHP for now,
because using VMask on those platforms allows us to perform the
eliminate_find_live_channel() optimization. However, XeHP doesn't
seem to have packed fragment shader dispatch, so we lose that
optimization regardless, and there's no reason not to avoid vmask.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1054>
It turns out that we need a fragment shader for streamout. Whh? From
Lionel's reading of simulator sources, it seems the streamout unit is
looking at enabled next stages. It'll generate output to the clipper in
the following cases :
- 3DSTATE_STREAMOUT::ForceRendering = ON
- PS enabled
- Stencil test enabled
- depth test enabled
- depth write enabled
- some other depth/hiz clear condition
Forcing rendering without a PS seems like a recipe for hangs so it's
probably better to just enable the PS in this case.
Fixes: 36ee2fd61c ("anv: Implement the basic form of VK_EXT_transform_feedback")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16506>
Actually compile and cache the no-op fragment shader but remove it from
the pipeline if we determine it's a no-op. This way we always have it
even if it's not strictly needed.
Fixes: 36ee2fd61c ("anv: Implement the basic form of VK_EXT_transform_feedback")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16506>
Starting with Ivy Bridge, we implement alpha-to-coverage by writting
gl_SampleMask with a pattern based on alpha. This will show up in
wm_prog_data::uses_omask so we don't need to look at the key.
Fixes: 36ee2fd61c ("anv: Implement the basic form of VK_EXT_transform_feedback")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16506>
Originally, we had virtual opcodes for scratch access, and let the
generator count spills/fills separately from other sends. Later, we
started using the generic SHADER_OPCODE_SEND for spills/fills on some
generations of hardware, and simply detected stateless messages there.
But then we started using stateless messages for other things:
- anv uses stateless messages for the buffer device address feature.
- nir_opt_large_constants generates stateless messages.
- XeHP curbe setup can generate stateless messages.
So counting stateless messages is not accurate. Instead, we move the
spill/fill accounting to the register allocator, as it generates such
things, as well as the load/store_scratch intrinsic handling, as those
are basically spill/fills, just at a higher level.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16691>
If we saw a HALT instruction, we would forget to invalidate our analysis
pass information before returning progress.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16677>
The game Batman: Arkham Knight expects OpenGL behavior
with sample mask and multisampling which is different
from the Vulkan one.
This workaround fix changes key->ignore_sample_mask_out
value that is used for
prog_data->uses_omask definition in brv_fs.cpp(9740)
In that way prog_data->uses_omask also changes it value
and the cloak stops flickering.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6078
Signed-off-by: Viktoriia Palianytsia <v.palianytsia@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16551>
Production Tigerlake and DG1 hardware shouldn't need this workaround.
It was only needed on the very first steppings which never went public.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16575>
OA reports on XeHP have their timestamp shifted to the left by 1. To
get that back in the same time domain as the REG_READ you need to
shift it back to the right and you're loosing the top bit.
v2: use ull for 64bit constant (Ian)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16144>
New counters will use those from inside their read function to
generate percentage numbers.
v2: Forgot to update Iris (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16144>
On Gfx12.5 products, we'll need to capture a couple of A counters that
are not captured in MI_RPC reports. Those are actually global,
previously all A counters were per context.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16144>
This already set in the intel_perf_setup.h file at metric set
creation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16144>
In the future we'll pull more information off devinfo.
v2: Constify pointers (Ian)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16144>
We already had a little workaround for v3dv where, for some if its meta
ops, it had to bind a depth/stenicil image as color. Instead of
special-casing binding depth/stencil as color, let's flip on the
drier_internal flag and get rid of most of the checks in that case.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16376>
XeHP supports a 20:5 pointer format, so the offset can legitimately
be more than UINT16_MAX. Likewise, with 256B binding table mode on
Icelake/Tigerlake, we might have 18:8 pointers that exceed UINT16_MAX.
Thanks to Felix DeGrood for catching this!
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16538>
Import the intact whole rendertest file from skqp (branch
android-cts-12.1_r1) to be able remove the offending test line in the
following commit.
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16407>
This is set to true for all drivers that have a GLSL level
of support lower than 4.00. This matches the rule for setting the
GLSL IR option EmitNoIndirectSampler.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16543>
With this option enabled range of input values for fsin and fcos is
limited to [-2*pi : 2*pi] by calculating the reminder after 2*pi modulo
division. This helps to improve calculation precision for large input
arguments on Intel.
-v2: Add limit_trig_input_range option to prog_key to update shader
cache (Lionel)
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16388>
We no longer emit STATE_BASE_ADDRESS in every batch on XeHP, so the
decoder might not know what the various base addresses are if it's only
looking at a single batch. Fortunately, they also never change, so we
can just emit them once here.
On earlier platforms, initializing them here should be harmless. We'll
emit STATE_BASE_ADDRESS if we change them, which will update these.
Thanks to Iván Briano for catching this.
Fixes: 8831cb38aa ("anv: Stop updating STATE_BASE_ADDRESS on XeHP")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16287>
Mostly Matt Roper's kernel patch commit message:
The IDs added here are the subset reserved for 'motherboard down'
designs of DG2. We have all the necessary support upstream to enable
these now.
The remaining DG2 IDs for add-in cards will be enabled in a future
patch once some additional required functionality has fully landed.
Ref: https://patchwork.freedesktop.org/patch/msgid/20220425211251.77154-3-matthew.d.roper@intel.com
Cc: 22.1 <mesa-stable>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16449>
This pattern pops up a bunch and the semantics of nir_channels() aren't
very convenient much of the time. Let's add a nir_trim_vector() which
matches nir_pad_vector().
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16309>
If the secondary command buffer executed used push constants on a
different set of stages than the primary is using, we may end up not
reallocating them for the primary, getting misrender artifacts at best,
or a nice GPU hang at worst.
Fixes the tests from a CTS from the future:
dEQP-VK.dynamic_rendering.random.*
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16439>
This code just got duplicated a lot. There is still more, but the
remaining instances do a bit more than just removing other functions.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16348>
We use the non-strict algorithm (with parallelograms) prior to Gen10 for
wide lines. We can not advertise rectangularLines.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Fixes: f6e7de41d7 ("anv: Implement VK_EXT_line_rasterization")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15432>
it's been flaking with "2022-05-05 16:29:49.055151: [0m[31mERROR - Failure
getting run results: parsing results: Reading from dEQP: timed out waiting
for fd to be ready (See \"//results/c32.r1.log\")" and a pile of missings
since the brief "whoops, HW CI failed to listen to the test exit code"
regression.
The only ways I know of to hit this case would be:
1) The deqp binary abruptly wedges on its own. This happens with NFS
failures sometimes, but the rest of the run went fine and we never got the
kernel complaining about NFS, so that seems unlikely.
2) The stderr pipe filled up before stdout was completed, and deqp got
wedged trying to output stderr (happens sometimes when you do like
NIR_DEBUG=print in your run).
Both of these seem unlikely, given that we've got a big .qpa file that
made it all the way to writing out test case durations at the end of the
run before abruptly terminating. Why didn't we have at least some of the
test results parsed?
The next deqp-runner release we integrate will solve #2, and cleans up
these error paths a bunch, so I'm hoping we get more information soon.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16350>
This is now unnecessary. Either an instruction is never dynamic and
it's emitted in genX_pipeline.c or it can be and it's emitted in
genX_cmd_buffer.c/gfx8_cmd_buffer/gfx7_cmd_buffer.c
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16220>
On Gfx7 we can only give the sample location for a given multisample
number. This means everytime the multisampling value changes, we have
to re-emit the locations. It's fine because it's also where
(3DSTATE_MULTISAMPLE) the number of samples is stored.
On Gfx8+ though, 3DSTATE_MULTISAMPLE only holds the number of samples
and all the sample locations for all number of samples are located in
3DSTATE_SAMPLE_PATTERN. So to be more effecient there, we need to
track the locations for all sample numbers and compare new values with
the relevant sample count when touching the dynamic state.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16220>
We don't know in what state the secondary buffer will leave the HW
when it ends. It's easier to consider everything needs to be reemitted
for now.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16220>
device->l3_config is only valid on Gfx11+
This only fixes using GPU_TRACE=1
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 02a4d622ed ("anv: expose a couple of emit helper to build utrace buffer copies")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16291>
Using the bo pointer address as the offset doesn't go over well when
someone is fuzzing you. But we already have the mem_addr, we can simply
use that instead.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16250>
Ken performed some tests with shader-db to evaluate the effects
```
Across all 145,848 shaders generated, the results were:
Total bytes compacted before: 3,326,224
Total bytes compacted after: 60,963,280
```
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15399>
Discrete platforms don't have LLC, but on those, we mmap our buffers
with WC. So we shouldn't need to clflush there.
Anv already had a boolean field on the physical device to know whether
we need to use clflush(), based off the memory heaps available. So use
that instead.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15780>
This changes the intel_device_info calculation to call an additional
DRM query requesting the geometry topology from the kernel, which may
differ from the result of the current topology query on XeHP+
platforms with compute-only and 3D-only DSSes. This seems more
reliable than the current guesswork done in intel_device_info.c trying
to figure out which DSSes are available for the render CS.
Cc: 22.1 <mesa-stable>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14143>
This controls the whole lowering of "make tex ops with implicit
derivatives on non-implicit-derivative stages be tex ops with an explicit
lod of 0 instead", but it's really hard to describe that in a git commit
summary.
All existing callers get it added except:
- nir_to_tgsi which didn't want it.
- nouveau, which didn't want it (fixes regressions in shadowcube and
shadow2darray with NIR, since the shading languages don't expose txl of
those sampler types and thus it's not supported in HW)
- optional lowering passes in mesa/st (lower_rect, YUV lowering, etc)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>
This patch adds Tile 4 modifier support to Mesa and allows Mesa to
use Tile 4 on gen12-hp with GBM.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: 22.1 <mesa-stable>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14521>
This doesn't fix anything because memcpy is only used before secondary
buffer execution and we dirty everything after that.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16189>
This commit make simple adding tests which use both GL(ES) and VK.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16048>
Without this we might choose 8 or 16 width, while the app assumes 32.
With subgroup operations it may cause wrong calculations and thus bugs.
Examples of such games are Aperture Desk Job and DOOM Eternal.
v2: Make it a driconf option instead of applying unconditionally, move
from brw_required_dispatch_width to brw_compile_cs
v3: Rename allow_assuming_full_subgroups -> assume_full_subgroups.
Include assume_full_subgroups value in anv_pipeline_hash_compute().
v4: Move actual workaround code from brw_fs.c -> anv_pipeline.c.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6171
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15708>
With mutable descriptor types, we can end up in a situation where a
binding can be, for instance, both a UBO and an acceleration
structure.
While we can promote the UBO to a binding table entry and the shader
can use it, this isn't true of acceleration structures that have no
surface state. In that case just skip the entry. The shader is already
compiled to use the descriptor entry.
In the non mutable case, the entry will not be created by
anv_nir_apply_pipeline_layout.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 63e91148b7 ("anv: Enable VK_VALVE_mutable_descriptor_type")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15969>
Computations for indexing in-memory data structures for ray queries
depend on this.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4f9141607f ("intel: Add device info for DG2")
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15925>
Instead of having two different helpers, delete the pipeline_cache ones.
Also, instead of manually handling the cache == NULL case in every
vkCreateFooPipelines call, handle it inside the helpers. This means
that BLORP can use them too by passing cache=NULL.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13184>
This patch is intended to be somewhat minimal. There's a lot of cleanup
work that can be done but we'll leave that to later patches.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13184>
Call flush_pipeline_select_3d in CmdBeginRendering() to emit a dummy MEDIA_VFE_STATE
before switching from GPGPU to 3D.
Original commit with the fix:
bc612536 ("anv: Emit a dummy MEDIA_VFE_STATE before switching from GPGPU to 3D")
Fixes: 3501a3f9ed ("anv: Convert to 100% dynamic rendering")
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6201
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15954>
There is no reason not to be able to get it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 34a0ce58c7 ("anv: add a new execution mode for secondary command buffers")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15968>
fix brw_kernel::stats member that was declared as a variable
but used as a pointer to array of 3 elements
CID: 1503279
Signed-off-by: Bozhenko Alexey <oleksii.bozhenko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15975>
Useful for hang debugging. Previously Anv incorrectly used DEBUG_SYNC
for this.
v2: Update documentations for sync/stall (Jordan)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15950>
Fix invalid usage of meson objects which violates official meson
specification and thus breaks muon, an implementation of meson
written in C.
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15715>
Fixes a hang on Gfx9 GT1 : dEQP-VK.compute.zero_initialize_workgroup_memory.max_workgroup_memory.128
Tested-by: Mark Janes <markjanes@swizzler.org>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15596>
This fixes a number of tests like :
dEQP-VK.renderpass*.suballocation.multisample.s8_uint.*
dEQP-VK.renderpass*.suballocation.multisample.separate_stencil_usage.d24_unorm_s8_uint.*.test_stencil
dEQP-VK.renderpass*.suballocation.multisample.d24_unorm_s8_uint.*
dEQP-VK.renderpass*.suballocation.multisample.d32_sfloat_s8_uint.*
Because the driver asserts when generating RENDER_SURFACE_STATE with a
8 Valign value for stencil buffer (only 2 & 4 are supported).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12670>
c78be5da30 ("intel/fs: lower ray query intrinsics") introduced a
helper function using nir_(push|pop)_if which invalidated dominance &
block_index for the replacement of nir_intrinsic_rt_trace_ray.
We can still keep dominance/block_index metadata for the lowering of
nir_intrinsic_rt_execute_callable though.
This change uses 2 different lowering function with correct metadata
preservation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15910>
When IndirectParameterEnable==true it's not actually used by the hardware,
but if it's not initialized and INTEL_DEBUG=bat is set, then Valgrind complains.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15850>
The intel_perf_query path used for performance queries on GL was
passing a bogus "end" pointer to intel_perf_query_result_accumulate(),
causing it to accumulate garbage values. This was causing the values
of many performance counters to be corrupted.
The "end" pointer was incorrect because the current code was assuming
that different OA reports were located TOTAL_QUERY_DATA_SIZE bytes
apart, which is a hard-coded preprocessor define. However recent
(Gfx12+) hardware generations use a variable query size determined by
the query layout. Use the size derived from it instead, and remove
the stale define.
Fixes: 3c51325025 ("intel/perf: switch query code to use query layout")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15783>
This enables a lower power mode in the sampler hardware in certain
common scenarios. On Tigerlake, SAMPLER_MODE is not programmable by
userspace but the kernel already sets this bit for us.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
While this register still exists, it's no longer a per-context register.
Instead, on Gfx12+, SAMPLER_MODE exists per dual-subslice and is
accessed as a "multicast" register, where you write control which
version is accessed by the "steering control register".
At any rate, userspace cannot write it any longer, and so there's not
much point to it existing in our genxml (which was missing most of the
fields anyway).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
This allows the sampler to perform faster filtering of 8-bit UNORM
textures by filtering them at a different precision. The filtering
is intended to still be OpenGL and DirectX spec compliant.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
It was originally introduced in ca791f5c but it was never actually set
anywhere. It doesn't serve any purpose other than some sanity checking
so let's clean it up for now.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15799>
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces. The
maximum number of gradient components and coordinate components should
be 2. In spite of this limitation, the Bspec lists a mysterious R
component before the min_lod, so the maximum coordinate components is 3.
Fixes the following Vulkan CTS failures on DG2:
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1d_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2d_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1d_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2d_fragment
The Fixes: tag below is a bit misleading. This commit fixes some test
cases similar to ones fixed by the Fixes: commit. I just want to make
sure this commit gets applied everywhere that commit was also applied.
Fixes: 635ed58e52 ("intel/compiler: Lower txd for 3D samplers on XeHP.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15781>
Otherwise the driver setting interacts with it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 939ddccb7a ("anv: Add support for depth bounds testing.")
Fixes: 1df871f8ff ("iris: Add support for depth bounds testing.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15763>
Signed-off-by: Omar Akkila <omar.akkila@collabora.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15668>
These both require swizzling so border colors won't work. However,
they're conveniently in the list of formats for which custom border
colors require you to specify a format in the sampler. That list
constists of:
- VK_FORMAT_B4G4R4A4_UNORM_PACK16
- VK_FORMAT_B5G6R5_UNORM_PACK16
- VK_FORMAT_B5G5R5A1_UNORM_PACK16
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6226
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15624>
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces. Cube
maps and 3D surfaces were already handled, but 1D array and 2D array
surfaces were not.
Fixes the following Vulkan CTS failures on DG2:
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2darray_fragment
The Fixes: tag below is a bit misleading. This commit adds another
lowering, similar to the one in the Fixes: commit, that probably should
have been added at the same time. I just want to make sure this commit
gets applied everywhere that commit was also applied.
Fixes: 635ed58e52 ("intel/compiler: Lower txd for 3D samplers on XeHP.")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15681>
according to KHR_gl_texture_3D_image:
If <target> is EGL_GL_TEXTURE_3D_KHR, <buffer> must be the name of a
complete, nonzero, GL_TEXTURE_3D (or equivalent in GL extensions) target
texture object, cast
into the type EGLClientBuffer. <attr_list> should specify the mipmap
level (EGL_GL_TEXTURE_LEVEL_KHR) and z-offset (EGL_GL_TEXTURE_ZOFFSET_KHR)
which will be used as the EGLImage source; the specified mipmap level must
be part of <buffer>, and the specified z-offset must be smaller than the
depth of the specified mipmap level.
thus a 2d view of a 3d surface is not only legal, it's part of the spec and
must be supported when available
cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15584>
Now that we're using 3DSTATE_BINDING_TABLE_POOL_ALLOC to set the base
address for the binding table pool separately from surface states, we
don't actually need to update surface state base address anymore.
Instead, we can just set STATE_BASE_ADDRESS once at context creation,
and never bother updating it again, saving some heavyweight flushes
and freeing us from the need for address offsetting trickery.
This patch was originally written by Jason Ekstrand, with fixes from
Lionel Landwerlin, but was targeting Icelake. Doing it there requires
additional changes (15:5 -> 18:8 binding table pointer formats) which
also involve some trade-offs, whereas the XeHP change is purely a win,
so we'll do it here first.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15616>
You should probably resize the sources array before accessing entries
that might be out of bounds. inst->resize_sources() always allocates
enough space for at least 3 sources, so this is really only an issue
when there are 4+ sources.
Fixes: a920979d4f ("intel/fs: Use split sends for surface writes on gen9+")
Fixes: 4f86a70599 ("intel/fs: Lower DW untyped r/w messages to LSC when available")
Fixes: d372abe397 ("intel/fs: Add surface OWORD BLOCK opcodes")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15632>
In general, an atomic intrinsic may perform separate atomics for every
enabled SIMD channel, as each channel may operate on different memory.
However, an extremely common case is for all channels to access the same
memory location. In this case, we can simply perform a reduction/scan
across the subgroup, and perform one atomic for the whole subgroup,
rather than one per channel. For example, if an intrinsic says to take
the minimum value of the existing memory and the value in each channel,
we can do a thread-local minimum of all enabled channels, then do a
single atomic to take the minimum of that and the existing memory.
Our hardware doesn't optimize the case where multiple channels ask for
atomics on the same memory location; it assumes the compiler will do so.
nir_opt_uniform_atomics() uses divergence analysis to detect this case,
adds the necessary subgroup operations, and moves the atomic inside a
conditional that disables all but a single invocation. It even detects
cases where the shader code already performs this kind of optimization,
and avoids doing it a second time.
This may not be the optimal solution for us. In the backend, we could
detect this case and emit send(1) instructions with NoMask, rather than
generating if...send(16)...endif, and a lot of unnecessary ALU ops. But
it's simple to do, reuses the same path as ACO, and still provides most
of the benefit by cutting up to 16x atomics down to a single atomic,
which is more merciful to the memory bus.
Improves performance of Shadow of the Tomb Raider by 5.5% on XeHP.
Improves performance of a customer-internal benchmark on XeHP at
3840x2160 and low settings by approximately 30%.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
We'll use this more shortly. For now, enable it to separately in case
anything bisects to this.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
Although we don't use divergence analysis yet, we've had several
work-in-progress series that make use of it. We may as well set
our options so that those series can assume they're in place.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
We haven't exposed this intrinsic as it doesn't directly correspond to
anything in SPIR-V. However, it's used internally by some NIR passes,
namely nir_opt_uniform_atomics().
We reuse most of the infrastructure in brw_find_live_channel, but with
LZD/ADD instead of FBL. A new SHADER_OPCODE_FIND_LAST_LIVE_CHANNEL is
like SHADER_OPCODE_FIND_LIVE_CHANNEL but from the other side.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15484>
Instead of reusing the in/out slot mechanism, use a separated NIR
variable mode. This will make easier later to implement staging the
output in shared memory (and storing all at the end to the URB).
Note to get 64-bit type support we currently rely on the
brw_nir_lower_mem_access_bit_sizes() pass.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15022>
We're trying to replace VK_OUTARRAY_MAKE() by VK_OUTARRAY_MAKE_TYPED()
so people don't get tempted to use it and make things incompatible with
MSVC (which doesn't support typeof()).
Suggested-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15522>
Use emit_predicate_on_sample_mask() helper that does check where to
get the correct mask depending on whether discard/demote was used or
not.
Fixes: 45f5db5a84 ("intel/fs: Implement "demote to helper invocation"")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15400>
Without this change, a check for "is helper invocation" could read
uninitialized values.
Fixes: 45f5db5a84 ("intel/fs: Implement "demote to helper invocation"")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15400>
This information should match the current pipeline sample count.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15310>
3DSTATE_MULTISAMPLE should be baked into the pipeline if not dynamic.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 27ee40f4c9 ("anv: Add support for sample locations")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15310>
Color writes & color masks occupy the same fields in the BLEND_STATE
structure. So we need to store color mask (which are not dynamic) on
the pipeline to merge that information with color writes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b15bfe92f7 ("anv: implement VK_EXT_color_write_enable")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6111
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15310>
First, there is a problem if you do the following
vkCmdSetColorWriteEnableEXT(attachmentCount = 8)
vkCmdBindPipeline(GFX, with attachmentCount = 4)
vkCmdDraw()
vkCmdBindPipeline(GFX, with attachmentCount = 8)
vkCmdDraw()
Because in the dynamic state emission code we rely on the first
pipeline to figure the number of BLEND_STATE entries to prepare. This
is wrong, we should fill all entries so that the dynamic state works
regardless of the number of attachments in the pipeline. With regard
to the dynamic values, we should retain enable/disable values that do
not concern the current pipeline.
Second, 3DSTATE_WM was not always reemitted when the pipeline changed.
But since it is not emitted as part of the pipeline, this results in
inconsistent state being programmed.
Third, we end up disabling the fragment stage completely in some
cases. And that is programming the pipeline inconsistently and
triggering a hang on TGL.
v2: Fix comment (Tapani)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b15bfe92f7 ("anv: implement VK_EXT_color_write_enable")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15310>
The problem is that we missed looking at pipeline changes. Pipelines
hold bits of dynamic states and when it changes we might need to
reemit a packet.
v2: fix comment (Tapani)
Add missing anv_cmd_buffer_needs_dynamic_state() use (Tapani)
Cc: mesa-stable
Fixes: 505d176a8e ("anv: disable baked in pipeline bits from dynamic emission path")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15310>
As this function doesn't check for any control-flow
dependence, it only returns true for statically
(or globally) uniform values.
The same holds true for is_binding_dynamically_uniform()
in nir_opt_gcm().
Rename to better reflect that property.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14994>
We're now required to provide all modules to link at the same SPIRV
version.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15486>
We're running the io lowering twice so need to reset some fields so
the offset don't go over what is really needed.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15486>
Linkage should have happened before this in intel_clc. This just
silence a parser warning.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15486>
This is roughly the same as SpvCapabilityGroupNonUniform
(subgroup_basic).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15486>
When CCS compression first came out on Skylake, we referred to it as
"renderbuffer compression", or RBC for short. However, that name has
long since fallen out of favor, and we refer to it as CCS nearly
everywhere.
This patch renames DEBUG_NO_RBC to DEBUG_NO_CCS inside the codebase
for clarity, and adds INTEL_DEBUG=noccs. The legacy INTEL_DEBUG=norbc
name continues to work, because it's one line of code and having both
names makes our lives easier in the interim.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15447>
This reverts commit d68b2db89c.
With this change, no regressions have been observed with the
dEQP-VK.synchronization* test group. There are regressions with
dEQP-VK.drm_format_modifiers.export_import.*, but those have been
root-caused to be test issues (see 3575).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6125
Fixes: 57445adc89 ("anv: Re-enable CCS_E on TGL+")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15420>
This tool is currently only aimed at Gfx version 12.5+ with
COMPUTE_WALKER. We could make it work on earlier platforms but they
require pushing gl_SubgroupInvocation and the CLC code is missing the
back-end compiler set-up bits for that.
v2: Commit description by Jason
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13171>
Having everything ever known to man is confusing our SPIRV parser.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13171>
This is for Gfx12.5 with the COMPUTE_WALKER::Inline Data payload. We
do this in a similar way to the compute kernels.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13171>
The Vulkan spec was recently clerified to say that transitions only
happen to the bound layers:
"Automatic layout transitions apply to the entire image subresource
attached to the framebuffer. If multiview is not enabled and the
attachment is a view of a 1D or 2D image, the automatic layout
transitions apply to the number of layers specified by
VkFramebufferCreateInfo::layers. If multiview is enabled and the
attachment is a view of a 1D or 2D image, the automatic layout
transitions apply to the layers corresponding to views which are
used by some subpass in the render pass, even if that subpass does
not reference the given attachment."
This is in the context of render passes but it applies to dynamic
rendering because the implicit layout transition stuff is a Mesa pseudo-
extension and inherits those rules.
For clears, the Vulkan spec says:
"renderArea is the render area that is affected by the render pass
instance. The effects of attachment load, store and multisample
resolve operations are restricted to the pixels whose x and y
coordinates fall within the render area on all attachments. The
render area extends to all layers of framebuffer."
Again, this is in the context of render passes but the same principals
apply to dynamic rendering where the layerCount and renderArea are
specified as part of the vkCmdBeginRendering() call.
Fixes: 3501a3f9ed ("anv: Convert to 100% dynamic rendering")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15441>
This fixes the test_resolve_non_issued_query_data vkd3d-proton test.
This change is required on TGL+ (maybe ICL?) because on all platforms
3D pipeline writes are not coherent with CS. On previous platform we
fixed this by flushing the render cache to make sure data is visble to
CS before it writes to memory. But on more recently platforms,
flushing the render cache leaves the data in the tile cache which is
still not coherent with CS, so we need to flush that one too.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14552>
We've run into issues before where PIPE_CONTROL races MI_STORE_*
commands. So make sure we emit the availability using the same type of
CS so that memory writes are properly ordered.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14552>
Most of the time, this doesn't matter. On the versions with _sat, if
the destination type is incorrect, the clamping will not happen
correctly.
Fixes the following CTS tests:
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_packed_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.all_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_packed_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.limits_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_packed_uu_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_ss_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_su_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_us_v4i8_out32
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.small_uu_v4i8_out32
v2: Update anv-tgl-fails.txt.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Fixes: 0f809dbf40 ("intel/compiler: Basic support for DP4A instruction")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15417>
Increase default batch_size and buffer_size from 16 -> 64. These
are sized to be big enough to service most games. As games have
become more demanding, larger sizes become necessary.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15348>
anv_batch_bo has a length field that we use to flush cachelines. Not
having that field initialized properly leads us to access out of bound
memory.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15425>
For any fence greater than local scope, always set flush type to at
least invalidate so that fence goes on properly.
v2: Fixup condition to trigger workaround (Lionel)
v3: Simplify workaround (Curro)
v4: Don't drop the existing WA (Curro)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: 22.0 <mesa-stable>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14947>
v2: Use CS_STALL instead of FLUSH_ENABLE in Iris (Lionel)
Add missing CS_STALL after SO_BUFFER change in Anv (Lionel)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: 22.0 <mesa-stable>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14947>
It's tricky to always get the render area to the viewport code. In
particular, it's not provided to secondary command buffers as part of
the inheritance info so we have to bend over backwards and look for a
framebuffer. With VK_KHR_dynamic_rendering, there is no framebuffer and
this approach won't work and we'll need something better if we want
competent guardbands in secondary command buffers.
The good news is that any client that's sloppily rendering and trusting
the clipper to keep things inside the render area will set a scissor and
that's something they have to set inside the secondary. We can dig
through the scissor state and also include the corresponding scissor (if
any) and use that for our render area. This should give us the same
secondary command buffer performance with VK_KHR_dynamic_rendering.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
There's never been a particularly good reason to stick these in gfx7/8.
We mostly did it to deduplicate the binary a bit but this shouldn't emit
all that much code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
This is about the only "small" change we can make in the process of
converting from render-pass-based to dynamic-rendering-based. Make
everything in pipeline creation work in terms of dynamic rendering and
create the dynamic rendering structs from the render pass as-needed.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
This was ill-conceived at best. Yes, it checks for a few error
conditions but it doesn't check much and what checks it has are very far
away from the code that relies on those invariants. If we care about
these invariants, we should add asserts near the code that makes those
assumptions rather than pretending to be the validation layers.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14961>
A new extension allowing the application to use the OpenGL depth
range in NDC, i.e. with depth in range [-1, 1], as opposed to
Vulkan’s default of [0, 1].
v2:
- call gfx8_cmd_buffer_emit_viewport on ANV_CMD_DIRTY_PIPELINE (Jason)
- remove redundant !! operator since negativeOneToOne must be true or
false (Tapani)
- coding style changes (Lionel)
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6070
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15304>
The Vulkan spec for VK_KHR_depth_stencil_resolve allows a format
mismatch between the primary attachment and the resolve attachment
within certain limits. In particular,
VUID-VkSubpassDescriptionDepthStencilResolve-pDepthStencilResolveAttachment-03181
If pDepthStencilResolveAttachment is not NULL and does not have the
value VK_ATTACHMENT_UNUSED and VkFormat of
pDepthStencilResolveAttachment has a depth component, then the
VkFormat of pDepthStencilAttachment must have a depth component with
the same number of bits and numerical type
VUID-VkSubpassDescriptionDepthStencilResolve-pDepthStencilResolveAttachment-03182
If pDepthStencilResolveAttachment is not NULL and does not have the
value VK_ATTACHMENT_UNUSED, and VkFormat of
pDepthStencilResolveAttachment has a stencil component, then the
VkFormat of pDepthStencilAttachment must have a stencil component
with the same number of bits and numerical type
So you can resolve from a depth/stencil format to a depth-only or
stencil-only format so long as the number of bits matches.
Unfortunately, this has never been tested because the CTS tests which
purport to test this are broken and actually test with a destination
combined depth/stencil format.
Fixes: 5e4f9ea363 ("anv: Implement VK_KHR_depth_stencil_resolve")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15333>
We add an assert to verify that those are not bound.
v2: Drop != 0 (Tapani)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15241>
In 4001d9ce1a we started lazily allocating surface states in the
descriptor sets rather than upfront in the descriptor pool. This was
to workaround vkd3d-proton allocating more than we could handle at the
HW level.
The issue introduced in that change is that we didn't protect the
descriptor pool free list as well as the anv_state_stream which are
now potentially used from different threads through the descriptor set
write functions.
This reverts the lazy allocation part of that change. Host only
descriptor sets changes remain.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4001d9ce1a ("anv: Handle VK_DESCRIPTOR_POOL_CREATE_HOST_ONLY_BIT_VALVE for descriptor sets")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15241>
We're not supposed to have a
VkWriteDescriptorSetAccelerationStructureKHR when doing a copy. We
should instead get the acceleration structure object from the source
descriptor.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 03e1e19246 ("anv: Refactor descriptor copy")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15241>
in the initial implementation, a stream like:
* CmdBeginTransformFeedbackEXT
* CmdSetRasterizerDiscardEnableEXT
* CmdDraw
* CmdEndTransformFeedbackEXT
* CmdBeginTransformFeedbackEXT
* CmdDraw
* CmdEndTransformFeedbackEXT
would never enable transform feedback, as it only checked for the change
in rasterizer_discard state
Fixes: 4d531c67df ("anv: support rasterizer discard dynamic state")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15269>
This workaround is needed on all Gfx12.0 parts, but doesn't appear to be
necessary on XeHP. The other drivers do not appear to be applying this
workaround on those parts. As further evidence, we accidentally added
the 3DSTATE_BINDING_TABLE_POOL_ALLOC commands after switching back to
GPGPU mode, which would be an incorrect way to implement the workaround,
and things seem to be working.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
On Gen11+, we have a feature that requires us to shift binding table
offsets by 3. This adds a helper which gives the driver a hook to do
this if it so chooses.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
Along with fixing the grammar, this allows it to be deduplicated since
the properly worded description exists in later generations' XMLs.
Cuts 96 B from iris_dri.so and libvulkan_intel.so.
text data bss dec hex filename
917613 0 0 917613 e006d meson-generated_.._intel_perf_metrics.c.o (before)
917511 0 0 917511 e0007 meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14131044 365708 210004 14706756 e06844 iris_dri.so (before)
14130948 365708 210004 14706660 e067e4 iris_dri.so (after)
text data bss dec hex filename
8124321 214264 22820 8361405 7f95bd libvulkan_intel.so (before)
8124225 214264 22820 8361309 7f955d libvulkan_intel.so (after)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
The compiler does a good job of deduplicating strings already, but we
can eliminate the pointers to each string by combining the strings into
a single char array and storing only an index into that array.
The longest of the char arrays is the descriptions array, which is a
little over 45 KiB, so still under MSVC's 64 KiB string literal limit
[0]. Because the string length is under 64 KiB we can use uint16_t as
the index type, which roughly doubles our savings as compared to an int.
This cuts 77 KiB from iris_dri.so (0.5%) and libvulkan_intel.so (0.9%).
text data bss dec hex filename
926811 25920 0 952731 e899b meson-generated_.._intel_perf_metrics.c.o (before)
924401 0 0 924401 e1af1 meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14190852 391628 210004 14792484 e1b724 iris_dri.so (before)
14137732 365708 210004 14713444 e08264 iris_dri.so (after)
text data bss dec hex filename
8184097 240184 22820 8447101 80e47d libvulkan_intel.so (before)
8131009 214264 22820 8368093 7fafdd libvulkan_intel.so (after)
relinfo:
iris_dri.so (before): 17765 relocations, 17545 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
iris_dri.so (after) : 15605 relocations, 15385 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (before): 10720 relocations, 6989 relative (65%), 355 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (after) : 8560 relocations, 4829 relative (56%), 355 PLT entries, 1 for local syms (0%), 0 users
[0] https://docs.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=msvc-170&viewFallbackFrom=vs-2019
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
intel_perf_query_counter contains fields for things we can't or don't
want to store in our static data (like runtime-determined max values) or
oa_read_counter function pointers which are dependent on the GPU gen and
would make deduplication very ineffective.
Cuts 16 KiB from iris_dri.so and libvulkan_intel.so.
text data bss dec hex filename
926811 43200 0 970011 ecd1b meson-generated_.._intel_perf_metrics.c.o (before)
926811 25920 0 952731 e899b meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14190852 408908 210004 14809764 e1faa4 iris_dri.so (before)
14190852 391628 210004 14792484 e1b724 iris_dri.so (after)
text data bss dec hex filename
8184097 257464 22820 8464381 8127fd libvulkan_intel.so (before)
8184097 240184 22820 8447101 80e47d libvulkan_intel.so (after)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
And specifically mark it with ATTRIBUTE_NOINLINE. Otherwise it will be
inlined and actually slightly increase code size.
Cuts 505 KiB from iris_dri.so and libvulkan_intel.so.
text data bss dec hex filename
1538720 0 0 1538720 177aa0 meson-generated_.._intel_perf_metrics.c.o (before)
926811 43200 0 970011 ecd1b meson-generated_.._intel_perf_metrics.c.o (after)
text data bss dec hex filename
14751700 365708 210004 15327412 e9e0b4 iris_dri.so (before)
14190852 408908 210004 14809764 e1faa4 iris_dri.so (after)
text data bss dec hex filename
8744913 214264 22820 8981997 890ded libvulkan_intel.so (before)
8184097 257464 22820 8464381 8127fd libvulkan_intel.so (after)
Relocations increase because the counter initializations are moved from
code (in .text) to pointers (in .text) to .rodata, which require
relocations.
relinfo:
iris_dri.so (before): 15605 relocations, 15385 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
iris_dri.so (after) : 17765 relocations, 17545 relative (98%), 452 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (before): 8560 relocations, 4829 relative (56%), 355 PLT entries, 1 for local syms (0%), 0 users
libvulkan_intel.so (after) : 10720 relocations, 6989 relative (65%), 355 PLT entries, 1 for local syms (0%), 0 users
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>
No changes in resulting code (yes, seriously!). GCC constant propagates
the static const arrays into the code, yielding bit for bit identical
results. This will however enable further cleanups.
Before this patch, we emit 11916 different initializations of
intel_perf_query_counter. With this patch we emit an array of 539 and
initialize the intel_perf_query_counters in terms of those.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15237>