The limit here is from the RENDER_SURFACE_STATE height/width/depth
fields - it's 2^30 for ISL_FORMAT_RAW buffers, and 2^27 otherwise.
anv_isl_format_for_descriptor_type() uses ISL_FORMAT_R32G32B32A32_FLOAT
for uniform buffers when compiler->indirect_ubos_use_sampler is set
(Icelake and earlier), but ISL_FORMAT_RAW when it isn't (Tigerlake+).
So we can increase the limit on Tigerlake and later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14892>
Meson devenv is a feature added in meson 0.58 (thus the features is
version guarded) that allows creating a shell environment with
environment variables automatically setup for running the project inside
the build dir. Some variables (such as LD_LIBRARY_PATH and PATH) are set
automatically, others must be added by the project.
For vulkan is is relativley simple, we create a new, uninstalled, icd
file for each driver and set the VK_ICD_FILENAMES variable
appropriately. This can be used with:
```sh
meson devenv -C $builddir
```
then, vulkan applications will automatically use the uninstall vulkan
driver, no need to install.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14826>
This brings in some interesting new vulkan tests and fixes for the
spurious KHR-GL TF failures. Also, reduces the runtime of
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.36 so that it
should stop timing out.
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13779>
The Mesh pipeline is implemented as a variant of the
regular (primitive) Graphics Pipeline.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13662>
Per primitive & attachment shading rate support added.
v2: Rebase on KHR_dynamic_rendering
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13739>
For instance, the current code in genX_cmd_buffer.c assumes that the
depth/stencil attachments & resolves will be at the end of all
attachments, but that won't be the case anymore with fragment rate
shading.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13739>
Those surfaces are used as attachment to rendering passes and describe
the rate of coarse pixel shading for the pass.
v2: Move CPB_BIT tile filtering to isl_gfx125_filter_tiling() (Nanley)
v3: Drop unused macro (Nanley)
s/isl_to_gen/isl_encode/ (Nanley)
Remove pitch alignment 128B constraint already covered by tiling (Nanley)
Move some asserts together (Nanley)
v4: Disable miptail for now (Nanley)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13739>
DG2 introduces per primitive coarse pixel settings (in stages
preceding the PS shader) and also a control surface specifying the
rate at through the resulting surface.
v2: update comment (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13739>
Add text describing why HierarchicalDepthBufferEnable must be set along
with SeparateStencilBufferEnable.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14825>
The assert for the TiledSurface field caught a programming error, but
with a segfault instead of the usual route of assert-failing. We only
set this field when we have a depth surface, but we also need to set it
when one isn't provided. Fix this issue and drop the assert.
Fixes: b77d694223 ("intel/isl: Allow HiZ with Tile4/64 surfaces")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5950
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14825>
For SNB and prior, assert that the surface is Y-tiled and use constants
when configuring the tiling parameters. This makes a follow-on commit
clearer.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14825>
This option only applies to relative shuffles (up/down/xor), and in a
moment we're going to add an option to lower normal shuffles, so rename
it.
While we're here, rename lower_shuffle() to lower_to_shuffle() for
similar reasons.
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14412>
This guarantees that we get an image that's created with exactly the
swapchain image creation parameters instead of trying to emulate it
inside the driver. Ideally, we'd use the fancy new bind helper too but
our magic ANV_IMAGE_MEMORY_BINDING_PRIVATE gets in the way and we really
do want to re-bind ourself.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12031>
With Vulkan 1.1, we have a VkImageSwapchainCreateInfoKHR struct which
lets you create a new VkImage which aliases a swapchain image. However,
there is no corresponding swapchain create flag so we have to set
VK_IMAGE_CREATE_ALIAS_BIT all the time.
We need to do a bit of work in ANV to prevent it from asserting the
moment it sees one of these. Fortunately, they're already safe because
WSI images go through a different bind path for
VkBindImageMemorySwapchainInfoKHR.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12031>
Replace load_mesh_global_arg_addr_intel with a more general intrinsic
load_mesh_inline_data_intel, since inline data now hold both
a pointer descriptor information and the first few push constants.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14788>
The format on this platform is slightly different from the one used on
TGL. Also it's part of the surface state instead of an aux-map.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14355>
The name of the bit field is CompressionFormat. The format subsections
of the field specify the alternate names of RenderCompressionFormat or
MediaCompressionFormat depending on the compression type.
We're going to start programming this field for media compression, so
we'd like to use either the bit field name or a new
MediaCompressionFormat field. Either option seems fine, so we go with
the first.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14355>
We haven't tested the single-sampled case, so this doesn't enable any
more uses of standalone CCS. This does however, allow multisampled
surfaces with MCS or HIZ to get upgraded to MCS_CCS and HIZ_CCS,
respectively.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14431>
On XeHP, CCS doesn't require VMA on XeHP. The HW provides anything
allocated in LMEM a mapping to a CCS memory range for free. So, we:
1) use the implicit CCS framework to avoid adding an image memory
binding for the CCS surface.
2) leave each BO sized as-is instead of adding on space for the CCS.
Thankfully the framework only adds on space if an aux-map is present.
XeHP has no aux-map, so this patch doesn't explicitly do anything for
this.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14431>
The fallback is incompatible with allocations that use CCS on XeHP. On
that platform, compression can't be used in SMEM.
Apps should be okay with this change. They're able to manage local and
system memory heaps directly (see VK_EXT_memory_budget).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14431>
ISL reports no CCS support for these formats, except for
R10G10B10A2_UNORM_SRGB. Anv doesn't support that format at all however.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14431>
this is triggered by mesa's own wsi handling, so stop printing nonsense
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14743>
Commit e614789588 ("anv: Also disallow CCS_E for multi-LOD images")
accidentally disabled CCS_E on TGL+ because it checked for
image->vk.mip_levels > 0 instead of image->vk.mip_levels > 1.
Instead of reverting it, we remove the code which disables CCS_E for
mipmapped or arrayed images now that we've sufficiently handled the
clear color issue in other ways.
Fixes: e614789588 ("anv: Also disallow CCS_E for multi-LOD images")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14723>
On TGL, if a block of fragment shader outputs match the surface's clear
color, the HW may convert them to fast-clears (see HSD 14010672564).
This can lead to rendering corruptions if not handled properly. We
restrict the clear color to zero to avoid issues that can occur with:
- Texture view rendering (including blorp_copy calls)
- Images with multiple levels or array layers
Fixes: e614789588 ("anv: Also disallow CCS_E for multi-LOD images")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14723>
CCS_E is currently disabled on TGL+, but we'll enable it soon. We choose
to explicitly disable it for certain copy operations to avoid CTS
failures in the following groups:
- dEQP-VK.drm_format_modifiers.export_import.*
- dEQP-VK.synchronization*
Fixes: e614789588 ("anv: Also disallow CCS_E for multi-LOD images")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14723>
Includes a fake framebuffer allocation that's necessary for the code we
still use from the regular render passes.
v3: (Lionel)
- Reuse the attachment count from the faux render pass, remove now
unused function
- Add a cmd_buffer_end_rendering function to match begin_rendering,
making use of the split stuff from end_subpass
v4: (Lionel)
- Don't bother with mark_images_writen or resolves on suspend case
- Remove flush at the end of end_rendering, it's not needed
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13980>
v3: (Lionel)
- Handle VkPipelineRenderingCreateInfoKHR not being present
- Rename dynamic_pass and set it for regular render passes too
v4: C99 is good (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13980>
There's two of them because they can be created from three points in the
code that provide different details and this is the least ugly way I
could think of for now.
v2: Avoid allocations (Lionel)
v3: Move definition closer to its usage (Lionel)
v4: (Lionel)
- Simplify anv_dynamic_pass_init_full
- Zero out pass/subpass to avoid stall pointers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13980>
On platforms where we don't support 64 bit instructions we shouldn't
pass such instructions for the code generator to lower into supported
instructions, because this makes their execution pipeline
unpredictable to the scoreboard lowering pass on XeHP+ platforms.
We really should be reducing all these 64 bit instructions before code
generation, so here we add an assert to help us catch and fix these
cases more easily.
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
[ Francisco Jerez: Also allow has_integer_dword_mul. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
This fixes a bug in the CLUSTER_BROADCAST code generation that causes
the original IR region to be ignored, this will be a problem when we
start lowering 64-bit CLUSTER_BROADCAST instructions at the IR level,
since it will lead to instructions with non-trivial regioning.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
One of the two SHUFFLE implementations wasn't taking into account the
destination stride at all, and the other (more commonly used) one was
taking it into account incorrectly since brw_reg::hstride represents
the stride logarithmically, so we need to use a left-shift operator
instead of product. Found by inspection.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
This fixes a bug in the handcrafted SIMD lowering done by the SHUFFLE
code generation, which wasn't taking into account the source and
destination region strides while deciding whether it needs to split an
instruction.
v2: Use new element_sz() helper instead of left shift. (Lionel)
Fixes: 90c9f29518 ("i965/fs: Add support for nir_intrinsic_shuffle")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
This adds some generic infrastructure that allows splitting any
instruction into a number of instructions of a smaller legal execution
type. This is meant to replace several instances of handcrafted 64bit
type lowering done manually in the code generator, which is rather
error-prone, prevents scheduling of the lowered instructions, and
makes them invisible to the SWSB pass on Gfx12+ platforms, which will
become especially problematic on Gfx12.5+ since the EUs introduce
multiple asynchronous execution pipelines which the SWSB pass needs to
be able to synchronize to one another, so it's critical for the real
execution type of the instruction to be visible to the SWSB pass.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
Right now the execution type lowering functionality of this pass
assumes that an integer type of the original bit size is always
acceptable, however we'll want more complex behavior than that in
order to leverage this pass to automate the lowering of unsupported
64-bit operations into multiple 32-bit operations.
In order to do that calculate the closest legal execution type from a
new helper function, and take advantage of that function from the
has_invalid_exec_type() helper, along the lines of other
lower_regioning() helpers structured as a pair of has_invalid_foo() +
required_foo() functions.
This shouldn't have any functional changes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
Previously the software scoreboard structure would drop previous
dependencies for a given register and replace them with the most
recent one for the same register when a new instruction (or set of
instructions) is processed. This worked correctly on the Gfx12LP
platforms this code was originally designed for, because a repeated
dependency on the same register would either require the second
instruction to synchronize against the first (so the first dependency
could be disregarded from that point on) *or* require the dependency
to be RaR and in-order, which allows the synchronization to be
optimized out (the first dependency could still be disregarded as
well, since the pipeline is in-order). However the latter assumption
will break on upcoming Gfx12HP platforms, because they have multiple
asynchronous FPU pipelines, so whenever we hit a RaR dependency we
need to propagate forward both dependencies, since the order in which
both reads will complete is not guaranteed by the hardware in cases
where they occur from different asynchronous pipelines.
Note that this dependency propagation change requires us to change the
definition of dependency::done as well, since that constant is defined
to discard any previous dependency information when used as argument
for shadow().
This has been reported to fix the following conformance failures on DG2:
KHR-GL46.shaders.uniform_block.random.all_per_block_buffers.19
dEQP-GLES3.functional.shaders.derivate.fwidth.*
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5670
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
We tightened the requirements for multisampling on Gfx7 but didn't
format that at the Vulkan level.
This will break more conformance tests on Gfx7, but we weren't
conformant anyway.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 531b1b7511 ("intel/isl: Strengthen MCS SINT format restriction")
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14679>
This introduces a new blorp_copy() path using the new XY_BLOCK_COPY_BLT
blitter command introduced on Tigerlake. Unlike the blitter commands of
old, this one is actually fast and worth using. Although it doesn't use
shaders like the rest of BLORP, we still can use some surface-munging
code from there, and BLORP also provides a nice place to put this which
is shared among the drivers.
To use the new path, set BLORP_BATCH_USE_BLITTER (much like Jordan's
recent BLORP_BATCH_USE_COMPUTE bit) and target the batch at the copy
engine.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
This will be used as a performance hint for XY_BLOCK_COPY_BLT to
indicate whether the source/destination surfaces are (likely) in
device-local memory or system memory. We don't need to be precise
here - it's okay to set the fields to LOCAL even if a buffer has
been evicted out to system memory.
We should set this from Vulkan too, but I haven't yet. There isn't
a convenient anv_bo field like there is in iris...
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
This should only be set on XeHP. It implies that CCS works via based on
the virtual addresses involved and a flat memory carve-out, rather than
treating CCS like a surface, or using auxiliary maps.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14687>
This change introduces the anv_descriptor_size_for_mutable_type and
anv_descriptor_data_for_mutable_type helpers to compute the size and
data flags respectively for mutable descriptor types.
In order to make handling these types easier we now store a precomputed
descriptor stride for all types and use in the in appropriate places.
We also need to adjust the compiler to take into account this descriptor
stride. To that extent, we now pack the stride into the upper 16 bits
alongside the index and the dynamic offset index and use it later to
compute the correct offset.
Closes: #4250
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14633>
Simplify the buffer chaining process with a single loop and
a helper function from Lionel Landwerlin's input.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Jianxun Zhang <jianxun.zhang@linux.intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14578>
Add L1 cache control bit field to RENDER_SURFACE_STATE and
STATE_BASE_ADDRESS instruction.
v1: (Jason)
- Add prefix to bit field
- Don't miss out STATE_BASE_ADDRESS instruction
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14676>
This enables MCS support on XeHP, now that MCS can be created with a
tiling supported by that platform.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14464>
This reverts commit 2f0fbe06e6.
We don't handle the reconfiguration of existing HiZ surfaces, nor do we
do so for CCS surfaces. This code path is unused, so we remove it.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14464>
The gfx7 MCS restriction for SINT formats actually applies to
multisampling in general. Place the stronger restriction in the format
support check and assert this in isl_surf_get_mcs_surf.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14464>
The pitch check for 16x multisampled surfaces should already be covered
by the pitch_in_range check in isl_calc_row_pitch.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14464>
It's not obvious why the (gl_FrontFacing ? -1.0 : 1.0) case was handled
different for Gfx12+ than for previous generations, and it's not
correct. It tries to negate the result as an integer, and it does this
before the mask operation that clears the other bits in the value.
When we eventually support dual-SIMD8 dispatch, the other front-facing
bit is in g1.6 at bit 15, so similar code should be possible there.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: c92fb60007 ("intel/fs/gen12: Implement gl_FrontFacing on gen12+.")
Closes: #5876
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14625>
AUX and clear state is stored in the VkDevice private binding
Signed-off-by: Renato Pereyra <renatopereyra@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14416>
With the proper version checking in the common vulkan instance code
(commit 88b9b68) it is now possible to bring the reported interface
version up to v5.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14563>
This cuts the compile time down for this file on my ryzen from
real 1m4.077s
to
real 0m30.827s
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14630>
This creates an internal shader_prim enum, I've fixed up most
users to use it instead of GL types.
don't store the enum in shader_info as it changes size, and confuses
other things.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
To avoid dragging gl.h into places it has no business being,
defined tessellation primitive mode to an enum.
This has a lot of fallout all over the place.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
When doing copies of descriptors from one set to another, that contain
either a UNIFORM_BUFFER or STORAGE_BUFFER, both the buffer view &
surface state are allocated from the source descriptor. Therefore we
need to copy their content otherwise we could run into lifecycle
issues when the source descriptor is destroyed.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14585>
VARYING_SLOT_{VIEWPORT,LAYER,PSIZ} all live in the same VUE header slot,
and the FS is already set up to read the x/y/z/w component of that vec4.
However, we were setting up the SBE to pass each of those items as a
separate FS input, so hypothetically if a shader read all three, we
would burn 3 FS inputs with redundant data. Not only was this passing
extra data to the FS, but it would count as extra input slots for the
"Do we have 16 or fewer attributes?" check for using SBE swizzling to
rearrange them in a convenient manner.
Now we make them share a single FS attribute and only count them once.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14210>
During debug builds, if apply_hwconfig is not set, then the devinfo
value will be compared with the hwconfig value. If they don't match
then a warning message will be logged to stderr.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13866>
This will be used to conditionally use hwconfig values to update
intel_device_info at runtime.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13866>
This will be used by crocus and iris to clamp pointsizes only
on the last stage of the shader compile.
Fixes: 3077d96856 ("crocus: Clamp VS point sizes to the HW limits as required.")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14359>
The switch statement in anv_descriptor_data_for_type() shows that this
field isn't used on SKL+.
On XeHP, this avoids assert failures by preventing
isl_surf_fill_image_param() from being called. That function doesn't
expect Tile4 surfaces.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14546>
v2: Fixup gpu_id computation, use minor of /dev/dri/* % 128 since we
don't know whether we get card0 or renderD128 for instance.
(Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com> (v1)
Acked-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13996>
This could lead to confusing if the 32bits roll over (every ~6mn or
so).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4ef6698a26 ("intel/ds: drop timestamp correlation code")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Acked-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13996>
Rather than using always the same metric set, let the user choose when
starting the producer with :
INTEL_PERFETTO_METRIC_SET=RasterizerAndPixelBackend ./build/src/tool/pps/pps-producer
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Acked-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13996>
Will be useful to figure out when blorp operations end.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Acked-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13996>
As indicated by
VkPhysicalDeviceFragmentShadingRatePropertiesKHR::fragmentShadingRateWithShaderSampleMask
our implementation will clamp to 1x1 when reading samplemask or
writing to samplemask.
This fixes vkd3d-proton tests test_sample_mask_dxbc & test_sample_mask_dxil
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b6332fc4a8 ("intel/compiler: handle coarse pixel in render target writes descriptors")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14553>
Because 0 is no longer a recognizable value (it's NEVER, which isn't a
good default), we add an emit_alpha_test bool to tell the back-end when
to bother alpha testing. This lets us only touch crocus with the
change.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14157>
i915 does not report slices accurately anymore on Gfx12.5+. Since this
is information we need to have for performance queries, we need to
rebuild it here.
v2: Remove invalid change to pixel pipes computations (Jordan)
v3: Fix index calculation (Curro)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14297>
We're not saving the first pool.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 36ea90a361 ("anv: Convert to the common sync and submit framework")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14420>
Suggested by Lionel Landwerlin, we add has_bit6_swizzle as
another input when computing driver uuid.
Also fix miscalculation of the length of driver tag.
Signed-off-by: Jianxun Zhang <jianxun.zhang@linux.intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13936>
The chipset_id should be named after i915 ioctl that's called
to get the device id. In user space this field holds pci device
id in reality. We now have a pci_device_id queried from drm
instead using the ioctl, so there is no much reason to keep
the chipset_id for the same purpose.
Signed-off-by: Jianxun Zhang <jianxun.zhang@linux.intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13936>
Dump PCI bus and device info so that we can easily identify output
in a multi-gpu system.
Signed-off-by: Jianxun Zhang <jianxun.zhang@linux.intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13936>
With the new input from PCI bus and device fields, we can compute
device uuids in a multi-gpu system.
Signed-off-by: Jianxun Zhang <jianxun.zhang@linux.intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13936>
Having PCI bus and dev info in the base struct
'intel_device_info' enables us to utilize the info across
multiple drivers for several purposes, such as computing
device uuids in a multi-gpu system.
Signed-off-by: Jianxun Zhang <jianxun.zhang@linux.intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13936>
This field is an average computation that is not actually useful for
any of our driver code.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14510>
Follow the CCS and MCS functions by returning false for unsupported
cases. This reduces the burden on the caller.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14091>
The new HiZ compresses twice as many rows of the depth surface compared
to TGL (Bspec 47009). Also, its tiling needs to be specified in
3DSTATE_HIER_DEPTH_BUFFER_BODY::TiledMode.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14091>
An 8x4 HiZ block doesn't fit in with the new formulas for sizing HiZ on
XeHP. Update a comment which assumed this block size on SKL+.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14091>
* Check the format's compression type instead of the format directly to
prepare for a new HiZ format on XeHP.
* Adjust the gfx12+ calculations so that XeHP will automatically be
handled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14091>
Anv allows non-8x4-aligned depth buffer clears, but it has multisampled
HiZ disabled for BDW. iris allows multisampled HiZ on BDW, but disallows
non-8x4-aligned depth buffer clears.
Drop the unused optimization for non-8x4-aligned clears of multisampled
surfaces on BDW and use this opportunity to use some PRM text in the
code comment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14091>
Binding table pool runs out of capacity quickly on modern games,
requiring new Surface Base Address instructions to be sent. That
is costly due to flushes and stalls. Increasing BT pool capacity
to 64KB improves performance several workloads.
Fallout4 +4%
Shadow of the Tomb Raider +4%
Borderlands3 +3%
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14483>
We're using the wrong helper to get the subslice total count.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c24ba6cecb ("intel/dev: Handle CHV CS thread weirdness in get_device_info_from_fd")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14492>
The coarser 32x32 cross-slice hashing mode seems to lead to better L1
and L2 utilization due to the improved execution locality, however it
can also lead to a bottleneck in a single slice, especially in
workloads that concentrate heavy rendering in small areas of the
screen (e.g. SynMark2 OglGeomPoint, OglTerrain*) -- This effect is
mitigated here by performing a permutation of the pixel pipe hashing
tables that ensures that adjacent rows map to pixel pipes as far away
as possible in the caching hierarchy.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>
Note that this has an effect even for unfused native die platforms,
since the pixel pipe hashing tables we intend to program aren't
equivalent to the hardware's defaults on such configs.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>
This starts off with the simplest possible pixel hashing table
calculation that just assigns consecutive indices (modulo N) to
adjacent entries of the table, along the lines of the existing
intel_compute_pixel_hash_table(). The same function will be improved
in a future commit with a more optimal calculation.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>
In order to avoid some duplication between the GL and Vulkan driver,
which will get worse as we introduce additional code in order to
handle more recent generations.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>
It's now an array with 7 tables, each table is intended to specify the
pixel pipe hashing behavior for every possible slice count between 2
and 8, however that doesn't actually work, among other reasons due to
hardware bugs that will cause the GPU to erroneously access the table
at the wrong index in some cases, so in practice all 7 tables need to
be initialized to the same value.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>
The state cache invalidation shouldn't be necessary on recent
platforms. On ICL it *seems* to be required to get the hardware to
pick up an updated indirect clear color, so this change is only
applied to TGL platforms and later for the moment.
On some DG2 configs this seems to improve SynMark2/OglDrvRes by 16.0%
±0.1%, n=8.
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>
The current packed dispatch assumptions for fragment shaders seem to
be the reason that the fs-readFirstInvocation-uint-loop Piglit
test-case for the ARB_shader_ballot extension fails on DG2 in
combination with the patches in this series that enable pixel pipe
hashing (thanks Jordan for reporting the regression). I've confirmed
that the brw_fs_test_dispatch_packing() test fails on DG2 hardware for
fragment shaders, while it succeeds for other shader stages,
indicating that the PSD hardware no longer guarantees packed dispatch.
Disable it.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13569>
Actually, no, there's no need to do anything, just update some
comments for the record. An earlier revision of this change that
implemented the workaround text to the letter required no less than 8
new PIPE_CONTROLs throughout the tree. However Felix Degrood noticed
that the cost of some of the PIPE_CONTROLs was showing up in workloads
like Shadow of the Tomb Raider. The Windows driver wasn't emitting
many of those pipe controls, contrary to the W/A instructions, so we
engaged in a back and forth with the hardware team, who concluded that
the original suggested workaround was unnecessarily strict, and the
Windows driver's behavior acceptable. It turns out that Wa_1408224581
we had already implemented for TGL is roughly equivalent to the
Windows behavior, so no need to do anything new after all.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14278>
XeHP platforms require the invalidation of the instruction cache after
a STATE_BASE_ADDRESS change due to a hardware bug potentially leading
to instruction cache pollution. Note that the workaround text says
it's applicable "DG2 128/256/512-A/B", however it's also marked as
permanent and not confirmed to be fixed in any specific steping, so we
apply it to all Gfx12HP platforms.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14278>
The MOCS entry used for this on Tigerlake doesn't exist on DG2.
Ref: aca31baafc ("isl: Enable Tigerlake HDC:L1 caches via MOCS in various cases.")
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14467>
Adreno GPUs has native instruction for unsigned and mixed dot_4x8 but
not signed dot product.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>