We can actually create array surfaces instead of requiring single-slice
in a few cases. This does require us to be very careful about our
checks, though.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11647>
It doesn't matter for the actual copy rectangle and this makes the
asserts a bit nicer as we don't need to bother with the intratile
offsets because there aren't any yet.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11647>
This adds a helper isl_surf_get_uncompressed_surf for creating a surface
which provides an uncompressed view into a compressed surface. The code
is basically a direct port of the uncompressed surface code from the
Vulkan driver which, in turn, was a port from BLORP.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11647>
The unused bo_flags here is a leftover from the past. A similar
setup of bo_flags is now performed within anv_device_alloc_bo
via a call to anv_bo_alloc_flags_to_bo_flags.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11645>
We set the ex_desc to 0, since the address surface type is FLAT.
v2 (Sagar Ghuge):
- Fix message descriptor encoding
v2 (Jason Ekstrand):
- Drop support for block messages
Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com>
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
Bspec programming note metions that "Atomic messages are always forced
to "un-cacheable" in the L1 cache". We can make the L1 cache
un-cacheable and L3 with write-back policy.
v2: (Sagar Ghuge):
- Fix caching policy for atomic messages
- Fix simd exec size
v3: (Sagar Ghuge):
- Add atomic messages to brw_schedule_instructions
v4: (Jason Ekstrand):
- Rebase on lsc_msg_desc reworks
Co-authored-by: Sagar Ghuge <sagar.ghuge@intel.com>
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
This puts the basic infrastructure in place for lowering logical
dataport messages to LSC messages. We start with the two most obvious
opcodes and add more in later patches.
v2 (Sagar Ghuge):
- Pass required params to message desc
- Remove duplicate mlen calculation
- Change commit message.
v3 (Jason Ekstrand):
- Drop TGM support
Co-authored-by: Jason Ekstrand <mark.a.janes@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
v2 (Jason Ekstrand):
- Use lsc_msg_desc_opcode()
- Drop all opcodes for now and add them in later patches.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
v2: (Sagar Ghuge):
- rename addr_reg_size to src0_len to match with bspec
v3 (Jason Ekstrand):
- Re-arrange things in increasing bit order
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
v2 (Jason Ekstrand):
- Squash all the similar patches together
- Re-arrange and rename some things to be more consistent
- Add a lsc_opcode_has_cmask helper
- Drop is_one_addr_reg
v3 (Jason Ekstrand):
- Add transpose
- Re-order arguments to make more logical sense
- Switch from `write` to `has_dest`
Co-authored-by: Mark Janes <mark.a.janes@intel.com>
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
Xe-HPG comes with a massively reworked dataport. The new thing, called
Load/Store Cache or LSC, has a significantly improved interface.
Instead of bespoke messages for every case, there's basically one or two
messages with different bits to control things like address size, how
much data is read/written, etc. It's way nicer but also means we get to
rewrite all our dataport encoding/decoding code. This patch kicks off
the party with all of the new enums.
v2 (Jason Ekstrand, Mark Janes):
- Rename to LSC
v3 (Jason Ekstrand):
- Add numbers to all enums
Co-authored-by: Mark Janes <mark.a.janes@intel.com>
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11600>
When vkCmdPushDescriptorSetKHR is used, the descriptor set is allocated
internally without belonging to any pool. Such descriptor set will be
visible on the GPU side because it's a part of the dynamic state stream,
but we still have to store its address in the array of descriptor sets.
Complements: 379b9bb7b0 ("anv: Support fetching descriptor addresses from push constants")
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11577>
Surely the compiler would sort that out, you would think. But no, my
debugoptimized build improves
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime by 25%
on my SKL from this change.
This was the slowest test in the GLES31 tests on APL in CI, at 22s. And
yes, we were spending around half of our runtime in this function.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11631>
We're currently using the 32bit version which is dropping half the
bits of the 64bits values.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11607>
iris, i965, and anv define the PAGE_SIZE in anv_allocator and bufmgr
files. As on FreeBSD the page size is defined in machine/param.h that is
indirectly included by those files, we'd rather define it only when the
system is not FreeBSD to avoid compile errors.
v2: Changed the path in the comment to make clear that machine/params.h
is a FreeBSD system file.
Signed-off-by: Eleni Maria Stea <elene.mst@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11203>
"regs" is an array of 2 ->
"m" must be <= 2 ->
"components" array can be allocated on the stack
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11575>
Coverity complains that memset has no effect, because of size 0.
Size of BLEND_STATE struct is 0 on gfx [6, 7.5], so memset has
nothing to do there. This is of course harmless, but we can make
code simpler by replacing memset with an empty initializer list
and at the same time avoid a warning from Coverity.
CID: 1486015
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11574>
Also mark parse_error as printf-like to catch such errors with gcc.
CID: 1473100
CID: 1473101
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11574>
gcc complains:
../src/intel/tools/aub_write.c: In function ‘populate_ppgtt_table’:
../src/intel/tools/aub_write.c:254:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
254 | (void *)(aub->phys_addrs_allocator++ << 12);
| ^
../src/intel/tools/aub_write.c:258:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
258 | i, (uint64_t)table->subtables[i]);
| ^
../src/intel/tools/aub_write.c:273:24: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
273 | (level == 1 ? (uint64_t)table->subtables[i] :
| ^
../src/intel/tools/aub_write.c: In function ‘ppgtt_lookup’:
../src/intel/tools/aub_write.c:346:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
346 | return (uint64_t)L1_table(ppgtt_addr)->subtables[L1_index(ppgtt_addr)];
| ^
../src/intel/tools/intel_sanitize_gpu.c: In function ‘bo_size’:
../src/intel/tools/intel_sanitize_gpu.c:99:15: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
99 | return e ? (uint64_t)e->data : UINT64_MAX;
| ^
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11574>
In all cases both variables has a type of uint32_t, so multiplying
them will also generate uint32_t. The results of those multiplications
are used as uint64_t's, so Coverity thinks there might be integer
overflows here.
I don't think it's possible to hit them (query BOs should be relatively
small), but let's avoid those overflows.
CID: 1472820
CID: 1472821
CID: 1472822
CID: 1472824
CID: 1475934
CID: 1475927
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11574>
Coverity complains about out-of-bounds access in
intel_field_iterator_init, because it doesn't know that the GT_MODE
register has a size of 4 bytes. Add an assertion to verify that.
CID: 1474552
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11574>
This is the first time we see an application running out of mmap().
We essentially allocate too many batches (+65k) and end up not being
able to mmap them, at which point we can't mmap anything anymore and
things go sideways.
This change allocates bigger batch BOs as we grow an existing command
buffer. This drastically reduces the number of BOs we need to allocate
(the benchmark that reported the issue now reaches a max of ~630 BOs,
instead of reaching 65k and failing previously).
v2: Track the total batch size of command buffers (Jason)
Just give 0 for batch_len to i915 (Jason)
v3: Fix indentation (Jason)
v4: Drop uncessary reshuffling of error labels (Jason)
v5: Remove empty lines (Marcin)
v6: Limit BO growing to chunks of 16Mb (Jason)
v7: Add assert on initial size (Jason)
v8: Add define for max size (Jason)
v9: Fixup v7 assert for non softpin platforms (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4956
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11482>
VkGraphicsPipelineCreateInfo.pMultisampleState is a pointer to a
VkPipelineMultisampleStateCreateInfo structure, and is ignored if the
pipeline has rasterization disabled.
Fixes a crash in one CTS tests that checks this.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11601>
Rework:
* Jordan: Handle per_thread_scratch==0 in anv_scratch_pool_get_surf
* Jordan: Update subslices in anv_scratch_pool_alloc
* Jason: Clean up the patch a bit
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11582>
XeHP adds support for a new surface type for scratch. It's similar to
SURFTYPE_STRBUF in that it's a 2D array-of-struct format but the one
key difference is that the U coordinate is computed automatically based
on the thread ID and only the V coordinate is provided in the dataport
message.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11582>
v2 (Jordan Justin):
- add anv_gem_stubs.c impl
v3 (Jason Ekstrand):
- Use the upstream uAPI
- Rework the interface a bit
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5599>
Create additional memory type with DEVICE_LOCAL_BIT if we have local
memory region aviable.
v2 (Jason Ekstrand):
- Don't leak mem_regions if the second ioctl fails
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5599>
This gets rid of the awkward interface for isl_surf_get_ccs_surf where
we passed it two aux surfaces and it was supposed to fill out the second
one based on whether or not the first one already had stuff in it.
Instead, we now pass it three well-labled surfaces: surf,
hiz_or_mcs_surf, and ccs_surf which have obvious meanings. This does
mean that iris has to carry a bit of logic and we have to flip
parameters around in all the callers. But the resulting interface is
much cleaner.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11479>
Whether or not a surface supports CCS on Tigerlake and later is
dependent not only on the main surface but also on the MCS or HiZ
surface, if any. We were doing some of these checks in
isl_get_ccs_surf based on the extra_aux parameter but not as many as we
probably should. In particular, we were really only checking HiZ
conditions and nothing for MCS. It also meant that, in spite of the
symmetry in names, the checks in isl_surf_get_ccs_surf were more
complete than in isl_surf_supports_ccs.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11479>
The only driver which calls isl_surf_get_ccs_surf with extra_aux != NULL
is iris and it always calls it with two aux surfaces and never calls it
for CCS twice. We can turn those checks into asserts.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11479>
Initial implementation missed various fields that derive from the
primitive topology. This patch fixes 3DSTATE_RASTER/3DSTATE_SF,
3DSTATE_CLIP and 3DSTATE_WM (gen7.x) emission in the dynamic case.
Fixes: f6fa4a8000 ("anv: add support for dynamic primitive topology change")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4924
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11379>
Validation layers should warn you about this
(VUID-VkBindBufferMemoryInfo-size-01037) but this would be useful for
zink debugging.
Requested by Zmike.
v2: Also check memoryOffset (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11542>
Right now the accumulator-clearing move emitted by the generator for
Wa_14010017096 inherits the SWSB field from the previous instruction.
This can lead to redundant synchronization, or possibly more serious
issues if the previous instruction had a TGL_SBID_SET SWSB
synchronization mode. Take the SWSB synchronization information from
the IR.
Fixes: a27542c5dd ("intel/compiler: Clear accumulator register before EOT")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>
This is unlikely to have had any negative side effect on the original
TGL, but will lead to issues on XeHP+ if the software scoreboard pass
isn't able to synchronize the accumulator writes.
Fixes: a27542c5dd ("intel/compiler: Clear accumulator register before EOT")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>
In cases where an in-order instruction is overwriting a register
previously read by another in-order instruction, drop the dependency
iff the previous read is guaranteed to have occurred from the same
in-order pipeline. This should only have an effect on XeHP+ since
previous Xe platforms only had one in-order FPU pipeline.
The previous workaround we were using for this treated all ordered
read dependencies as write dependencies to avoid noise from our
simulation environment. Relative to our previous workaround this
improves performance of GFXBench5 gl_tess by ~7% on a DG2 system
among other single-digit percentual FPS improvements.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>
The hardware fails to provide the expected data coherency guarantees
for accumulator registers when accessed from multiple FPU pipelines.
Fix this by tracking implicit accumulator accesses just like we do for
regular GRF registers, but instead of adding synchronization
annotations for any dependency we only do it for dependencies with a
pipeline mismatch, since the hardware should be able to guarantee
proper synchronization for matching pipelines.
Note that this workaround handles RaW and WaW dependencies in addition
to the WaR dependencies described in the hardware bug report even
though cross-pipeline RaW accumulator dependencies should be extremely
rare, since chances are the hardware will also hang if we ever hit
such a condition. This only affects XeHP+, since all FPU instructions
are executed as a single in-order pipeline on earlier Xe platforms.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>
This change reduces the precision of the scoreboard data structure for
accumulator registers, because the rules determining the aliasing of
accumulator registers are non-trivial and poorly documented (e.g. acc0
overlaps the storage of acc1 when the former is accessed with an
integer type). We could implement those rules but it wouldn't have
any practical benefit since we currently only use acc0-1, and for the
most part we can rely on the hardware's accumulator dependency
tracking. Instead make our lives easier by representing it as a
single register.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11433>
This is now 100% equivalent to the new rt_resume intrinsic.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
v2: Turn a bunch of pointer checks into checks against NULL (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
Upon looking at caching the raytracing shader (in particular the
trampoline one) I kind of got afraid that some of the keys used for
blorp would end up matching other keys. This is because blorp keys are
fairly simple. There is no SPIRV module hash included.
This change includes a "blorp" string at the beginning of the queue to
ensure we don't collide with other keys.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
This allows us to convert a 64-bit address to an anv_address which is
useful for working with device addresses.
v2: switch to int64_t to keep state pool relative relocation working
on non-softpin platforms
v3: Update assert to reflect relative offsets (Jason)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
This is required in order to be able to use GenXML pack functions for
structs with addresses when you're not packing into a batch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
These don't necessarily go in any group but are required for dispatch to
work properly. The trampoline is a compute shader that is the initial
start point for the trace. It's in charge of invoking the actual
ray-gen shader. The trivial return shader is used whenever another
shader is missing and it does no work except the minimum required to do
a stack return.
v2: Rebase on upstream changes (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
This doesn't look too different from other compile functions we have in
anv_pipeline.c. The primary difference is that ray-tracing pipelines
have this weird two-stage thing where you have "stages" which are
individual shaders and "groups" which are sort of mini pipelines that
are used to handle hits. For any given ray intersection, only the hit
and intersection shaders from the same group get used together. You
can't have an intersection shader from group A used with an any-hit from
group B. This results in a weird two-step compile.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
Bindless shaders don't have binding tables so they have to get at the
descriptor sets via a different mechanism.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
They don't have binding tables so they have to use A64 descriptor set
access and everything has to be bindless all the time.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
Instead of depending on the driver to compile each resume shader
separately, we compile them all in one go in the back-end and build an
SBT as part of the shader program. Shader relocs are used to make the
entries in the SBT point point to the correct resume shader.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
This commit adds a delta to be added to the relocated value as well as
the possibility of multiple types of relocations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
They're common between the two drivers and we want to add a couple more
that get emitted from code in src/intel/compiler.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
This just adds the core data structure which we'll build on going
forward.
v2: Add VK_EXT_pipeline_creation_cache_control handling (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
This makes a bunch of loops use ARRAY_SIZE instead of MESA_SHADER_STAGES,
extends a few arrays, and adds a bunch of array length asserts.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
This just adds a base struct and trivial implementations of all the
create/destroy/bind functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
This makes dEQP-VK.api.version_check.entry_points pass and matches how
other drivers are handling this case. We do not support the feature but
still need to provide a dummy entrypoint.
v2: throw error if/when called (Jason)
Fixes: 0d031d1da3 ("anv: toggle on VK_EXT_extended_dynamic_state2")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11503>
For debug on Android, it's useful to be able to print shaders to the
android log interface, since you don't usually have stdout/stderr.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9262>
Fix defect reported by Coverity Scan.
Missing break in switch (MISSING_BREAK)
unterminated_case: The case for value
VEC4_OPCODE_ZERO_OOB_PUSH_REGS is not terminated by a break
statement.
Fixes: 89fd196f6b ("intel/vec4: Add support for masking pushed data")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11347>
This patch only enables the below VkFormat:
- VK_FORMAT_G8_B8R8_2PLANE_420_UNORM
This patch ensures the proper behavior of the below APIs:
- vkGetPhysicalDeviceFormatProperties2
- vkGetPhysicalDeviceImageFormatProperties2
- vkCreateImage
- vkGetImageSubresourceLayout
- vkGetImageDrmFormatModifierPropertiesEXT
- vkGetImageMemoryRequirements
- vkGetImageMemoryRequirements2
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11281>
Add initial multi-planar format support on the images with modifiers:
- With aux usage,
- Format plane count must be 1.
- Memory plane count must be 2.
- Without aux usage,
- Each format plane must map to a distinct memory plane.
For the other cases, currently there is no way to properly map memory
planes to format planes and aux planes due to the lack of defined ABI
for external multi-planar images.
This patch doesn't include some potentially supported cases like all
format planes mapping to a single memory plane, additional refactoring
is needed to workaround explicit base offset + ANV_OFFSET_IMPLICIT.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11281>
Commit 24342e499b changed how primitive
topology is handled on Gfx8+ but missed updating the Gfx7.x code.
As a result, tests which previously used topologies like PATCHLIST_3
instead started using bogus ones like LINESTRIP_ADJ. This caused a
GPU hangs in a bunch of Vulkan conformance tests involving tessellation.
This fixes those hangs.
Fixes: 24342e499b ("anv: fix dynamic primitive topology for tess")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11434>
This has two steps. First, for each range we look at the memory object
and see if it actually needs flushing before we start throwing CLFLUSH
instructions. Second, we look at the whole list of types on device
initialization and decide whether or not we need CLFLUSH at all. The
first part should speed up atom chips a bit since we're currently
CLFLUSHing everything even when we don't need to. The second isn't
needed on most of today's parts because we base it on !has_llc but it is
needed for discrete parts. It's also over-all cleaner.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11364>
HDC Pipeline Flush is the correct method for flushing HDC
pipeline on Gfx12+ HW. Continue using DC Flush for earlier HW.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9834>
Tile Cache flush flushes all Color/Depth values from L3 cache
to memory in Unified Cache mode. This is only required when
CPU access is required.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9834>
On Gfx12+, flushing tile cache ensures color/depth values are
globally visible, but that's expensive. Most operations only
need values to be GT-visible which can be achieved with depth
or rt flush. Remove a bunch of unnecessary Tile Cache flushes.
Fast clears and slow depth clears still require Tile Cache flush.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9834>
This is a gallium driver for the Intel gfx 4-7 GPUs.
It was initially cloned from the iris driver by Ilia Mirkin,
then I ported over large reams of code from i965 until it worked.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11146>
965 and the mesa st disagree on how vertex elements are ordered when
edgeflags are involved. 965 wants them in gl_vert_attrib order,
but gallium supplies the edgeflag as the last vertex element regardless.
This adds a flag which is enabled for gen4/5 to denote that the
edgeflag is at the end. When we reap 965 later we can resolve this
better.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11146>
Now that there's a common NIR pass, there's no point in us doing this in
the back-end anymore. In order to use this pass in i965, we do have to
make one tiny change. Gallium runs the pass after assigning input and
output locations and so needs the pass to respect those locations and
num_inputs. i965, however, runs it before any location assignment or
I/O lowering so we don't care. We do, however, need the pass to succeed
with num_inputs == 0 because we set that later.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11313>
With Yf and Ys tiling, everything is actually four dimensional because
we can have multiple depth or multisampled array slices in the same
tile. This commit just enhances the calculations so they can handle it.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11330>
We need to do this in order to handle Yf and Ys tiling because they use
a four-dimensional tile instead of laying everything out in two
dimensions.
v2 (Jason Ekstrand):
- Update functions added since v1:
- isl_surf_get_image_range_B_tile
- blorp_can_hiz_clear_depth
- get_image_offset_el
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11330>
This should drop the CPU overhead of processing buffers on SKL+ by
dropping some of the logic contained in anv_reloc_list_add() whenever we
have enough compile-time information to know we have softpin.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11236>
The relocation list currently serves two purposes. One is for
relocations on older non-softpin platforms. The second is to keep track
of driver-managed BOs which are used by the given command buffer. We
going to need a mechanism to add BOs to the command buffer without doing
a relocation into the batch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11236>
Whenever we have the GFX_VERx10 macro available, we can make use_softpin
a compile-time thing for everything but Broadwell and Cherryview. This
should save us some CPU cycles especially on SKL+.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11236>
Softpin was added to i915 in
commit 506a8e87d8d2746b9e9d2433503fe237c54e4750
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Dec 8 11:55:07 2015 +0000
drm/i915: Add soft-pinning API for execbuffer
which was included in Linux 4.5. It's been over 5 years so it's
probably reasonable to make it a hard requirement.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11236>
Mesh and Task shaders can use workgroup memory, so generalize its
handling in anv by moving it from anv_pipeline_compile_cs() to
anv_pipeline_lower_nir().
Update Pipeline Statistics accordingly.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11230>