Just check across all contexts if a resource is referenced.
Fixes: 6bbbe15a78 ("Reinstate: llvmpipe: allow vertex processing and fragment processing in parallel")
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17702>
When a flush happens the per-context setup is used to hold the fence
for the last scene sent to the rasterizer. However when multiple
contexts are in use, this fence won't get returned to be blocked on.
Instead move the last fence to the rasterizer object, and return
that instead as it should be valid across contexts.
Fixes gtk4 bugs on llvmpipe since overlapping vertex/fragment.
Fixes: 6bbbe15a78 ("Reinstate: llvmpipe: allow vertex processing and fragment processing in parallel")
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17702>
Semaphore waits in a command buffer only affect the first jobs we execute
in each hardware queue since jobs in the same queue are serialized against
each other. Binning syncs in particular, only affect CL jobs.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17594>
We only want to cleant BCL barrier flags if we consume a BCL barrier.
For example, if the client records a barrier for an index buffer
it should apply to the next draw call that uses an index buffer
which may not be the current draw call but one coming after it.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17594>
if the src and dst for these operations can both be promoted, then the
entire operation can be promoted to potentially avoid splitting renderpasses
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17667>
previously this would opportunistically promote barriers to the unordered
cmdbuf only if a renderpass was active or there was no access, which
was the wrong approach
instead, opportunistically promote barriers to the unordered cmdbuf
any time it's possible to do so, which is when one of these conditions is true:
* when there is no access to the resource on the current cmdbuf
* when the only access to the resource is in the unordered cmdbuf
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17667>
Instead of having 2 VkMemoryType pointing to the same VkMemoryHeap, we
have each VkMemoryType with VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT (one
host visible, the other not) point to its own VkMemoryHeap. For the
local heap that is host visible, we'll use the
I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag at GEM BO creation.
When the smallbar uAPI is not available we fallback to a single heap
and do not use I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS.
v2: Handle probed_cpu_visible_size == probed_size (Matthew)
v3:
* Jordan: Use region info from devinfo
v4: Also make the vram host visible heap as local (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16739>
if this isn't supported, don't use rect-related sampling
cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17714>
The CTS does a lot of 1x1x1 compute shaders (all that stuff like
dEQP-GLES31.functional.shaders.builtin_functions.precision.mul.highp_compute.scalar)
which finish with store_ssbos. Instead of doing the invocation loop in
that case (which LLVM has to later unroll), just emit the single
invocation's store.
Fixes timeouts running
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.36, which does
a spectacular number of SSBO stores in a long 1x1x1 compute shader.
Reduces runtime of on llvmpipe from 66s to 29s locally, and virgl from
1:38 to 43s. virgl
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.22
goes down to 7 seconds.
Fixes: #6797
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17730>
Let's be slightly more defensive here. If a texture image doesn't have
an associated pipe_resource allocated, then render_texture() will pass
that along to _mesa_update_renderbuffer_surface(), which will crash on a
NULL pointer dereference. So, if there isn't a pipe_resource, then we
should just skip this altogteher.
Today, this isn't an issue, because each gl_texture_image always
allocates a pipe_resource up front. On a branch of mine, I prototyped
some improvements to the compressed texture fallback handling, where it
would defer resource allocation, examine the source image's block data,
and dynamically select a format based on that, then allocate it later.
With that prototype in place, we saw crashes the Android "My Talking
Tom" series of games, which appear to be attaching ASTC textures to a
framebuffer color attachment. That FBO would be incomplete anyway, as
ASTC textures aren't renderable, but we got into a situation where the
render-to-texture code was crashing due to the lack of pt before it
could properly signal that it was incomplete and bailing.
Technically, we don't need this now, but I figure that being defensive
won't hurt and this would probably save whoever encounters such an issue
in the future a bunch of frustrating debugging.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17508>
Fixes crash when capturing with RenderDoc.
From VK spec:
descriptorSetLayout [...] This parameter is ignored if templateType
is not VK_DESCRIPTOR_UPDATE_TEMPLATE_TYPE_DESCRIPTOR_SET.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17751>
If we don't append subpass->self_dep_info last, other __vk_append_struct()
calls will update its pNext chain which lives in the subpass which
should be treated as immutable. This is easily fixable by just making
it the last thing we append to the chain.
Fixes: 7e11cdc77a ("vulkan/render_pass: Pass sample locations to barriers")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17748>
No shader-db changes on any Intel platform
Fossil-db results:
Tiger Lake
Instructions in all programs: 156926440 -> 156926470 (+0.0%)
Instructions hurt: 15
Cycles in all programs: 7513099349 -> 7513099402 (+0.0%)
Cycles hurt: 15
Ice Lake and Skylake had similar results. (Ice Lake shown)
Cycles in all programs: 9099036492 -> 9099036489 (-0.0%)
Cycles helped: 1
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
The changes to fs_visitor::validate() helped track down a place where I
initially forgot to convert a message to the new sources layout. This
had caused a different validation failure in
dEQP-GLES31.functional.tessellation.tesscoord.triangles_equal_spacing,
but this were not detected until after SENDs were lowered.
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 19951145 -> 19951133 (<.01%)
instructions in affected programs: 2429 -> 2417 (-0.49%)
helped: 8 / HURT: 0
total cycles in shared programs: 858904152 -> 858862331 (<.01%)
cycles in affected programs: 5702652 -> 5660831 (-0.73%)
helped: 2138 / HURT: 1255
Broadwell
total cycles in shared programs: 904869459 -> 904835501 (<.01%)
cycles in affected programs: 7686744 -> 7652786 (-0.44%)
helped: 2861 / HURT: 2050
Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 141442369 -> 141442032 (-0.0%)
Instructions helped: 337
Cycles in all programs: 9099270231 -> 9099036492 (-0.0%)
Cycles helped: 40661
Cycles hurt: 28606
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17605>
The code assumed that if the source was d32s8 then the destination would
also be d32s8, in particular that depth_base_addr/stencil_base_addr
would also be filled out. Move the destination and source handling into
two different ifs with different conditions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17684>
This was missed when we added support for VK_KHR_depth_stencil_resolve.
There is a similar feature where the stencil aspect of a D24S8 can be
copied "tightly" in CopyImageToBuffer, but it used the texture swizzle
and so required the 3d path. To get it to work with the 2D path, which
is required for resolves, we have to instead use the A8_UNORM format,
which works for texture sampling even for tiled images. We also have to
reuse the pre-existing image views because subpass resolves work on
image views rather than images, whereas before the fixup was applied
while creating the image view. This means threading through the
corresponding "opposite" format through setup, src, and dst functions,
doing the fixup there (through some shared helpers), and then getting
every user to specify the right format. As a bonus, we no longer need to
force the 3d path for the CopyImageToBuffer and CopyBufferToImage
special cases.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17684>
primitive_param takes up 2 vec4's. Remove an align that I don't
understand.
The align upset
Test case 'dEQP-VK.subgroups.ballot_broadcast.graphics.subgroupbroadcast_vec4'..
deqp-vk: ../src/freedreno/ir3/ir3_nir.c:1039:
void ir3_setup_const_state(nir_shader *, struct ir3_shader_variant *, struct ir3_const_state *):
Assertion `constoff <= ir3_max_const(v)' failed.
with an older version (android11-tests-dev branch) of deqp-vk. This is
because ir3_nir_opt_preamble uses the function for the worst case but
the function fails to replace the align by the worst case.
No regression with dEQP-VK.*tess*.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17570>
This is more accurate because it's computed directly in spirv_to_nir and
takes even unused SampleID and SamplePos builtings into account.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17669>
if this is only going to do a sample0 resolve, the functionality is
equivalent to just copying the first sample, and in llvmpipe terms,
this just means doing a direct copy at offset=0
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17705>
RECT textures used to be required to be supported by drivers. But since
the state-tracker learned how to lower these to 2D textures, some
drivers no longer support them.
While we have lowering in place for this, lowering it involves some
needless overhead. So let's just use a 2D texture instead of a RECT
texture.
Because having two versions and switching between them is more
complicated than it needs to be, let's just always use a 2D texture.
Similarly, let's just always multiply the reciprocal here, so we don't
have to test for PIPE_CAP_TGSI_DIV first.
Cc: mesa-stable
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17707>
these states were erroneously assigned to the pre-rasterization stage
for pipeline libraries when they instead belong to the vertex input stage
cc: mesa-stable
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17738>
this adds a mangled variation on nir_inline_uniforms that enables inlining
from any uniform buffer in order to try inlining every possible load
if the shader is too small or the ssa_alloc delta from inlining is too small,
then inlining is disabled for that shader to avoid pointlessly churning
the same shaders for no gain
with certain types of shaders, the speedup is astronomical
before:
dEQP-VK.graphicsfuzz.cov-int-initialize-from-multiple-large-arrays (4750.76s)
after:
dEQP-VK.graphicsfuzz.cov-int-initialize-from-multiple-large-arrays (0.505s)
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17722>
Yes this is a round trip, but X_PresentPixmap is not itself a blocking
operation, it just instructs the server to do the next presentation at
some time. More importantly, if _we_ don't catch the presentation error,
xlib's error queue will, and the calling code is certainly not prepared
to handle errors from Present.
Forcing the round trip here is also a bit more correct semantically.
This is the end of the Vulkan client part of the present queue, and the
X_PresentPixmap request transfers the queue operation to the server, so
we should not return until we are sure the handoff has happened.
Fixes some flakiness with piglit@glx-visuals-* with zink+radv.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17685>
Checking for the extension isn't enough, we also need to check for the
feature-bit.
Fixes: 49a20e0981 ("zink: start a unified driver workarounds struct")
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17709>
Set cabac entropy mode if enabled.
v2: add extra check on radeon driver side, disable cabac if profile is
baseline or extended.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16113>
Get entropy mode and cabac init idc from VAAPI interface.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16113>
This moves some common code into shared locations, limits the scope of
some variables, switches some booleans for bools, and cleans up some
whitespace.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17653>
This makes it more obivous when what state changes, and they are always
just called in order.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17653>
some apps trigger the texture update path far in advance of when the
texture will be used, so don't crash and wait to do the update
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17665>
When "Depth of Field" is set to Ultra, a compute shader is emitted that
results in Hardware hangs when OpenGL > 4.3 is available.
If the option is enabled, the game will hang at the menu screen so that
it is no longer possible to simply change the option back. To avoid this
disable the extension for this game until the shader emission can be fixed.
Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6857
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17728>
With shader variants it may happen that we already translated a TGSI
shader for the current selector, so delete the old nir shader if we
already had one.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17729>
VK_KHR_format_feature_flags2 is core and implicitly enabled in 1.3.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17442>
Now there are two paths for push constants.
When it's range is under 128b, we can use shared consts.
When it's over 128b, we can instead do loading data through
regular path, which is same as the previous way.
Now we can satisfy emulations like vkd3d that requires 256b for
its root signatures and we think it fairly maps to push constants
rather than inline uniform blocks that requires one indirection.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15503>
Follow the way blob is doing for PushConstants though it supports only
128b, same as previous.
v1. Rename tu_push_constant_range.count into dwords to redue confusion.
( Danylo Piliaiev <dpiliaiev@igalia.com> )
v2. Enable shared constants only if necessary.
v3. Merge the two draw states TU_DRAW_STATE_SHADER_GEOM_CONST and
TU_DRAW_STATE_FS_CONST as shared constants are used.
Note that this leaves tu_push_constant_range in tu_shader so we could
use it again in the following patch.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15503>
Adds a shared consts base offset and a size of it(dwords) to ir3_compiler
since they might be depending on gpu generations. (Danylo Piliaiev <dpiliaiev@igalia.com> )
Adds a flag to present whether shared consts are enabled to
ir3_shader_options and then it sets to ir3_const_state when creating
an ir3 variant. Although this state is not per-shader state, this is
necessary when figureing out real constlens.
v1. Define a hw quirk for geometry shared const files and use it when
calculating const length.
v2. Don't hardcode when calculating a safe const length.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15503>
According to the observation on a630/a650/a660, max_const_pipeline has
to be 512 when all geometry stages are present. Otherwise a gpu hang
happens. Acoordingly maximum safe size for each stage should be under
(max_const_pipeline / 5 (stages)).
Only when VS and FS stages are present, the limit is 640.
v1. Align max_const_safe to 4 vec4's.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15503>
The asserts that check the pointer element type can't be used on LLVM >= 15.
Instead of using precompiler #if, use boolean shortcut in assert.
Reviewed-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17650>
This is needed for LLVM-15 opaque pointers.
The new variants taking the type are named with the suffix "2", using
the same naming pattern LLVM (e.g. LLVMBuildGEP2 vs. LLVMBuildGEP).
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17650>
LLVM 15 requires transition to opaque pointers; factorize a bit the cache
memthods to help this transition.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17650>
As LLVM 15 transitions to opaque pointers, we need to update
the deprecated methods dealing with non-opaque pointers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17650>
For platforms where there's no validator available, leave the field zero-initialized
to let the DXIL backend choose whatever target validator version it wants.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
For platforms where there's no validator available, leave the field zero-initialized
to let the DXIL backend choose whatever target validator version it wants.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
This version claims to support validator version 1.6, but doesn't
actually have the 1.6 changes (PSV v2, PSV resource v1, barycentrics).
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
This is a huge pain because it's an array, meaning that accessing
an entry in the array now depends on the validator version to use
the right element stride.
We could always just store the v1 and downconvert if needed... but
this isn't *that* bad that I felt I had to do it that way.
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
Instead of counting the number of UAV arrays, it counts the
number of actual UAVs declared. This is more correct, but we
need to do the same accounting to set the 64 UAVs flag.
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
This starts actually updating the always-read/never-written
masks while processing the shader. Note that we follow DXC's
lead here and treat "always read" as "sometimes read."
This isn't strictly required, but might help drivers out.
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
This version of the validator starts adding usage masks into
the DXIL, which then are expected to match the PSV and signature
data. The usage masks are "correct" meaning that the never-writes
mask no longer includes bits outside of components 0-3.
A future change will actually compute useful masks.
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
A future change will start computing component masks while
processing I/O instructions, and only having to compute
a mask for one component per instruction simplifies things.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
We're about to lower I/O to scalar, which means we'll end up with
multiple writes to position, and none of them has enough info to
fill in the blanks.
This causes a test that previously crashed on WARP (due to
StoreOutput with an undef not being handled) to fail more
gracefully - but that failure means that the test spends
forever just outputting errors, so explicitly skip it.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
First, preprocess the signatures, strictly based on the variables
in the nir shader. Then, later, after the actual shader contents
have been processed, we emit the metadata.
This lets shader processing rely on the pre-processed data (e.g.
the row -> ID mapping needed for large VS inputs) while also allowing
the signature data to rely on data gathered during the shader traversal
(e.g. which components are actually used).
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
Instead of using the short-lived semantic structure (that's used to
fill out the long-lived signature and PSV data), use the long-lived
ones. This is staging so we can hold off on emitting the metadata
until later.
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17603>
Instead of starting a new block when the kcache handling failed,
try to continue scheduling instructions until kcache allocation
fails for all ready instruction.
With that we avoid a CF split withing an LDS fetch/read group.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17678>
It's what the backend would do anyway, so let's do it in nir and
give the optimizer some chance to profit from possible improvements.
Fixes a bad shader with "The Raven Remastered"
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17678>
When lowering gl_Clipertex the driver_location may no longer correspond
to the array index, so fill the array by counting the array index up
according to outputs that need to be handled by the state setup.
Fixes: 3340c7ce35
r600/sfn: lower CLIPVERTEX to clip planes
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17678>
Fix defect reported by Coverity Scan.
Uninitialized pointer field (UNINIT_CTOR)
uninit_member: Non-static class member m_instr_factory is not
initialized in this constructor nor in any functions that it calls.
Fixes: 79ca456b48 ("r600/sfn: rewrite NIR backend")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17584>
PAN_MESA_DEBUG=overflow will place objects as close as possible to a
protected region at the end of the buffer, so that overflows segfault.
Caught the bugs in all four of the preceding commits.
v2: memset the BO to 0xbb to catch code expecting zeroed allocations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>
The xfb_base is a base index, it makes no sense to multiply that with
the number of streamout targets. Use addition instead to fix a buffer
overflow.
Fixes: 557633b142 ("panfrost: Suppress Bifrost prefetching")
Reported-by: Luc Ma <onion0709@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>
Otherwise the indirect draw shader can read uninitialised data for the
stride, and the position varying buffer may be outside the heap BO.
The next commit fixes a bug that masked this one.
Fixes: 2e6d94c198 ("panfrost: Add helpers to support indirect draws")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>
create_vertex_elements_state is sometimes called with a too large
num_elements argument, for example with util_blitter, which causes a
buffer overflow.
There is no documentation to forbid this practice, so don't rely on
so->num_elements being correct and instead use the vertex shader
attribute count, which matches the value used to allocate the
descriptors.
Use attributes_read_count rather than attribute_count because the
latter also includes images and PAN_VERTEX_ID/PAN_INSTANCE_ID.
Fixes: 76de3e691c ("panfrost: Merge attribute packing routines")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>
nr_images is the trigger for allocating double the number of buffers
for attributes. When there are no images, there is not always enough
space for ALIGN_POT(k, 2) to not move k out of bounds, so don't
execute the line in that case.
Fixes: dc85f65e05 ("panfrost: emit shader image attribute descriptors")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17447>
Like 90a8fb0355.
fossil-db results:
All Skylake and newer Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141442369 -> 141442363 (-0.0%)
Instructions helped: 1
Cycles in all programs: 9099270231 -> 9099270187 (-0.0%)
Cycles helped: 1
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17637>
In iris_create_surface, use the fill_surface_states helper function instead of
an open-coded solution for compressed resources.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17598>
Before this patch, we were leaking compressed resources in iris_create_surface.
Specifically, when we failed to create an uncompressed ISL surface and view for
a compressed resource, we didn't unreference the resource pointer we referenced
into the pipe_surface.
Fix this by delaying the pipe_surface initialization code to after attempting
to create the uncompressed surface and view.
Cc: 22.1 <mesa-stable>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17598>
Before this patch, we were leaking surface states in iris_create_surface.
Specifically, when we failed to create an uncompressed ISL surface and view for
a compressed resource, we didn't free surface states we allocated for it.
Fix this by attempting to create the uncompressed surface and view before we
allocate the surface states.
Cc: 22.1 <mesa-stable>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17598>
Now that the old state tracking code is removed, implementation details
no longer need to be leaked out of this single source file. Remove structs,
function declarations, 'd3d12_' prefixes, and add static when possible.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17688>
Most call sites for transitions will only apply transitions to one or two
resources, and don't need to use the bo set, where each call is guaranteed
to insert the bo, only to walk the set immediately afterwards. Instead, they
can just append the barriers to the dynarray directly and skip the bo set.
Draws and dispatches still use the append approach, to accumulate the full
set of state needed for each subresource for the case where a single
[sub]resource is bound to the pipeline in multiple places.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17688>
Uses a set of d3d12_bo on the context to track which bos are pending
a transition instead of an intrusive linked list, since the bo may
need to be pending on multiple contexts at once.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17688>
When a resource is destroyed, we'll need to let the contexts know.
This is guarded by the submit mutex, because we'll already be holding
that for at least one place where we want to iterate this list, and
it's low-frequency enough that re-using it is simpler than adding more
locks and creating confusing lock ordering.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17688>
This will host some code that's moving and ported to match style
with the rest of the driver, and other code that will be re-written.
Reviewed-by: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17688>
Otherwise we will mix and match mesa's custom cross slot packing
with arb_enhanced_layouts style packing and we won't correctly
handle the size of the vars needed for the mesa custom packing.
The code was working correctly if the shader interface had both
a matching input and output but when we only had one side of
the interface we were only marking a single slot location as
packed.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Fixes: e5122a5543 ("glsl: add a NIR based varying linker")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6853
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17550>
It does the same thing as align() from u_math.h, no need to
have a etnaviv specific version.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17695>