If gsprim_lds_size is larger than target_lds_size then gfx10_ngg_calculate_subgroup_info
will fail.
This commit adds a logic to try the multi-cycling in this case because it's
using less memory.
This fix glsl-1.50-gs-max-output when using NGG.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5401>
In case shader contains two equal macro defines, first one with trailing spaces
and the second one without.
`#define A 1 `
`#define A 1`
The parser crashes
Fixes: 0346ad3774 ("glsl: ignore trailing whitespace when define redefined")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5312>
In d11e4738a8, we started using a start_offset to allow us to
allocate pools where the base address isn't at the start of the pool.
This is useful for binding table pools which want to be relative to
surface state base address (more or less), among other things. However,
we had a bug where, if you have a negative offset, everything returned
to the pool would end up being returned to the "back" of the pool. This
isn't what we want for binding tables in the softpin world. This was
causing us to never actually re-use any binding table blocks. How this
passed CTS, I have no idea.
Closes: #3100
Fixes: d11e4738a8 "anv/allocator: Add a start_offset to anv_state_pool"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5395>
Even through the resouce requested has a BIND_SCANOUT or related tag,
this does not mean that we have a render-only driver.
This can trivially happen as one requests such resource from GBM, while
using the panfrost fd (and hence panfrost_dri.so)
Forward port of !3000
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Closes: #2664
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5410>
I think this case was just missing.
This fixes a bunch of 16-bit storage related CTS failures like
dEQP-VK.ssbo.phys.layout.single_basic_type.std430.u16vec4.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>
Use v_bfe to implement small bitsize conversions because the
compiler probably optimizes this better.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>
This is needed to lower some corner cases correctly,
in case the same operand occurs multiple times:
e.g. v0 = p_create_vector(v0[0:8], v0[0:8], v0[0:8], v0[0:8])
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>
As there are no SDWA instructions, we need to take care not to overwrite
the upper bits of other copy_operation's operands.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>
when shaders are created and destroyed in large numbers, the same pointers
get reused for different shaders, which can lead to bad lookups in the
program_cache hash table.
now each shader tracks its program usage to automatically remove itself from
that program in order to avoid hash collisions
fixesmesa/mesa#3053
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5315>
mpv is passing in a NULL destination_video_rect, which results in a
black screen when playing videos using VDPAU in some cases.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5386>
This seems to be working now, but it wasn't working before.
I don't know what fixed this. Tested on Raven and Navi14.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5402>
The app is allowed to never bind descriptor sets that are statically
unused by the pipeline, which would've caused a context fault since
CP_LOAD_STATE6 would try to load the descriptors that don't exist. Fix
this by not preloading descriptors from unused descriptor sets. We could
do more fine-grained accounting of which descriptors are used, but this
is enough to fix the problem.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5400>
Unreal Engine 4 has a bug in usage of glDrawRangeElements,
causing it to be called with a number of vertices in place
of "end" parameter (which specifies the maximum array index
contained in indices).
Since there is unknown amount of games affected and we
could not identify that a game is built with UE4 - we are
forced to make a blanket workaround, disregarding max_index
in range calculations. Fortunately all such calls look like:
glDrawRangeElements(GL_TRIANGLES, 0, 3, 3, ...);
So we are able to narrow down this workaround.
This was uncovered after b684030c3a
broke a bunch of UE4 games.
Cc: 20.1 <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2917
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5203>
Replace the various ad-hoc flushes that we've inserted, copied from
freedreno, etc. with a unified system that uses the user-supplied
information via vkCmdPipelineBarrier() and subpass dependencies.
There are a few notable differences in behavior:
- We now move setting RB_CCU_CNTL up a little in the gmem case, but
hopefully that won't matter too much. This matches what the Vulkan blob
does.
- We properly implement delayed setting of events, completing our
implementaton of events.
- Finally, of course, we should be a lot less flush-happy. We won't emit
useless CCU/cache flushes with multiple copies, renderpasses, etc. that
don't depend on each other, and also won't flush/invalidate the cache
around renderpasses unless we actually need to.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4964>
We just dropped the last user which actually cared about the seqno.
This never worked anyway, since the seqno was never reset between
multiple executions of the same command buffer. Turn the part of the
control buffer which used to track the seqno into a dummy dword, and
figure out automatically whether we need to include it. We will
implement seqnos again eventually, with timline semaphores, but that
will likely be totally different.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4964>
The Vulkan blob doesn't do this, and based on my understanding of how
the blob works this is unnecessary. CACHE_FLUSH is already serialized
against all 3d commands so you don't need to wait for rendering commands
to finish before issuing it, and the subsequent wfi + WAIT_FOR_ME will
cause the CP to wait for the CACHE_FLUSH to finish, so there's also no
need to wait for it to complete. The CACHE_INVALIDATE also seems
unnecessary, and also isn't done by the blob.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4964>
NVIDIA hardware doesn't have an equivilant to bfi, but we do already have
a lowering for bitfield_insert->bfi.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5373>
Note this changes the value of SP_GS_CTRL_REG0, by using FOUR_QUADS and
setting MERGEDREGS. ir3 expects MERGEDREGS, and using FOUR_QUADS instead
of TWO_QUADS doesn't seem to hurt.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5370>
Fixes failures with TU_DEBUG=forcebin and geometry shaders, for example:
dEQP-VK.binding_model.*geometry*
dEQP-VK.transform_feedback.simple.query*
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5370>
The current "ds = state->bs" seems broken, and the "vs = state->bs" is
unnecessary (already set above). Since it was added as part of a GS-related
patch, I think this is what was intended.
Note: tesselation disables GMEM rendering so we shouldn't have to worry
about hs/ds + binning interaction.
Fixes: 0eebedb619 ("freedreno/a6xx: Emit program state for GS")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5370>
Otherwise A6XX_TEX_SAMP_1_{MIN,MAX}_LOD silently overflows.
This fixes these tests:
dEQP-VK.texture.explicit_lod.2d.derivatives.*
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5375>
The combination must stop when we see a scoped barrier that have
execution scope, i.e. it has control barrier behavior. The code was
mistakenly looking at the wrong scope.
Fixes: 345b5847b4 ("nir: Replace the scoped_memory barrier by a scoped_barrier")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5365>
Just to clarify the missing break is intentional.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5365>
Fixes: 3ed2123d77 ("spirv: Use scoped barriers for SpvOpControlBarrier")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5365>
Fixes: 345b5847b4 ("nir: Replace the scoped_memory barrier by a scoped_barrier")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5365>
Fixes RADV max_waves reporting for GFX10
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5356>
Make sure to propagate the NON_UNIFORM access for UBO loads, so
that non-uniform loads are correctly lowered.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5311>
asin(x) is now implemented using a piecewise approximation, which
improves the precision for |x| < 0.5
Previously, we were using a polynomial approximation for both the
asin() and acos() functions. Unfortunately, for asin(), this polynomial
does not have enough precision to satisfy the Vulkan CTS requiremenents,
which define the asin() precision based on the precision of
atan2(x, sqrt(1.0 - x*x)). The piecewise approximation gives the needed
precision in the problematic range.
v2: Skip the piecewise approximation for acos
Closes: #1843
Acked-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3809>
16-bit types can be used with MUBUF on GFX6-GFX7.
Fixes: c3e0ba52a0 ("ac/nir: support 16-bit data in buffer_load_format opcodes")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5325>
Fixes:
error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘time_t’ {aka ‘long long int’}
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4279>
Fixes:
error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘time_t’ {aka ‘long long int’}
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4279>
Tessellation shader in llvmpipe depends on llvm coroutines support. So do not
advertise tessellation shader support in llvmpipe if GALLIVM_HAVE_CORO is FALSE.
This fixes assertion in LLVMTokenTypeInContext() running tessellation shader
tests with llvm version < 6.
Fixes: eb522717 "llvmpipe: add support for tessellation shaders"
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5366>
xml.etree.ElementTree getchildren was deprecated since Python 2.7 and
will be removed in Python 3.9.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5348>
Fixes the following building errors:
external/mesa/src/gallium/drivers/svga/svga_context.c:184: error: undefined reference to 'svga_init_ts_functions'
external/mesa/src/gallium/drivers/svga/svga_context.c💯 error: undefined reference to 'svga_cleanup_tcs_state'
out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_pipe_svga_intermediates/libmesa_pipe_svga.a(svga_state.o):svga_state.c:hw_draw_state_sm5: error: undefined reference to 'svga_hw_tes'
out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_pipe_svga_intermediates/libmesa_pipe_svga.a(svga_state.o):svga_state.c:hw_draw_state_sm5: error: undefined reference to 'svga_hw_tcs'
Fixes: ccb4ea5a "svga: Add GL4.1(compatibility profile) support in svga driver"
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5364>
The xml.etree.cElementTree module will be removed in Python 3.9. Since
Python 3.3 the xml.etree.cElementTree module has been deprecated, the
xml.etree.ElementTree module uses a fast implementation whenever
available.
Builds using Python 2.7 can still work but with the slower
implementation.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5349>
This reverts commit 794c239a99.
A kernel bug causes cached BOs to not be unmapped correctly,
triggering "bad page cache" kernel messages and causing short hangs.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5355>
The GLES blob on the p3a limits constlen to 512 between VS and FS across
a6xx gpu ids (615, 630, 640, and 650). Experimentally, exceeding that
limit in any one stage results in rendering corruption or GPU hangs
(though my most detailed testing had a loop limit in a uniform, so that
may the cause of the hang). Clamp the limit we use inside of a shader so
we don't exceed it within a stage.
This commit doesn't resovle limiting inter-stage. Experimentally, I've
found that I can push up to a total of ~768 vec4s between VS and FS on
a630, with or without uniform updates between each draw. We'll need to do
some shader key-based limiting of constlen at draw time to respect that
limit, but that's left for future work, and this commit is enough for the
google earth case that initiated this work.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
It turns out the GL uniforms file is larger than the hardware constant
file, so we need to limit how many UBOs we lower to constbuf loads. To do
actual UBO loads, we'll need to be able to upload UBO 0's pointer or
descriptor.
No difference on nohw 1 UBO update drawoverhead case (n=35).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
For now we never ask to set up UBO 0 as a real UBO, so this doesn't
trigger, but it gets us ready for handling the case where UBO 0 is too big
to be push constants in the HW.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
Comparing a blob trace using the feature to one not, the difference was
pretty obvious and in the spot you'd expect compared to alphaToCoverage.
The SP_ reg didn't have a corresponding bit set, though it also has an
alphaToCoverage.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5343>
Needed for RADV_DEBUG=shaderstats to dump ACO statistics.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5358>
The script was failing for me (python 3.8), not sure if this is a recent
python version break or not as I don't know how often people have been
running this script:
Processing ./gen9.xml... Traceback (most recent call last):
File "./gen_sort_tags.py", line 177, in <module>
main()
File "./gen_sort_tags.py", line 170, in main
genxml[:] = enums + sorted_structs.values() + instructions + registers
TypeError: can only concatenate list (not "odict_values") to list
Turning the odict into a list fixes it for me, and the resulting xml
file are identical to before :)
Fixes: 903e142f0d ("genxml: add a sorting script")
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5352>
CTS pass on Pitcairn (GFX6). This extension isn't really useful
without 8-bit/16-bit storage though but this is going to be exposed
soon.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5327>
Also do not allocate aux surfaces for multi-plane images. I may
have messed up and used plane 1 offsets for the other planes as well.
I cannot imagine that sharing aux surfaces between the planes will
work well.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5194>
Just in case someone links RADV with this old LLVM 8 and wants ACO.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5331>
This is a squash commit of in house performance fixes and misc bug fixes
for GL4.1 support.
Performance fixes:
* started using system memory for constant buffer to gain 3X performance boost with metro redux
Misc bug fixes:
* fixed usage of vertexid in shader
* added empty control point phase in hull shader for zero ouput control point
* misc shader signature fixes
* fixed clip_distance input declaration
* clearing the dirty bit for the surface while using direct map if surface is already flushed
and there is no pending primitive
This patch also uses SVGA_RETRY macro for commands retries. Part of it is already
used in previous patch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5317>
This patch is a squash commit of a very long in-house patch series.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5317>
This brings in the new types, enums and #defines for GL 4.1
features in the virtual device.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5317>
This is to check whether virtual hardware has SM5 support
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5317>
This patch adds the following tgsi utilities
* tgsi_dynamic_indexing: This utility flattens out the dyanamic indexing of constant buffers
* tgsi_vpos: This utility writes zeros to position at index 0 in vertex shader.
This utility can be used if there is no shader output in vertex shader
* util_make_tess_ctrl_passthrough_shader: This adds passthough tessellation control shader.
Input of passthrough tess ctrl shader is output of vertex shader
and output is input of tessellation eval shader.
If program has tessellation eval shader but no tessellation control shader,
this utility can be used to create passthrough tessellation control shader.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5317>
Technically we only have to do late-z in the alpha-test or discard case
if depth-write is enabled. If depth write is disabled, the depth read /
test / conditional-write interlock that we need to emulate is not a
problem, so we can still use early-z test.
There is a slightly weird case when there is no zsbuf attachment (see
dEQP-GLES31.functional.fbo.no_attachments.*) where the hw wants us to
use LATE_Z.. not entirely sure if this is an interaction with occlusion
query or just a pecularity of how the hw works when there is no depth
buffer.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5336>
We were remapping the bindings so the HW binding points were consecutive,
which there's no need for. Now that we don't shuffle, we can mostly drop
the dependency on the pipeline for this SDS.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5321>
the virgl CI code was using the noopt path and crashing with a
wierd can't select llvm.coro.subfn.addr error, turns out we have
to call the cleanup pass no matter what.
This enable a lot more virgl gles31 passes, but we have
to disable tessellation shaders as now they executed, they
crash due to missing OES_gpu_shader5, I should try and reenable
them when llvmpipe is further along
Fixes: d32690b43c ("gallivm: add coroutine pass manager support")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Elie Tournier <elie.tournier@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5320>
The FETCH_CNT field isn't actually the FETCH count. We don't have a
lot of data where it's different from DECODE_CNT, so there's not much
to go by. It could be number of VFD_DEST_CNTL or maybe DECODE_CNT for
binning. For now, setting both to number of DEST_CNTL gets Google
Earth working again.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5324>
We need to force src_invert to be in the right place even if we flip
when lowering an embedded->inline constant.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 449e5ded93 ("pan/mdg: Treat inot as a modifier")
Reported-by: Icecream95 <ixn@keemail.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5299>
For shaders where there's already a psiz-variable, we should rather
reuse it than create a second one. This can happen if a shader writes
gl_PointSize, but disables GL_PROGRAM_POINT_SIZE.
Fixes: 878c94288a ("nir: add lowering-pass for point-size mov")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5328>
We reuse DRM file descriptors internally. Therefore when we export a
GEM handle we must do so in the file descriptor used externally.
v2: Fix dmabuf leak
Fix GEM handle leaks by tracking exported handles
v3: Check os_same_file_description error (Michel)
Don't create multiple exports for a given GEM table
v4: Add WARN_ONCE (Ken)
v5: Remove blank line (Ian)
Remove unused field (Ian)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2882
Fixes: 4094558e86 ("i965: share buffer managers across screens")
Tested-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4861>
We reuse DRM file descriptors internally. Therefore when we export a
GEM handle we must do so in the file descriptor used externally.
This change also fixes a file descriptor leak of the FD given at
screen creation.
v2: Don't bother checking fd equals, they're always different
Fix dmabuf leak
Fix GEM handle leaks by tracking exported handles
v3: Check os_same_file_description error (Michel)
Don't create multiple exports for a given GEM table
v4: Add WARN_ONCE (Ken)
Rename external_fd to winsys_fd
v5: Remove export lock in favor of bufmgr's
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2882
Fixes: 7557f16059 ("iris: share buffer managers accross screens")
Tested-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4861>
We'll start using this field more for querying image properties.
Without it we run into a crash.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4861>
Looks like we don't have CI coverage for this (since deqp==GLES) but
alpha test is conceptually the same as frag shaders with discard, and
should be handled as such.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5298>
Now that we are doing a better job of managing LRZ, add support for the
EARLY_LRZ_LATE_Z mode. Since we properly disable LRZ write in cases
where we don't know a fragment's z value during the binning pass (or
when blend is enabled in a later draw, meaning we will need the earlier
fragment's color), we can enable a mode that keeps the early-lrz test
when the frag shader has kill/discard. This will only discard geometry
that is definitely not visible.
This is a pretty big win for games/benchmarks that have a lot of frag
shaders with kill/discard. More than 10% gain for gfxbench trex/mh and
40% gain for mh31.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5298>
In particular, properly detect reversal of depth-test direction.
With that we can remove a lot of cases where we were unnecessarily
invalidating LRZ, which was simply papering over the direction-
reversal issue in deqp.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5298>
And document the early-lrz-late-z mode.
Initially I thought this would be two bits to control early-lrz vs
early-z. But having early-z without early-lrz does not make sense,
and the way the values line up makes an enum fit better.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5298>
Unlike other conditions which prevent early-discard of fragments, kill
does not prevent early LRZ test. Split `has_kill` from `no_earlyz` so
we can take advantage of this.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5298>
With UBO access going through LDC, all memory access uses buffer based
io primitives. We can then advertise
PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR and
PIPE_CAP_DEVICE_RESET_STATUS_QUERY, which turn on GL_EXT_robustness,
GL_KHR_robust_buffer_access_behavior and GL_KHR_robustness.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5319>
Fix warning reported by Coverity Scan.
Wrong sizeof argument (SIZEOF_MISMATCH)
suspicious_sizeof: Passing argument 3544UL (sizeof
(vlVdpPresentationQueue)) to function calloc that returns a pointer of
type vlVdpPresentationQueueTarget * is suspicious because a multiple of
sizeof (vlVdpPresentationQueueTarget) /*16*/ is expected.
Fixes: 65fe0866ae ("vl: implemented a few functions and made stubs to get mplayer running")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3026
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5182>
The iris_blorp_exec() hook needs to be executed under a single
indivisible sync region, which means that in cases where we need to
emit a PIPE_CONTROL for a buffer barrier we won't be able to track the
subsequent commands separately from the previous commands, which will
prevent us from optimizing out subsequent PIPE_CONTROLs if we
encounter the same buffers again. In particular I've encountered this
situation in some SynMark test-cases which perform lots of BLORP
operations with the same buffer bound as both source and destination
(in order to generate mipmaps): In such a scenario if the source
requires flushing we'd also end up flushing for the destination
redundantly, even though a single PIPE_CONTROL would have been
sufficient.
This avoids a 4.5% FPS regression in SynMark OglHdrBloom and a 3.5%
FPS regression in SynMark OglMultithread.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
The big-hammer iris_flush_depth_and_render_caches() is largely
redundant whenever a format mismatch is detected from
iris_cache_flush_for_render(). There is no need to kick the depth,
sampler nor constant caches in that case.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
The render cache hash table is now *mostly* redundant with the more
general seqno matrix-based cache tracking mechanism. Most hash table
operations are now gone except for the format mismatch checks done in
iris_cache_flush_for_render(). Redundant code removed as a separate
patch for bisectability.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
The depth cache set is now redundant with the more general seqno
matrix-based cache tracking mechanism. Removed as a separate patch
for bisectability.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
Whenever iris_predraw_resolve_inputs() ends up doing a flush or
invalidate, we really want it to be on the same batch which is going
to consume the result. Any resolves should still be performed from
the render batch thanks to the previous patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
The resolves performed by this function are only expected to work from
the render batch, so make sure we use it independently of the batch
the caller wants to use. This function provides no synchronization
guarantees anyway, the caller is expected to insert any cache flushing
and synchronization required for the resolved surface to be visible to
the target batch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This takes advantage of the previously introduced cache tracking
infrastructure in order to define a multi-purpose barrier operation
that allows the caller to order memory operations with respect to
previous operations performed on the same buffer from any other cache
domain.
v2: Assorted CPU overhead micro-optimizations (Francisco).
v3: Use C99 designated initializers (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This introduces a batch synchronization boundary at every PIPE_CONTROL
command, and updates the cache coherency status tracked during batch
construction according to the specified control bits.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This introduces a representation of the cache coherency status of the
GPU at any point in the batch. This is done by defining a matrix C of
synchronization sequence numbers such that at any point of batch
construction, a memory operation from domain i introduced into the
batch is guaranteed to be ordered after any memory operation from
domain j in a previous batch section with seqno n if the following
condition holds:
C_i_j >= n
This allows us to efficiently determine whether additional flushing
and/or invalidation is required in order to access a buffer object
from some arbitrary domain.
Except for batch buffer reset which requires clearing the whole
matrix, all operations on the matrix are either O(n) or O(1) on the
number of caching domains (which is basically constant).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This is the main performance trade-off of this cache tracking
mechanism: In order for the seqno vector of buffer objects to be
accurate, they need to be marked as used again every time the batch is
split into a new synchronization section if they remain bound to the
pipeline. This can be achieved easily by re-using
iris_restore_render_saved_bos() and iris_restore_compute_saved_bos(),
which currently serve a similar purpose across batch buffer
boundaries.
The impact on Piglit drawoverhead results seems to be within a
standard deviation of the current results.
XXX - It might be possible to completely remove the current
iris_batch::contains_draw flag at a small additional performance
cost.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
The write flag is redundant since it can be inferred easily from the
iris_address::access domain. This allows the iris_address struct to
be laid out more efficiently in memory, leading to a measurable
improvement in several Piglit Drawoverhead test-cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
Probably the most annoying patch to review from the whole series --
Mark every buffer object use as accessed through some caching domain
with the sequence number of the current synchronization section of the
batch. The additional argument of iris_use_pinned_bo() makes sure I'd
have gotten a compile error if I had missed any buffer added to the
batch validation list.
There are only a few exceptions where a buffer is left untracked while
adding it to the validation list, justified below:
- Batch buffers: These are strictly read-only for the moment.
- BLORP buffer objects: Their seqnos are bumped manually at the end
of iris_blorp_exec() instead, in order to avoid plumbing domain
information through BLORP address combining.
- Scratch buffers: The contents of these are strictly thread-local.
- Shader images and SSBOs: Accesses of these buffers are explicitly
synchronized at the API level.
v2: Opt out of tracking more aggressively (Ken): In addition to the
above, surface states, binding tables, instructions and most
dynamic states are now left untracked, which means a *lot* more BO
uses marked IRIS_DOMAIN_NONE which need to be reviewed extremely
carefully, since the cache tracker won't be able to provide any
coherency guarantees for them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This delimits all batch operations which access memory between
iris_batch_sync_region_start() and iris_batch_sync_region_end() calls.
This makes sure that any buffer objects accessed within the region are
considered in use through the same caching domain until the end of the
region.
Adding any buffer to the batch validation list outside of a sync
region will lead to an assertion failure in a future commit, unless
the caller explicitly opted out of the cache tracking mechanism.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This introduces some minimalistic infrastructure which will be used in
order to partition the batch into a series of sections, each one with
a unique, monotonically-increasing sequence number. Section
boundaries will typically lie at points in the batch where the
execution and memory coherency status of some previous commands are
known, e.g. at batch buffer boundaries or PIPE_CONTROL commands.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
The purpose of this is to represent the cache coherency state of a
buffer as a vector of integers (AKA seqnos), one for each incoherent
caching domain of the GPU. A seqno will identify a single section of
a batch buffer uniquely across the whole pipe_screen (which means that
there will be no ambiguity about what context a given seqno belongs to
even if there are multiple threads accessing the same buffer in
parallel), and is guaranteed to be allocated in monotonically
increasing order within any given context. The iris_bo_bump_seqno()
helper is provided for marking the last update of a buffer from a
given caching domain in a lockless manner.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
Emulating them will be a rather annoying dance. Let's not worry about
this until further down the line when we have a better sence of how to
do handle them efficiently.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5290>
In this case, vs->writes_point_size is true as the VS writes
gl_PointSize, but panfrost_writes_points_size() is false as we are not
drawing points so the hardware doesn't process it. Thus the varying
descriptor is emitted but elements is never written. When the VS runs,
it will attempt to write to elements, a NULL pointer.
The behaviour is architecture-independent. On Midgard, the write
silently fails, hence why this bug was never noticed before. On Bifrost,
this raises an MMU fault.
The fix is to set the format to VARYING_DISCARD to ignore the write.
Noticed on Neverball.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5290>
It's like gl_FragCoord. Still not implemented. This unfortunately makes
point sprites a lot more complicated.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5290>
We're nearly out of dirty bits, and some patches pending review on
GitLab no longer apply due to that. Make room for them by splitting
off shader stage-specific bits into a separate stage_dirty mask.
An alternative would be to split compute-related bits into a separate
mask, but that would prevent the '<< stage' indexing done in various
parts of the driver from working.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5279>
This makes iris_batch_prepare_noop() return a boolean instead of
passing through the relevant set of dirty flags. It will make it
easier to change the representation of dirty flags.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5279>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5318>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5318>
Needed by the next patch, for c++ code which is more strict about
conversions between integers and enums.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5318>
We need to pass it thru to EARLY_Z and WRITES_GLOBAL instead of ignoring
and assuming respectively. Nontrivial performance fix.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5300>
Via the ES3.1 early-z testing force, I've confirmed this bit is e-z.
I've also confirmed e-z must be disabled for global writes, as expected.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5300>
16-bit subgroup ops are implemented with 32-bit instructions
on GFX6-GFX7.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5227>
SDWA is only GFX8+. Use v_mov_b32 since the upper 16 bits don't matter.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5227>
At the NIR level this is a second vector source of the first (only)
argument; at the BIR level this is a pair of scalars.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5307>
Instead of the vendored version. Only for blend shaders at the moment,
frag shaders fb_fetch has a lot more going on.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5285>
Now that we can handle destination sizes directly, this keeps us from
needing to chew through so many conversions.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5285>
Since these shaders are purely internal, the optimization criteria are a
bit different, so it's worth calling attention to this when dumping.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5285>