This just removes a hole in the cmd_state and packs some bools
together.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we allocate attachments in the begin command buffer due to the
render pass continue bit, we were leaking them.
Since renderpasses inside a cmd buffer malloc/free these properly,
and set to NULL, we just need to call free at end.
Fixes a memory leak with multithreading demo.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
uint32_t data[MAX_SETS * 2] = {}; was getting executed before
the exit and took significant amounts of time. By having the
check outside the function, we skip the execution of the clear.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The vram_list linked list resulted in lots of pointer chasing.
Replacing this with an array instead improves descriptor set
allocation CPU usage by 3x at least (when also considering the free),
because it had to iterate through 300-400 sets on average.
Not a huge improvement as the pre-improvement CPU usage was only
about 2.3% in the busiest thread.
Reviewed-by: Dave Airlie <airlied@redhat.com>
In OpenCL/CUDA kernels, shared memory usage can be defined within the
kernel code. Those usage will only be picked up while parsing the
SPIR-V, during the translation phase of the program.
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
This workaround doesn't fix any of the piglit hangs we've seen
on CNL. But it might be fixing something we haven't tested yet.
V2: Remove the bits enabling Float blend optimization. It is
enabled through CACHE_MODE_SS register.
Update the comment.
Move gen10 if block on top of gen9 if block.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This optimization is enabled for previous generations too.
See Mesa commit c17e214a6b
On CNL this bit has been moved to CACHE_MODE_SS register.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This workaround doesn't fix any of the piglit hangs we've seen
on CNL. But it might be fixing something we haven't tested yet.
V2: Add the check for Post Sync Operation.
Update the workaround comment.
Use braces around if-else.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
There are few other (duplicate) workarounds which have similar recommendations:
WaFlushHangWhenNonPipelineStateAndMarkerStalled
WaCSStallBefore3DSamplePattern
WaPipeControlBefore3DStateSamplePattern
WaPipeControlBefore3DStateSamplePattern has some extra recommendations if
driver is using mid batch context restore. Ignoring it for now because We're
not doing mid-batch context restore in Mesa.
This workaround doesn't fix any of the piglit hangs we've seen
on CNL. But it might be fixing something we haven't tested yet.
V2: Use brw_load_register_imm32() to program CACHE_MODE_0.
Get rid of brw_flush_gpu_caches().
V3: Make the workaround helper functions static.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by :Nanley Chery <nanley.g.chery@intel.com>
Fixes reverted patch f03b7c9 by doing VMID reservation per
process and not per context.
Also updates required amdgpu libdrm version since the change
involved interface updates in amdgpu libdrm.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This will be used in the next commit to build up register programming.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The dynamic index of a vector (not array!) is lowered to a sequence of
conditional assignments. However, the interpolate_at_* expressions
require that the interpolant is an l-value of a shader input.
So instead of doing conditional assignments of parts of the shader input
and then interpolating that (which is nonsensical), we interpolate the
entire shader input and then do conditional assignments of the interpolated
result.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The intended rule has been clarified in GLSL 4.60, Section 8.13.2
(Interpolation Functions):
"For all of the interpolation functions, interpolant must be an l-value
from an in declaration; this can include a variable, a block or
structure member, an array element, or some combination of these.
Component selection operators (e.g., .xy) may be used when specifying
interpolant."
For members of interface blocks, var->data.must_be_shader_input must be
determined on-the-fly after lowering interface blocks, since we don't want
to disable varying packing for an entire block just because one input in it
is used in interpolateAt*.
v2: keep setting must_be_shader_input in ast_function (Ian)
v3: follow the relaxed rule of GLSL 4.60
v4: only apply the relaxed rules to desktop GL
(the ES WG decided that the relaxed rules may apply in a future version
but not retroactively; see also
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.*)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101378
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We need to validate some structs exist before we dirty the states, and
avoid the problem in some other places.
Fixes: e027935a7 ("st/mesa: don't update unrelated states in non-draw calls such as Clear")
These are produced by nir_lower_bitmap(), adding the missing derefence
would cause other issues that need to be hacked around such as
skipping sampler lowering and uniform location assignment, so this
change seems the correct way to go.
Fixes 194 piglit crashes on radeonsi using NIR.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This add support for the early depth/stencil property found
on image shaders.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support for emitting RAT instructions to the assembler.
RAT instructions are used to implement image accessors.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support to the assembler for the mark bit
on the export word1.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds support to the assembler for setting the valid
pixel mode on the CF clause.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These special ALU sources provide the shader engine,
simd and hw wave ids.
These are required for images support.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This should reduce the time where compute units are idle, mainly
for meta operations because they use a bunch of compute shaders.
This seems to have a really minor positive effect for Talos, at least.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Currently this ABI check only checks for es2 symbols, but es3.x symbols
are also exposed. Exposing these symbols is recommended by Khronos, and
as such the test should accept that as ABI.
see: https://lists.freedesktop.org/archives/mesa-stable/2016-June/004545.html
for the discussion about exposing these symbols
cc: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Because otherwise gbm will expose wayland symbols that it shouldn't.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-and-Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>