Doesn't work so well when you start having more than one possible
attrib. Prep-work for next patch.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
Reduce the noise in the next patch. For EGL_SYNC_NATIVE_FENCE_ANDROID
the sync condition is conditional on EGL_SYNC_NATIVE_FENCE_FD_ANDROID
attribute.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
We were previously also verifying that no backing buffers were available
when an array wasn't enabled. This is has no basis in the spec, and it
causes GLupeN64 to fail as a result.
Fixes: c2e146f487 ("mesa: error out in indirect draw when vertex bindings mismatch")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Noticed in shaders with branching, where we ended up scheduling delay
slots near the start of a block for the uniforms reset setup.
total instructions in shared programs: 93970 -> 93951 (-0.02%)
instructions in affected programs: 3117 -> 3098 (-0.61%)
3DMMES performance +0.423087% +/- 0.133521% (n=9,10)
This helps us get the delay slots between SFU writes and reads filled.
total instructions in shared programs: 94494 -> 93970 (-0.55%)
instructions in affected programs: 59206 -> 58682 (-0.89%)
3DMMES performance +1.89967% +/- 0.157611% (n=10,9)
The latency_between was trying to handle the delay between the coordinate
write ("before") and the corresponding sample read ("after"), but we were
handing in the two instructions swapped.
This meant that we tried to fit things between a tex_s and its *preceding*
tex_result. This made us only interleave normal texture coordinates by
accident, and pessimized UBO reads by pushing the tex_result collection
earlier until there was nothing but it (and then its preceding coordinate
setup) left.
In addition to latency reduction, things end up packing better (probably
due to reduced live ranges of the texture results):
total instructions in shared programs: 98121 -> 94775 (-3.41%)
instructions in affected programs: 91196 -> 87850 (-3.67%)
3DMMES performance +1.15569% +/- 0.124714% (n=8,10)
I'm not sure how I managed to write the SF merge code
(7d8b79f398) without allowing merges with
NOPs. *Everything* we try to merge with will have a NOP on one or the
other side of the instruction, and that's why that commit showed no
benefit.
total instructions in shared programs: 99347 -> 95128 (-4.25%)
instructions in affected programs: 91906 -> 87687 (-4.59%)
3DMMES performance +2.57105% +/- 0.135276% (n=6,8)
This should be a win for most loops, which tend to have uniform control
flow.
More importantly, it exposes important information to live variables: that
the break/continue here means that our jump target may have access to
values that were live on our input. Previously, we were just setting the
exec mask and letting control flow fall through, so an intervening def
between the break and the end of the loop would appear to live variables
as if it screened off the variable, when it didn't actually.
Fixes a regression in glsl-vs-loop-redundant-condition.shader_test when a
perturbing of register allocation caused a live variable to get stomped.
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
This is already supported in genX_state.c, expose the extension string.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
In an attempt to fix 3DSTATE_DEPTH_BUFFER for stencil-only cases, I
accidentally kept setting the SurfaceType to 2D in the stencil-only case
thanks to a copy+paste error.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The buffer_size does not take the offset into account. Just add the
offset into the pointer which lines up the structures much better.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
The number has to be less than or equal to the max, not just less than.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
The components count the number of individual values, not the number of
slots.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
There is no support for resuming streamout. Furthermore, this also
controls glDrawTransformFeedback functionality which requires the same
ability to query how many primitives were sent out of TF.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
We need to take the instance divisor and number of instances into
account for instanced client-side arrays, rather than the vertex
parameters.
Loosely based on the comparable nvc0 logic.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
We now support clearing these, and actually rendering to multiple layers
would require GS support, which will fail in much more spectacular ways
for now. Once that is hooked up, there won't be anything else to do
here.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Since we don't pass a renderTargetArrayIndex in, and the current hot
tile may be for a different index, we may end up loading the RTAI=0 into
the hot tile for no reason.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Internal docs don't mention it, but they also don't mention that the bug
has been fixed (like other CI bugs fixed in VI).
Vulkan does this too.
v2: also update r600_gfx_write_fence_dwords
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
34953f8907 added this bitmask but it wasn't being reset when
a program was relinked. If a stage was removed from the new
program then it could case a crash as we expect the linked shader
for that stage to not be null.
Fixes crashes in:
ESEXT-CTS.tessellation_shader.single.xfb_captures_data_from_correct_stage
ES31-CTS.core.tessellation_shader.single.xfb_captures_data_from_correct_stage
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98917
Looks like immed branch offset size increased again.. making what we
think is a small negative number look to hw like a huge positive number.
And things go badly when shader tries to jump to hyperspace.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Android doesn't build all the files that normal linux/autotools build
does (mainly standalond ir3_compiler).. but possibly we should pull
C_SOURCES + aNxx_SOURCES into a single variable picked up by both
Android.mk and Makefile.am? (Suggested by Rob H.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Set the include paths to consider in-tree headers before out-of-tree
headers.
Avoids the build failing due to stale headers being present in
$prefix. Previosuly 'make -ki install' or something similar was required
to update the out-of-tree headers to allow the build to succeed.
Also avoids having to rebuild the entire thing after every 'make
install'.
Cc: Rob Clark <robdclark@gmail.com>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
On a3xx/a4xx, the SP_VS_VPC_DST_REG.OUTLOCn is offset by 8, so we used
to add this offset into fs->inputs[n].inloc. But a5xx drops this extra
offset-by-8. So instead make inloc zero based and add the offset when
we emit OUTLOCn values (for the gen's that need the offset).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Helps simplify things on a5xx, where pos/psize get added to the vs-out
map. And anyways, simplifies a3xx and a4xx.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Drivers that support this benefit by saving one lowering pass in the
GLSL-to-TGSI conversion.
radeonsi already supports this because all outputs are stored in temporary
variables before the export (except for TCS outputs, which have always
been readable in TGSI anyway due to their special semantics).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
With nir_intrinsic_ssbo_atomic_comp_swap we run out of params.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This one was split across two dwords as "Kernel Start Pointer" and
"Kernel Start Pointer High", which looks like it works when the driver
only accesses "Kernel Start Pointer". This breaks, of course, with BO
offsets > 4G.
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>