Commit Graph

87092 Commits

Author SHA1 Message Date
Rob Clark 2ba4c7e154 egl: un-fallthrough sync attr parsing
Doesn't work so well when you start having more than one possible
attrib.  Prep-work for next patch.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
2016-12-01 10:57:24 -08:00
Rob Clark cce04a4630 egl: initialize SyncCondition after attr parsing
Reduce the noise in the next patch.  For EGL_SYNC_NATIVE_FENCE_ANDROID
the sync condition is conditional on EGL_SYNC_NATIVE_FENCE_FD_ANDROID
attribute.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Tested-by: Chad Versace <chadversary@chromium.org>
2016-12-01 10:52:55 -08:00
Tim Rowley 05f35a868c tgsi: store writes_primid when scanning tgsi
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-01 11:33:01 -06:00
Ilia Mirkin 7c16552f8d mesa: only verify that enabled arrays have backing buffers
We were previously also verifying that no backing buffers were available
when an array wasn't enabled. This is has no basis in the spec, and it
causes GLupeN64 to fail as a result.

Fixes: c2e146f487 ("mesa: error out in indirect draw when vertex bindings mismatch")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-12-01 06:35:13 -05:00
Eric Anholt 51244859e3 vc4: Avoid false scheduling dependencies for LOAD_IMMs.
Noticed in shaders with branching, where we ended up scheduling delay
slots near the start of a block for the uniforms reset setup.

total instructions in shared programs: 93970 -> 93951 (-0.02%)
instructions in affected programs:     3117 -> 3098 (-0.61%)

3DMMES performance +0.423087% +/- 0.133521% (n=9,10)
2016-11-30 19:58:09 -08:00
Eric Anholt 6c34084d8e vc4: Try to schedule QIR instructions between writing to and reading math.
This helps us get the delay slots between SFU writes and reads filled.

total instructions in shared programs: 94494 -> 93970 (-0.55%)
instructions in affected programs:     59206 -> 58682 (-0.89%)

3DMMES performance +1.89967% +/- 0.157611% (n=10,9)
2016-11-30 19:58:09 -08:00
Eric Anholt d182740ac8 vc4: Improve interleaving of texture coordinates vs results.
The latency_between was trying to handle the delay between the coordinate
write ("before") and the corresponding sample read ("after"), but we were
handing in the two instructions swapped.

This meant that we tried to fit things between a tex_s and its *preceding*
tex_result.  This made us only interleave normal texture coordinates by
accident, and pessimized UBO reads by pushing the tex_result collection
earlier until there was nothing but it (and then its preceding coordinate
setup) left.

In addition to latency reduction, things end up packing better (probably
due to reduced live ranges of the texture results):

total instructions in shared programs: 98121 -> 94775 (-3.41%)
instructions in affected programs:     91196 -> 87850 (-3.67%)

3DMMES performance +1.15569% +/- 0.124714% (n=8,10)
2016-11-30 19:58:09 -08:00
Eric Anholt 1f9daf7cd1 vc4: Fix stray "." on no-op MUL packs.
This happened when the PM bit was set for R4 unpacks, where the MUL pack
was NOP.
2016-11-30 19:58:09 -08:00
Eric Anholt 98d7e87488 vc4: Allow merging instructions with SF set where the other writes NOP.
I'm not sure how I managed to write the SF merge code
(7d8b79f398) without allowing merges with
NOPs.  *Everything* we try to merge with will have a NOP on one or the
other side of the instruction, and that's why that commit showed no
benefit.

total instructions in shared programs: 99347 -> 95128 (-4.25%)
instructions in affected programs:     91906 -> 87687 (-4.59%)

3DMMES performance +2.57105% +/- 0.135276% (n=6,8)
2016-11-30 19:58:09 -08:00
Eric Anholt 8e5ec33f11 vc4: In a loop break/continue, jump if everyone has taken the path.
This should be a win for most loops, which tend to have uniform control
flow.

More importantly, it exposes important information to live variables: that
the break/continue here means that our jump target may have access to
values that were live on our input.  Previously, we were just setting the
exec mask and letting control flow fall through, so an intervening def
between the break and the end of the loop would appear to live variables
as if it screened off the variable, when it didn't actually.

Fixes a regression in glsl-vs-loop-redundant-condition.shader_test when a
perturbing of register allocation caused a live variable to get stomped.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
2016-11-30 19:58:09 -08:00
Ilia Mirkin fda1d0187d anv: expose support for VK_KHR_sampler_mirror_clamp_to_edge
This is already supported in genX_state.c, expose the extension string.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2016-11-30 20:49:04 -05:00
Jason Ekstrand 27433b26b1 anv/cmd_buffer: Actually use the stencil dimension
In an attempt to fix 3DSTATE_DEPTH_BUFFER for stencil-only cases, I
accidentally kept setting the SurfaceType to 2D in the stencil-only case
thanks to a copy+paste error.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2016-11-30 17:42:42 -08:00
Ilia Mirkin ef59cb0820 swr: add streamout buffer offset into pBuffer pointer
The buffer_size does not take the offset into account. Just add the
offset into the pointer which lines up the structures much better.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-30 20:36:03 -05:00
Ilia Mirkin 3d837a8871 swr: fix assertion for max number of so targets
The number has to be less than or equal to the max, not just less than.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-30 20:36:00 -05:00
Ilia Mirkin 02b2efa5eb swr: properly report max number of SO components
The components count the number of individual values, not the number of
slots.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-30 20:35:56 -05:00
Ilia Mirkin ab3bbe06ed swr: turn off queries around blits
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-30 20:35:53 -05:00
Ilia Mirkin d8ce8acdfa swr: don't advertise stream pause/resume
There is no support for resuming streamout. Furthermore, this also
controls glDrawTransformFeedback functionality which requires the same
ability to query how many primitives were sent out of TF.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-30 20:35:43 -05:00
Ilia Mirkin 632c11e857 swr: fix range computation for instanced client-side arrays
We need to take the instance divisor and number of instances into
account for instanced client-side arrays, rather than the vertex
parameters.

Loosely based on the comparable nvc0 logic.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-11-30 20:35:33 -05:00
Ilia Mirkin 3b736acf1b swr: [rasterizer memory] assert when trying to convert an unknown format
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-30 20:35:16 -05:00
Ilia Mirkin 763c015ce5 swr: remove warning about multi-layer surfaces
We now support clearing these, and actually rendering to multiple layers
would require GS support, which will fail in much more spectacular ways
for now. Once that is hooked up, there won't be anything else to do
here.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-30 20:35:06 -05:00
Ilia Mirkin a9d292f5bd swr: [rasterizer core] don't attempt to load another RTAI when storing
Since we don't pass a renderTargetArrayIndex in, and the current hot
tile may be for a different index, we may end up loading the RTAI=0 into
the hot tile for no reason.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-11-30 20:34:55 -05:00
Marek Olšák 77014a0ad3 radeonsi: document a CP DMA bug that doesn't need a workaround yet
This one is easy to miss, because it's not documented in any internal doc.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Marek Olšák bacf9b4e73 radeonsi: apply the double EVENT_WRITE_EOP workaround to VI as well
Internal docs don't mention it, but they also don't mention that the bug
has been fixed (like other CI bugs fixed in VI).

Vulkan does this too.

v2: also update r600_gfx_write_fence_dwords

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
2016-12-01 02:16:51 +01:00
Marek Olšák a816c7fe07 radeonsi: add a tess+GS hang workaround for VI dGPUs
ported from Vulkan

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Marek Olšák da7453666a radeonsi: don't apply the Z export bug workaround to Hainan
not needed

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Marek Olšák 78c4528ae7 radeonsi: apply a tessellation bug workaround for SI
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Marek Olšák 72e46c9889 radeonsi: apply a TC L1 write corruption workaround for SI
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Marek Olšák 72d48fcd8e radeonsi: apply a multi-wave workgroup SPI bug workaround to affected CIK chips
All codepaths are handled except for clover.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Marek Olšák ec36c63b4f radeonsi: consolidate max-work-group-size computation
The next commit will need this.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-01 02:16:51 +01:00
Timothy Arceri 966567aa12 mesa: reset linked_stages bitmask when re-linking
34953f8907 added this bitmask but it wasn't being reset when
a program was relinked. If a stage was removed from the new
program then it could case a crash as we expect the linked shader
for that stage to not be null.

Fixes crashes in:
ESEXT-CTS.tessellation_shader.single.xfb_captures_data_from_correct_stage
ES31-CTS.core.tessellation_shader.single.xfb_captures_data_from_correct_stage

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98917
2016-12-01 10:24:16 +11:00
Rob Clark 45eef9af03 freedreno/a5xx: fix negative branches
Looks like immed branch offset size increased again.. making what we
think is a small negative number look to hw like a huge positive number.
And things go badly when shader tries to jump to hyperspace.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 17:32:54 -05:00
Rob Clark ef30e91fe6 freedreno: fix android build with a5xx
Android doesn't build all the files that normal linux/autotools build
does (mainly standalond ir3_compiler).. but possibly we should pull
C_SOURCES + aNxx_SOURCES into a single variable picked up by both
Android.mk and Makefile.am?  (Suggested by Rob H.)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 17:32:54 -05:00
Rob Clark 8b6f8f2576 freedreno/a5xx: fix discard
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 17:32:54 -05:00
Ville Syrjälä 676c0cf287 anv: Prefer in-tree headers to out-of-tree headers
Set the include paths to consider in-tree headers before out-of-tree
headers.

Avoids the build failing due to stale headers being present in
$prefix. Previosuly 'make -ki install' or something similar was required
to update the out-of-tree headers to allow the build to succeed.

Also avoids having to rebuild the entire thing after every 'make
install'.

Cc: Rob Clark <robdclark@gmail.com>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-11-30 20:01:00 +02:00
Rob Clark 946cf4eb68 freedreno/a5xx: initial support
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 12:35:49 -05:00
Rob Clark fcba3046e1 freedreno: update generated headers
Pull in a5xx

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 12:25:48 -05:00
Rob Clark 8c56789f60 freedreno: make gmem tile size alignment configurable
a5xx seems to prefer 64 pixel alignment, in at least some cases.  Make
this configurable per generation.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 12:25:48 -05:00
Rob Clark 728e2c4d38 freedreno/ir3: don't offset inloc by 8
On a3xx/a4xx, the SP_VS_VPC_DST_REG.OUTLOCn is offset by 8, so we used
to add this offset into fs->inputs[n].inloc.  But a5xx drops this extra
offset-by-8.  So instead make inloc zero based and add the offset when
we emit OUTLOCn values (for the gen's that need the offset).

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 12:25:48 -05:00
Rob Clark 7a59157287 freedreno/a3xx: use new shader linkage helper
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 12:25:48 -05:00
Rob Clark 98c83b5d1c freedreno/a4xx: use new shader linkage helper
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 12:25:48 -05:00
Rob Clark 1be5670c8d freedreno/ir3: add new helper for shader linkage
Helps simplify things on a5xx, where pos/psize get added to the vs-out
map.  And anyways, simplifies a3xx and a4xx.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-30 12:25:48 -05:00
Nicolai Hähnle f60374aa68 st/mesa: skip lower_output_reads when possible
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-30 09:10:02 +01:00
Nicolai Hähnle 0a58b258ca st/glsl_to_tgsi: swizzle PROGRAM_OUTPUTs correctly in src_register translation
This is required for reading directly from fragment shader stencil and depth
outputs.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-30 09:09:59 +01:00
Nicolai Hähnle 611166b8ed gallium: add PIPE_CAP_TGSI_CAN_READ_OUTPUTS
Drivers that support this benefit by saving one lowering pass in the
GLSL-to-TGSI conversion.

radeonsi already supports this because all outputs are stored in temporary
variables before the export (except for TCS outputs, which have always
been readable in TGSI anyway due to their special semantics).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-11-30 09:09:50 +01:00
Bas Nieuwenhuizen abc887faa1 ac/nir: Fix out of bounds array access.
With nir_intrinsic_ssbo_atomic_comp_swap we run out of params.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-11-30 07:09:38 +01:00
Kristian H. Kristensen d3d7cab812 aubinator: Add support for enum types
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen 7fc659d8d5 intel/genxml: Fix ksp for INTERFACE_DESCRIPTOR_DATA
This one was split across two dwords as "Kernel Start Pointer" and
"Kernel Start Pointer High", which looks like it works when the driver
only accesses "Kernel Start Pointer". This breaks, of course, with BO
offsets > 4G.

Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen 99e573b4e0 intel/genxml: Use enum 3D_Logic_Op_Function where applicable
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen 374d19ac00 intel/genxml: Use blend function and factor enums where applicable
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00
Kristian H. Kristensen 09fe8ad010 intel/genxml: Use enum 3D_Vertex_Component_Control where applicable
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-11-29 22:02:49 -08:00