Commit Graph

75866 Commits

Author SHA1 Message Date
Matt Turner e734fb0326 i965: Inform compiler of variable range to silence warning.
Extends commit 6531ccb70 to silence the warning in release builds as
well.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-19 12:08:59 -08:00
Matt Turner a439788c59 glsl: Restore Mesa-style to shader_enums.c/h. 2016-01-19 12:08:59 -08:00
Christian König f3b067af86 st/va: fix motion adaptive deinterlacing
Signed-off-by: Christian König <christian.koenig@amd.com>
2016-01-19 17:28:38 +01:00
Nicolai Hähnle e6281a2850 util/u_pstipple.c: copy immediates during transformation
Apparently, nobody has combined stippling with a fragment shader
containing immediates in almost five years...

Fixes a bug in Kodi with radeonsi reported by Christian König.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-19 10:52:35 -05:00
Marta Lofstedt 2bcacc69b9 mesa: Move sanity check of BindVertexBuffer for OpenGL ES 3.1
Sanity check of BindVertexBuffer for OpenGL ES in
_mesa_handle_bind_buffer_gen breaks OpenGL ES 2 conformance.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93426
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2016-01-19 13:08:42 +01:00
Timothy Arceri d018619d7f glsl: fix interface block error message
Print the stream value not the pointer to the expression,
also use the unsigned format specifier.

Cc: 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-19 14:51:31 +11:00
Ilia Mirkin a31819cff8 nv50/ir: swap the least-ref'd source into src1 when both const/imm
The whole point of inlining sources is to reduce loads. We can end up in
a situation where one value is used a lot of times, and one value is
used only once per instruction. The once-per-instruction one is the one
that should get inlined, but with the previous algorithm, it was given
no preference.

This flips things around to preferring putting less-referenced values
into src1 which increases the likelihood of them being inlined.

While we're at it, adjust the heuristic to not treat 0 as an immediate,
as well as (effectively) check for situations where LIMMs can't be
loaded. All this yields improvements on nvc0:

total instructions in shared programs : 6261157 -> 6255985 (-0.08%)
total gprs used in shared programs    : 945082 -> 943417 (-0.18%)
total local used in shared programs   : 30372 -> 30288 (-0.28%)
total bytes used in shared programs   : 50089256 -> 50047880 (-0.08%)

                local        gpr       inst      bytes
    helped          21         822        3332        3332
      hurt           0         278         565         565

And more importantly avoids generating really bad code with SSBOs, where
we end up checking a lot of different values (usually immediates) against
the length.

On nv50 we get comparable results, and even improve packing (bytes went
down more than instructions):

total instructions in shared programs : 6346564 -> 6341277 (-0.08%)
total gprs used in shared programs    : 728719 -> 725131 (-0.49%)
total local used in shared programs   : 3552 -> 3552 (0.00%)
total bytes used in shared programs   : 43995688 -> 43932928 (-0.14%)

                local        gpr       inst      bytes
    helped           0        1380        3252        3774
      hurt           0         287        1710        1365

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-18 17:52:07 -05:00
Ilia Mirkin af686e7de3 st/mesa: restore the stObj's size if it was cleared out
An issue could still occur if the base level is set, but fixing that
would require a lot more logic.

This fixes the recently-failing texelFetch 3D tests because the mipmaps
were no longer being generated, which in turn caused the copying logic
to be hit, which in turn didn't work because of the broken
width/height/depth.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-18 17:52:07 -05:00
Rob Clark 805e080ba0 freedreno/a4xx: use smaller threadsize for more registers
Once we go past half of the "GPR" register file, it seems like we need
to run frag shader with smaller threadsize.  (The vertex shader already
runs at TWO_QUADS, which is the minimum.)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-18 16:58:25 -05:00
Rob Clark 6062941e4d freedreno: per-generation OUT_IB packet
Some a4xx firmware doesn't implement the "PFD" (prefetch-disabled)
version of the CP_INDIRECT_BUFFER packet.  So allow for PFD vs PFE per
generation.  Switch a3xx and a4xx over to using prefetch-enabled version
(which is also what blob does.. it seems only on a2xx we cannot use
PFE).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-18 16:58:25 -05:00
Emil Velikov c03f3dd0a5 gallium: bundle the compat header u_pwr8.h in the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-18 13:37:58 +02:00
Emil Velikov 7bc714509b mapi: include gl.xml in the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-18 13:37:58 +02:00
Emil Velikov a78e08e88f i965: adding missing headers to the dist tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-01-18 13:37:58 +02:00
Christian König eaf7ec9cfc st/va: add motion adaptive deinterlacing v2
v2: minor cleanup

Signed-off-by: Christian König <christian.koenig@amd.com>
2016-01-18 10:59:32 +01:00
Michel Dänzer ad20be1f30 gallium/radeon: Rename do_invalidate_resource to invalidate_buffer
And only call it from r600_invalidate_resource for buffer resources.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-18 17:39:37 +09:00
Michel Dänzer 0491dd1deb st/dri: Don't call invalidate_resource for NULL depth/stencil buffers
Fixes crash in 4 EGL piglit tests with radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-18 17:39:37 +09:00
Michel Dänzer a9ab7172a6 radeonsi: Avoid warning about LLVM generating R_0286D0_SPI_PS_INPUT_ADDR
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-01-18 17:39:37 +09:00
Michel Dänzer 4297259fc8 radeonsi: Print "LLVM emitted unknown config register" warning only once
Say "LLVM" instead of "Compiler" for clarity.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-18 17:39:37 +09:00
Oded Gabbay 679a654a77 llvmpipe: use vpkswss when dst is signed
This patch fixes a bug when building a pack instruction.

For POWER (altivec), in case the destination is signed and the
src width is 32, we need to use vpkswss. The original code used vpkuwus,
which emits an unsigned result.

This fixes the following piglit tests on ppc64le:
- spec@arb_color_buffer_float@gl_rgba8-drawpixels
- shaders@glsl-fs-fogscale

I've also corrected some coding style issues in the function.

v2: Returned else statements to vmware style

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-18 09:45:25 +02:00
Dave Airlie 119bef9543 glsl: fix subroutine lowering reusing actual parmaters
One of the oglconform tests was crashing here, and it was
due to not cloning the actual parameters before creating the
new call. This makes a call clone function that does the right
things to make sure we clone all the needed info, and points
the callee at it. (It differs from ->clone due to this).

this may fix https://bugs.freedesktop.org/show_bug.cgi?id=93722, I had this
patch in my cts fixes tree, but hadn't had time to make sure I liked it.

Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2016-01-18 15:02:34 +10:00
Timothy Arceri 9258d9f23d glsl: remove special case for detecting stream duplicates
Any duplicates in a single declaration will already fail the
generic duplicates test due to the explicit_stream flag being set.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-01-18 13:09:28 +11:00
Timothy Arceri eac2cece31 glsl: add missing explicit_stream flag to has_layout()
This will allow the ARB_shading_language_420pack rules in
glsl_parser.yy for catching duplicate layout qualifiers to be
triggered for the stream identifier rather than relying on the
code meant to catch duplicates within a single layout(...)

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-01-18 13:09:16 +11:00
Timothy Arceri 86677f1016 mesa: fix segfault in glUniformSubroutinesuiv()
From Section 7.9 (SUBROUTINE UNIFORM VARIABLES) of the OpenGL
4.5 Core spec:

   "The command

       void UniformSubroutinesuiv(enum shadertype, sizei count,
                                  const uint *indices);

   will load all active subroutine uniforms for shader stage
   shadertype with subroutine indices from indices, storing
   indices[i] into the uniform at location i. The indices for
   any locations between zero and the value of
   ACTIVE_SUBROUTINE_UNIFORM_LOCATIONS minus one which are not
   used will be ignored."

V2: simplify NULL check suggested by Jason.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "11.0 11.1" mesa-stable@lists.freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=93731
2016-01-18 11:53:24 +11:00
Timothy Arceri 50376e0c0e glsl: fix segfault linking subroutine uniform with explicit location
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "11.0 11.1" mesa-stable@lists.freedesktop.org
2016-01-18 11:30:45 +11:00
Ilia Mirkin 4ac1274caa gm107/ir: don't do indirect frag shader inputs on GM107
Apparently the IPA op decided to stop working with offsets. Need to
figure out if we need to do an AL2P situation or something similar. For
now just turn it back off.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-17 16:37:04 -05:00
Ilia Mirkin 3281ae96c8 tgsi: initialize Atomic field in tgsi_default_declaration
Spotted by Coverity.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-01-17 16:37:04 -05:00
Ilia Mirkin 5a81b48ad0 nvc0: bsp_bo can't be null
We already deref it earlier. And these are all allocated on load.
Spotted by Coverity.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-17 16:37:04 -05:00
Oded Gabbay 529aa8249a llvmpipe: fix arguments order given to vec_andc
This patch fixes a classic "confuse the enemy" bug.

_mm_andnot_si128 (SSE) and vec_andc (VMX) do the same operation, but the
arguments are opposite.

_mm_andnot_si128 performs "r = (~a) & b" while
vec_andc performs "r = a & (~b)"

To make sure this error won't return in another place, I added a wrapper
function, vec_andnot_si128, in u_pwr8.h, which makes the swap inside.

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-17 21:07:27 +02:00
Rob Clark 02ac91d717 freedreno/ir3: fix mad 3rd src delay calc
In fad158a0 ("freedreno/ir3: array rework") the src # (n) shifted by
one, but missed updating delay-slot calc.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-17 12:21:45 -05:00
Rob Clark 2a6ec1e061 freedreno/ir3: better array register allocation
Detect arrays which don't conflict with each other and allow overlapping
register allocation.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:23:52 -05:00
Rob Clark 6a33c5c0df freedreno/ir3: array offset can be negative
It at least happens with some piglit tests, like
$piglit/bin/vp-address-01

  VERT
  DCL IN[0]
  DCL IN[1]
  DCL OUT[0], POSITION
  DCL OUT[1], COLOR
  DCL CONST[0..7]
  DCL ADDR[0]
    0: ARL ADDR[0].x, IN[1].xxxx
    1: MOV_SAT OUT[1], CONST[ADDR[0].x-1]
    2: DP4 OUT[0].x, CONST[4], IN[0]
    3: DP4 OUT[0].y, CONST[5], IN[0]
    4: DP4 OUT[0].z, CONST[6], IN[0]
    5: DP4 OUT[0].w, CONST[7], IN[0]
    6: END

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:23:20 -05:00
Rob Clark ddede497b8 freedreno/ir3: workaround bug/feature
Seems like in certain cases, we cannot use c<a0.x+0> as the third src to
cat3 instructions.  This may be slightly conservative, we may only have
this restriction when the first src is also const.

This fixes, for example, +24/-0 of the variable-indexing piglit tests.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:22:43 -05:00
Rob Clark ebd3a1fc17 ttn: use writemask for store_var
Only user is freedreno, and after array-rework it can cope.  Avoids
generating loads for a store.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:21:52 -05:00
Rob Clark fad158a0e0 freedreno/ir3: array rework
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:21:08 -05:00
Rob Clark cc7ed34df9 freedreno/ir3: refactor/simplify cp
If we handle separately the special case of eliminating output mov
(which includes keeps and various other cases where we don't have a
consuming instruction's src register to collapse things into), we
can simplify the logic.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:20:46 -05:00
Rob Clark 680664dff9 freedreno/ir3: fix incorrect decoding of mov instructions
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:20:37 -05:00
Rob Clark 2809c87f90 freedreno/ir3: remove unused tgsi tokens ptr
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:18:59 -05:00
Rob Clark fc0d2f7e02 freedreno/ir3: bit of ra refactor
Shuffle things slightly, passing instr-data to ra_name() to reduce the
number of places where we need to add support for array names.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:18:47 -05:00
Rob Clark d430f443de freedreno/ir3: cosmetic de-indent
Collapse two nested if's into one to reduce indent level.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-01-16 14:18:33 -05:00
Rob Clark 6f0377d651 ttn: add missing writemask on store_output
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-01-16 13:35:44 -05:00
Rob Clark 683794fd60 nir/print: const_index is signed
Noticed this with $piglit/bin/vp-address-01

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-01-16 13:35:44 -05:00
Rob Clark 211b0644e6 nir: few missing struct names
nir.h is a bit inconsistent about 'typedef struct {} nir_foo' vs
'typedef struct nir_foo {} nir_foo'.  But missing struct name tags is
inconvenient when you need a fwd declaration without pulling in all
of nir.

So add missing struct name tag for nir_variable, and a couple other
spots where it would likely be useful.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
2016-01-16 13:35:43 -05:00
Ilia Mirkin 32a9fe013b nv50/ir: add saturate support on ex2
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-01-16 00:10:56 -05:00
Jeff Muizelaar e5fefe49f2 gallivm: avoid crashing in mod by 0 with llvmpipe
This adds code that is basically the same as the code in umod, udiv and idiv.
However, unlike idiv we return -1.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-01-16 03:36:29 +01:00
Kenneth Graunke d54a70aa18 glsl: Allow implicit int -> uint conversions for bitwise operators (&, ^, |).
The ARB has decided that implicit conversions should be performed for
bitwise operators in future language revisions.  Implementations of
current language revisions may or may not perform them.

This patch makes Mesa apply implicti conversions even on current
language versions.  Applications appear to expect this behavior,
and there's really no downside to doing so.

Fixes shader compilation in Shadow of Mordor.

Bugzilla: https://www.khronos.org/bugzilla/show_bug.cgi?id=1405
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
2016-01-15 17:53:44 -08:00
Jason Ekstrand 61b0cfd84e i965/fs: Always set channel 2 of texture headers in some stages
In the vertex and fragment stages, the hardware is nice to us and leaves
g0.2 zerod out for us so we can use it for headers.  However, in compute,
geometry, and tessellation stages, the hardware is not so nice.  In
particular, for compute shaders on BDW, the hardware places some debug bits
in 23:15.  As it happens, bit 15 is interpreted by the sampler as the alpha
channel mask.  This means that if you use a texturing instruction with a
header in a compute shader, you may randomly get the alpha channel
disabled.  Since channel masks affect the return length of the sampler
message, this can lead the GPU to expect a different mlen to the one you
specified in the shader and this, in turn, hangs your GPU.

Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-01-15 16:44:02 -08:00
Jason Ekstrand 9870f798be i965/fs/generator: Take an actual shader stage rather than a string
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-15 16:44:02 -08:00
Jason Ekstrand 0a6811207f i965/vec4: Use UW type for multiply into accumulator on GEN8+
BDW adds the following restriction: "When multiplying DW x DW, the dst
cannot be accumulator."

Cc: "11.1,11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-01-15 16:44:02 -08:00
Roland Scheidegger 03f66dfb4b llvmpipe: ditch additional ref counting for vertex/geometry sampler views
The cleaning up was quite a performance hog (making pipe_resource_reference
the number two in profilers on the vertex path, and 3rd overall, with its
cousin pipe_reference_described not far behind) if there were lots
of tiny draw calls (ipers). Now the reason was really that it was blindly
calling this for all potential shader views (so 32 each for vs and gs) even
though the app never touched a single one which could have been fixed,
however I can't come up with a good reason why we refcount these. We've got
references, of course, in the sampler views, which should be quite sufficient
as we do all vertex and geometry shader execution fully synchronous.
(Calling prepare_shader_sampling for all draw calls even if there were no
changes looks quite suboptimal too, but generally we don't really expect vs/gs
shader sampling to be used much with llvmpipe, and there's even an early exit
if there aren't any views to avoid the "null loop" albeit it's now no longer
always trying to loop through all 32 slots. Maybe improve another time...).
Of course, if we manage to make vertex loads run asynchronously some day,
we need references again, but adding that back would be the least of the
problems...
Also only set LP_NEW_SAMPLER_VIEW for fragment sampler views. Nothing on the
vertex side depends on it (I suppose we'd really wanted a separate flag in
any case).
(Good for a 3% improvement or so in ipers under the right conditions.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-15 20:13:45 +01:00
Roland Scheidegger 2f9a325b6a llvmpipe: fix "leaking" textures
This was not really a leak per se, but we were referencing the textures for
longer than intended. If textures were set via llvmpipe_set_sampler_views()
(for fs) and then picked up by lp_setup_set_fragment_sampler_views(), they
were referenced in the setup state. However, the only way to unreference them
was by replacing them with another texture, and not when the texture slot
was replaced with a NULL sampler view. (They were then further also referenced
by the scene too which might have additional minor side effects as we limit
the memory size which is allowed to be referenced by a scene in a rather crude
way.) Only setup destruction (at context destruction time) then finally would
get rid of the references.
Fix this by noting the number of textures the last time, and unreference
things if the new view is NULL (avoiding having to unreference things
always up to PIPE_MAX_SHADER_SAMPLER_VIEWS which would also have worked).
Found by code inspection, no test...

v2: rename var

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-01-15 20:13:45 +01:00