Commit Graph

66990 Commits

Author SHA1 Message Date
Roland Scheidegger fea5c2640b draw: (trivial): remove double semicolon 2014-12-09 00:10:41 +01:00
Abdiel Janulgue 49e0431211 st/mesa: For vertex shaders, don't emit saturate when SM 3.0 is unsupported
There is a bug in the current lowering pass implementation where we lower saturate
to clamp only for vertex shaders on drivers supporting SM 3.0. The correct behavior
is to actually lower to clamp only when we don't support saturate which happens
on drivers that don't support SM 3.0

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-12-08 20:14:26 +02:00
Abdiel Janulgue 4ea8c8d56c glsl: Don't optimize min/max into saturate when EmitNoSat is set
v3: Fix multi-line comment format (Ian)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-12-08 20:14:17 +02:00
Abdiel Janulgue 39f7b72428 ir_to_mesa: Remove sat to clamp lowering pass
Fixes an infinite loop in swrast where the lowering pass unpacks saturate into
clamp but the opt_algebraic pass tries to do the opposite.

v3 (Ian):
This is a revert of commit cfa8c1cb "ir_to_mesa: lower ir_unop_saturate" on
the ir_to_mesa.cpp portion. prog_execute.c can handle saturates in vertex
shaders, so classic swrast shouldn't need this lowering pass.

Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83463
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
2014-12-08 20:14:10 +02:00
Michael Forney 5d64da401c loader: Add missing EXPAT_CFLAGS to libloader.la CPPFLAGS
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-08 08:50:27 -08:00
Matt Turner f65200ccc9 i965: Remove default from brw_instruction_name switch to catch missing names.
The case-range extension is available in clang and gcc at least back to
3.4.0.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
2014-12-08 08:50:26 -08:00
Matt Turner b6a71cbb64 i965: Add missing opcode names.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
2014-12-08 08:50:26 -08:00
Matt Turner 6383e206c0 i965: Add opcode names for set_omask and set_sample_id.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-12-08 08:50:26 -08:00
Chad Versace 7e8ba77c49 egl: Expose EGL_KHR_get_all_proc_addresses and its client extension
Mesa already implements the behavior of EGL_KHR_get_all_proc_addresses
and EGL_KHR_client_get_all_proc_addresses. This patch just exposes the
extension strings.

See: https://www.khronos.org/registry/egl/extensions/KHR/EGL_KHR_get_all_proc_addresses.txt
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2014-12-07 20:58:25 -08:00
Emil Velikov 0b6e0aa5ae docs: add news item and link release notes for mesa 10.3.5
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-12-07 19:22:11 +00:00
Emil Velikov 7409ad5147 docs: Add sha256 sums for the 10.3.5 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 1ba2029184d3e7b013e3fc322e80a761604495d4)
2014-12-07 19:22:11 +00:00
Emil Velikov 8d235e0c70 Add release notes for the 10.3.5 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit c90b0db1aef8f439b52b38ad58aac4ca202232a7)
2014-12-07 19:22:11 +00:00
Ilia Mirkin 043b79461f freedreno/a2xx: silence warning about missing DEPTH32X
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:53 -05:00
Ilia Mirkin c416f49ebe freedreno/a3xx: handle index_bias (i.e. base_vertex)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:50 -05:00
Ilia Mirkin b38b40d7bb freedreno/a3xx: add bgr565 texturing and rendering
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:47 -05:00
Ilia Mirkin e02ed16cb5 freedreno/a3xx: add support for SRGB render targets
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:43 -05:00
Ilia Mirkin 39a7c049d3 freedreno/a3xx: output RGBA16_FLOAT from fs for certain outputs
Fixes R11G11B10F rendering, and is required for SRGB format support.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:40 -05:00
Ilia Mirkin 3674c76edf freedreno/a3xx: re-enable rgb10_a2 render targets
There were previously regressions regarding border colors, which the
updated swizzle logic resolves.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:37 -05:00
Ilia Mirkin fc94b2c2a0 freedreno/a3xx: fix border color swizzle to match texture format desc
This is a hack since it uses the texture information together with the
sampler, but I don't see a better way to do it. In OpenGL, there is a
1:1 correspondence.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:33 -05:00
Ilia Mirkin 97fef2db5c freedreno/a3xx: fix alpha-blending on RGBX formats
Expert debugging assistance provided by Chris Forbes.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2014-12-06 18:18:20 -05:00
Chris Forbes 6b01969345 glcpp: Fix `can not` to `cannot` in error message
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
2014-12-07 11:49:28 +13:00
Chris Forbes b49a069bd3 glcpp: Disallow undefining GL_* builtin macros.
Fixes the piglit test: spec/glsl-es-3.00/compiler/undef-GL_ES.vert

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-07 11:47:45 +13:00
Chris Forbes ed56c16820 i965/Gen6-7: Fix point sprites with PolygonMode(GL_POINT)
This was an oversight in the original patch. When PolygonMode is
used, then front faces, back faces, or both may be rendered as
points and are affected by point sprite state.

Note that SNB/IVB can't actually be fully conformant here, for
a legacy context -- we don't have separate sets of pointsprite
enables for front and back faces. Haswell ignores pointsprite
state correctly in hardware for non-point rasterization, so can
do this correctly, but it doesn't seem worth it.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86764
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-07 11:46:42 +13:00
Chris Forbes 092c73a7c3 i965: Fix regs read for FS_OPCODE_INTERP_PER_SLOT_OFFSET
Dead code elimination was eating the Y offset.

Fixes the piglit test:
spec/ARB_gpu_shader5/arb_gpu_shader5-interpolateAtOffset-nonconst

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-12-07 10:29:26 +13:00
Chris Forbes 680f72d6f2 i965: Add opcode names for FS interpolation opcodes
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-12-07 10:29:20 +13:00
Roland Scheidegger d8da6decea mesa/st: don't use CMP / I2F for conditional assignments with native integers
The original idea was to optimize away the condition by integrating it directly
into the CMP instruction. However, with native integers this requires an extra
I2F instruction. It is also fishy because the negation used didn't really honor
ieee754 float comparison rules, not to mention the CMP instruction itself
(being pretty much a legacy instruction) doesn't really have defined special
float value behavior in any case.
So, use UCMP and adjust the code trying to optimize the condition away
accordingly (I have absolutely no idea if such conditions are actually hit
or would be translated away somewhere else already).

v2: cosmetic changes

No piglit regressions on llvmpipe.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2014-12-06 18:03:25 +01:00
Roland Scheidegger 6f2cf5f3d0 llvmpipe: decrease MAX_SCENES from 2 to 1
Multiple scenes per context are meant to be used so a new scene can be built
while another one is processed in rasterization. However, quite surprisingly,
this does not actually work (and according to git log, possibly never did,
though maybe it did at some point further back (5 years+) but was buggy)
because we always wait immediately on the rasterizer to finish the scene when
contexts (and hence setup/scene) is flushed. This means when we try to get
an empty scene later, any old one is already empty again.
Thus using multiple scenes is just a waste of memory (not too bad, since the
additional scenes are guaranteed to be empty, which means their size ought to
be one data block (64kB) plus the size of some structs), without actually
really doing anything. (There is also quite some code for the whole concept of
multiple scenes which doesn't really do much in practice, but keep it hoping
the wait-on-scene-flush can be fixed some day.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-12-06 18:03:18 +01:00
Roland Scheidegger 1b6db3593e draw: use the prim type from prim_info not emit in passthrough emit
The prim assembler may change the prim type when injecting prim ids now,
which isn't reflected by what's stored in emit.
This looks brittle and potentially dangerous (it is not obvious if such prim
type changes are really supported by pt emit, the prim type is actually also
set in prepare which would then be different).

This fixes piglit primitive-id-no-gs-first-vertex.shader_test.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-12-06 18:03:11 +01:00
Roland Scheidegger fe86415beb draw: use correct output prim for non-adjacent topologies in prim assembler.
The decomposition done in the prim assembler will turn tri fans into tris,
but this wasn't reflected in the output prim type. Meaning with a tri fan
with 6 verts input, the output was a tri fan with 12 vertices instead of a
tri list with 12 vertices (not as bad as it sounds, since the additional tris
created would all be degenerate since they'd all have two times vertex zero
but still bogus).
This is because the prim assembler is used if either the input topology is
something with adjacency, or if prim id needs to be injected, and for the
latter case topologies without adjacency can be converted to basic ones.
Unfortunately decomposition here for inserting prim ids is necessary, at
least for the indexed case where we can't just insert the prim id at the
right place depending on provoking vertex.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-12-06 18:03:05 +01:00
Roland Scheidegger 3fdbad1142 draw: kill off unneded prim assembler code for handling adjacency verts
The default macros when the adjacency macros aren't defined will already
exactly do that (that is, drop the adjacent vertices and call the non-adjacent
macro).

Reviewed-by: Jose Fonseca <jfonseca@vmwarec.com>
2014-12-06 18:02:59 +01:00
Roland Scheidegger ec30c66b46 gallium/docs: (trivial) remove STR opcode description.
The opcode was removed alongside SFL by commit
ecfe9e2ad2.
2014-12-06 17:56:46 +01:00
Matt Turner a28ad9d4c0 i965/fs: Perform CSE on MOV ..., VF instructions.
Safe from causing optimization loops, since we don't constant propagate
VF arguments.

(for this and the previous patch):
total instructions in shared programs: 4289075 -> 4271932 (-0.40%)
instructions in affected programs:     1616779 -> 1599636 (-1.06%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-05 16:43:31 -08:00
Matt Turner 963a3c7f90 i965/fs: Try to emit LINE instructions on Gen <= 5.
The LINE instruction performs a multiply-add instruction (a * b + c)
where b and c are scalar arguments. It reads b and c from offsets in
src0 such that you can load them (it they're representable) as a
vector-float immediate with a single instruction.

Hurts some programs, but that'll all get better once we CSE the
vector-float MOVs in the next patch.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77544
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-05 16:43:31 -08:00
Matt Turner 6be863af0e i965/fs: Add support for generating the LINE instruction.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-05 16:43:31 -08:00
Matt Turner 92346db057 i965: Set the region of LINE's src0 to <0,1,0>.
The PRMs say that

   <src0> region must be a replicated scalar
   (with HorzStride = VertStride = 0).

but apparently that doesn't actually apply to all generations. I did
notice when implementing the optimization later in this series that G45
and ILK needed this regioning.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-05 16:43:31 -08:00
Matt Turner 9ed8d00ab5 i965: Give compile stats through KHR_debug.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-05 16:43:31 -08:00
Matt Turner 5b1e51bfbe mesa: Add a source parameter to _mesa_gl_debug.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-05 16:43:31 -08:00
Eric Anholt befdff8142 vc4: Try swapping the regfile A to B to pair instructions.
total instructions in shared programs: 56995 -> 56087 (-1.59%)
instructions in affected programs:     40503 -> 39595 (-2.24%)
2014-12-05 16:27:58 -08:00
Eric Anholt 7d8b79f398 vc4: Allow pairing of some instructions that disagree about the WS bit.
No difference on shader-db because we tend to have a lot of other
conflicts going on as well (like RADDR_A disagreements)
2014-12-05 16:27:06 -08:00
Matt Turner e36c6513ce configure.ac: Replace contraction to fix syntax highlighting. 2014-12-05 13:22:56 -08:00
Ben Widawsky f13870db09 i965/gs: Avoid DW * DW mul
The GS has an interesting use for mul. Because the GS can emit multiple
vertices per input vertex, and it also has a unique count at the top of the URB
payload, the GS unit needs to be able to dynamically specify URB write offsets
(relative to the global offset). The documentation in the function has a very
good explanation from Paul on the mechanics.

This fixes around 2000 piglit tests on BSW.

v2:
Reworded commit message (Ben) no mention of CHV (Matt)
Change SHRT_MAX to USHRT_MAX (Ken, and Matt)
Update comment in code to reflect the use of UW (Ben)
Add Gen7+ assertion for the relevant GS code, since it won't work on Gen6- (Ken)
Drop the bogus hunk in emit_control_data_bits() (Ken)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84777 (with many dupes)
Cc: "10.4 10.3 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-12-05 12:12:46 -08:00
Eric Anholt 6f32deb538 vc4: Add separate write-after-read dependency tracking for pairing.
If an operation is the last one to read a register, the instruction
containing it can also include the op that has the next write to that
register.

total instructions in shared programs: 57486 -> 56995 (-0.85%)
instructions in affected programs:     43004 -> 42513 (-1.14%)
2014-12-05 10:53:53 -08:00
Eric Anholt 042962df2d vc4: Fix inverted priority of instructions for QPU scheduling.
We were scheduling TLB operations as early as possible, and texture setup
as late as possible.  When I introduced prioritization, I visually
inspected that an independent operation got moved above texture results
collection, which tricked me into thinking it was working (but it was just
because texture setup was being pushed late).

total instructions in shared programs: 57651 -> 57486 (-0.29%)
instructions in affected programs:     18532 -> 18367 (-0.89%)
2014-12-05 10:43:14 -08:00
Eric Anholt bd4057a5d7 vc4: Refuse to merge two ops that both access shared functions.
Avoids assertion failures in vc4_qpu_validate.c if we happen to find the
right set of operations available.
2014-12-05 10:43:14 -08:00
Eric Anholt dadc32ac80 vc4: Allow dead code elimination of color reads.
This might happen if the blending functions are set up to not actually use
the destination color/alpha, for example.
2014-12-05 10:43:14 -08:00
Eric Anholt 34cf86bdc4 vc4: Add a debug flag for waiting for sync on submit.
This is nice when you're tracking down which command list is hanging the
GPU.
2014-12-05 10:43:14 -08:00
Matt Turner c0e26c5d27 i965/fs: Move brw_file_from_reg() higher in the file.
This was supposed to be part of the previous commit.
2014-12-05 09:53:35 -08:00
Matt Turner db186f2a38 i965/fs: Make brw_reg_from_fs_reg static and remove prototype.
And move it above its first use in brw_fs_generator.cpp.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-12-05 09:49:42 -08:00
Matt Turner 2881b123d0 i965: Use ~0 to represent true on all generations.
Jason realized that we could fix the result of the CMP instruction on
Gen <= 5 by doing -(result & 1). Also do the resolves in the vec4
backend before use, rather than when the bool was created. The FS does
this and it saves some unnecessary resolves.

On Ironlake:

total instructions in shared programs: 4289762 -> 4287277 (-0.06%)
instructions in affected programs:     619430 -> 616945 (-0.40%)

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-12-05 09:49:42 -08:00
Matt Turner 05e2578cac i965: Change the type of booleans to D.
This is a revert of commit 4656c14e ("i965/fs: Change the type of
booleans to UD and emit correct immediates") plus some small additional
fixes, like casting ctx->Const.UniformBooleanTrue to int and changing UD
to D in the ir_unop_b2f cases. Note that it's safe to leave 0x3f800000
as UD and as a literal it's more recognizable than 1065353216.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-12-05 09:49:42 -08:00