Commit Graph

67770 Commits

Author SHA1 Message Date
Eric Anholt 8ab6759cef mesa: Move simple_list.h to src/util.
We have two copies of it in the tree, I'm going to delete one.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-01-28 16:33:34 -08:00
Tom Stellard 2397a72129 radeonsi: Enable VGPR spilling for all shader types v5
v2:
  - Only emit write SPI_TMPRING_SIZE once per packet.
  - Use context global scratch buffer.

v3:
  - Patch shaders using WRITE_DATA packet instead of map/unmap.
  - Emit ICACHE_FLUSH, CS_PARTIAL_FLUSH, PS_PARTIAL_FLUSH, and
    VS_PARTIAL_FLUSH when patching shaders.

v4:
  - Code cleanups.
  - Remove unnecessary multiplies.

v5:
  - Patch shaders in system memory and re-upload to vram.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-28 21:03:47 +00:00
Tom Stellard 5dcd97f25c radeonsi/compute: Allocate the scratch buffer during state creation
This moves scratch buffer allocation from si_launch_grid() to
si_create_compute_state().  This helps to reduce the overhead of
launching a kernel and also fixes a bug in the code that would cause
the scratch buffer to be too small if a kernel with smaller scratch size
was launched before a kernel with a larger scratch size.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-28 21:03:46 +00:00
Tom Stellard 32206c5e56 radeonsi: Add radeon_shader_binary member to struct si_shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-28 21:03:46 +00:00
Tom Stellard 37559f8dfc radeonsi/compute: Rename si_compute::program to si_compute::shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-28 21:03:46 +00:00
Marek Olšák 5935edd47c radeonsi: Avoid leaking memory when rebuilding shader states
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-01-28 21:03:46 +00:00
Jason Ekstrand bb26ebac13 nir/opcodes: Use a return type of tfloat for ldexp
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-28 13:21:40 -08:00
Jason Ekstrand 7ac79eea1a Revert "util: Move the alternate fpclassify implementation to util"
This reverts commits d6eb572905 and
58e8468d11.

This is no longer necessary as we aren't using it in NIR anymore.  Also, it
broke the build on some strange systems so let's put it back in querymatrix
where it came from.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88852

Acked-by: Matt Turner <mattst88@gmail.com>
2015-01-28 13:20:26 -08:00
Jason Ekstrand f0340ff625 Revert "nir/opcodes: Use fpclassify() instead of isnormal() for ldexp"
This reverts commit d7d340fb2f.

We have an isnormal() implementation available, the only problem was that
we had the wrong return type (fixed in a later patch).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806

Acked-by: Matt Turner <mattst88@gmail.com>
2015-01-28 13:19:47 -08:00
Jason Ekstrand 58e8468d11 util: Predicate the fpclassify fallback on !defined(__cplusplus)
The problem is that the fallbacks we have at the moment don't work in C++.
While we could theoretically fix the fallbacks it would also raise the
issue of correctly detecting the fpclassify function.  So, for now, we'll
just disable it until we actually have a C++ user.

Reported-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: EdB <edb+mesa@sigluy.net>
2015-01-28 11:47:56 -08:00
Sven Arvidsson 3b7747c022 drirc: set allow_glsl_extension_directive_midshader for Dead Island.
Signed-off-by: Sven Arvidsson <sa@whiz.se>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87076
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-01-28 14:50:28 +01:00
Jason Ekstrand d7d340fb2f nir/opcodes: Use fpclassify() instead of isnormal() for ldexp
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-01-28 03:42:41 -08:00
Jason Ekstrand d6eb572905 util: Move the alternate fpclassify implementation to util
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-01-28 03:42:41 -08:00
Jason Ekstrand 5e8468e6da i965/tex: Don't create read-write textures with non-renderable formats
I haven't actually seen this bug in the wild, but it's possible that
someone could ask to do a S3TC PBO download or something.  This protects us
from accidentally creating a render target with a compressed or otherwise
non-renderable format.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-28 01:28:32 -08:00
Jason Ekstrand 34723c0861 i965/gen8: Include the buffer offset when emitting renderbuffer relocs
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88792
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-28 01:28:31 -08:00
Tapani Pälli 291d7ef84d mesa: improve error messaging for format CSV parser
Patch adds 2 error messages that point user directly to fix
mispelled or impossible swizzle field for a format.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-28 10:40:15 +02:00
EdB 6ee5effac1 clover/llvm: Dump the OpenCL C code earlier.
[ Francisco Jerez: As discussed on the mailing list, this is intended
  to produce more useful debug output in cases where the compilation
  terminates unexpectedly. ]

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-01-28 02:27:41 +02:00
EdB 13d23a9a17 clover/llvm: Move CLOVER_DEBUG stuff into anonymous namespace.
[ Francisco Jerez: As we're at it make debug_options[] local to its
  only user and remove temporary. ]

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-01-28 02:27:41 +02:00
Dave Airlie 349df23eb0 r600g: add support for primitive id without geom shader (v2)
GLSL 1.50 specifies a fragment shader may have a primitive id
input without a geometry shader present.

On r600 hw there is a special GS scenario for this, you have
to enable GS_SCENARIO_A and pass the primitive id through
the vertex shader which operates in GS_A mode.

This is a first pass attempt at this, and passes the piglit
tests that test for this.

v1.1: clean up debug print + no need to assign
key value to setup output.
v2: add r600 support

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-01-28 09:51:21 +10:00
Dave Airlie cc2fc095bf r600g: move selecting the pixel shader earlier.
In order to detect that a pixel shader has a prim id
input when we have no geometry shader we need to reorder
the shader selection so the pixel shader is selected
first, then the vertex shader key can take into account
the primitive id input requirement and lack of geom shader.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-01-28 09:51:02 +10:00
Michel Dänzer 5c83a0d2ce st/clover: Pass target instead of target.begin() to std::string()
Fixes reading beyond allocated memory:

==1936== Invalid read of size 1
==1936==    at 0x4C2C1B4: strlen (vg_replace_strmem.c:412)
==1936==    by 0x9E00C30: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20)
==1936==    by 0x5B44FAE: clover::compile_program_llvm(clover::compat::string const&, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&, pipe_shader_ir, clover::compat::string const&, clover::compat::string const&, clover::compat::string&) (invocation.cpp:698)
==1936==    by 0x5B39A20: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==1936==    by 0x5B20152: clBuildProgram (program.cpp:182)
==1936==    by 0x400F41: main (hello_world.c:109)
==1936==  Address 0x56fee1f is 0 bytes after a block of size 15 alloc'd
==1936==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==1936==    by 0x5B398F0: alloc (compat.hpp:59)
==1936==    by 0x5B398F0: vector<std::basic_string<char> > (compat.hpp:98)
==1936==    by 0x5B398F0: string<std::basic_string<char> > (compat.hpp:327)
==1936==    by 0x5B398F0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==1936==    by 0x5B20152: clBuildProgram (program.cpp:182)
==1936==    by 0x400F41: main (hello_world.c:109)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-01-27 16:55:29 +09:00
Michel Dänzer ee31c8d706 r600g,radeonsi: Fix calculation of IR target cap string buffer size
Fixes writing beyond the allocated buffer:

==31855== Invalid write of size 1
==31855==    at 0x50AB2A9: vsprintf (iovsprintf.c:43)
==31855==    by 0x508F6F6: sprintf (sprintf.c:32)
==31855==    by 0xB59C7EC: r600_get_compute_param (r600_pipe_common.c:526)
==31855==    by 0x5B2B7DE: get_compute_param<char> (device.cpp:37)
==31855==    by 0x5B2B7DE: clover::device::ir_target() const (device.cpp:201)
==31855==    by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==31855==    by 0x5B20152: clBuildProgram (program.cpp:182)
==31855==    by 0x400F41: main (hello_world.c:109)
==31855==  Address 0x56fed5f is 0 bytes after a block of size 15 alloc'd
==31855==    at 0x4C29180: operator new(unsigned long) (vg_replace_malloc.c:324)
==31855==    by 0x5B2B7C2: allocate (new_allocator.h:104)
==31855==    by 0x5B2B7C2: allocate (alloc_traits.h:357)
==31855==    by 0x5B2B7C2: _M_allocate (stl_vector.h:170)
==31855==    by 0x5B2B7C2: _M_create_storage (stl_vector.h:185)
==31855==    by 0x5B2B7C2: _Vector_base (stl_vector.h:136)
==31855==    by 0x5B2B7C2: vector (stl_vector.h:278)
==31855==    by 0x5B2B7C2: get_compute_param<char> (device.cpp:35)
==31855==    by 0x5B2B7C2: clover::device::ir_target() const (device.cpp:201)
==31855==    by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==31855==    by 0x5B20152: clBuildProgram (program.cpp:182)
==31855==    by 0x400F41: main (hello_world.c:109)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-01-27 16:54:38 +09:00
Connor Abbott f1a9252def nir: fix a bug with constant folding non-per-component instructions
Before, we were only copying the first N channels, where N is the size
of the SSA destination, which is fine for per-component instructions,
but non-per-component instructions like fdot3 can have more source
components than destination components. Fix this using the helper
function introduced in the last patch.

v2: use new helper name

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-26 21:26:36 -05:00
Connor Abbott 816f0515a2 nir: add a helper function for getting the number of source components
Unlike with non-SSA ALU instructions, where if they're per-component
you have to look at the writemask to know which source channels are
being used, SSA ALU instructions always have all the possible channels
enabled so we can just look at the number of components in the SSA
definition for per-component instructions to say how many source
components are being used.

v2: use new name nir_ssa_alu_instr_src_components()

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-26 21:26:36 -05:00
Sisinty Sasmita Patra 90bd943f2a i965: Implemente a tiled fast-path for glReadPixels and glGetTexImage
Added intel_readpixels_tiled_mempcpy and intel_gettexsubimage_tiled_mempcpy
functions. These are the fast paths for glReadPixels and glGetTexImage.

On chrome, using the RoboHornet 2D Canvas toDataURL test, this patch cuts
amount of time spent in glReadPixels by more than half and reduces the time
of the entire test by 10%.

v2: Jason Ekstrand <jason.ekstrand@intel.com>
   - Refactor to make the functions look more like the old
     intel_tex_subimage_tiled_memcpy
   - Don't export the readpixels_tiled_memcpy function
   - Fix some pointer arithmatic bugs in partial image downloads (using
     ReadPixels with a non-zero x or y offset)
   - Fix a bug when ReadPixels is performed on an FBO wrapping a texture
     miplevel other than zero.

v3: Jason Ekstrand <jason.ekstrand@intel.com>
   - Better documentation fot the *_tiled_memcpy functions
   - Add target restrictions for renderbuffers wrapping textures

v4: Jason Ekstrand <jason.ekstrand@intel.com>
   - Only check the return value of brw_bo_map for error and not bo->virtual

v5: Jason Ekstrand <jason.ekstrand@intel.com>
   - Don't unnecessarily repeat a comment

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-01-26 17:29:35 -08:00
Sisinty Sasmita Patra b52959c602 i965/tiled_memcpy: Add tiled-to-linear paths
This commit addes tiled copy functions for coping from tiled memory to
linear memory.  These are very similar to the existing linear-to-tiled
paths.

v2: Jason Ekstrand <jason.ekstrand@intel.com>
   - New commit message
   - Various whitespace fixes
   - Added ptrdiff_t casts as done in commit 225a09790

v3: Jason Ekstrand <jason.ekstrand@intel.com>
   - Fixed a comment

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-01-26 17:29:34 -08:00
Sisinty Sasmita Patra 009be40b7d i965: Refactor tiled memcpy functions and move them into their own file
This commit refactors the tiled_memcpy code in intel_tex_subimage.c and
moves it into its own file intel_tiled_memcpy files.  Also, xtile_copy and
ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled
respectively.  The *_faster functions are similarly renamed.

There was also a bit of logic to select between the the libc provided
memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle.
This was moved into an intel_get_memcpy function so that rgba8_copy can
live (and be inlined) in intel_tiled_memcpy.c.

v2: Jason Ekstrand <jason.ekstrand@intel.com>
   - Better commit message
   - Fix up the copyright on the intel_tiled_memcpy files
   - Various whitespace fixes
   - Moved a bunch of stuff that did not need to be exposed from
     intel_tiled_memcpy.h to intel_tiled_memcpy.c
   - Added proper documentation for intel_get_memcpy
   - Incorperated the ptrdiff_t tweaks from commit 225a09790

v3: Jason Ekstrand <jason.ekstrand@intel.com>
   - Fixed a comment
   - Move the tile size constants into the .c file

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-01-26 17:29:34 -08:00
Jason Ekstrand f883aac06e i965/tex_subimage: Use the fast tiled path for rectangle textures
There's no reason why we should be doing this for 2D textures and not
rectangles.  Just a matter of adding another hunk to the condition.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
2015-01-26 17:29:34 -08:00
Dave Airlie ea9ae5d51a mesa/autoconf: attempt to use gnu99 on older gcc compilers
anonymous structs/union don't work with c99 but do work with gnu99
on gcc 4.4.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-01-27 10:27:56 +10:00
Felix Janda 2e2087a9eb mesa: simplify detection of fpclassify
Fixes compilation with musl libc.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-01-26 14:07:57 -08:00
Jason Ekstrand dd74369a0a nir/opcodes: Don't go through doubles when constant-folding iabs
Previously, we called the abs() function in math.h.  However, this involves
unnecessarily going through double.  This commit changes it to use integers
directly with a ternary.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-01-26 11:25:02 -08:00
Jason Ekstrand 9bd28fe3a3 nir/opcodes: Simplify and fix the unpack_half_*_split_* constant expressions
Previously, these functions were explicitly writing to dst.x and dst.y.
However they both return only one component so writing to dst.y is invalid.
Also, since they only return one component, we don't need the explicit
assignment in the expression and can simplify it use an implicit
assignment.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-26 11:25:02 -08:00
Jason Ekstrand 27c6e3e4ca nir: Use pointers for nir_src_copy and nir_dest_copy
This avoids the overhead of copying structures and better matches the newly
added nir_alu_src_copy and nir_alu_dest_copy.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-26 11:24:58 -08:00
Kenneth Graunke 9f5fee8804 i965: Handle CMP.nz ... 0 and MOV.nz similarly in cmod propagation.
"MOV.nz null src" and "CMP.nz null src 0" are equivalent instructions.

Previously, we deleted MOV.nz instructions when the instruction
generating the MOV's source also wrote the flag register (as the flag
register already contains the desired value).  However, we wouldn't
delete CMP.nz instructions that served the same purpose.

We also didn't attempt true cmod propagation on MOV.nz instructions,
while we would for the equivalent CMP.nz form.

This patch fixes both limitations, treating both forms equally.
CMP.nz instructions will now be deleted (helping the NIR backend),
and MOV.nz instructions will have their .nz propagated.

No changes in shader-db without NIR.  With NIR,

total instructions in shared programs: 6006153 -> 5969364 (-0.61%)
instructions in affected programs:     2087139 -> 2050350 (-1.76%)
helped:                                10704
HURT:                                  0
GAINED:                                2
LOST:                                  2

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-01-26 10:13:18 -08:00
Jan Vesely 9cbb9165b9 clover: Fix build with llvm after r226981
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88783
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
2015-01-26 09:46:41 -05:00
Niels Ole Salscheider 4b94c3fc31 configure: Link against all LLVM targets when building clover
Since 8e7df519bd, we initialise all targets in
clover. This fixes bug 85380.

v2: Mention correct bug in commit message

Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
2015-01-25 18:11:03 +02:00
Connor Abbott 0aa31bf9c3 nir/constant_folding: use the new constant folding infrastructure
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-01-24 21:35:35 -08:00
Jason Ekstrand 89285e4d47 nir: add new constant folding infrastructure
Add a required field to the Opcode class, const_expr, that contains an
expression or statement that computes the result of the opcode given known
constant inputs. Then take those const_expr's and expand them into a function
that takes an opcode and an array of constant inputs and spits out the constant
result. This means that when adding opcodes, there's one less place to update,
and almost all the opcodes are self-documenting since the information on how to
compute the result is right next to the definition.

The helper functions in nir_constant_expressions.c were taken from
ir_constant_expressions.cpp.

v3 Jason Ekstrand <jason.ekstrand@iastate.edu>
 - Use mako to generate one function per opcode instead of doing piles of
   string splicing

v4 Jason Ekstrand <jason.ekstrand@iastate.edu>
 - More comments and better indentation in the mako
 - Add a description of the constant expression language in nir_opcodes.py
 - Added nir_constant_expressions.py to EXTRA_DIST in Makefile.am

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-01-24 21:35:35 -08:00
Connor Abbott fa4bc6c130 nir: use Python to autogenerate opcode information
Before, we used a system where a file, nir_opcodes.h, defined some macros that
were included to generate the enum values and the nir_op_infos structure. This
worked pretty well, but for development the error messages were never very
useful, Python tools couldn't understand the opcode list, and it was difficult
to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we
store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to
generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h,
which contains all the enum names and gets included into nir.h like before.  In
addition to solving the above problems, using Python and Mako to generate
everything means that it's much easier to add keep information centralized as we
add new things like constant propagation that require per-opcode information.

v2:
 - make Opcode derive from object (Dylan)
 - don't use assert like it's a function (Dylan)
 - style fixes for fnoise, use xrange (Dylan)
 - use iterkeys() in nir_opcodes_h.py (Dylan)
 - use pydoc-style comments (Jason)
 - don't make fmin/fmax commutative and associative yet (Jason)

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

v3 Jason Ekstrand <jason.ekstrand@intel.com>
 - Alphabetize source file lists
 - Generate nir_opcodes.h in the builddir instead of the source dir
 - Include $(builddir)/src/glsl/nir in the i965 build
 - Rework nir_opcodes.h generation so it generates a complete header file
   instead of one that has to be embedded inside an enum declaration
2015-01-24 21:33:56 -08:00
Emil Velikov d2811c29da docs: add news item and link release notes for mesa 10.4.3
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-01-24 13:18:10 +00:00
Emil Velikov 48818a0fc7 docs: Add sha256 sums for the 10.4.3 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 49a5bce7801651574d3f4841d7532d8b2b86af63)
2015-01-24 13:14:56 +00:00
Emil Velikov 9f35423270 Add release notes for the 10.4.3 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit e92bfa3f9512832b61706b76b5c9f7afa78199b0)
2015-01-24 13:14:54 +00:00
Matt Turner 94e7b59a75 i965: Convert CMP.GE -(abs)reg 0 -> CMP.Z reg 0.
total instructions in shared programs: 5952059 -> 5951603 (-0.01%)
instructions in affected programs:     138812 -> 138356 (-0.33%)
GAINED:                                1
LOST:                                  0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-23 17:57:40 -08:00
Matt Turner 40ae302a3c i965/fs: Add support for removing MOV.NZ instructions.
For some reason, we occasionally write the flag register with a MOV.NZ
instruction:

   add(8)          g25<1>F         -g6<0,1,0>F     g15<8,8,1>F
   cmp.l.f0(8)     g26<1>D         g25<8,8,1>F     0F
   mov.nz.f0(8)    null            g26<8,8,1>D

A MOV.NZ instruction on the result of a CMP is like comparing for
equality with true in C. It's useless. Removing it allows us to
generate:

   add.l.f0(8)     null            -g6<0,1,0>F     g15<8,8,1>F

total instructions in shared programs: 5955701 -> 5951657 (-0.07%)
instructions in affected programs:     302910 -> 298866 (-1.34%)
GAINED:                                1
LOST:                                  0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-23 17:57:40 -08:00
Matt Turner 9a3a294224 i965/fs: Allow flipping cond mod for negated arguments.
This allows us to apply the optimization in cases where the CMP's
argument is negated, by flipping the conditional mod. For example, it
allows us to optimize this:

   add(8)       temp   a      b
   cmp.l.f0(8)  null   -temp  0.0

into

   add.g.f0(8)  temp   a      b

total instructions in shared programs: 5958360 -> 5955701 (-0.04%)
instructions in affected programs:     466880 -> 464221 (-0.57%)
GAINED:                                0
LOST:                                  1

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-23 17:57:40 -08:00
Matt Turner d6317beb46 i965/fs: Propagate cmod across flag read if it contains the same value.
total instructions in shared programs: 5959463 -> 5958900 (-0.01%)
instructions in affected programs:     70031 -> 69468 (-0.80%)

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-23 17:57:40 -08:00
Matt Turner 3fb5b2bc47 i965/fs: Add unit tests for cmod propagation pass.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-23 17:57:40 -08:00
Matt Turner 19f9cb72c8 i965/fs: Add pass to propagate conditional modifiers.
total instructions in shared programs: 5974160 -> 5959463 (-0.25%)
instructions in affected programs:     1743737 -> 1729040 (-0.84%)
GAINED:                                0
LOST:                                  12

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-23 17:57:40 -08:00
Matt Turner 3759a89ad3 i965/fs: Eliminate null-dst instructions without side-effects.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-23 17:57:40 -08:00
Matt Turner 7452f18b22 i965/fs: Apply conditional mod specially to split MAD/LRP.
Otherwise we'll apply the conditional mod to only one of SIMD8
instructions and trigger an assertion.

NoDDClr/NoDDChk have the same problem but we never apply those to these
instructions, so I'm leaving them for a later time.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-01-23 17:57:40 -08:00