Commit Graph

32 Commits

Author SHA1 Message Date
Jason Ekstrand 700bebb958 i965: Move the back-end compiler to src/intel/compiler
Mostly a dummy git mv with a couple of noticable parts:
 - With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
 - Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
 - brw_util.[ch] is not really compiler specific, so it's moved to i965.

v2:
 - move brw_eu_defines.h instead of brw_defines.h
 - remove no-longer applicable includes
 - add missing vulkan/ prefix in the Android build (thanks Tapani)

v3:
 - don't list brw_defines.h in src/intel/Makefile.sources (Jason)
 - rebase on top of the oa patches

[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00
Francisco Jerez b42c13a5b8 i965/fs: Drop fs_inst::overwrites_reg() in favor of regions_overlap().
fs_inst::overwrites_reg is rather easy to misuse because it cannot
tell how large the register region starting at 'reg' is, so in cases
where the destination region starts after 'reg' it may give a
misleading result.  regions_overlap() is somewhat more verbose to use
but handles arbitrary overlap correctly so it should generally be used
instead.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:55 -07:00
Francisco Jerez 86944e063a i965/fs: Replace fs_reg::reg_offset with fs_reg::offset expressed in bytes.
The fs_reg::offset field in byte units introduced in this patch is a
more straightforward alternative to the current register offset
representation split between fs_reg::reg_offset and ::subreg_offset.
The split representation makes it too easy to forget about one of the
offsets while dealing with the other, which has led to multiple
back-end bugs in the past.  To make the matter worse the unit
reg_offset was expressed in was rather inconsistent, for uniforms it
would be expressed in either 4B or 16B units depending on the
back-end, and for most other things it would be expressed in 32B
units.

This encodes reg_offset as a new offset field expressed consistently
in byte units.  Each rvalue reference of reg_offset in existing code
like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and
each lvalue reference like 'r.reg_offset = x' is rewritten to
'r.offset = r.offset % reg_unit + x * reg_unit'.

Because the change affects a lot of places and is rather non-trivial
to verify due to the inconsistent value of reg_unit, I've tried to
avoid making any additional changes other than applying the rewrite
rule above in order to keep the patch as simple as possible, sometimes
at the cost of introducing obvious stupidity (e.g. algebraic
expressions that could be simplified given some knowledge of the
context) -- I'll clean those up later on in a second pass.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-09-14 14:50:52 -07:00
Francisco Jerez b054605722 i965/fs: Restrict inequality that can only hold equal in saturate propagation.
Should have no functional change.  The IP value of an instruction that
reads src_var cannot possibly be after the end of the live interval of
the variable it's reading from, by the definition of live interval.
Might save future readers a momentary WTF while trying to understand
this code.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-14 14:58:19 -07:00
Matt Turner 4009a9ead4 i965/fs: Allow saturate propagation to propagate negations into MADs.
Allows us to transform

   mad      res  src0   src1   src2
   mov.sat  dst  -res

into

   mad.sat  dst  -src0 -src1   src2

instructions in affected programs: 3712 -> 3688 (-0.65%)
helped: 24

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-25 10:51:15 -08:00
Matt Turner 65d3217cb0 i965/fs: Allow saturate propagation to propagate negations into ADDs.
Allows us to transform

   add      res  src0   src1
   mov.sat  dst  -res

into

   add.sat  dst  -src0 -src1

No shader-db changes.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-25 10:51:13 -08:00
Matt Turner 7b6113bc2d i965/fs: Allow saturate propagation to propagate negations into MULs.
Allows us to transform

   mul      res  src0   src1
   mov.sat  dst  -res

into

   mul.sat  dst  src0  -src1

instructions in affected programs: 45246 -> 45054 (-0.42%)
helped: 162

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-02-25 10:51:10 -08:00
Matt Turner b163aa0148 i965: Rename GRF to VGRF.
The 2-bit hardware register file field is ARF, GRF, MRF, IMM.

Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to
mean an assigned general purpose register.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Matt Turner 7638e75cf9 i965: Use brw_reg's nr field to store register number.
In addition to combining another field, we get replace silliness like
"reg.reg" with something that actually makes sense, "reg.nr"; and no one
will ever wonder again why dst.reg isn't a dst_reg.

Moving the now 16-bit nr field to a 16-bit boundary decreases code size
by about 3k.

Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-11-13 11:27:50 -08:00
Neil Roberts 801f151917 i965: Remove block arg from foreach_inst_in_block_*_starting_from
Since 49374fab5d these macros no longer actually use the block
argument. I think this is worth doing to make the macros easier to use
because they already have really long names and a confusing set of
arguments.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-10-21 11:07:04 +02:00
Matt Turner 35a2d259f2 i965/fs: Consider type mismatches in saturate propagation.
NIR considers bcsel to produce and consume unsigned types, leading to
SEL instructions operating on unsigned types when the data is really
floating-point. Previous to this patch, saturate propagation would
happily transform

   (+f0) sel      g20:UD, g30:UD, g40:UD
         mov.sat  g50:F,  g20:F

into

   (+f0) sel.sat  g20:UD, g30:UD, g40:UD
         mov      g50:F,  g20:F

But since the meaning of .sat is dependent on the type of the
destination register, this is not valid.

Instead, allow saturate propagation to change the types of dest/source
on instructions that are simply copying data in order to propagate the
saturate modifier.

Fixes bad code gen in 158 programs.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-10-19 10:19:32 -07:00
Matt Turner 7f8dd91d16 i965/fs: Consider MOV.SAT to interfere if it has a source modifier.
The saturate propagation pass recognizes that the second instruction
below does not interfere with an attempt to propagate the saturate
modifier from instruction 3 to 1.

 1:  add(8)     dst0   src0  src1
 2:  mov.sat(8) dst1   dst0
 3:  mov.sat(8) dst2   dst0

Unfortunately, we did not consider the case of instruction 2 having a
source modifier on dst0. Take for instance:

 1:  add(8)     dst0   src0  src1
 2:  mov.sat(8) dst1  -dst0
 3:  mov.sat(8) dst2   dst0

Consider such an instruction to interfere. Increase instruction counts
in Anomaly 2, which could be a bug fix depending on the values the first
instruction produces.

instructions in affected programs:     53228 -> 53934 (1.33%)
HURT:                                  360

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-02-19 21:16:43 -08:00
Matt Turner 871ad3f08b i965/fs: Use fs_inst::overwrites_reg() in saturate propagation.
This is safer and matches the conditional_mod propagation pass.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2015-02-19 21:16:43 -08:00
Matt Turner 2308b3bef2 i965/fs: Add a comment explaining what saturate propagation does. 2014-12-16 11:30:44 -08:00
Matt Turner b37273b924 i965/fs: Use const fs_reg & rather than a copy or pointer.
Also while we're touching var_from_reg, just make it an inline function.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-01 16:42:13 -08:00
Matt Turner e9aee2572a i965/fs: Don't invalidate live intervals in saturate propagation.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-09-27 12:18:37 -07:00
Matt Turner b9689c6bda i965/fs: Ignore mov.sat instructions in interference check in sat prop.
When an instruction's result was consumed by multiple mov.sat
instructions, we would decide that we couldn't move the saturate
modifier because something else was using the result, even though it was
just another mov.sat!

total instructions in shared programs: 4275598 -> 4274842 (-0.02%)
instructions in affected programs:     75634 -> 74878 (-1.00%)

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-09-27 12:18:37 -07:00
Matt Turner 82bdb559a1 i965/fs: Walk instructions in reverse in saturate propagation.
When we find a mov.sat, we search backwards. We might as well search
everything else backwards as well and potentially look at fewer
instructions.

This change enables the next patch.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-09-27 12:18:37 -07:00
Matt Turner f0598d413b i965/fs: Don't iterate between blocks with inst->next/prev.
When instruction lists are per-basic block, this won't work.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-09-24 09:42:46 -07:00
Matt Turner 072ea414d0 i965: Remove cfg-invalidating parameter from invalidate_live_intervals.
Everything has been converted to preserve the CFG.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-09-24 09:42:46 -07:00
Matt Turner 20a849b4aa i965: Use basic-block aware insertion/removal functions.
To avoid invalidating and recreating the control flow graph. Also stop
invalidating the CFG in places we didn't add or remove an instruction.

cfg calculations:     202951 -> 80307 (-60.43%)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-08-22 10:23:34 -07:00
Matt Turner 596990d91e i965: Add and use foreach_block macro.
Use this as an opportunity to rename 'block_num' to 'num'. block->num is
clear, and block->block_num has always been redundant.
2014-08-18 18:56:30 -07:00
Matt Turner 680fe0acb3 i965: Add cfg to backend_visitor.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2014-07-21 10:35:34 -07:00
Matt Turner 2e90d1fb62 i965/fs: Pass cfg to calculate_live_intervals().
We've often created the CFG immediately before, so use it when
available.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-07-01 08:55:52 -07:00
Matt Turner bc2fbbafd2 i965: Add and use foreach_inst_in_block macros.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-07-01 08:55:51 -07:00
Matt Turner bcbb7c41b7 i965/fs: Let sat-prop ignore live ranges if producer already has sat.
This sequence (where both x and w are used afterwards) wasn't handled.

   mul.sat x, y, z
   ...
   mov.sat w, x

We assumed that if x was used after the mov.sat, that we couldn't
propagate the saturate modifier, but in fact x was already saturated.

So ignore the live range check if the producing instruction already
saturates its result. Cuts one instruction from hundreds of TF2 shaders.

total instructions in shared programs: 1995631 -> 1994951 (-0.03%)
instructions in affected programs:     155248 -> 154568 (-0.44%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-06-30 22:31:05 -07:00
Matt Turner b1dcdcde2e i965/fs: Loop from 0 to inst->sources, not 0 to 3.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-06-01 13:29:24 -07:00
Matt Turner 92d03f7f28 i965/fs: Don't propagate saturation modifiers if there are source modifiers.
Which would lead to translating

   mad     vgrf9:F,  vgrf3:F, u0:F, vgrf6:F
   mov.sat vgrf7:F, -vgrf9:F

into

   mad.sat vgrf9:F,  vgrf3:F, u0:F, vgrf6:F
   mov     vgrf7:F, -vgrf9:F

Fixes some lighting effects in Dota2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76749
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-04-05 09:47:36 -07:00
Matt Turner 7a7b8a02be i965/fs: Don't propagate saturate modifiers into partial writes.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-04-05 09:47:36 -07:00
Matt Turner 86ae6f477d i965/fs: Fix off-by-one in saturate propagation.
ip needs to be initialized to start_ip - 1, since the first thing in the
main loop is ip++. Otherwise we would incorrectly propagate the saturate
from the mov to the mad:

   mad     a, b, c, d
   mov.sat x, a
   add     y, z, a

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-04-05 09:47:36 -07:00
Kenneth Graunke 4d2e79269a i965/fs: Fix register comparisons in saturate propagation.
opt_saturate_propagation_local compares scan_inst->dst.reg/reg_offset
with inst->src[0].reg/reg_offset, and ensures that scan_inst->dst.file
is GRF.  But nothing ensured that inst->src[0].file was GRF.

In the following program, this resulted in u1:F matching vgrf1:UW,
and a saturate being incorrectly propagated from instruction 8 to
instruction 1.

{  1}    0: add vgrf0:UW, hw_reg1+8:UW, hw_reg0:V
{  1}    1: add vgrf1:UW, hw_reg1+10:UW, hw_reg0:V
{  1}    2: linterp vgrf6:F, hw_reg2:F, hw_reg3:F, hw_reg0:F
{  2}    3: linterp vgrf27:F, hw_reg2:F, hw_reg3:F, hw_reg0+16:F
{  4}    4: mov vgrf10+0.0:F, vgrf6:F
{  3}    5: mov vgrf10+1.0:F, vgrf27:F
{  6}    6: tex vgrf8+0.0:F, vgrf10+0.0:F
{  5}    7: mov vgrf32:F, u1:F
{  5}    8: mov.sat vgrf12:F, u1:F

From shader-db:
   total instructions in shared programs: 1841932 -> 1841957 (0.00%)
   instructions in affected programs:     5823 -> 5848 (0.43%)
I inspected two of the 25 hurt shaders, and concluded that they were
both hitting this bug, and not legitimately optimized.

This fixes bugs in Left 4 Dead 2 and Team Fortress 2, possibly among
others.  The optimization pass didn't exist in 10.0, so this is only
a candidate for 10.1.

Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2014-03-14 13:17:57 -07:00
Matt Turner 947c828d5c i965/fs: Add a saturation propagation optimization pass.
Transforms, for example,

mul     vgrf3, vgrf2, vgrf1
mov.sat vgrf4, vgrf3

into

mul.sat vgrf3, vgrf2, vgrf1
mov     vgrf4, vgrf3

which gives register_coalescing an opportunity to remove the MOV
instruction.

total instructions in shared programs: 1515039 -> 1504634 (-0.69%)
instructions in affected programs:     798586 -> 788181 (-1.30%)
GAINED:                                0
LOST:                                  4

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2014-01-28 17:47:41 -08:00