From 23278a75ce06c3c083892b2a20d9efdf794167d6 Mon Sep 17 00:00:00 2001
From: Iago Toral Quiroga <itoral@igalia.com>
Date: Mon, 18 Jul 2016 12:04:13 +0200
Subject: [PATCH] i965/vec4: teach register coalescing about 64-bit

Specifically, at least for now, we don't want to deal with the fact that
channel sizes for fp64 instructions are twice the size, so prevent
coalescing from instructions with a different type size.

Also, we should check that if we are coalescing a register from another
MOV we should be writing the same amount of data in both operations, otherwise
we end up wiring more or less than the original instruction. This can happen,
for example, when we have split fp64 MOVs with an exec size of 4 that only
write one register each and then a MOV with exec size of 8 that reads both.
We want to avoid the pass to think that it can coalesce from the first split
MOV alone. Ideally we would like the pass to see that it can coalesce from both
split MOVs instead, but for now we keep it simple.

Finally, the pass doesn't support coalescing of multiple registers but in the
case of normal SIMD4x2 double-precision instructions they naturally write two
registers (one per vertex) and there is no reason why we should not allow
coalescing in this case. Change the restriction to bail if we see instructions
that write more than 8 channels, where the channels can be 32-bit or 64-bit.

v2:
 - Make sure that scan_inst and inst write the same amount of data.

Reviewed-by: Matt Turner <mattst88@gmail.com>
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index cc0a76a7eb4..42fde07c4ef 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1191,6 +1191,20 @@ vec4_visitor::opt_register_coalesce()
                   scan_inst->dst.type == scan_inst->src[0].type))
                break;
 
+            /* Only allow coalescing between registers of the same type size.
+             * Otherwise we would need to make the pass aware of the fact that
+             * channel sizes are different for single and double precision.
+             */
+            if (type_sz(inst->src[0].type) != type_sz(scan_inst->src[0].type))
+               break;
+
+            /* Check that scan_inst writes the same amount of data as the
+             * instruction, otherwise coalescing would lead to writing a
+             * different (larger or smaller) region of the destination
+             */
+            if (scan_inst->size_written != inst->size_written)
+               break;
+
             /* If we can't handle the swizzle, bail. */
             if (!scan_inst->can_reswizzle(devinfo, inst->dst.writemask,
                                           inst->src[0].swizzle,
@@ -1198,10 +1212,12 @@ vec4_visitor::opt_register_coalesce()
                break;
             }
 
-            /* This only handles coalescing of a single register starting at
-             * the source offset of the copy instruction.
+            /* This only handles coalescing writes of 8 channels (1 register
+             * for single-precision and 2 registers for double-precision)
+             * starting at the source offset of the copy instruction.
              */
-            if (scan_inst->size_written > REG_SIZE ||
+            if (DIV_ROUND_UP(scan_inst->size_written,
+                             type_sz(scan_inst->dst.type)) > 8 ||
                 scan_inst->dst.offset != inst->src[0].offset)
                break;