freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
/* -*- mode: C; c-file-style: "k&r"; tab-width 4; indent-tabs-mode: t; -*- */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Copyright (C) 2014 Rob Clark <robclark@freedesktop.org>
|
|
|
|
*
|
|
|
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
|
|
|
* copy of this software and associated documentation files (the "Software"),
|
|
|
|
* to deal in the Software without restriction, including without limitation
|
|
|
|
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
|
|
|
* and/or sell copies of the Software, and to permit persons to whom the
|
|
|
|
* Software is furnished to do so, subject to the following conditions:
|
|
|
|
*
|
|
|
|
* The above copyright notice and this permission notice (including the next
|
|
|
|
* paragraph) shall be included in all copies or substantial portions of the
|
|
|
|
* Software.
|
|
|
|
*
|
|
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
|
|
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
|
|
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
|
|
|
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
|
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
|
|
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
|
|
* SOFTWARE.
|
|
|
|
*
|
|
|
|
* Authors:
|
|
|
|
* Rob Clark <robclark@freedesktop.org>
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include "pipe/p_shader_tokens.h"
|
|
|
|
#include "util/u_math.h"
|
|
|
|
|
|
|
|
#include "ir3.h"
|
|
|
|
#include "ir3_visitor.h"
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Register Assignment:
|
|
|
|
*
|
|
|
|
* NOTE: currently only works on a single basic block.. need to think
|
|
|
|
* about how multiple basic blocks are going to get scheduled. But
|
|
|
|
* I think I want to re-arrange how blocks work, ie. get rid of the
|
|
|
|
* block nesting thing..
|
|
|
|
*
|
|
|
|
* NOTE: we could do register coalescing (eliminate moves) as part of
|
|
|
|
* the RA step.. OTOH I think we need to do scheduling before register
|
|
|
|
* assignment. And if we remove a mov that effects scheduling (unless
|
|
|
|
* we leave a placeholder nop, which seems lame), so I'm not really
|
|
|
|
* sure how practical this is to do both in a single stage. But OTOH
|
|
|
|
* I'm not really sure a sane way for the CP stage to realize when it
|
|
|
|
* cannot remove a mov due to multi-register constraints..
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
|
|
|
struct ir3_ra_ctx {
|
|
|
|
struct ir3_block *block;
|
|
|
|
enum shader_t type;
|
2014-02-22 14:46:39 +00:00
|
|
|
bool half_precision;
|
2014-02-25 13:51:30 +00:00
|
|
|
bool frag_coord;
|
|
|
|
bool frag_face;
|
2014-04-08 19:14:43 +01:00
|
|
|
bool has_samp;
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
int cnt;
|
freedreno/ir3: fix lockups with lame FRAG shaders
Shaders like:
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL IN[0], GENERIC[0], PERSPECTIVE
DCL OUT[0], COLOR
DCL SAMP[0]
DCL TEMP[0], LOCAL
IMM[0] FLT32 { 0.0000, 1.0000, 0.0000, 0.0000}
0: TEX TEMP[0], IN[0].xyyy, SAMP[0], 2D
1: MOV OUT[0], IMM[0].xyxx
2: END
cause unhappyness. They have an IN[], but once this is compiled the
useless TEX instruction goes away. Leaving a varying that is never
fetched, which makes the hw unhappy.
In the process fix a signed vs unsigned compare. If the vertex shader
has max_reg=-1, MAX2() vs an unsigned would not give the desired result.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-10-03 15:02:31 +01:00
|
|
|
int max_bary;
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
bool error;
|
|
|
|
};
|
|
|
|
|
2014-02-22 14:46:39 +00:00
|
|
|
/* sorta ugly way to retrofit half-precision support.. rather than
|
|
|
|
* passing extra param around, just OR in a high bit. All the low
|
|
|
|
* value arithmetic (ie. +/- offset within a contiguous vec4, etc)
|
|
|
|
* will continue to work as long as you don't underflow (and that
|
|
|
|
* would go badly anyways).
|
|
|
|
*/
|
|
|
|
#define REG_HALF 0x8000
|
|
|
|
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
struct ir3_ra_assignment {
|
|
|
|
int8_t off; /* offset of instruction dst within range */
|
|
|
|
uint8_t num; /* number of components for the range */
|
|
|
|
};
|
|
|
|
|
|
|
|
static void ra_assign(struct ir3_ra_ctx *ctx,
|
|
|
|
struct ir3_instruction *assigner, int num);
|
|
|
|
static struct ir3_ra_assignment ra_calc(struct ir3_instruction *instr);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Register Allocation:
|
|
|
|
*/
|
|
|
|
|
2014-07-21 20:24:30 +01:00
|
|
|
#define REG(n, wm, f) (struct ir3_register){ \
|
|
|
|
.flags = (f), \
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
.num = (n), \
|
|
|
|
.wrmask = TGSI_WRITEMASK_ ## wm, \
|
|
|
|
}
|
|
|
|
|
|
|
|
/* check that the register exists, is a GPR and is not special (a0/p0) */
|
|
|
|
static struct ir3_register * reg_check(struct ir3_instruction *instr, unsigned n)
|
|
|
|
{
|
|
|
|
if ((n < instr->regs_count) && reg_gpr(instr->regs[n]))
|
|
|
|
return instr->regs[n];
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int output_base(struct ir3_ra_ctx *ctx)
|
|
|
|
{
|
|
|
|
/* ugg, for fragment shader we need to have input at r0.x
|
|
|
|
* (or at least if there is a way to configure it, I can't
|
|
|
|
* see how because the blob driver always uses r0.x (ie.
|
|
|
|
* all zeros)
|
|
|
|
*/
|
2014-02-25 13:51:30 +00:00
|
|
|
if (ctx->type == SHADER_FRAGMENT) {
|
|
|
|
if (ctx->half_precision)
|
2014-03-29 18:32:38 +00:00
|
|
|
return ctx->frag_face ? 4 : 3;
|
|
|
|
return ctx->frag_coord ? 8 : 4;
|
2014-02-25 13:51:30 +00:00
|
|
|
}
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* live means read before written */
|
|
|
|
static void compute_liveregs(struct ir3_ra_ctx *ctx,
|
|
|
|
struct ir3_instruction *instr, regmask_t *liveregs)
|
|
|
|
{
|
|
|
|
struct ir3_block *block = instr->block;
|
|
|
|
regmask_t written;
|
|
|
|
unsigned i, j;
|
|
|
|
|
|
|
|
regmask_init(liveregs);
|
|
|
|
regmask_init(&written);
|
|
|
|
|
|
|
|
for (instr = instr->next; instr; instr = instr->next) {
|
|
|
|
struct ir3_register *r;
|
|
|
|
|
|
|
|
if (is_meta(instr))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* check first src's read: */
|
|
|
|
for (j = 1; j < instr->regs_count; j++) {
|
|
|
|
r = reg_check(instr, j);
|
|
|
|
if (r)
|
|
|
|
regmask_set_if_not(liveregs, r, &written);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* then dst written (if assigned already): */
|
|
|
|
if (instr->flags & IR3_INSTR_MARK) {
|
|
|
|
r = reg_check(instr, 0);
|
|
|
|
if (r)
|
|
|
|
regmask_set(&written, r);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* be sure to account for output registers too: */
|
|
|
|
for (i = 0; i < block->noutputs; i++) {
|
2014-07-21 20:24:30 +01:00
|
|
|
struct ir3_register reg = REG(output_base(ctx) + i, X, 0);
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
regmask_set_if_not(liveregs, ®, &written);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* calculate registers that are clobbered before last use of 'assigner'.
|
|
|
|
* This needs to be done backwards, although it could possibly be
|
|
|
|
* combined into compute_liveregs(). (Ie. compute_liveregs() could
|
|
|
|
* reverse the list, then do this part backwards reversing the list
|
|
|
|
* again back to original order.) Otoh, probably I should try to
|
|
|
|
* construct a proper interference graph instead.
|
|
|
|
*
|
|
|
|
* XXX this need to follow the same recursion path that is used for
|
|
|
|
* to rename/assign registers (ie. ra_assign_src()).. this is a bit
|
|
|
|
* ugly right now, maybe refactor into node iterator sort of things
|
|
|
|
* that iterates nodes in the correct order?
|
|
|
|
*/
|
|
|
|
static bool compute_clobbers(struct ir3_ra_ctx *ctx,
|
|
|
|
struct ir3_instruction *instr, struct ir3_instruction *assigner,
|
|
|
|
regmask_t *liveregs)
|
|
|
|
{
|
|
|
|
unsigned i;
|
|
|
|
bool live = false, was_live = false;
|
|
|
|
|
|
|
|
if (instr == NULL) {
|
|
|
|
struct ir3_block *block = ctx->block;
|
|
|
|
|
|
|
|
/* if at the end, check outputs: */
|
|
|
|
for (i = 0; i < block->noutputs; i++)
|
|
|
|
if (block->outputs[i] == assigner)
|
|
|
|
return true;
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 1; i < instr->regs_count; i++) {
|
|
|
|
struct ir3_register *reg = instr->regs[i];
|
|
|
|
if ((reg->flags & IR3_REG_SSA) && (reg->instr == assigner)) {
|
|
|
|
if (is_meta(instr)) {
|
|
|
|
switch (instr->opc) {
|
|
|
|
case OPC_META_INPUT:
|
|
|
|
// TODO
|
|
|
|
assert(0);
|
|
|
|
break;
|
|
|
|
case OPC_META_FO:
|
|
|
|
case OPC_META_FI:
|
|
|
|
was_live |= compute_clobbers(ctx, instr->next,
|
|
|
|
instr, liveregs);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
live = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
was_live |= compute_clobbers(ctx, instr->next, assigner, liveregs);
|
|
|
|
|
|
|
|
if (was_live && (instr->regs_count > 0) &&
|
|
|
|
(instr->flags & IR3_INSTR_MARK) &&
|
|
|
|
!is_meta(instr))
|
|
|
|
regmask_set(liveregs, instr->regs[0]);
|
|
|
|
|
|
|
|
return live || was_live;
|
|
|
|
}
|
|
|
|
|
2014-07-21 20:24:30 +01:00
|
|
|
static int find_available(regmask_t *liveregs, int size, bool half)
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
{
|
|
|
|
unsigned i;
|
2014-07-21 20:24:30 +01:00
|
|
|
unsigned f = half ? IR3_REG_HALF : 0;
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
for (i = 0; i < MAX_REG - size; i++) {
|
2014-07-21 20:24:30 +01:00
|
|
|
if (!regmask_get(liveregs, ®(i, X, f))) {
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
unsigned start = i++;
|
|
|
|
for (; (i < MAX_REG) && ((i - start) < size); i++)
|
2014-07-21 20:24:30 +01:00
|
|
|
if (regmask_get(liveregs, ®(i, X, f)))
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
break;
|
|
|
|
if ((i - start) >= size)
|
|
|
|
return start;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
assert(0);
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int alloc_block(struct ir3_ra_ctx *ctx,
|
|
|
|
struct ir3_instruction *instr, int size)
|
|
|
|
{
|
|
|
|
if (!instr) {
|
|
|
|
/* special case, allocating shader outputs. At this
|
|
|
|
* point, nothing is allocated, just start the shader
|
|
|
|
* outputs at r0.x and let compute_liveregs() take
|
|
|
|
* care of the rest from here:
|
|
|
|
*/
|
|
|
|
return 0;
|
|
|
|
} else {
|
2014-07-21 20:24:30 +01:00
|
|
|
struct ir3_register *dst = instr->regs[0];
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
regmask_t liveregs;
|
2014-07-21 20:24:30 +01:00
|
|
|
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
compute_liveregs(ctx, instr, &liveregs);
|
|
|
|
|
|
|
|
// XXX XXX XXX XXX XXX XXX XXX XXX XXX
|
|
|
|
// XXX hack.. maybe ra_calc should give us a list of
|
|
|
|
// instrs to compute_clobbers() on?
|
|
|
|
if (is_meta(instr) && (instr->opc == OPC_META_INPUT) &&
|
|
|
|
(instr->regs_count == 1)) {
|
|
|
|
unsigned i, base = instr->regs[0]->num & ~0x3;
|
|
|
|
for (i = 0; i < 4; i++) {
|
2014-09-08 18:42:54 +01:00
|
|
|
struct ir3_instruction *in = NULL;
|
|
|
|
if ((base + i) < ctx->block->ninputs)
|
|
|
|
in = ctx->block->inputs[base + i];
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
if (in)
|
|
|
|
compute_clobbers(ctx, in->next, in, &liveregs);
|
|
|
|
}
|
|
|
|
} else
|
|
|
|
// XXX XXX XXX XXX XXX XXX XXX XXX XXX
|
|
|
|
compute_clobbers(ctx, instr->next, instr, &liveregs);
|
2014-07-21 20:24:30 +01:00
|
|
|
|
|
|
|
return find_available(&liveregs, size,
|
|
|
|
!!(dst->flags & IR3_REG_HALF));
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Constraint Calculation:
|
|
|
|
*/
|
|
|
|
|
|
|
|
struct ra_calc_visitor {
|
|
|
|
struct ir3_visitor base;
|
|
|
|
struct ir3_ra_assignment a;
|
|
|
|
};
|
|
|
|
|
|
|
|
static inline struct ra_calc_visitor *ra_calc_visitor(struct ir3_visitor *v)
|
|
|
|
{
|
|
|
|
return (struct ra_calc_visitor *)v;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* calculate register assignment for the instruction. If the register
|
|
|
|
* written by this instruction is required to be part of a range, to
|
|
|
|
* handle other (input/output/sam/bary.f/etc) contiguous register range
|
|
|
|
* constraints, that is calculated handled here.
|
|
|
|
*/
|
|
|
|
static void ra_calc_dst(struct ir3_visitor *v,
|
|
|
|
struct ir3_instruction *instr, struct ir3_register *reg)
|
|
|
|
{
|
|
|
|
struct ra_calc_visitor *c = ra_calc_visitor(v);
|
|
|
|
if (is_tex(instr)) {
|
|
|
|
c->a.off = 0;
|
|
|
|
c->a.num = 4;
|
|
|
|
} else {
|
|
|
|
c->a.off = 0;
|
|
|
|
c->a.num = 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
ra_calc_dst_shader_input(struct ir3_visitor *v,
|
|
|
|
struct ir3_instruction *instr, struct ir3_register *reg)
|
|
|
|
{
|
|
|
|
struct ra_calc_visitor *c = ra_calc_visitor(v);
|
|
|
|
struct ir3_block *block = instr->block;
|
|
|
|
struct ir3_register *dst = instr->regs[0];
|
|
|
|
unsigned base = dst->num & ~0x3;
|
|
|
|
unsigned i, num = 0;
|
|
|
|
|
|
|
|
assert(!(dst->flags & IR3_REG_IA));
|
|
|
|
|
|
|
|
/* check what input components we need: */
|
|
|
|
for (i = 0; i < 4; i++) {
|
|
|
|
unsigned idx = base + i;
|
|
|
|
if ((idx < block->ninputs) && block->inputs[idx])
|
|
|
|
num = i + 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
c->a.off = dst->num - base;
|
|
|
|
c->a.num = num;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void ra_calc_src_fanin(struct ir3_visitor *v,
|
|
|
|
struct ir3_instruction *instr, struct ir3_register *reg)
|
|
|
|
{
|
|
|
|
struct ra_calc_visitor *c = ra_calc_visitor(v);
|
|
|
|
unsigned srcn = ir3_instr_regno(instr, reg) - 1;
|
2014-02-16 12:41:59 +00:00
|
|
|
c->a.off += srcn;
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
c->a.num += srcn;
|
|
|
|
c->a.num = MAX2(c->a.num, instr->regs_count - 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
static const struct ir3_visitor_funcs calc_visitor_funcs = {
|
|
|
|
.instr = ir3_visit_instr,
|
|
|
|
.dst_shader_input = ra_calc_dst_shader_input,
|
|
|
|
.dst_fanout = ra_calc_dst,
|
|
|
|
.dst_fanin = ra_calc_dst,
|
|
|
|
.dst = ra_calc_dst,
|
|
|
|
.src_fanout = ir3_visit_reg,
|
|
|
|
.src_fanin = ra_calc_src_fanin,
|
|
|
|
.src = ir3_visit_reg,
|
|
|
|
};
|
|
|
|
|
|
|
|
static struct ir3_ra_assignment ra_calc(struct ir3_instruction *assigner)
|
|
|
|
{
|
|
|
|
struct ra_calc_visitor v = {
|
|
|
|
.base.funcs = &calc_visitor_funcs,
|
|
|
|
};
|
|
|
|
|
|
|
|
ir3_visit_instr(&v.base, assigner);
|
|
|
|
|
|
|
|
return v.a;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Register Assignment:
|
|
|
|
*/
|
|
|
|
|
|
|
|
struct ra_assign_visitor {
|
|
|
|
struct ir3_visitor base;
|
|
|
|
struct ir3_ra_ctx *ctx;
|
|
|
|
int num;
|
|
|
|
};
|
|
|
|
|
|
|
|
static inline struct ra_assign_visitor *ra_assign_visitor(struct ir3_visitor *v)
|
|
|
|
{
|
|
|
|
return (struct ra_assign_visitor *)v;
|
|
|
|
}
|
|
|
|
|
2014-02-22 14:46:39 +00:00
|
|
|
static type_t half_type(type_t type)
|
|
|
|
{
|
|
|
|
switch (type) {
|
|
|
|
case TYPE_F32: return TYPE_F16;
|
|
|
|
case TYPE_U32: return TYPE_U16;
|
|
|
|
case TYPE_S32: return TYPE_S16;
|
|
|
|
/* instructions may already be fixed up: */
|
|
|
|
case TYPE_F16:
|
|
|
|
case TYPE_U16:
|
|
|
|
case TYPE_S16:
|
|
|
|
return type;
|
|
|
|
default:
|
|
|
|
assert(0);
|
|
|
|
return ~0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* some instructions need fix-up if dst register is half precision: */
|
|
|
|
static void fixup_half_instr_dst(struct ir3_instruction *instr)
|
|
|
|
{
|
|
|
|
switch (instr->category) {
|
|
|
|
case 1: /* move instructions */
|
|
|
|
instr->cat1.dst_type = half_type(instr->cat1.dst_type);
|
|
|
|
break;
|
|
|
|
case 3:
|
|
|
|
switch (instr->opc) {
|
|
|
|
case OPC_MAD_F32:
|
|
|
|
instr->opc = OPC_MAD_F16;
|
|
|
|
break;
|
|
|
|
case OPC_SEL_B32:
|
|
|
|
instr->opc = OPC_SEL_B16;
|
|
|
|
break;
|
|
|
|
case OPC_SEL_S32:
|
|
|
|
instr->opc = OPC_SEL_S16;
|
|
|
|
break;
|
|
|
|
case OPC_SEL_F32:
|
|
|
|
instr->opc = OPC_SEL_F16;
|
|
|
|
break;
|
|
|
|
case OPC_SAD_S32:
|
|
|
|
instr->opc = OPC_SAD_S16;
|
|
|
|
break;
|
|
|
|
/* instructions may already be fixed up: */
|
|
|
|
case OPC_MAD_F16:
|
|
|
|
case OPC_SEL_B16:
|
|
|
|
case OPC_SEL_S16:
|
|
|
|
case OPC_SEL_F16:
|
|
|
|
case OPC_SAD_S16:
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
assert(0);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
case 5:
|
|
|
|
instr->cat5.type = half_type(instr->cat5.type);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
/* some instructions need fix-up if src register is half precision: */
|
|
|
|
static void fixup_half_instr_src(struct ir3_instruction *instr)
|
|
|
|
{
|
|
|
|
switch (instr->category) {
|
|
|
|
case 1: /* move instructions */
|
|
|
|
instr->cat1.src_type = half_type(instr->cat1.src_type);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
static void ra_assign_reg(struct ir3_visitor *v,
|
|
|
|
struct ir3_instruction *instr, struct ir3_register *reg)
|
|
|
|
{
|
|
|
|
struct ra_assign_visitor *a = ra_assign_visitor(v);
|
2014-02-25 13:51:30 +00:00
|
|
|
|
|
|
|
if (is_flow(instr) && (instr->opc == OPC_KILL))
|
|
|
|
return;
|
|
|
|
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
reg->flags &= ~IR3_REG_SSA;
|
2014-02-22 14:46:39 +00:00
|
|
|
reg->num = a->num & ~REG_HALF;
|
2014-03-29 15:42:01 +00:00
|
|
|
|
|
|
|
assert(reg->num >= 0);
|
|
|
|
|
2014-02-22 14:46:39 +00:00
|
|
|
if (a->num & REG_HALF) {
|
|
|
|
reg->flags |= IR3_REG_HALF;
|
|
|
|
/* if dst reg being assigned, patch up the instr: */
|
|
|
|
if (reg == instr->regs[0])
|
|
|
|
fixup_half_instr_dst(instr);
|
|
|
|
else
|
|
|
|
fixup_half_instr_src(instr);
|
|
|
|
}
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void ra_assign_dst_shader_input(struct ir3_visitor *v,
|
|
|
|
struct ir3_instruction *instr, struct ir3_register *reg)
|
|
|
|
{
|
|
|
|
struct ra_assign_visitor *a = ra_assign_visitor(v);
|
|
|
|
unsigned i, base = reg->num & ~0x3;
|
|
|
|
int off = base - reg->num;
|
|
|
|
|
|
|
|
ra_assign_reg(v, instr, reg);
|
|
|
|
reg->flags |= IR3_REG_IA;
|
|
|
|
|
|
|
|
/* trigger assignment of all our companion input components: */
|
|
|
|
for (i = 0; i < 4; i++) {
|
2014-09-08 18:42:54 +01:00
|
|
|
struct ir3_instruction *in = NULL;
|
|
|
|
if ((base + i) < instr->block->ninputs)
|
|
|
|
in = instr->block->inputs[base + i];
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
if (in && is_meta(in) && (in->opc == OPC_META_INPUT))
|
|
|
|
ra_assign(a->ctx, in, a->num + off + i);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void ra_assign_dst_fanout(struct ir3_visitor *v,
|
|
|
|
struct ir3_instruction *instr, struct ir3_register *reg)
|
|
|
|
{
|
|
|
|
struct ra_assign_visitor *a = ra_assign_visitor(v);
|
|
|
|
struct ir3_register *src = instr->regs[1];
|
|
|
|
ra_assign_reg(v, instr, reg);
|
|
|
|
if (src->flags & IR3_REG_SSA)
|
|
|
|
ra_assign(a->ctx, src->instr, a->num - instr->fo.off);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void ra_assign_src_fanout(struct ir3_visitor *v,
|
|
|
|
struct ir3_instruction *instr, struct ir3_register *reg)
|
|
|
|
{
|
|
|
|
struct ra_assign_visitor *a = ra_assign_visitor(v);
|
|
|
|
ra_assign_reg(v, instr, reg);
|
|
|
|
ra_assign(a->ctx, instr, a->num + instr->fo.off);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
static void ra_assign_src_fanin(struct ir3_visitor *v,
|
|
|
|
struct ir3_instruction *instr, struct ir3_register *reg)
|
|
|
|
{
|
|
|
|
struct ra_assign_visitor *a = ra_assign_visitor(v);
|
|
|
|
unsigned j, srcn = ir3_instr_regno(instr, reg) - 1;
|
|
|
|
ra_assign_reg(v, instr, reg);
|
|
|
|
ra_assign(a->ctx, instr, a->num - srcn);
|
|
|
|
for (j = 1; j < instr->regs_count; j++) {
|
|
|
|
struct ir3_register *reg = instr->regs[j];
|
|
|
|
if (reg->flags & IR3_REG_SSA) /* could be renamed already */
|
|
|
|
ra_assign(a->ctx, reg->instr, a->num - srcn + j - 1);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static const struct ir3_visitor_funcs assign_visitor_funcs = {
|
|
|
|
.instr = ir3_visit_instr,
|
|
|
|
.dst_shader_input = ra_assign_dst_shader_input,
|
|
|
|
.dst_fanout = ra_assign_dst_fanout,
|
|
|
|
.dst_fanin = ra_assign_reg,
|
|
|
|
.dst = ra_assign_reg,
|
|
|
|
.src_fanout = ra_assign_src_fanout,
|
|
|
|
.src_fanin = ra_assign_src_fanin,
|
|
|
|
.src = ra_assign_reg,
|
|
|
|
};
|
|
|
|
|
|
|
|
static void ra_assign(struct ir3_ra_ctx *ctx,
|
|
|
|
struct ir3_instruction *assigner, int num)
|
|
|
|
{
|
|
|
|
struct ra_assign_visitor v = {
|
|
|
|
.base.funcs = &assign_visitor_funcs,
|
|
|
|
.ctx = ctx,
|
|
|
|
.num = num,
|
|
|
|
};
|
|
|
|
|
|
|
|
/* if we've already visited this instruction, bail now: */
|
|
|
|
if (ir3_instr_check_mark(assigner)) {
|
2014-02-22 14:46:39 +00:00
|
|
|
debug_assert(assigner->regs[0]->num == (num & ~REG_HALF));
|
|
|
|
if (assigner->regs[0]->num != (num & ~REG_HALF)) {
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
/* impossible situation, should have been resolved
|
|
|
|
* at an earlier stage by inserting extra mov's:
|
|
|
|
*/
|
|
|
|
ctx->error = true;
|
|
|
|
}
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
ir3_visit_instr(&v.base, assigner);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
|
|
|
static void ir3_instr_ra(struct ir3_ra_ctx *ctx,
|
|
|
|
struct ir3_instruction *instr)
|
|
|
|
{
|
2014-07-21 20:24:30 +01:00
|
|
|
struct ir3_register *dst;
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
unsigned num;
|
|
|
|
|
|
|
|
/* skip over nop's */
|
|
|
|
if (instr->regs_count == 0)
|
|
|
|
return;
|
|
|
|
|
2014-07-21 20:24:30 +01:00
|
|
|
dst = instr->regs[0];
|
2014-02-16 12:35:20 +00:00
|
|
|
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
/* if we've already visited this instruction, bail now: */
|
|
|
|
if (instr->flags & IR3_INSTR_MARK)
|
|
|
|
return;
|
|
|
|
|
|
|
|
/* allocate register(s): */
|
2014-07-25 14:49:41 +01:00
|
|
|
if (is_addr(instr)) {
|
2014-07-21 20:24:30 +01:00
|
|
|
num = instr->regs[2]->num;
|
|
|
|
} else if (reg_gpr(dst)) {
|
|
|
|
struct ir3_ra_assignment a;
|
|
|
|
a = ra_calc(instr);
|
|
|
|
num = alloc_block(ctx, instr, a.num) + a.off;
|
|
|
|
} else if (dst->flags & IR3_REG_ADDR) {
|
|
|
|
dst->flags &= ~IR3_REG_ADDR;
|
|
|
|
num = regid(REG_A0, 0) | REG_HALF;
|
|
|
|
} else {
|
2014-07-23 20:08:40 +01:00
|
|
|
/* predicate register (p0).. etc */
|
|
|
|
return;
|
2014-07-21 20:24:30 +01:00
|
|
|
}
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
|
|
|
|
ra_assign(ctx, instr, num);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* flatten into shader: */
|
|
|
|
// XXX this should probably be somewhere else:
|
|
|
|
static void legalize(struct ir3_ra_ctx *ctx, struct ir3_block *block)
|
|
|
|
{
|
|
|
|
struct ir3_instruction *n;
|
2014-07-25 15:56:23 +01:00
|
|
|
struct ir3 *shader = block->shader;
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
struct ir3_instruction *end =
|
|
|
|
ir3_instr_create(block, 0, OPC_END);
|
|
|
|
struct ir3_instruction *last_input = NULL;
|
2014-07-21 20:24:30 +01:00
|
|
|
struct ir3_instruction *last_rel = NULL;
|
2014-02-25 13:02:28 +00:00
|
|
|
regmask_t needs_ss_war; /* write after read */
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
regmask_t needs_ss;
|
|
|
|
regmask_t needs_sy;
|
|
|
|
|
2014-02-16 00:01:38 +00:00
|
|
|
regmask_init(&needs_ss_war);
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
regmask_init(&needs_ss);
|
|
|
|
regmask_init(&needs_sy);
|
|
|
|
|
|
|
|
shader->instrs_count = 0;
|
|
|
|
|
|
|
|
for (n = block->head; n; n = n->next) {
|
2014-02-16 00:01:38 +00:00
|
|
|
struct ir3_register *reg;
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
unsigned i;
|
|
|
|
|
|
|
|
if (is_meta(n))
|
|
|
|
continue;
|
|
|
|
|
freedreno/ir3: fix lockups with lame FRAG shaders
Shaders like:
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL IN[0], GENERIC[0], PERSPECTIVE
DCL OUT[0], COLOR
DCL SAMP[0]
DCL TEMP[0], LOCAL
IMM[0] FLT32 { 0.0000, 1.0000, 0.0000, 0.0000}
0: TEX TEMP[0], IN[0].xyyy, SAMP[0], 2D
1: MOV OUT[0], IMM[0].xyxx
2: END
cause unhappyness. They have an IN[], but once this is compiled the
useless TEX instruction goes away. Leaving a varying that is never
fetched, which makes the hw unhappy.
In the process fix a signed vs unsigned compare. If the vertex shader
has max_reg=-1, MAX2() vs an unsigned would not give the desired result.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-10-03 15:02:31 +01:00
|
|
|
if (is_input(n)) {
|
|
|
|
struct ir3_register *inloc = n->regs[1];
|
|
|
|
assert(inloc->flags & IR3_REG_IMMED);
|
|
|
|
ctx->max_bary = MAX2(ctx->max_bary, inloc->iim_val);
|
|
|
|
}
|
|
|
|
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
for (i = 1; i < n->regs_count; i++) {
|
2014-02-16 00:01:38 +00:00
|
|
|
reg = n->regs[i];
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
|
2014-02-16 12:35:20 +00:00
|
|
|
if (reg_gpr(reg)) {
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
|
|
|
|
/* TODO: we probably only need (ss) for alu
|
|
|
|
* instr consuming sfu result.. need to make
|
|
|
|
* some tests for both this and (sy)..
|
|
|
|
*/
|
|
|
|
if (regmask_get(&needs_ss, reg)) {
|
|
|
|
n->flags |= IR3_INSTR_SS;
|
|
|
|
regmask_init(&needs_ss);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (regmask_get(&needs_sy, reg)) {
|
|
|
|
n->flags |= IR3_INSTR_SY;
|
|
|
|
regmask_init(&needs_sy);
|
|
|
|
}
|
|
|
|
}
|
2014-07-21 20:24:30 +01:00
|
|
|
|
|
|
|
/* TODO: is it valid to have address reg loaded from a
|
|
|
|
* relative src (ie. mova a0, c<a0.x+4>)? If so, the
|
|
|
|
* last_rel check below should be moved ahead of this:
|
|
|
|
*/
|
|
|
|
if (reg->flags & IR3_REG_RELATIV)
|
|
|
|
last_rel = n;
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
}
|
|
|
|
|
2014-02-16 00:01:38 +00:00
|
|
|
if (n->regs_count > 0) {
|
|
|
|
reg = n->regs[0];
|
|
|
|
if (regmask_get(&needs_ss_war, reg)) {
|
|
|
|
n->flags |= IR3_INSTR_SS;
|
|
|
|
regmask_init(&needs_ss_war); // ??? I assume?
|
|
|
|
}
|
2014-07-21 20:24:30 +01:00
|
|
|
|
|
|
|
if (last_rel && (reg->num == regid(REG_A0, 0))) {
|
|
|
|
last_rel->flags |= IR3_INSTR_UL;
|
|
|
|
last_rel = NULL;
|
|
|
|
}
|
2014-02-16 00:01:38 +00:00
|
|
|
}
|
|
|
|
|
2014-02-19 16:55:25 +00:00
|
|
|
/* cat5+ does not have an (ss) bit, if needed we need to
|
|
|
|
* insert a nop to carry the sync flag. Would be kinda
|
|
|
|
* clever if we were aware of this during scheduling, but
|
|
|
|
* this should be a pretty rare case:
|
|
|
|
*/
|
|
|
|
if ((n->flags & IR3_INSTR_SS) && (n->category >= 5)) {
|
|
|
|
struct ir3_instruction *nop;
|
|
|
|
nop = ir3_instr_create(block, 0, OPC_NOP);
|
|
|
|
nop->flags |= IR3_INSTR_SS;
|
|
|
|
n->flags &= ~IR3_INSTR_SS;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* need to be able to set (ss) on first instruction: */
|
|
|
|
if ((shader->instrs_count == 0) && (n->category >= 5))
|
|
|
|
ir3_instr_create(block, 0, OPC_NOP);
|
|
|
|
|
2014-02-21 23:03:30 +00:00
|
|
|
if (is_nop(n) && shader->instrs_count) {
|
|
|
|
struct ir3_instruction *last =
|
|
|
|
shader->instrs[shader->instrs_count-1];
|
|
|
|
if (is_nop(last) && (last->repeat < 5)) {
|
|
|
|
last->repeat++;
|
|
|
|
last->flags |= n->flags;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
shader->instrs[shader->instrs_count++] = n;
|
|
|
|
|
|
|
|
if (is_sfu(n))
|
|
|
|
regmask_set(&needs_ss, n->regs[0]);
|
2014-02-19 16:55:25 +00:00
|
|
|
|
2014-04-08 19:14:43 +01:00
|
|
|
if (is_tex(n)) {
|
|
|
|
/* this ends up being the # of samp instructions.. but that
|
|
|
|
* is ok, everything else only cares whether it is zero or
|
|
|
|
* not. We do this here, rather than when we encounter a
|
|
|
|
* SAMP decl, because (especially in binning pass shader)
|
|
|
|
* the samp instruction(s) could get eliminated if the
|
|
|
|
* result is not used.
|
|
|
|
*/
|
|
|
|
ctx->has_samp = true;
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
regmask_set(&needs_sy, n->regs[0]);
|
2014-04-08 19:14:43 +01:00
|
|
|
}
|
2014-02-19 16:55:25 +00:00
|
|
|
|
|
|
|
/* both tex/sfu appear to not always immediately consume
|
|
|
|
* their src register(s):
|
|
|
|
*/
|
|
|
|
if (is_tex(n) || is_sfu(n)) {
|
2014-02-16 00:01:38 +00:00
|
|
|
for (i = 1; i < n->regs_count; i++) {
|
|
|
|
reg = n->regs[i];
|
|
|
|
if (reg_gpr(reg))
|
|
|
|
regmask_set(&needs_ss_war, reg);
|
|
|
|
}
|
|
|
|
}
|
2014-02-19 16:55:25 +00:00
|
|
|
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
if (is_input(n))
|
|
|
|
last_input = n;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (last_input)
|
|
|
|
last_input->regs[0]->flags |= IR3_REG_EI;
|
|
|
|
|
2014-07-21 20:24:30 +01:00
|
|
|
if (last_rel)
|
|
|
|
last_rel->flags |= IR3_INSTR_UL;
|
|
|
|
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
shader->instrs[shader->instrs_count++] = end;
|
|
|
|
|
|
|
|
shader->instrs[0]->flags |= IR3_INSTR_SS | IR3_INSTR_SY;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int block_ra(struct ir3_ra_ctx *ctx, struct ir3_block *block)
|
|
|
|
{
|
|
|
|
struct ir3_instruction *n;
|
|
|
|
|
|
|
|
if (!block->parent) {
|
2014-02-25 13:51:30 +00:00
|
|
|
unsigned i, j;
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
int base, off = output_base(ctx);
|
|
|
|
|
|
|
|
base = alloc_block(ctx, NULL, block->noutputs + off);
|
|
|
|
|
2014-02-22 14:46:39 +00:00
|
|
|
if (ctx->half_precision)
|
|
|
|
base |= REG_HALF;
|
|
|
|
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
for (i = 0; i < block->noutputs; i++)
|
2014-02-25 13:51:30 +00:00
|
|
|
if (block->outputs[i] && !is_kill(block->outputs[i]))
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
ra_assign(ctx, block->outputs[i], base + i + off);
|
|
|
|
|
|
|
|
if (ctx->type == SHADER_FRAGMENT) {
|
2014-02-25 13:51:30 +00:00
|
|
|
i = 0;
|
|
|
|
if (ctx->frag_face) {
|
|
|
|
/* if we have frag_face, it gets hr0.x */
|
|
|
|
ra_assign(ctx, block->inputs[i], REG_HALF | 0);
|
|
|
|
i += 4;
|
|
|
|
}
|
|
|
|
for (j = 0; i < block->ninputs; i++, j++)
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
if (block->inputs[i])
|
2014-02-25 13:51:30 +00:00
|
|
|
ra_assign(ctx, block->inputs[i], (base & ~REG_HALF) + j);
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
} else {
|
|
|
|
for (i = 0; i < block->ninputs; i++)
|
|
|
|
if (block->inputs[i])
|
|
|
|
ir3_instr_ra(ctx, block->inputs[i]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* then loop over instruction list and assign registers:
|
|
|
|
*/
|
|
|
|
n = block->head;
|
|
|
|
while (n) {
|
|
|
|
ir3_instr_ra(ctx, n);
|
|
|
|
if (ctx->error)
|
|
|
|
return -1;
|
|
|
|
n = n->next;
|
|
|
|
}
|
|
|
|
|
|
|
|
legalize(ctx, block);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2014-02-22 14:46:39 +00:00
|
|
|
int ir3_block_ra(struct ir3_block *block, enum shader_t type,
|
2014-04-08 19:14:43 +01:00
|
|
|
bool half_precision, bool frag_coord, bool frag_face,
|
freedreno/ir3: fix lockups with lame FRAG shaders
Shaders like:
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL IN[0], GENERIC[0], PERSPECTIVE
DCL OUT[0], COLOR
DCL SAMP[0]
DCL TEMP[0], LOCAL
IMM[0] FLT32 { 0.0000, 1.0000, 0.0000, 0.0000}
0: TEX TEMP[0], IN[0].xyyy, SAMP[0], 2D
1: MOV OUT[0], IMM[0].xyxx
2: END
cause unhappyness. They have an IN[], but once this is compiled the
useless TEX instruction goes away. Leaving a varying that is never
fetched, which makes the hw unhappy.
In the process fix a signed vs unsigned compare. If the vertex shader
has max_reg=-1, MAX2() vs an unsigned would not give the desired result.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-10-03 15:02:31 +01:00
|
|
|
bool *has_samp, int *max_bary)
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
{
|
|
|
|
struct ir3_ra_ctx ctx = {
|
|
|
|
.block = block,
|
|
|
|
.type = type,
|
2014-02-22 14:46:39 +00:00
|
|
|
.half_precision = half_precision,
|
2014-02-25 13:51:30 +00:00
|
|
|
.frag_coord = frag_coord,
|
|
|
|
.frag_face = frag_face,
|
freedreno/ir3: fix lockups with lame FRAG shaders
Shaders like:
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL IN[0], GENERIC[0], PERSPECTIVE
DCL OUT[0], COLOR
DCL SAMP[0]
DCL TEMP[0], LOCAL
IMM[0] FLT32 { 0.0000, 1.0000, 0.0000, 0.0000}
0: TEX TEMP[0], IN[0].xyyy, SAMP[0], 2D
1: MOV OUT[0], IMM[0].xyxx
2: END
cause unhappyness. They have an IN[], but once this is compiled the
useless TEX instruction goes away. Leaving a varying that is never
fetched, which makes the hw unhappy.
In the process fix a signed vs unsigned compare. If the vertex shader
has max_reg=-1, MAX2() vs an unsigned would not give the desired result.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-10-03 15:02:31 +01:00
|
|
|
.max_bary = -1,
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
};
|
2014-04-08 19:14:43 +01:00
|
|
|
int ret;
|
|
|
|
|
2014-07-25 15:56:23 +01:00
|
|
|
ir3_clear_mark(block->shader);
|
2014-04-08 19:14:43 +01:00
|
|
|
ret = block_ra(&ctx, block);
|
|
|
|
*has_samp = ctx.has_samp;
|
freedreno/ir3: fix lockups with lame FRAG shaders
Shaders like:
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL IN[0], GENERIC[0], PERSPECTIVE
DCL OUT[0], COLOR
DCL SAMP[0]
DCL TEMP[0], LOCAL
IMM[0] FLT32 { 0.0000, 1.0000, 0.0000, 0.0000}
0: TEX TEMP[0], IN[0].xyyy, SAMP[0], 2D
1: MOV OUT[0], IMM[0].xyxx
2: END
cause unhappyness. They have an IN[], but once this is compiled the
useless TEX instruction goes away. Leaving a varying that is never
fetched, which makes the hw unhappy.
In the process fix a signed vs unsigned compare. If the vertex shader
has max_reg=-1, MAX2() vs an unsigned would not give the desired result.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-10-03 15:02:31 +01:00
|
|
|
*max_bary = ctx.max_bary;
|
2014-04-08 19:14:43 +01:00
|
|
|
|
|
|
|
return ret;
|
freedreno/a3xx/compiler: new compiler
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.png
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
https://github.com/freedreno/mesa/commit/163b6306b1660e05ece2f00d264a8393d99b6f12
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-01-29 22:18:49 +00:00
|
|
|
}
|