There are more useful nir passes added since initial conversion to nir.
But ir3 was never updated to use them.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Agressively lowering all if/else to selects in some extreme cases
results in much higher register pressure. Using peephole select instead
with a modest threshold speeds up alu2 4x!
16 seems like a good limit, low enough to help alu2 but not too low that
it penalizes everything else. With a bit better scheduling of the
instruction that moves a value into a predicate register, we might be
able to lower this limit a bit more in the future, but since we need 6
cycles from the move to predicate register to predicated branch, that
puts some sort of lower bound on how far we can lower this threshold.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cmdstream traces from blob make it clear that the blob driver dev's
*think* a5xx has a real (non-zero-based) vtxid. But reality claims
differently.
Fixes ./bin/gl-3.2-basevertex-vertexid and probably others.
This means draw-indirect is going to need some gymnastics to copy
base-vertex into uniform. (a4xx probably needs that too.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Unfortunately Adreno A4xx hardware returns incorrect results with the
GATHER4 opcodes. As a result, we have to lower to 4 individual texture
calls (txl since we have to force lod to 0). We achieve this using
offsets, including on cube maps which normally never have offsets.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Give algebraic-opt pass a chance to catch udiv by const power-of-two,
before running lower-idiv pass.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Not sure how we didn't hit this already, but since we want fdiv
converted into mul + rcp, we should set this.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The i965 driver has its own pass for fusing mul+add combinations that's
much smarter than what nir_opt_algebraic can do so we don't want to get the
nir_opt_algebraic one just because we didn't set lower_ffma.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lower lrp when operating with double operands because float version of
lrp is also lowered.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that there is a pass to do this in NIR, lets just use that and
manage the variants ourself, rather than letting state-tracker do it.
This way, mesa/st will precompile shaders without requiring
ST_DEBUG=precompile (which requires a debug build).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
A later patch will add lower_flrp64 option to NIR.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We seem to need range reduction to get sane results. Fixes glmark2
jellyfish bench, and a whole bunch of
dEQP-GLES3.functional.shaders.builtin_functions.precision.{sin,cos,tan}.*
v2: squashed in android build fixes from Rob Herring
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This *seems* like a hw bug, and maybe only applies to certain a4xx
variants/revisions. But setting the SRGB bit in sampler view state
(texconst0) causes invalid alpha for ASTC textures. Work around this
by doing the srgb->linear conversion in the shader instead.
This fixes 392 dEQP tests: dEQP-GLES3.functional.texture.*astc*srgb*
(The remaining fails seem to be a bug w/ ASTC + linear filtering, also
possibly a420.0 specific.)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The old version of the pass only worked on globals and locals and always
left inputs, outputs, uniforms, etc. alone.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The following commits broke things by starting to feed us unhandled
extract_u16/extract_u8 opcodes:
commit 905ff86198
Author: Matt Turner <mattst88@gmail.com>
AuthorDate: Wed Feb 3 14:28:31 2016 -0800
Commit: Matt Turner <mattst88@gmail.com>
CommitDate: Fri Mar 4 11:52:34 2016 -0800
nir: Recognize open-coded extract_u16.
commit 76289fbfa8
Author: Matt Turner <mattst88@gmail.com>
AuthorDate: Thu Jan 21 09:09:48 2016 -0800
Commit: Matt Turner <mattst88@gmail.com>
CommitDate: Fri Mar 4 11:52:34 2016 -0800
nir: Recognize open-coded extract_u8.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Immediately convert into NIR and do an initial key-agnostic lowering/
optimization pass. This should let us share most of the per-variant
transformations between each variant, and hopefully minimize the draw-
time variant creation part of the compilation process.
Signed-off-by: Rob Clark <robclark@freedesktop.org>