When 'x' is the result of a scmp op:
x != 0.0 or x == 1.0: passthrough
x == 0.0 or x != 1.0: invert
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Add generic lowerings for fall_equalN/fany_nequalN. These should be optimal
for vec4 backends that doesn't have any special instructions for it, as
long as they support saturate.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Add simple fdot2 optimizations that are missing.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For backends that don't have a 'fdph' instructions
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This version has less ops for the same precision.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
This is to allow optimizations in nir_opt_algebraic not otherwise possible
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Previously we'd hit the unreachable() for uploading RGTC.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Structure itself wasn't freed during context tear-down, causing a
memory leak on iris.
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This will effectively enable the optimization in anv.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This allows us to drop legacy userclip plane handling in both the vec4
and FS backends, and simplifies a few interfaces.
v2 (Jason Ekstrand):
- Move brw_nir_lower_legacy_clipping to brw_nir_uniforms.cpp because
it's i965-specific.
- Handle adding the params in brw_nir_lower_legacy_clipping
- Call brw_nir_lower_legacy_clipping from brw_codegen_vs_prog
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of building a constant mask (which depends on knowing the
subgroup size), we build an expression. Because the pass uses the
nir_shader_lower_instructions helper, subgroup lowering will be run on
any newly emitted instructions as well as the previously existing
instructions. In particular, if the subgroup size is known, the newly
emitted subgroup_size intrinsic will get turned into a constant and a
later constant folding pass will clean it up.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The rules for gl_SubgroupSize in Vulkan require that it be a constant
that can be queried through the API. However, all GL requires is that
it's a uniform. Instead of always claiming that the subgroup size in
the shader is 32 in GL like we have to do for Vulkan, claim 8 for
geometry stages, the maximum for fragment shaders, and the actual size
for compute.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Instead of lowering the subgroup size so early, wait until we have more
information. In particular, we're going to want different subgroup
sizes from different stages depending on the API. We also defer
lowering of subgroup masks because the ge/gt masks require the subgroup
size to generate a subgroup mask.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested on Gen > 9.
v2: 1) Fix lowering
2) Keep a consistent i/u order (Matt Turner)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Add freeing of SubroutineIndexes to the _mesa_free_shader_state.
Fixes: 4566aaaa5b ("mesa/subroutines: start adding per-context
subroutine index support (v1.1)")
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
warning: struct 'fs_reg' was previously declared as a class
Fixes: e64be391 ("intel/compiler: generalize the combine constants pass")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Fixes: ed23335a31 ("gallium: use enums in p_shader_tokens.h (v2)")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When a pipeline uses transform feedback, the driver fallbacks to
the legacy path because NGG support for streamout is a non-trivial
amount of work.
AMDVLK also uses the legacy path for streamout, while RadeonSI
uses the new NGG path.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
NGG GS for streamout requires a bunch of work, so enable it with
the legacy path only for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For some reasons, InstanceID is VGPR3 although StepRate0 is set to 1.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The parameters were getting messy and I have to add a few more
for compute shaders, so clean it up before proceeding.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
I can't find a single place where nir_lower_io is called after going out
of SSA which is the only real reason why you wouldn't do this. Returning
SSA defs is more idiomatic and is required for the next commit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Numbers for Talos:
gfx10 without binning: 77.0 77.7 77.2 77.6
gfx10 with binning: 82.3 82.0 82.7 82.4
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>