Commit Graph

868 Commits

Author SHA1 Message Date
Timur Kristóf 5aa39253cb nir: Rename nir_get_io_vertex_index_src and include per-primitive I/O.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13466>
2021-11-16 07:46:55 +00:00
Marek Olšák 33b4eb149e nir: add new SSA instruction scheduler grouping loads into indirection groups
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13604>
2021-11-08 21:20:11 +00:00
Jason Ekstrand 3ace6b968b compiler/types: Add a texture type
This is separate from images and samplers.  It's a texture (not a
storage image) without a sampler.  We also add C-visible helpers to
convert between sampler and image types.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13389>
2021-10-16 05:49:34 +00:00
Jason Ekstrand d343aef942 nir/serialize: Pack deref modes better
With nir_var_image, we've now run out of bits in our packed blob for
deref instructions.  We could revert to an unpacked blob or we could be
a bit more clever about how we encode deref modes and pack them into 5
bits.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13386>
2021-10-16 03:47:10 +00:00
Jason Ekstrand 9272a952c9 nir: Re-arrange the variable modes
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13386>
2021-10-16 03:47:10 +00:00
Jason Ekstrand 956199e870 nir: s/nir_var_mem_image/nir_var_image/g
We typically use nir_var_mem_* for stuff that has an explicit byte-based
memory layout.  Images are opaque.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13386>
2021-10-16 03:47:10 +00:00
Jason Ekstrand 4c5a88d735 nir: Validate image variable modes
We can also significantly simplify the foreach_image_variable helper.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
2021-10-15 14:58:56 +00:00
Jason Ekstrand 2a53c33fbe nir: Add a nir_foreach_image_variable() iterator
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
2021-10-15 14:58:55 +00:00
Caio Marcelo de Oliveira Filho de3705edb0 nir: Add nir_var_mem_image
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4743>
2021-10-15 14:58:55 +00:00
Boris Brezillon 56251f924d nir: Add a nir_sysvals_to_varyings() helper
Allow backends to turn some sysvals into input varyings so the frontend
(in our case spirv_to_nir()) doesn't have to bother selecting which
one is expected.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13017>
2021-10-07 19:45:35 +00:00
Rhys Perry f3723822a4 nir/lower_tex: add lower_to_fragment_fetch_amd
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12214>
2021-10-07 15:36:39 +00:00
Rhys Perry 225fe37c14 nir: add _amd suffix to fragment_mask_fetch and fragment_fetch texops
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12214>
2021-10-07 15:36:39 +00:00
Emma Anholt 7dde279db5 nir-to-tgsi: Avoid emitting TXL just for lod 0 on non-vertex shaders.
Prompted by comparing virgl fails and finding that it has issues with
immediate args to TXL/TXB, at least.

Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12800>
2021-10-06 03:44:17 +00:00
Rhys Perry 69f9a96af1 nir: add nir_src_components_read()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12472>
2021-09-24 18:41:18 +00:00
Emma Anholt aed4c0b5a9 nir: Drop the unused instr arg for src/dest copy functions.
Now that we don't use ralloc, we don't need this arg to get at the right
ralloc ctx.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
2021-09-14 17:53:06 +00:00
Emma Anholt d1a2870f78 nir: Add all allocated instructions to a GC list.
Right now we're using ralloc to GC our NIR instructions, but ralloc has
significant overhead for its recursive nature so it would be nice to use a
simpler mechanism for GCing instructions.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
2021-09-14 17:53:06 +00:00
Emma Anholt b99efb8af0 nir: Pull the instr list free function out to a helper.
With the de-rallocing, we're going to have some more places that free a
list of instrs.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
2021-09-14 17:53:05 +00:00
Emma Anholt 36d9bdca0b nir: Add a nir_instr_free() to replace ralloc_free(instr).
This will gain another step shortly.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
2021-09-14 17:53:05 +00:00
Qiang Yu 7054c1b7fd nir/linker: pack varyings with different interpolation qualifier
Driver like radeonsi load varying in a scalar manner, so prefer to pack
varying with different interpolation qualifier into same slot to save
space.

But driver like panfrost/bifrost can load varying in vector manner,
so prefer to pack varying with same interpolation qualifier.

Driver can add interpolation qualifiers which are able to be
packed into same varying slot to pack_varying_options nir option.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12537>
2021-09-09 06:00:58 +00:00
Rhys Perry 41ecef7855 nir: add sdot_2x16 and udot_2x16 opcodes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>
2021-09-03 13:21:27 +00:00
Rhys Perry ae00f5af61 nir: separate lower_add_sat
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>
2021-09-03 13:21:27 +00:00
Emma Anholt 01759d3fb2 nir: Set .driver_location for GLSL UBO/SSBOs when we lower to block indices.
Without this, there's no way to match the UBO nir_variable declarations to
the load_ubo intrinsics referencing their data.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12175>
2021-08-31 20:12:16 +00:00
Caio Marcelo de Oliveira Filho f95daad3a2 nir: Add a way to identify per-primitive variables
Per-primitive is similar to per-vertex attributes, but applies to all
fragments of the primitive without any interpolation involved.

Because they are regular input and outputs, keep track in shader_info
of which I/O is per-primitive so we can distinguish them after deref
lowering.  These fields can be used combined with the regular
`inputs_read`, `outputs_written` and `outputs_read`.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
2021-08-28 03:56:42 +00:00
Caio Marcelo de Oliveira Filho 927584fa67 nir: Update documentation for location to mention Task/Mesh
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10600>
2021-08-28 03:56:42 +00:00
Lionel Landwerlin a13e79843e nir: prevent peephole from generating invalid NIR
We can't append instructions following a return/halt instruction
because the control flow helpers will modify the successor of the
block containing the return/halt. And the NIR validator enforces that
the return/halt must have the end of the function as successor.

This tends to happen following lower_shader_calls lowering which
inserts halts. This probably doesn't prevent the optimization, it'll
just happen in one of the return shaders after the halt has been
removed.

v2: Move prev block ending check earlier in the function (Daniel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12506>
2021-08-25 11:38:21 +00:00
Ian Romanick 6c18a3b497 nir/opcodes: Add integer dot-product opcodes
Six opcodes are added: sdot_4x8_iadd, udot_4x8_uadd, sudot_4x8_iadd,
sdot_4x8_iadd_sat, udot_4x8_uadd_sate, and sudot_4x8_iadd_sat.  These
represent the combinations of integer dot-product and add that operate
on packed source vectors.  That is, the four 8-bit values for each
vector is stored in a single 32-bit integer.

Some hardware may prefer to operate on unpacked byte vectors.  When such
hardware comes to Mesa, we'll have to figure out how to name things.

v2: Add nir_op_iudp4a and nir_op_iudp4a_sat instructions.  These opcodes
are not 2-source commutative.

v3: Rename all opcodes to be more like some existing 4x8 opcodes.
Suggested by Timur.  Change type of packed vector sources to uint32,
change types of constant folding variables to have explicit size, and
delete some extra casts.  All suggested by Jason.

v4: Fix typo previously noticed by Alyssa but missed in v2.

v5: Add has_sudot_4x8 flag.  Requested by Rhys.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>
2021-08-24 19:58:57 +00:00
Qiang Yu 0b9639c35d nir/loop_analyze: record induction variables for each loop
For being used by uniform inline lowering pass.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11950>
2021-08-19 02:17:35 +00:00
Ian Romanick f0a8a9816a nir: intel/compiler: Add and use nir_op_pack_32_4x8_split
A lot of CTS tests write a u8vec4 or an i8vec4 to an SSBO.  This results
in a lot of shifts and MOVs.  When that pattern can be recognized, the
individual 8-bit components can be packed much more efficiently.

v2: Rebase on b4369de27f ("nir/lower_packing: use
shader_instructions_pass")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
2021-08-18 22:03:37 +00:00
Rhys Perry ed70b256ce nir: add ffma creation helpers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>
2021-08-16 17:19:45 +00:00
Emma Anholt 673cc9323a nir: Move phi src setup to a helper.
Cleans up the ralloc/list push code all over the tree.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11772>
2021-08-13 16:11:57 +00:00
Pierre-Eric Pelloux-Prayer 7684d57a05 nir: add a pass to optimize "gl_FragDepth = gl_FragCoord.z" away
gl_FragDepth default value is gl_FragCoord.z so if a shader does:

   gl_FragDepth = gl_FragCoord.z

we can drop this assignment.

v2: use nir_ssa_scalar_resolved and don't do this is gl_FragDepth
    is wrote multiple times (Jason)
v3: - move to its own pass (Jason)
    - handle var = NULL (Rhys)
v4: refactoring (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10697>
2021-08-11 11:00:11 +02:00
Dave Airlie ad92c2b253 nir: add fisnormal lowering
just lower the 32-bit version for now.

Thanks to alyssa for this suggested lowering.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12207>
2021-08-06 14:27:48 +10:00
Jason Ekstrand 0ddac113f8 nir: Removing uses of SSA defs destroys SSA liveness
The liveness information will be a superset of real liveness so it's
unlikely something will explode if it tries to use it.  However, it is
out-of-date and should be re-run if someone really wants it.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12186>
2021-08-03 21:36:53 +00:00
Timothy Arceri a7f2e683de nir: move nir_block_ends_in_break() to nir.h
Will be used in a following commit.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>
2021-08-03 10:54:50 +00:00
Timothy Arceri a9ed4538ab nir: add indirect loop unrolling to compiler options
This is where it should be rather than having to pass it into the
optimisation pass every time.

It also allows us to call the loop analysis pass without having to
duplicate these options which we will do later in this series.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>
2021-08-03 10:54:50 +00:00
Emma Anholt 9ffd00bcf1 nir_to_tgsi: Pack our tex coords into vec4 nir_tex_src_backend[12].
For TGSI, we need the coordinate, comparator, bias, and LOD all together
in the first two vec4 args, and by doing it in the backend we were
generating extra MOVs.

softpipe shader-db results:
total instructions in shared programs: 2985416 -> 2953625 (-1.06%)
instructions in affected programs: 499937 -> 468146 (-6.36%)
total temps in shared programs: 544769 -> 565869 (3.87%)
temps in affected programs: 105469 -> 126569 (20.01%)

i915g shader-db:
total instructions in shared programs: 371625 -> 369594 (-0.55%)
instructions in affected programs: 24903 -> 22872 (-8.16%)
total tex_indirect in shared programs: 11381 -> 11365 (-0.14%)
tex_indirect in affected programs: 43 -> 27 (-37.21%)
LOST:   7
GAINED: 16

The temps increase is the pre-existing issue that we never release temps
for NIR regs, which doesn't matter much for softpipe (just memory/cache
footprint) but does for i915g as seen by shaders that no longer compile
(though overall we seem to win).

Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11912>
2021-07-29 09:05:05 -07:00
Jason Ekstrand 74ec2b12be nir/lower_tex: Rework invalid implicit LOD lowering
Only fragment and some compute shaders support implicit derivatives.
They're totally meaningless without helper invocations and some
understanding of the dispatch pattern.  We've got code to lower
nir_texop_tex in these shader stages to use an explicit derivative of 0
but it was pretty badly broken:

 1. It only handled nir_texop_tex, not nir_texop_txb or nir_texop_lod.

 2. It didn't take min_lod into account

 3. It was conflated with adding a missing LOD parameter to opcodes
    which expect one such as nir_texop_txf.  While not really a bug,
    this does make it way harder to reason about the code.

 4. Unless you set a flag (which most drivers don't), it left the
    opcode nir_texop_tex instead of nir_texop_txl which it should have
    been.

This reworks it to go through roughly the same path as other LOD
lowering only with a constant lod of 0 instead of calling out to
nir_texop_lod.  We also get rid of the lower_tex_without_implicit_lod
flag because most drivers set it and those that don't are probably
subtly broken.  If someone really wants to get nir_texop_tex in their
vertex shaders, they can write a new patch to add the flag back in.

Fixes: e382890e25 "nir: set default lod to texture opcodes that..."
Fixes: d5ac5d6e83 "nir: Add option to lower tex to txl when..."
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>
2021-07-23 15:53:57 +00:00
Jason Ekstrand fa717a202c docs,nir: Document NIR texture instructions
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>
2021-07-23 15:53:57 +00:00
Jason Ekstrand 4465ca296d nir: Suffix all the MCS texture stuff _intel
It's intel-specific, used to get at MSAA compression information.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11775>
2021-07-23 15:53:57 +00:00
Jason Ekstrand 60b5faf572 nir/lower_tex: Add a lower_txs_cube_array option
Several bits of hardware require the division by 6 to happen in the
shader.  May as well have common lowering for it.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12005>
2021-07-22 14:22:35 -05:00
Jordan Justen 6898549d56 nir: Add nir_lower_image() to lower cube image sizes
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9466>
2021-07-21 11:02:15 -07:00
Sagar Ghuge 06ab737686 nir: Add optimizations for iadd3
This patch also adds has_iadd3 bit to give more control if backend
supports ternary add instruction or not.

v2:
- Add patterns in late optimization (Connor Abbott)

Suggested-by: Alyssa/Jason

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11596>
2021-07-16 15:59:56 +00:00
Jason Ekstrand 624e799cc3 nir: Drop nir_ssa_def::name and nir_register::name
We say that they're for debug only but we don't really have a good
policy around when to set them and when not to.  In particular,
nir_lower_system_values and nir_lower_vars_to_ssa which are the chief
producers of SSA values which might reasonably have a name do not bother
to set one.  We have some names set from things like BLORP and RADV's
meta shaders but AFAICT, they're setting a name more because it's there
than because they actually care.

Also, most things other than nir_clone and nir_serialize don't bother to
try and preserve them.  You can see in the diffstat of this commit
exactly what passes attempt to preserve names.  Notably missing from the
list is opt_algebraic which is the single largest source of SSA def
churn and it happily throws names away.

These observations lead me to question whether or not names are actually
useful at all or if they're just taking up space (8B per instruction)
and wasting CPU cycles (to ralloc_strdup on the off chance we do have
one).  I don't think I can think of a single time in recent history
where I've been debugging a shader issue and a SSA value name has been
there and been useful.  If anything, the few times they are there, they
just throw me off because they mess up the indentation in nir_print.

iris shader-db on my system gets runtime -2.07734% +/- 1.26933% (n=5)

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5439>
2021-07-08 17:34:41 +00:00
Connor Abbott cc514bfa0e nir: Add read_invocation_cond_ir3 intrinsic
On qualcomm, we have shared registers similar to SGPR's on AMD. However,
there is no readlane or readfirstlane primitive. shared registers can
only be written to when just one lane is active. This means that we have
to lower readInvocation(val, id) to something like:

if (gl_SubgroupInvocation == id) {
    scalar_reg = val;
}

return scalar_reg;

However it's a bit difficult to actually get the value of
gl_SubgroupInvocation in the backend, because for compute it requires
some calculations and we don't have any CSE support in the backend. This
intrinsic lets us turn it into
"readInvocationCond(val, id == gl_SubgroupInvocation)" in NIR at which
point the backend code generation is a lot easier.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
2021-07-08 16:02:41 +00:00
Connor Abbott e4e79de2a4 nir/subgroups: Support > 1 ballot components
Qualcomm has a mode with a subgroup size of 128, so just emitting larger
integer operations and then lowering them later isn't an option. This
makes the pass able to handle the lowering itself, so that we don't have
to go down to 64-thread wavefronts when ballots are used.

(The GLSL and legacy SPIR-V extensions only support a maximum of 64
threads, but I guess we'll cross that bridge when we come to it...)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
2021-07-08 16:02:41 +00:00
Connor Abbott 90819b9b0e nir/subgroups: Replace lower_vote_eq_to_ballot with lower_vote_eq
Lower it to a vote instead of a ballot. This was only used for AMD, and
in that case they're pretty much the same. However Qualcomm has a vote
builtin, which we want to use instead of ballots.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6752>
2021-07-08 16:02:41 +00:00
Emma Anholt 4118264643 nir: Free the instructions in a DCE instr removal.
No significant change in shader-db time (n=11), but should be a little win
for memory usage by the compiler.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11628>
2021-07-06 11:24:48 -07:00
Emma Anholt 5618445d45 nir: Use remove_and_dce for nir_shader_lower_instructions().
Reduces the work that other shader passes have to do to look at dead code,
and possibly extra rounds around the optimization loop if dce wasn't the
last pass in it.

shader-db runtime -1.12919% +/- 0.264337% (n=49) on SKL.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11628>
2021-07-06 11:24:45 -07:00
Emma Anholt 5251548572 nir: Add a nir_instr_remove that recursively removes dead code.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11628>
2021-07-06 11:24:43 -07:00
Rob Clark c7b935962b nir: Add pass to lower phi precision
In addition to register pressure benefits from getting more fp16/int16,
this avoids i2imp's from standing in the way of loop unrolling.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11545>
2021-06-29 23:27:28 +00:00