Commit Graph

338 Commits

Author SHA1 Message Date
Matt Turner 62d55f1281 nir: Wire up int64 lowering functions
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-09 16:42:40 -08:00
Matt Turner dde73e646f nir: Tag entrypoint for easy recognition by nir_shader_get_entrypoint()
We're going to have multiple functions, so nir_shader_get_entrypoint()
needs to do something a little smarter.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-09 16:42:40 -08:00
Eric Anholt 6051c11d17 nir: Add nir_lower_tex support for Broadcom's swizzled TG4 results.
V3D returns the texels in a different order in the resulting vec4 from
what GLSL wants, so we need to put in a swizzle.  Fixes
dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.base_level.level_1

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-08 13:03:41 -08:00
Karol Herbst d0c6ef2793 nir: rename global/local to private/function memory
the naming is a bit confusing no matter how you look at it. Within SPIR-V
"global" memory is memory accessible from all threads. glsl "global" memory
normally refers to shader thread private memory declared at global scope. As
we already use "shared" for memory shared across all thrads of a work group
the solution where everybody could be happy with is to rename "global" to
"private" and use "global" later for memory usually stored within system
accessible memory (be it VRAM or system RAM if keeping SVM in mind).
glsl "local" memory is memory only accessible within a function, while SPIR-V
"local" memory is memory accessible within the same workgroup.

v2: rename local to function as well
v3: rename vtn_variable_mode_local as well

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-08 18:51:46 +01:00
Jason Ekstrand e90b738f20 nir/vulkan: Add a descriptor type to vulkan resource intrinsics
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-01-08 00:38:30 +00:00
Jason Ekstrand f393b10b3f nir/lower_io: Add "explicit" IO lowering
This new pass is for lowering explicitly laid out memory coming in from
SPIR-V or a similar source.  It's quite a bit more complicated than the
normal lower_io because we have to be able to handle matrices.  The
way the stride information is stored for matrices is awkward and dealing
with row-major matrices is especially painful.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-01-08 00:38:30 +00:00
Jason Ekstrand 013ee5732b nir/intrinsics: Add access flags to load/store_deref
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-01-08 00:38:30 +00:00
Jason Ekstrand 7755171e4c nir/intrinsics: Allow deref sources to consume anything
This commit adds a new num_components value for intrinsic sources of -1
which means that it consumes everything and the number of components
effectively isn't validated.  This is useful for deref sources which
just take the result of the deref and we leave it up to the driver to
decide what that size should be.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-01-08 00:38:30 +00:00
Jason Ekstrand e94a027af8 nir: Add a ptr_as_array deref type
These correspond directly to SPIR-V's OpPtrAccessChain.  As such, they
treat whatever their parent gives them as if it's the first element in
some array and dereferences that array.  If the parent is, itself, an
array deref, then the two indices can just be added together to get the
final array deref.  However, it can also be used in cases where what you
have is a dereference to some random vec2 value somewhere.  In this
case, we require a cast before the ptr_as_array and use the ptr_stride
field in the cast to provide a stride for the ptr_as_array derefs.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-01-08 00:38:30 +00:00
Jason Ekstrand fc9c4f89b8 nir: Move propagation of cast derefs to a new nir_opt_deref pass
We're going to want to do more deref optimizations going forward and
this gives us a central place to do them.  Also, cast propagation will
get a bit more complicated with the addition of ptr_as_array derefs.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-01-08 00:38:30 +00:00
Jason Ekstrand a700a82bda nir: Distinguish between normal uniforms and UBOs
Previously, NIR had a single nir_var_uniform mode used for atomic
counters, UBOs, samplers, images, and normal uniforms.  This commit
splits this into nir_var_uniform and nir_var_ubo where nir_var_uniform
is still a bit of a catch-all but the nir_var_ubo is specific to UBOs.
While we're at it, we also rename shader_storage to ssbo to follow the
convention.

We need this so that we can distinguish between normal uniforms and UBO
access at the deref level without going all the way back variable and
seeing if it has an interface type.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-01-08 00:38:29 +00:00
Eric Anholt f217a94542 nir: Add nir_lower_tex options to lower sampler return formats.
I've been doing this in the nir-to-vir and nir-to-qir backends of v3d and
vc4, but nir could potentially do some useful stuff for us (like avoiding
unpack/repacks) if we give it the information.

v2: Skip lowering for txs/query_levels
v3: Fix a crash on old-style shadow
v4: Rename to tex_packing, use nir_format_unpack_sint/uint helpers, pack
    the enum.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-04 15:59:57 -08:00
Timothy Arceri 0016166d19 nir: make nir_opt_remove_phis_impl() static
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-01-03 11:47:56 +11:00
Caio Marcelo de Oliveira Filho 7d6babf995 nir: add a way to print the deref chain
Makes debugging easier when we care about the deref chain and not the
deref instruction itself.  To make it take a const pointer, constify
some of the static functions in nir_print.c.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 10:09:04 -08:00
Timothy Arceri 50de3f80a8 nir: rename nir_link_constant_varyings() nir_link_opt_varyings()
The following patches will add support for an additional
optimisation so this function will no longer just optimise varying
constants.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 12:19:17 +11:00
Sagar Ghuge 933c44bcc4 nir: Add a new lowering option to lower 3D surfaces from txd to txl.
Tested on gen9.

v2: Rename lower_txd_3d_surafaces flag to lower_txd_3d (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-18 13:44:09 -08:00
Ian Romanick 378f996771 nir/opt_peephole_select: Don't peephole_select expensive math instructions
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive.  On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.

This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).

v2: Remove stray #if block.  Noticed by Thomas.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick 09b7e1d8e4 nir/opt_peephole_select: Don't try to remove flow control around indirect loads
That flow control may be trying to avoid invalid loads.  On at least
some platforms, those loads can also be expensive.

No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").

v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select.  Suggested
by Rob.  See also the big comment in src/intel/compiler/brw_nir.c.

v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).

v4: Fix inverted condition in brw_nir.c.  Noticed by Lionel.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick 7adafd6e1c nir: Fix holes in nir_instr
Found using pahole.

Changes in peak memory usage according to Valgrind massif:

mean soft fp64 using uint64:   1,343,991,403 => 1,342,759,331
gfxbench5 aztec ruins high 11:    63,619,971 =>    63,555,571
deus ex mankind divided 148:      62,887,728 =>    62,845,304
deus ex mankind divided 2890:     72,399,750 =>    71,922,686
dirt showdown 676:                69,464,023 =>    69,238,607
dolphin ubershaders 210:          78,359,728 =>    77,822,072

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-16 14:39:56 -08:00
Jason Ekstrand 44227453ec nir: Switch to using 1-bit Booleans for almost everything
This is a squash of a few distinct changes:

    glsl,spirv: Generate 1-bit Booleans

    Revert "Use 32-bit opcodes in the NIR producers and optimizations"

    Revert "nir/builder: Generate 32-bit bool opcodes transparently"

    nir/builder: Generate 1-bit Booleans in nir_build_imm_bool

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand 11dc130779 nir: Add a bool to int32 lowering pass
We also enable it in all of the NIR drivers.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand 3191a82372 nir: Add support for 1-bit data types
This commit adds support for 1-bit Booleans and integers.  Booleans
obviously take a value of true or false.  Because we have to define the
semantics of 1-bit signed and unsigned integers, we define uint1_t to
take values of 0 and 1 and int1_t to take values of 0 and -1.  1-bit
arithmetic is then well-defined in the usual way, just with fewer bits.
The definition of int1_t and uint1_t doesn't usually matter but we do
need something for purposes of constant folding.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand 80e8dfe9de nir: Rename Boolean-related opcodes to include 32 in the name
This is a squash of a bunch of individual changes:

    nir/builder: Generate 32-bit bool opcodes transparently

    nir/algebraic: Remap Boolean opcodes to the 32-bit variant

    Use 32-bit opcodes in the NIR producers and optimizations

        Generated with a little hand-editing and the following sed commands:

        sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
        sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
        sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
        sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
        sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
        sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c

     Use 32-bit opcodes in the NIR back-ends

        Generated with a little hand-editing and the following sed commands:

        sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
        sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
        sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
        sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
        sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
        sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
        sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand 517099809a nir: Drop support for lower_b2f
This was originally added for the out-of-tree Mali driver but I think
we've all agreed it's easy enough for them to just do in their back-end.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Rhys Perry ed4020fabe nir: fix constness in nir_intrinsic_align()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-12-16 14:56:10 +00:00
Jason Ekstrand 74492ebad9 nir: Add a pass for lowering integer division by constants
It's a reasonably well-known fact in the world of compilers that integer
divisions by constants can be replaced by a multiply, an add, and some
shifts.  This commit adds such an optimization to NIR for easiest case
of udiv.  Other division operations will be added in following commits.
In order to provide some additional driver control, the pass takes a
minimum bit size to optimize.

Reviewed-by: Ian Romanick ian.d.romanick@intel.com
2018-12-13 17:49:48 +00:00
Jason Ekstrand 39198a1238 nir/lower_int64: Add support for [iu]mul_high
Reviewed-by: Ian Romanick ian.d.romanick@intel.com
2018-12-13 17:49:48 +00:00
Jason Ekstrand 4ef8f46fd1 nir/lower_tex: Add lowering for some min_lod cases
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-12-11 21:26:23 -06:00
Jason Ekstrand caeffe7549 spirv: Add support for MinLod
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-12-11 21:26:23 -06:00
Timothy Arceri 03d7c65ad8 nir: clarify some nit_loop_info member names
Following commits will introduce additional fields such as
guessed_trip_count. Renaming these will help avoid confusion
as our unrolling feature set grows.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-10 13:59:50 +11:00
Jason Ekstrand dca6cd9ce6 nir: Make boolean conversions sized just like the others
Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is
one if 8, 16, 32, or 64.  This leads to having a few more opcodes but
now everything is consistent and booleans aren't a weird special case
anymore.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2018-12-05 15:03:07 -06:00
Jonathan Marek 3e7186d472 nir: add fceil lowering
lowers ceil(x) as -floor(-x)

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-11-27 15:44:02 -05:00
Kenneth Graunke 5b682143da nir: Make nir_lower_clip_vs optionally work with variables.
The way nir_lower_clip_vs() works with store_output intrinsics makes a
ton of assumptions about the driver_location field.

In i965 and iris, I'd rather do this lowering early and work with
variables.  v3d may want to switch to that as well, and ir3 could too,
but I'm not sure exactly what would need updating.  For now, handle
both methods.

Reviewed-by: Eric Anholt <eric@anholt.net>
2018-11-19 14:33:16 -08:00
Jason Ekstrand 060817b2fa intel,nir: Move gl_LocalInvocationID lowering to nir_lower_system_values
It's not at all intel-specific; the formula is dictated by OpenGL and
Vulkan.  The only intel-specific thing is that we need the lowering.  As
a nice side-effect, the new version is variable-group-size ready.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2018-11-19 09:57:41 -06:00
Jason Ekstrand d34fd81e76 nir: Add alignment parameters to SSBO, UBO, and shared access
This also changes spirv_to_nir and glsl_to_nir to set them.  The one
place that doesn't set them is shared memory access lowering in
nir_lower_io.  That will have to be updated before any consumers of it
can effectively use these new alignments.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
2018-11-15 19:59:42 -06:00
Gert Wollny 4bba280937 nir: Allow to skip integer ops in nir_lower_to_source_mods
Some hardware supports source mods only for float operations. Make it
possible to skip lowering to source mods in these cases.

v2: use option flags instead of a boolean (Jason Ekstrand)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-11-14 08:59:26 +01:00
Christian Gmeiner c6aaafa3a1 nir: add lowering for ffloor
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-11-12 21:57:25 +01:00
Lionel Landwerlin 8a15f06d19 nir/lower_tex: Add AYUV lowering support
Byte ordering is :

0: V
1: U
2: Y
3: A

v2: Split refactoring of alpha channel (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1)
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v2)
2018-11-12 13:22:54 +00:00
Timothy Arceri d40dd05553 nir: add new linking opt nir_link_constant_varyings()
This pass moves constant outputs to the consuming shader stage
where possible.

Reviewed-by: Eric Anholt <eric@anholt.net>
2018-11-10 11:41:00 +11:00
Jason Ekstrand 61e15348c4 nir: Add a read_mask helper for ALU instructions
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-11-08 10:09:22 -06:00
Jason Ekstrand 28bb6abd1d nir/validate: Print when the validation failed
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-10-26 11:45:29 -05:00
Samuel Pitoiset 7c694cbfa4 nir: add linking helper nir_link_xfb_varyings()
The linking opts shouldn't try removing or compacting XFB varyings
in the consumer. To avoid this we copy the always_active_io flag
from the producer.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-10-24 08:21:29 +11:00
Jason Ekstrand bca5c2c688 nir: Add some new helpers for working with const sources
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-22 14:24:15 -05:00
Caio Marcelo de Oliveira Filho cb126cf67a nir: Separate dead write removal into its own pass
Instead of doing this as part of the existing copy_prop_vars pass.

Separation makes easier to expand the scope of both passes to be more
than per-block.  For copy propagation, the information about valid
copies comes from previous instructions; while the dead write removal
depends on information from later instructions ("have any instruction
used this deref before overwrite it?").

Also change the tests to use this pass (instead of copy prop vars).
Note that the disabled tests continue to fail, since the standalone
pass is still per-block.

v2: Remove entries from dynarray instead of marking items as
    deleted.  Use foreach_reverse. (Caio)

    (all from Jason)
    Do not cache nir_deref_path.  Not worthy for this patch.
    Clear unused writes when hitting a call instruction.
    Clean up enumeration of modes for barriers.
    Move metadata calls to the inner function.

v3: For copies, use the vector length to calculate the mask.

    (all from Jason)
    Use nir_component_mask_t when applicable.
    Rename functions for clarity.
    Consider local vars used by a call to be conservative (SPIR-V has
    such cases).
    Comment and assert the assumption that stores and copies are
    always to a deref that ends with a vector or scalar.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-10-15 17:29:46 -07:00
Eric Anholt 7d77fe1bcc nir: Expose nir_remove_unused_io_vars().
For gallium drivers where you want to do some linking at variant compile
time, you don't have the other producer/consumer shader on hand to modify.
By exposing the inner function, the driver can have the used varyings in
the compiled shader cache key and still do linking.

This is also useful for V3D, where the binning shader wants to only output
position and TF varyings.  We've been removing those after nir_lower_io,
but this will be less driver-specific code and let more of the shader get
DCEed early in NIR.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-10-15 17:16:44 -07:00
Ian Romanick 10f4a8871e nir: Add helper functions to get the instruction that generated a nir_src
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
2018-10-09 13:56:42 -07:00
Jason Ekstrand 7d1d1208c2 nir: Add a small pass to rematerialize derefs per-block
This pass re-materializes deref instructions on a per-block basis to
ensure that every use of a deref occurs in the same block as the
instruction which uses it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
2018-09-19 01:59:40 -05:00
Timothy Arceri ef4ad7baf1 nir: evaluate if condition uses inside the if branches
Since we know what side of the branch we ended up on we can just
replace the use with a constant.

All the spill changes in shader-db are from Dolphin uber shaders,
despite some small regressions the change is clearly positive.

V2: insert new constant after any phis in the
    use->parent_instr->type == nir_instr_type_phi path.

v3:
 - use nir_after_block_before_jump() for inserting const
 - check dominance of phi uses correctly

v4:
 - create some helpers as suggested by Jason.

v5 (Jason Ekstrand):
 - Use LIST_ENTRY to get the phi src

shader-db results IVB:

total instructions in shared programs: 9999201 -> 9993483 (-0.06%)
instructions in affected programs: 163235 -> 157517 (-3.50%)
helped: 132
HURT: 2

total cycles in shared programs: 231670754 -> 219476091 (-5.26%)
cycles in affected programs: 143424120 -> 131229457 (-8.50%)
helped: 115
HURT: 24

total spills in shared programs: 4383 -> 4370 (-0.30%)
spills in affected programs: 1656 -> 1643 (-0.79%)
helped: 9
HURT: 18

total fills in shared programs: 4610 -> 4581 (-0.63%)
fills in affected programs: 374 -> 345 (-7.75%)
helped: 6
HURT: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-09-14 16:07:36 +10:00
Jason Ekstrand 44ec31cd75 nir: Drop the vs_inputs_dual_locations option
It was very inconsistently handled; the only things that made use of it
were glsl_to_nir, glspirv, and nir_gather_info.  In particular,
nir_lower_io completely ignored it so anyone using nir_lower_io on
64-bit vertex attributes was going to be in for a shock.  Also, as of
the previous commit, it's set by every driver that supports 64-bit
vertex attributes.  There's no longer any reason to have it be an option
so let's just delete it.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-09-06 16:07:50 -05:00
Jason Ekstrand 25efd787cf compiler: Move double_inputs to gl_program::DualSlotInputs
Previously, we had two field in shader_info: double_inputs_read and
double_inputs.  Presumably, the one was for all double inputs that are
read and the other is all that exist.  However, because nir_gather_info
regenerates these two values, there is a possibility, if a variable gets
deleted, that the value of double_inputs could change over time.  This
is a problem because double_inputs is used to remap the input locations
to a two-slot-per-dvec3/4 scheme for i965.  If that mapping were to
change between glsl_to_nir and back-end state setup, we would fall over
when trying to map the NIR outputs back onto the GL location space.

This commit changes the way slot re-mapping works.  Instead of the
double_inputs field in shader_info, it adds a DualSlotInputs bitfield to
gl_program.  By having it in gl_program, we more easily guarantee that
NIR passes won't touch it after it's been set.  It also makes more sense
to put it in a GL data structure since it's really a mapping from GL
slots to back-end and/or NIR slots and not really a NIR shader thing.

Tested-by: Alejandro Piñeiro <apinheiro@igalia.com> (ARB_gl_spirv tests)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-09-06 16:07:50 -05:00