Commit Graph

102 Commits

Author SHA1 Message Date
Dylan Baker a8e2d79e02 meson: use gnu_symbol_visibility argument
This uses a meson builtin to handle -fvisibility=hidden. This is nice
because we don't need to track which languages are used, if C++ is
suddenly added meson just does the right thing.

Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4740>
2020-06-01 18:59:18 +00:00
Rob Clark 42d38ad028 nir: add pass to lower disjoint wrmask's
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2020-05-13 20:24:49 -07:00
Alyssa Rosenzweig 42c9bbaeed nir: Move nir_lower_mediump_outputs from ir3
(Original code from ir3)

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4716>
2020-04-27 16:32:24 +00:00
Jonathan Marek f8558fb1ce nir: add common convert_ycbcr for vulkan csc
Copied from anv, replaced state with passing model/range directly.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: D Scott Phillips <d.scott.phillips@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4528>
2020-04-20 22:01:43 +00:00
Eric Engestrom 79af30768d meson: inline `inc_common`
Let's make it clear what includes are being added everywhere, so that
they can be cleaned up.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4360>
2020-03-28 21:36:54 +01:00
Iago Toral Quiroga 467c9a0faa nir: add a bool bitsize lowering pass
The pass lowers 1-bit booleans produced by NIR to the native bitsize
of the operations that produce them.

v2: change on lower_load_const_instr after upstream changes. Added
    TODO2 to explain it, as it was not properly tested yet (see
    already existing TODO) (Neil)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
2020-03-24 23:21:21 +00:00
Caio Marcelo de Oliveira Filho bf432cd831 nir: Add pass to combine adjacent scoped memory barriers
SPIR-V generates very granular barriers, however HW and backends might
not necessarily take advantage of those.  This pass provides a general
mechanism to combine such barriers.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3224>
2020-03-12 19:21:36 +00:00
Daniel Schürmann ce87da71e9 nir: add pass to lower discard() to demote()
This pass is intended to work around game bugs, only!
It also lowers nir_intrinsic_load_helper_invocation to
nir_intrinsic_is_helper_invocation for consistency.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4047>
2020-03-09 12:29:32 +00:00
Alyssa Rosenzweig 7ab4e4dd96 nir: Add SSBO->global lowering pass
To facilitate lowering SSBOs to globals, we need a load_ssbo_address
intrinsic. This intrinsic takes an SSBO index and loads the address in
global memory of the SSBO (likely implemented via a uniform in the
driver). In the future, we'll support bounds checking, but at the moment
this is not supported (this pass should only be used for trusted
contexts at the moment, i.e. contexts without robustness extensions).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2753>
2020-02-21 13:06:22 +00:00
Arcady Goldmints-Orlov e9f83185a2 Rename nir_lower_constant_initializers to nir_lower_variable_initalizers
This is naming is more clear as nir_variables can be initializes not
just with a nir_constant but with a pointer to another nir_variable.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3047>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3047>
2020-02-12 15:41:49 +00:00
Michel Dänzer 65610ec774 gitlab-ci: Add ppc64el and s390x cross-build jobs
Using LLVM 8 for ppc64el and 7 for s390x (which hits some coroutine
related issues with LLVM 8).

There are some test failures we need to ignore for now. Also, the
timeout needs to be bumped from the default 30s for some tests, because
they can take longer under emulation.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3643>
2020-02-05 10:52:31 +00:00
Erik Faye-Lund d9ff5f0414 nir/zink: move clip_halfz-lowering to common code
Etnaviv also does the same thing, so let's try to avoid repetition here,
and use the same for it code as well.

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Paul Cercueil <paul@crapouillou.net>
2020-01-03 22:48:19 +00:00
Karol Herbst 11f736a6f9 nir/tests: add serializer tests
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-12-11 13:00:44 +01:00
Eric Anholt 8afab607ac nir: Add a scheduler pass to reduce maximum register pressure.
This is similar to a scheduler I've written for vc4 and i965, but this
time written at the NIR level so that hopefully it's reusable.  A notable
new feature it has is Goodman/Hsu's heuristic of "once we've started
processing the uses of a value, prioritize processing the rest of their
uses", which should help avoid the heuristic otherwise making such
systematically bad choices around getting texture results consumed.

Results for v3d:

total instructions in shared programs: 6497588 -> 6518242 (0.32%)
total threads in shared programs: 154000 -> 152828 (-0.76%)
total uniforms in shared programs: 2119629 -> 2068681 (-2.40%)
total spills in shared programs: 4984 -> 472 (-90.53%)
total fills in shared programs: 6418 -> 1546 (-75.91%)

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v2)

v2: Use the DAG datastructure, fold in the scheduling-for-parallelism
    patch, include SSA defs in live values so we can switch to bottom-up
    if we want.
v3: Squash in improvements from Alejandro Piñeiro for getting V3D to
    successfully register allocate on GLES3.1 dEQP.  Make sure that
    discards don't move after store_output.  Comment spelling fix.
2019-11-25 21:12:21 +00:00
Rhys Perry 0a759c3be6 nir: add load/store vectorizer tests
v7: run nir_opt_algebraic
v9: rework the callback function
v9: update alignment on all loads/stores, even if they're not vectorized
v10: add tests for 64-bit offsets
v10: add tests for signed offsets

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
2019-11-25 13:59:11 +00:00
Rhys Perry ce9205c03b nir: add a load/store vectorization pass
This pass combines intersecting, adjacent and identical loads/stores into
potentially larger ones and will be used by ACO to greatly reduce the
number of memory operations.

v2: handle nir_deref_type_ptr_as_array
v3: assume explicitly laid out types for derefs
v4: create less deref casts
v4: fix shared boolean vectorization
v4: fix copy+paste error in resources_different
v4: fix extract_subvector() to pass
    nir_load_store_vectorize_test.ssbo_load_intersecting_32_32_64
v4: rebase
v5: subtract from deref/offset instead of scheduling offset calculations
v5: various non-functional changes/cleanups
v5: require less metadata and preserve more
v5: rebase
v6: cleanup and improve dependency handling
v6: emit less deref casts
v6: pass undef to components not set in the write_mask for new stores
v7: fix 8-bit extract_vector() with 64-bit input
v7: cleanup creation of store write data
v7: update align correctly for when the bit size of load/store increases
v7: rename extract_vector to extract_component and update comment
v8: prevent combining of row-major matrix column acceses
v9: rework process_block() to be able to vectorize more
v9: rework the callback function
v9: update alignment on all loads/stores, even if they're not vectorized
v9: remove entry::store_value, since it will not be updated if it's was
    from a vectorized load
v9: fix bug in subtract_deref(), causing artifacts in Dishonored 2
v9: handle nir_intrinsic_scoped_memory_barrier
v10: use nir_ssa_scalar
v10: handle non-32-bit offsets
v10: use signed offsets for comparison
v10: improve create_entry_key_from_offset()
v10: support load_shared/store_shared
v10: remove strip_deref_casts()
v10: don't ever pass NULL to memcmp
v10: remove recursion in gcd()
v10: fix outdated comment
v11: use the new nir_extract_bits()
v12: remove use of nir_src_as_const_value in resources_different
v13: make entry key hash function deterministic
v13: simplify mask_sign_extend()
v14: add comment in hash_entry_key() about hashing pointers

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
2019-11-25 13:59:11 +00:00
Marek Olšák ff71fae440 nir: strip as we serialize to remove the nir_shader_clone call
Serializing stripped NIR is faster now.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-21 18:49:57 -05:00
Jason Ekstrand b8d45d9307 nir: Add tests for nir_extract_bits 2019-11-11 17:17:02 +00:00
Rob Clark 5e08f070f0 nir: add nir_lower_amul pass
Lower amul to either imul or imul24, depending on whether 24b is enough
bits to calculate an offset within the thing being dereferenced.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-10-18 15:08:54 -07:00
Erik Faye-Lund 878c94288a nir: add lowering-pass for point-size mov
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17 10:41:36 +02:00
Dave Airlie dc91a02a72 nir: add a pass to lower flat shading.
This takes any color or backcolor that has unspecified
shading and converts it to flat shading.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17 10:41:36 +02:00
Marek Olšák 3340c066a1 nir: move gl_nir_opt_access from glsl directory
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-10 15:49:18 -04:00
Eric Engestrom 7a1dc6ab44 meson: rename libnir to _libnir to make it clear it's not meant to be used anywhere else
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-10-07 21:49:40 +01:00
Timur Kristóf 610cc3089c nir: Carve out nir_lower_samplers from GLSL code.
Lowering samplers is needed to produce NIR that can actually be
consumed by some gallium drivers, so it doesn't make sense to
to keep it only in the GLSL code.

This commit introduces nir_lower_samplers to compiler/nir,
while maintains the GL-specific function too.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-06 12:20:20 +03:00
Daniel Schürmann df86c5ffb3 nir: add divergence analysis pass.
This pass expects the shader to be in LCSSA form.
The algorithm is based on 'The Simple Divergence Analysis' from
Diogo Sampaio, Rafael De Souza, Sylvain Collange, Fernando Magno Quintão Pereira.
Divergence Analysis. ACM Transactions on Programming Languages and Systems (TOPLAS)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-20 17:40:13 +02:00
Eric Engestrom a3d6024199 meson: add nir tests to the compiler/nir test suite
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-14 22:17:06 +01:00
Iago Toral Quiroga 48f5c34301 nir: add a pass to clamp gl_PointSize to a range
The OpenGL and OpenGL ES specs require that implementations clamp the
value of gl_PointSize to an implementation-depedent range. This pass
is useful for any GPU hardware that doesn't do this automatically
for either one or both sides of the range, such as V3D.

v2:
 - Turn into a generic NIR pass (Eric).
 - Make the pass work before lower I/O so we can use the deref variable
   to inspect if we are writing to gl_PointSize (Eric).
 - Make the pass take the range to clamp as parameter and allow it
   to clamp to both sides of the range or just one side.
 - Make the pass report progress.

v3:
 - Fix copyright header (Eric)
 - use fmin/fmax instead of bcsel to clamp (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 09:44:12 +02:00
Rhys Perry 7740149852 nir: merge and extend nir_opt_move_comparisons and nir_opt_move_load_ubo
v2: add to series
v3: update Makefile.sources
v4: don't remove a comment and break statement
v4: use nir_can_move_instr

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 22:01:30 +00:00
Rhys Perry da8ed68aca nir: replace nir_move_load_const() with nir_opt_sink()
This is mostly the same as nir_move_load_const() but can also move
undef instructions, comparisons and some intrinsics (being careful with
loops).

v2: actually delete nir_move_load_const.c
v3: fix nir_opt_sink() usage in freedreno
v3: update Makefile.sources
v4: replace get_move_def with nir_can_move_instr and nir_instr_ssa_def
v4: handle if uses
v4: fix handling of nested loops
v5: re-write adjust_block_for_loops
v5: re-write setting of use_block for if uses

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 22:01:30 +00:00
Ian Romanick 405de7ccb6 nir/range-analysis: Rudimentary value range analysis pass
Most integer operations are omitted because dealing with integer
overflow is hard.  There are a few things that could be smarter if there
was a small amount more tracking of ranges of integer types (i.e.,
operands are Boolean, operand values fit in 16 bits, etc.).

The changes to nir_search_helpers.h are included in this patch to
simplify reordering the changes to nir_opt_algebraic.py.

v2: Memoize range analysis results.  Without this, some shaders appear
to get stuck in infinite loops.

v3: Rebase on many months of Mesa changes, including 1-bit Boolean
changes.

v4: Rebase on "nir: Drop imov/fmov in favor of one mov instruction".

v5: Use nir_alu_srcs_equal for detecting (a*a).  Previously just the SSA
value was compared, and this incorrectly matched (a.x*a.y).

v6: Many code improvements including (but not limited to) better names,
more comments, and better use of helper functions.  All suggested by
Caio.  Rework the handling of several opcodes to use a table for mapping
source ranges to a result range.  This change fixed a bug that caused
fmax(gt_zero, ge_zero) to be incorrectly recognized as ge_zero.
Slightly tighten the range of fmul by recognizing that x*x is gt_zero if
x is gt_zero.  Add similar handling for -x*x.

v7: Use _______ in the tables as an alias for unknown.  Suggested by
Caio.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-05 20:14:13 -07:00
Eric Engestrom d2d85b950d meson: replace libmesa_util with idep_mesautil
This automates the include_directories and dependencies tracking so that
all users of libmesa_util don't need to add them manually.

Next commit will remove the ones that were only added for that reason.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-08-03 00:08:37 +00:00
Jonathan Marek bc3b6168ba nir: replace lower_sincos with algebraic opt
This version has less ops for the same precision.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Ian Romanick b08d704051 nir: Add unit tests for nir_opt_comparison_pre
Each tests has a comment with the expected before and after NIR.  The
tests don't actually check this.  The tests only check whether or not
the optimization pass reported progress.  I couldn't think of a robust,
future-proof way to check the before and after code.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-08 11:30:10 -07:00
Daniel Schürmann c31f470066 anv,nir: Move lower_input_attachments pass from ANV to NIR.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-08 14:02:50 +02:00
Rob Clark 5787a2dfe3 nir: add pass to lower load_interpolated_input
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Connor Abbott 47e7c6961a nir: add a vectorization pass
This effectively does the opposite of nir_lower_alus_to_scalar, trying
to combine per-component ALU operations with the same sources but
different swizzles into one larger ALU operation. It uses a similar
model as CSE, where we do a depth-first approach and keep around a hash
set of instructions to be combined, but there are a few major
differences:

1. For now, we only support entirely per-component ALU operations.
2. Since it's not always guaranteed that we'll be able to combine
equivalent instructions, we keep a stack of equivalent instructions
around, trying to combine new instructions with instructions on the
stack.

The pass isn't comprehensive by far; it can't handle operations where
some of the sources are per-component and others aren't, and it can't
handle phi nodes. But it should handle the more common cases, and it
should be reasonably efficient.

[Alyssa: Rebase on latest master, updating with respect to typeless
moves]

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-18 06:43:30 -07:00
Ian Romanick 3ee2e84c60 nir: Rematerialize compare instructions
On some architectures, Boolean values used to control conditional
branches or condtional selection must be propagated into a flag.  This
generally means that a stored Boolean value must be compared with zero.
Rather than force the generation of extra compares with zero, re-emit
the original comparison instruction.  This can save register pressure by
not needing to store the Boolean value.

There are several possible ares for future improvement to this pass:

1. Be more conservative.  If both sources to the comparison instruction
are non-constants, it may be better for register pressure to emit the
extra compare.  The current shader-db results on Intel GPUs (next
commit) lead me to believe that this is not currently a problem.

2. Be less conservative.  Currently the pass requires that all users of
the comparison match the pattern.  The idea is that after the pass is
complete, no instruction will use the resulting Boolean value.  The only
uses will be of the flag value.  It may be beneficial to relax this
requirement in some cases.

3. Be less conservative.  Also try to rematerialize comparisons used for
discard_if intrinsics.  After changing the way the Intel compiler
generates cod e for discard_if (see MR!935), I tried implementing this
already.  The changes were pretty small.  Instructions were helped in 19
shaders, but, overall, cycles were hurt.  A commit "nir: Rematerialize
comparisons for nir_intrinsic_discard_if too" is on my fd.o cgit.

4. Copy the preceeding ALU instruction.  If the comparison is a
comparison with zero, and it is the only user of a particular ALU
instruction (e.g., (a+b) != 0.0), it may be a further improvment to also
copy the preceeding ALU instruction.  On Intel GPUs, this may enable
cmod propagation to make additional progress.

v2: Use much simpler method to get the prev_block for an if-statement.
Suggested by Tim.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-31 08:47:03 -07:00
Vasily Khoruzhick e67e4e90b2 nir: implement lowering for fsin and fcos
Lower sin and cos using Nick's fast sin/cos approximation from
https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648

It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 15:25:21 +00:00
Ian Romanick 158370ed2a nir/flrp: Add new lowering pass for flrp instructions
This pass will soon grow to include some optimizations that are
difficult or impossible to implement correctly within nir_opt_algebraic.
It also include the ability to generate strictly correct code which the
current nir_opt_algebraic lowering lacks (though that could be changed).

v2: Document the parameters to nir_lower_flrp.  Rebase on top of
3766334923 ("compiler/nir: add lowering for 16-bit flrp")

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:28 -07:00
Vasily Khoruzhick 443c5a3cd6 nir: add int_to_float lowering pass
This new pass lowers ints and bools to floats. It allows hardware
that doesn't have native integers (e.g. Mali4x0) use the same
code paths as modern hardware.

It uses newly introduced pass to gather SSA types and should be
used as late as possible.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-07 01:07:27 +00:00
Jason Ekstrand 91899495a1 nir: Add a SSA type gathering pass
This new pass (which isn't even compile-tested) attempts to determine
the ALU type of all the SSA values in a function impl.  It takes a
greedy approach and assigns intness or floatness to everything it thinks
can possibly contain an int or a float.  Some values will be labled as
both int and float and some will be labled as neither and it is up to
the caller to decide what to do with this information.  However, for a
"nice" shader where the original source contained no bit-casts and no
implicit bit-casts were introduced by optimizations, there shouldn't be
any overlap in the two sets save for the odd CSEd zero constant.

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-05-04 03:52:05 +00:00
Rob Clark a99c360a46 nir: add pass to lower fb reads
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Andreas Baierl b82de2b4d7 nir: add rcp(w) lowering for gl_FragCoord
On some hardware (e.g. Mali400) the shader needs to apply some
transformations for correct gl_FragCoord handling. The lowering
actions look like the following in pseudocode:
   gl_FragCoord.xyz = gl_FragCoord_orig.xyz
   gl_FragCoord.w = 1.0 / gl_FragCoord_orig.w

Add this lowering as a nir pass in preparation for using it in the driver.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-29 02:46:44 +00:00
Alyssa Rosenzweig 2ce4adefa5 nir: Add nir_lower_viewport_transform
On Mali hardware (supported by Panfrost and Lima), the fixed-function
transformation from world-space to screen-space coordinates is done in
the vertex shader prior to writing out the gl_Position varying, rather
than in dedicated hardware. This commit adds a shared NIR pass for
implementing coordinate transformation and lowering gl_Position writes
into screen-space gl_Position writes.

v2: Run directly on derefs before io/vars are lowered to cleanup the
code substantially. Thank you to Qiang for this suggestion!

v3: Bikeshed continues.

v4: Add to Makefile.sources (per Jason's comment). Bikeshed comment.

Ian and Qiang's reviews are from v3, but no real functional changes from
v4. Rob's review is from v4.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Suggested-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-04-14 19:15:13 +00:00
Jason Ekstrand 18ed82b084 nir: Add a pass for selectively lowering variables to scratch space
This commit adds new nir_load/store_scratch opcodes which read and write
a virtual scratch space.  It's up to the back-end to figure out what to
do with it and where to put the actual scratch data.

v2: Drop const_index comments (by anholt)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-12 15:59:31 -07:00
Jason Ekstrand 6279074de1 nir: Get rid of global registers
We have a pass to lower global registers to locals and many drivers
dutifully call it.  However, no one ever creates a global register ever
so it's all dead code.  It's time we bury it.

Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-09 00:29:36 -05:00
Ian Romanick 2cf59861a8 nir: Add partial redundancy elimination for compares
This pass attempts to dectect code sequences like

    if (x < y) {
        z = y - x;
	...
    }

and replace them with sequences like

    t = x - y;
    if (t < 0) {
        z = -t;
	...
    }

On architectures where the subtract can generate the flags used by the
if-statement, this saves an instruction.  It's also possible that moving
an instruction out of the if-statement will allow
nir_opt_peephole_select to convert the whole thing to a bcsel.

Currently only floating point compares and adds are supported.  Adding
support for integer will be a challenge due to integer overflow.  There
are a couple possible solutions, but they may not apply to all
architectures.

v2: Fix a typo in the commit message and a couple typos in comments.
Fix possible NULL pointer deref from result of push_block().  Add
missing (-A + B) case.  Suggested by Caio.

v3: Fix is_not_const_zero to work correctly with types other than
nir_type_float32.  Suggested by Ken.

v4: Add some comments explaining how this works.  Suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-28 15:35:53 -07:00
Ian Romanick be1cc3552b nir: Add nir_const_value_negative_equal
v2: Rebase on 1-bit Boolean changes.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-28 15:35:52 -07:00
Jason Ekstrand 3bd5457641 nir: Add a lowering pass for non-uniform resource access
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-25 15:00:36 -05:00
Samuel Pitoiset 23d30f4099 spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass
This lowering isn't needed for RADV because AMDGCN has two
instructions. It will be disabled for RADV in an upcoming series.

While we are at it, factorize a little bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-22 19:41:46 +01:00