Commit Graph

79 Commits

Author SHA1 Message Date
Jason Ekstrand 36a9046848 freedreno: Only call nir_lower_io on shader_in/out
Gallium drivers should never see nir_var_uniform because gallium lowers
regular uniforms to a UBO.  No GL driver should ever see either
nir_var_mem_shared because that's lowered in GLSL IR.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>
2020-07-06 19:54:30 +00:00
Rob Clark d3ae559378 freedreno/ir3: add ir3_finalize_nir()
The next step is to hook this into pscreen->finalize_nir() so it can
come before the state tracker's shader-caching.

Unfortunately we still need to do lower_io after mesa/st, so that is
split out into a post-finalize pass.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5372>
2020-06-26 08:43:22 -07:00
Connor Abbott 9edff0cfd4 ir3: Support variants with different constlen's
This provides the mechanism for compiling variants with a reduced
constlen. The next patch provides the policy for choosing which to
reduce.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5607>
2020-06-26 09:34:33 +00:00
Jonathan Marek 16a9e233da freedreno/ir3: add support for load_draw_id
This is part of adding VK_KHR_shader_draw_parameters for turnip.

IR3_DP_VTXID_BASE/IR3_DP_VTXCNT_MAX offsets are changed to match what
CP_DRAW_INDIRECT_MULTI requires.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5635>
2020-06-25 15:57:45 +00:00
Rob Clark 3065c4bf92 freedreno/ir3: switch PIPE_CAP_TGSI_TEXCOORD
We don't really need the varying remapping, and it seems to somehow
happen twice when shader-cache comes into the picture.  But we can
just choose not to have this problem.

Now that everything is using the ir3_point_sprite() helper, we can
flip this pipe cap without it being a massive flag-day.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5595>
2020-06-24 22:29:28 +00:00
Rob Clark 82815bc980 freedreno/ir3: split ubo analysis/lowering passes
Since binning pass variants share the same const_state with their
draw-pass counterpart, we should re-use the draw-pass variant's ubo
range analysis.  So split the two functions of the existing pass
into two parts.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5526>
2020-06-21 00:52:02 +00:00
Rob Clark 8f11cc4cad freedreno/ir3: move output_loc to variant
This moves the last bit of important state to be serialized from
ir3_shader to ir3_shader_variant.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
2020-06-19 13:16:57 +00:00
Rob Clark 640ff0e847 freedreno/ir3: move const_state back to variant
For shader-cache, we want to not have anything important in `ir3_shader`.
And to have shader variants with lower const size limits (to properly
handle cross-stage limits), we also want variants to be able to have
their own const_state.

But we still need binning pass shaders to align with their draw pass
counterpart so that the same const emit can be used for both passes.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
2020-06-19 13:16:57 +00:00
Rob Clark 00926954c3 freedreno/ir3: un-embed const_state
Make it an rzalloc'd ptr instead of embedded struct, so it can serve as
the mem ctx for immediates.  This gets rid of needing to explicitly free
the immediates, so one less thing to deal with when moving const_state.
(Also, after we move const_state to the shader variant, we won't need
one for binning pass variants)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
2020-06-19 13:16:57 +00:00
Connor Abbott 65660622a1 ir3: Split out variant-specific lowering and optimizations
It seems a lot of the lowerings being run the second time were
unnecessary. In addition, when const_state is moved to the variant,
then it will become impossible to know ahead of time whether a variant
needs additional optimizing, which means that ir3_key_lowers_nir() needs
to go away. The new approach should have the same effect, since it skips
running lowerings that are unnecessary and then skips the opt loop if no
optimizations made progress, but it will work better when we move
ir3_nir_analyze_ubo_ranges() to be after variant creation.

The one maybe controversial thing I did is to make
nir_opt_algebraic_late() always happen during variant lowering. I wanted
to avoid code duplication, and it seems to me that we should push the
_late variants as far back as possible so that later opt_algebraic runs
don't miss out on optimization opportunities.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
2020-06-19 13:16:57 +00:00
Rob Clark 91ed8b7fe3 freedreno/ir3: drop shader->num_ubos
The only difference between this and `const_state->num_ubos` was that
the latter is counting # of ubos loaded via `ldg` (based on UBO addrs
in push-consts).  But turns out there isn't really any reason to care.
Instead just add an early return in the one code-path that cares about
the number of `ldg` UBOs.

This gets rid of one more thing we need to move from `ir3_shader` to
`ir3_shader_variant`.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
2020-06-19 13:16:57 +00:00
Rob Clark 70fbd48b3a freedreno/ir3: move ubo_state into const_state
As with const_state, this will also need to move into the variant.  To
simplify that, just move it into the const_state itself, since after all
it is related.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
2020-06-19 13:16:57 +00:00
Eric Anholt 486b894307 freedreno/ir3: Account for driver params in UBO max const upload.
The const state setup needs to be able to push its driver params, so
account for them in the analyze_ubo_ranges.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
2020-06-05 13:36:29 -07:00
Eric Anholt 9e58ab09ff freedreno/ir3: Drop unnecessary alignment of pushed UBO size.
The analysis pass gives us vec4-aligned size, and all of our other
constbuf allocations here are in vec4 units, so we can just divide by 16.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
2020-06-05 13:36:29 -07:00
Timothy Arceri 04dbf709ed nir: add callback to nir_remove_dead_variables()
This allows us to do API specific checks before removing variable
without filling nir_remove_dead_variables() with API specific code.

In the following patches we will use this to support the removal
of dead uniforms in GLSL.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4797>
2020-06-03 02:22:23 +00:00
Rob Clark cf21b76383 freedreno/ir3: use lower_wrmasks pass
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2020-05-13 20:24:53 -07:00
Eric Anholt 112c65825f freedreno/a6xx: Use LDC for UBO loads.
It saves addressing math, but may cause multiple loads to be done and
bcseled due to NIR not giving us good address alignment information
currently.  I don't have any workloads I know of using non-const-uploaded
UBOs, so I don't have perf numbers for it

This makes us match the GLES blob's behavior, and turnip (other than being
bindful).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4858>
2020-05-14 00:10:43 +00:00
Eric Anholt ab93a631b4 freedreno: Trim num_ubos to just the ones we haven't lowered to constbuf.
With the upcoming LDC usage in the GL driver, we don't want to be
uploading descriptors for every UBO when they aren't actually in use.
Trimming NIR's num_ubos will avoid that, and cleans up num_ubo handling
elsewhere right now.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4858>
2020-05-14 00:10:43 +00:00
Eric Anholt d5176c453e freedreno/ir3: Move i/o offset lowering after analyze_ubo_ranges.
I found that when moving more UBOs to load_ubo_ir3, analyze_ubo_ranges
would move things back in a broken way.  We can just run this pass later
and drop the _ir3 path.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4858>
2020-05-14 00:10:43 +00:00
Kristian H. Kristensen dd8d257a30 freedreno/ir3: Lower GS builtins before lowering IO
We mostly got away with replacing a store_output with a store_var, but
for complex types like structs, that doesn't work. Once the IO has
been lowered from vars to intrinsic, we've lost the deref chains and
can't properly shadow the outputs.

This commits moves the GS lowering up so we do it before the output
variables get lowered to store_output.  This way the pass works much
like nir_lower_io_to_temporaries() and cleanly shadows the outputs.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
2020-05-01 16:26:31 +00:00
Kristian H. Kristensen 79355fd901 freedreno/ir3: Add ir3_nir_lower_to_explicit_input() pass
This pass lowers per-vertex input intrinsics to load_shared_ir3. This
was open coded in the TCS and GS lowering passes before - this way we
can share it. Furthermore, we'll need to run the rest of the GS
lowering earlier (before lowering IO) so we need to split off this
part that operates on the IO intrinsics first.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
2020-05-01 16:26:31 +00:00
Kristian H. Kristensen b7bfccf085 freedreno/ir3: Rename ir3_nir_lower_to_explicit_io
We rename it to ir3_nir_lower_to_explicit_output, since it only
handles output and we'll add a lowering pass for input next.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
2020-05-01 16:26:31 +00:00
Jonathan Marek 065068c66a freedreno/ir3: run nir_lower_pack
This lowers pack_32_2x16/unpack_32_2x16 into the scalar versions of those
instructions.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4738>
2020-04-27 18:40:03 +00:00
Jonathan Marek 42093bb694 nir: add pack_32_2x16_split/unpack_32_2x16_split lowering
The new option replaces the two other _split lowering options, since
there's no need for separate options.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4738>
2020-04-27 18:40:03 +00:00
Rob Clark a0de0db0e4 freedreno/ir3: small cleanup and comments
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4272>
2020-03-27 22:41:36 +00:00
Hyunjun Ko 1ee2ad584c freedreno/ir3: enable nir_opt_loop_unroll on a6xx
If precision lowering happens at GLSL IR, loop_analysis at IR doesn't
work as expected since it can't handle things like:

"(expression bool < (expression float16_t f2fmp (var_ref ndx) ) (constant float16_t (1.000000)) )"

So we'd rather do this optimization at the NIR stage.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
2020-03-24 23:21:21 +00:00
Eric Anholt e4baff9081 freedreno: Switch to using lowered image intrinsics.
This cuts out a bunch of deref chain walking that the compiler can do for
us.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
2020-02-24 18:25:02 +00:00
Jonathan Marek a3a70588c0 freedreno/ir3: support load_base_instance
Not supported by hardware, uses same mechanism as base vertex.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3162>
2019-12-19 15:13:40 -05:00
Jonathan Marek c7c5a84cf3 freedreno/ir3: lower pack/unpack ops
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3106>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3106>
2019-12-16 19:20:07 -05:00
Jonathan Marek b936143327 freedreno/ir3: lower mul_2x32_64
lower_mul_2x32_64 generates mul_high opcodes, and lower_mul_high is done by
nir_lower_alu, so call nir_lower_alu after nir_opt_algebraic.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-12-16 13:37:09 -05:00
Eric Anholt f58ef5d481 turnip: Lower usub_borrow.
Fixes dEQP-VK.glsl.builtin.function.integer.usubborrow.uvec2_mediump_fragment.

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2986>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2986>
2019-12-16 04:52:09 +00:00
Neil Roberts 37f5395783 freedreno/ir3: Enabling lowering 16-bit flrp
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-11-20 14:09:43 +01:00
Kristian H. Kristensen 7272e8a709 freedreno/ir3: Allocate const space for tessellation parameters
The tessellation stages need size and stride or the patch layout as
well as locations of attributes in the patch.  The tesselation stages
also use two system memory BOs and need the iovas of those.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-11-07 16:40:27 -08:00
Kristian H. Kristensen 56ed835bff freedreno/ir3: Extend geometry lowering pass to handle tessellation
VS and TCS pass varyings the same way as VS and GS does. TCS then
writes entire patch to a system memory BO and TES eventually reads
back from the BO once the TE starts generating vertices.  TES outputs
vertices the same way as VS and GS, except when there's a GS as well,
in which case TES passes varyings to GS same way the VS would.

In addition, the TCS needs a little bit of control flow massaging so
that it only runs for valid invocations needs a couple of unknown
instructions to synchronize with the TE.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-11-07 16:36:59 -08:00
Kristian H. Kristensen 8621fbc37b freedreno/ir3: Add tessellation field to shader key
Whether we're tessellating and which primitives the TES outputs
affects the entire pipeline so let's add a field to the key to track
that.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-11-07 16:36:56 -08:00
Rhys Perry 8b98d0954e nir/lower_idiv: add new llvm-based path
v2: make variable names snake_case
v2: minor cleanups in emit_udiv()
v2: fix Panfrost build failure
v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature
v4: remove nir_op_urcp
v5: drop nv50 path
v5: rebase
v6: add back nv50 path
v6: add comment for nir_lower_idiv_path enum
v7: rename _nv50/_llvm to _fast/_precise
v8: fix etnaviv build failure

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-21 18:49:46 +00:00
Rob Clark 5e08f070f0 nir: add nir_lower_amul pass
Lower amul to either imul or imul24, depending on whether 24b is enough
bits to calculate an offset within the thing being dereferenced.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-10-18 15:08:54 -07:00
Kristian H. Kristensen 0293d14719 freedreno/ir3: Implement primitive layout intrinsics
This implements the load_vs_primitive_stride_ir3,
load_vs_vertex_stride_ir3 and load_primitive_location_ir3 intrinsics,
used for getting the primitive layout strides and locations.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-17 13:43:53 -07:00
Kristian H. Kristensen 8e16fb1528 freedreno/ir3: Implement lowering passes for VS and GS
This introduces two new lowering passes. One to lower VS to explicit
outputs using STLW and one to lower GS to load input using LDLW and
implement the GS specific functionality.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-17 13:43:53 -07:00
Erik Faye-Lund 71c0dcf266 nir: support feeding state to nir_lower_clip_[vg]s
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17 10:41:36 +02:00
Erik Faye-Lund eb3047c094 nir: support lowering clipdist to arrays
This allows us to make sure clipdist is emitted as a scalar array rather
than two vec4s. This matches SPIR-V semantics, and will be useful for
Zink.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17 10:41:36 +02:00
Marek Olšák cebc38ff60 nir: add nir_shader_compiler_options::lower_to_scalar
This will replace PIPE_SHADER_CAP_SCALAR_ISA.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-10 15:49:18 -04:00
Daniel Schürmann 10e508c815 freedreno: Enable the nir_opt_algebraic_late() pass.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Kristian H. Kristensen cc4fe81145 freedreno/a6xx: Turn on vectorize_io
We want this for tessellation eventually, but we can turn it on now.

Shader-db results:

total instructions in shared programs: 8612905 -> 8611387 (-0.02%)
instructions in affected programs: 164952 -> 163434 (-0.92%)

total dwords in shared programs: 11952000 -> 11950560 (-0.01%)
dwords in affected programs: 68096 -> 66656 (-2.11%)

total full in shared programs: 315019 -> 315009 (<.01%)
full in affected programs: 1642 -> 1632 (-0.61%)

total constlen in shared programs: 2463654 -> 2463654 (0.00%)
constlen in affected programs: 0 -> 0

total (ss) in shared programs: 152379 -> 152409 (0.02%)
(ss) in affected programs: 1503 -> 1533 (2.00%)

total (sy) in shared programs: 96473 -> 96525 (0.05%)
(sy) in affected programs: 654 -> 706 (7.95%)

total max_sun in shared programs: 1172454 -> 1172472 (<.01%)
max_sun in affected programs: 104 -> 122 (17.31%)

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-18 16:59:10 -07:00
Vasily Khoruzhick 9367d2ca37 nir: allow specifying filter callback in lower_alu_to_scalar
Set of opcodes doesn't have enough flexibility in certain cases. E.g.
Utgard PP has vector conditional select operation, but condition is always
scalar. Lowering all the vector selects to scalar increases instruction
number, so we need a way to filter only those ops that can't be handled
in hardware.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-06 01:51:28 +00:00
Jason Ekstrand 951cf94521 nir: Add explicit signs to image min/max intrinsics
This better matches all the other atomic intrinsics such as those for
SSBOs and shared variables where the sign is part of the intrinsic
opcode.  Both generators (GLSL and SPIR-V) know the sign from the type
of the image variable or handle.  In SPIR-V, signed min/max are separate
opcodes from unsigned.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-21 17:19:55 +00:00
Rob Clark 4a188e4215 freedreno/ir3: track # of driver params
To avoid emitting unneeded const state.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:26 -07:00
Rhys Perry da8ed68aca nir: replace nir_move_load_const() with nir_opt_sink()
This is mostly the same as nir_move_load_const() but can also move
undef instructions, comparisons and some intrinsics (being careful with
loops).

v2: actually delete nir_move_load_const.c
v3: fix nir_opt_sink() usage in freedreno
v3: update Makefile.sources
v4: replace get_move_def with nir_can_move_instr and nir_instr_ssa_def
v4: handle if uses
v4: fix handling of nested loops
v5: re-write adjust_block_for_loops
v5: re-write setting of use_block for if uses

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 22:01:30 +00:00
Sagar Ghuge 456557a837 nir: Add lower_rotate flag and set to true in all drivers
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Eric Anholt 852704976a freedreno: Stop treating UBO 0 specially in UBO uploading.
ir3_nir_analyze_ubo_ranges() has already told us how much of cb0 we
need to upload (all of it, since it will lower indirect UBO 0 accesses
from load_ubo back to indirection on the constant buffer).

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-24 14:23:07 -07:00