Commit Graph

193 Commits

Author SHA1 Message Date
Jason Ekstrand 9750164c09 nir: Rename get_buffer_size to get_ssbo_size
This makes it explicit that this intrinsic is only for SSBOs.  For the
v3dv driver, we'll be adding a get_ubo_size intrinsic and we want to be
able to distinguish between the two.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6812>
2020-09-22 13:34:12 +00:00
Gert Wollny 7ab804dbb4 freedreno/ir3: set lower_uniforms_to_ubo compiler flag
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6316>
2020-09-16 10:07:42 +00:00
Jonathan Marek 52534c3a86 freedreno/ir3: add view_zero to shader key
Does the same thing as layer_zero, but for VARYING_SLOT_VIEWPORT.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5832>
2020-09-15 16:18:45 +00:00
Marek Olšák ac55b1a9a6 nir: get ffma support from NIR options for nir_lower_flrp
This also fixes the inverted last parameter of nir_lower_flrp in most drivers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6599>
2020-09-04 17:06:22 +00:00
Eric Anholt 2b25240993 freedreno/ir3: Replace our custom vec4 UBO intrinsic with the shared lowering.
This gets us fewer comparisons in the shaders that we need to optimize
back out, and reduces backend code.

total instructions in shared programs: 11547270 -> 7219930 (-37.48%)
total full in shared programs: 334268 -> 319602 (-4.39%)

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6378>
2020-08-24 09:53:36 -07:00
Rob Clark 4f060549be freedreno/ir3: lower local_index using local_id
Somehow this works ok with the full compiler stack, but not in
ir3_cmdline.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6189>
2020-08-05 21:00:44 +00:00
Boris Brezillon 9e23925991 freedreno: Initialize lower_int64_options to a proper value
We're trying to get rid of the options argument passed to
nir_lower_int64() and use the nir_options.lower_int64_options instead.
But before we can do that we must patch nir_lower_int64() callers
that don't have this field properly set.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5588>
2020-07-30 16:54:24 +00:00
Jason Ekstrand 2956d53400 nir: Add nir_foreach_shader_in/out_variable helpers
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
2020-07-29 17:38:57 +00:00
Connor Abbott b559d26c74 freedreno/ir3: Fix SSBO size for bindless SSBO's
We theoretically could push these sizes to the const file
opportunistically, which appears to be what the blob does, but the
maximum number of SSBO's is way too big to do that unconditionally. Just
use resinfo to get the size for now.

Fixes on turnip: dEQP-VK.ssbo.unsized_array_length.*

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6012>
2020-07-21 19:53:32 +00:00
Icecream95 314ba5e174 nir: Add a face_sysval argument to nir_lower_two_sided_color
This is needed for handling drivers that use an input for loading the
face, for example Panfrost with Midgard GPUs.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Tested-by: Urja Rannikko <urjaman@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5915>
2020-07-17 14:50:26 +00:00
Jonathan Marek ffb6eb6d5d freedreno/ir3: run nir_opt_loop_unroll in optimization loop
GL driver was relying on this being done by gallium, but there might be
new loops to unroll during optimizations and turnip needs it.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5818>
2020-07-09 23:30:33 +00:00
Jonathan Marek f472c98443 freedreno/ir3: add support for a650 tess shared storage
A650 uses LDL/STL, and the "local_primitive_id" in tess ctrl shader comes
from bits 16-21 in the header instead of 0-5.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5764>
2020-07-08 02:30:23 +00:00
Connor Abbott 4f91345f49 ir3: Add layer_zero variant bit
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5732>
2020-07-07 08:10:47 +00:00
Ilia Mirkin 836d41d772 ir3: use empirical size for params as used by the shader
For example only some UCPs may be used by the shader, triggering asserts
that too many consts are being uploaded.

While we're at it, also fix the const size when loading UCPs, since
otherwise it doesn't correspond to what the shader is actually using.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5752>
2020-07-06 23:57:51 +00:00
Jason Ekstrand 36a9046848 freedreno: Only call nir_lower_io on shader_in/out
Gallium drivers should never see nir_var_uniform because gallium lowers
regular uniforms to a UBO.  No GL driver should ever see either
nir_var_mem_shared because that's lowered in GLSL IR.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>
2020-07-06 19:54:30 +00:00
Rob Clark d3ae559378 freedreno/ir3: add ir3_finalize_nir()
The next step is to hook this into pscreen->finalize_nir() so it can
come before the state tracker's shader-caching.

Unfortunately we still need to do lower_io after mesa/st, so that is
split out into a post-finalize pass.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5372>
2020-06-26 08:43:22 -07:00
Connor Abbott 9edff0cfd4 ir3: Support variants with different constlen's
This provides the mechanism for compiling variants with a reduced
constlen. The next patch provides the policy for choosing which to
reduce.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5607>
2020-06-26 09:34:33 +00:00
Jonathan Marek 16a9e233da freedreno/ir3: add support for load_draw_id
This is part of adding VK_KHR_shader_draw_parameters for turnip.

IR3_DP_VTXID_BASE/IR3_DP_VTXCNT_MAX offsets are changed to match what
CP_DRAW_INDIRECT_MULTI requires.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5635>
2020-06-25 15:57:45 +00:00
Rob Clark 3065c4bf92 freedreno/ir3: switch PIPE_CAP_TGSI_TEXCOORD
We don't really need the varying remapping, and it seems to somehow
happen twice when shader-cache comes into the picture.  But we can
just choose not to have this problem.

Now that everything is using the ir3_point_sprite() helper, we can
flip this pipe cap without it being a massive flag-day.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5595>
2020-06-24 22:29:28 +00:00
Rob Clark 82815bc980 freedreno/ir3: split ubo analysis/lowering passes
Since binning pass variants share the same const_state with their
draw-pass counterpart, we should re-use the draw-pass variant's ubo
range analysis.  So split the two functions of the existing pass
into two parts.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5526>
2020-06-21 00:52:02 +00:00
Rob Clark 8f11cc4cad freedreno/ir3: move output_loc to variant
This moves the last bit of important state to be serialized from
ir3_shader to ir3_shader_variant.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
2020-06-19 13:16:57 +00:00
Rob Clark 640ff0e847 freedreno/ir3: move const_state back to variant
For shader-cache, we want to not have anything important in `ir3_shader`.
And to have shader variants with lower const size limits (to properly
handle cross-stage limits), we also want variants to be able to have
their own const_state.

But we still need binning pass shaders to align with their draw pass
counterpart so that the same const emit can be used for both passes.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
2020-06-19 13:16:57 +00:00
Rob Clark 00926954c3 freedreno/ir3: un-embed const_state
Make it an rzalloc'd ptr instead of embedded struct, so it can serve as
the mem ctx for immediates.  This gets rid of needing to explicitly free
the immediates, so one less thing to deal with when moving const_state.
(Also, after we move const_state to the shader variant, we won't need
one for binning pass variants)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
2020-06-19 13:16:57 +00:00
Connor Abbott 65660622a1 ir3: Split out variant-specific lowering and optimizations
It seems a lot of the lowerings being run the second time were
unnecessary. In addition, when const_state is moved to the variant,
then it will become impossible to know ahead of time whether a variant
needs additional optimizing, which means that ir3_key_lowers_nir() needs
to go away. The new approach should have the same effect, since it skips
running lowerings that are unnecessary and then skips the opt loop if no
optimizations made progress, but it will work better when we move
ir3_nir_analyze_ubo_ranges() to be after variant creation.

The one maybe controversial thing I did is to make
nir_opt_algebraic_late() always happen during variant lowering. I wanted
to avoid code duplication, and it seems to me that we should push the
_late variants as far back as possible so that later opt_algebraic runs
don't miss out on optimization opportunities.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
2020-06-19 13:16:57 +00:00
Rob Clark 91ed8b7fe3 freedreno/ir3: drop shader->num_ubos
The only difference between this and `const_state->num_ubos` was that
the latter is counting # of ubos loaded via `ldg` (based on UBO addrs
in push-consts).  But turns out there isn't really any reason to care.
Instead just add an early return in the one code-path that cares about
the number of `ldg` UBOs.

This gets rid of one more thing we need to move from `ir3_shader` to
`ir3_shader_variant`.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
2020-06-19 13:16:57 +00:00
Rob Clark 70fbd48b3a freedreno/ir3: move ubo_state into const_state
As with const_state, this will also need to move into the variant.  To
simplify that, just move it into the const_state itself, since after all
it is related.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
2020-06-19 13:16:57 +00:00
Eric Anholt 486b894307 freedreno/ir3: Account for driver params in UBO max const upload.
The const state setup needs to be able to push its driver params, so
account for them in the analyze_ubo_ranges.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
2020-06-05 13:36:29 -07:00
Eric Anholt 9e58ab09ff freedreno/ir3: Drop unnecessary alignment of pushed UBO size.
The analysis pass gives us vec4-aligned size, and all of our other
constbuf allocations here are in vec4 units, so we can just divide by 16.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
2020-06-05 13:36:29 -07:00
Timothy Arceri 04dbf709ed nir: add callback to nir_remove_dead_variables()
This allows us to do API specific checks before removing variable
without filling nir_remove_dead_variables() with API specific code.

In the following patches we will use this to support the removal
of dead uniforms in GLSL.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4797>
2020-06-03 02:22:23 +00:00
Rob Clark cf21b76383 freedreno/ir3: use lower_wrmasks pass
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2020-05-13 20:24:53 -07:00
Eric Anholt 112c65825f freedreno/a6xx: Use LDC for UBO loads.
It saves addressing math, but may cause multiple loads to be done and
bcseled due to NIR not giving us good address alignment information
currently.  I don't have any workloads I know of using non-const-uploaded
UBOs, so I don't have perf numbers for it

This makes us match the GLES blob's behavior, and turnip (other than being
bindful).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4858>
2020-05-14 00:10:43 +00:00
Eric Anholt ab93a631b4 freedreno: Trim num_ubos to just the ones we haven't lowered to constbuf.
With the upcoming LDC usage in the GL driver, we don't want to be
uploading descriptors for every UBO when they aren't actually in use.
Trimming NIR's num_ubos will avoid that, and cleans up num_ubo handling
elsewhere right now.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4858>
2020-05-14 00:10:43 +00:00
Eric Anholt d5176c453e freedreno/ir3: Move i/o offset lowering after analyze_ubo_ranges.
I found that when moving more UBOs to load_ubo_ir3, analyze_ubo_ranges
would move things back in a broken way.  We can just run this pass later
and drop the _ir3 path.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4858>
2020-05-14 00:10:43 +00:00
Kristian H. Kristensen dd8d257a30 freedreno/ir3: Lower GS builtins before lowering IO
We mostly got away with replacing a store_output with a store_var, but
for complex types like structs, that doesn't work. Once the IO has
been lowered from vars to intrinsic, we've lost the deref chains and
can't properly shadow the outputs.

This commits moves the GS lowering up so we do it before the output
variables get lowered to store_output.  This way the pass works much
like nir_lower_io_to_temporaries() and cleanly shadows the outputs.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
2020-05-01 16:26:31 +00:00
Kristian H. Kristensen 79355fd901 freedreno/ir3: Add ir3_nir_lower_to_explicit_input() pass
This pass lowers per-vertex input intrinsics to load_shared_ir3. This
was open coded in the TCS and GS lowering passes before - this way we
can share it. Furthermore, we'll need to run the rest of the GS
lowering earlier (before lowering IO) so we need to split off this
part that operates on the IO intrinsics first.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
2020-05-01 16:26:31 +00:00
Kristian H. Kristensen b7bfccf085 freedreno/ir3: Rename ir3_nir_lower_to_explicit_io
We rename it to ir3_nir_lower_to_explicit_output, since it only
handles output and we'll add a lowering pass for input next.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
2020-05-01 16:26:31 +00:00
Jonathan Marek 065068c66a freedreno/ir3: run nir_lower_pack
This lowers pack_32_2x16/unpack_32_2x16 into the scalar versions of those
instructions.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4738>
2020-04-27 18:40:03 +00:00
Jonathan Marek 42093bb694 nir: add pack_32_2x16_split/unpack_32_2x16_split lowering
The new option replaces the two other _split lowering options, since
there's no need for separate options.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4738>
2020-04-27 18:40:03 +00:00
Rob Clark a0de0db0e4 freedreno/ir3: small cleanup and comments
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4272>
2020-03-27 22:41:36 +00:00
Hyunjun Ko 1ee2ad584c freedreno/ir3: enable nir_opt_loop_unroll on a6xx
If precision lowering happens at GLSL IR, loop_analysis at IR doesn't
work as expected since it can't handle things like:

"(expression bool < (expression float16_t f2fmp (var_ref ndx) ) (constant float16_t (1.000000)) )"

So we'd rather do this optimization at the NIR stage.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
2020-03-24 23:21:21 +00:00
Eric Anholt e4baff9081 freedreno: Switch to using lowered image intrinsics.
This cuts out a bunch of deref chain walking that the compiler can do for
us.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
2020-02-24 18:25:02 +00:00
Jonathan Marek a3a70588c0 freedreno/ir3: support load_base_instance
Not supported by hardware, uses same mechanism as base vertex.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3162>
2019-12-19 15:13:40 -05:00
Jonathan Marek c7c5a84cf3 freedreno/ir3: lower pack/unpack ops
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3106>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3106>
2019-12-16 19:20:07 -05:00
Jonathan Marek b936143327 freedreno/ir3: lower mul_2x32_64
lower_mul_2x32_64 generates mul_high opcodes, and lower_mul_high is done by
nir_lower_alu, so call nir_lower_alu after nir_opt_algebraic.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-12-16 13:37:09 -05:00
Eric Anholt f58ef5d481 turnip: Lower usub_borrow.
Fixes dEQP-VK.glsl.builtin.function.integer.usubborrow.uvec2_mediump_fragment.

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2986>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2986>
2019-12-16 04:52:09 +00:00
Neil Roberts 37f5395783 freedreno/ir3: Enabling lowering 16-bit flrp
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-11-20 14:09:43 +01:00
Kristian H. Kristensen 7272e8a709 freedreno/ir3: Allocate const space for tessellation parameters
The tessellation stages need size and stride or the patch layout as
well as locations of attributes in the patch.  The tesselation stages
also use two system memory BOs and need the iovas of those.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-11-07 16:40:27 -08:00
Kristian H. Kristensen 56ed835bff freedreno/ir3: Extend geometry lowering pass to handle tessellation
VS and TCS pass varyings the same way as VS and GS does. TCS then
writes entire patch to a system memory BO and TES eventually reads
back from the BO once the TE starts generating vertices.  TES outputs
vertices the same way as VS and GS, except when there's a GS as well,
in which case TES passes varyings to GS same way the VS would.

In addition, the TCS needs a little bit of control flow massaging so
that it only runs for valid invocations needs a couple of unknown
instructions to synchronize with the TE.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-11-07 16:36:59 -08:00
Kristian H. Kristensen 8621fbc37b freedreno/ir3: Add tessellation field to shader key
Whether we're tessellating and which primitives the TES outputs
affects the entire pipeline so let's add a field to the key to track
that.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-11-07 16:36:56 -08:00
Rhys Perry 8b98d0954e nir/lower_idiv: add new llvm-based path
v2: make variable names snake_case
v2: minor cleanups in emit_udiv()
v2: fix Panfrost build failure
v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature
v4: remove nir_op_urcp
v5: drop nv50 path
v5: rebase
v6: add back nv50 path
v6: add comment for nir_lower_idiv_path enum
v7: rename _nv50/_llvm to _fast/_precise
v8: fix etnaviv build failure

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-21 18:49:46 +00:00
Rob Clark 5e08f070f0 nir: add nir_lower_amul pass
Lower amul to either imul or imul24, depending on whether 24b is enough
bits to calculate an offset within the thing being dereferenced.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-10-18 15:08:54 -07:00
Kristian H. Kristensen 0293d14719 freedreno/ir3: Implement primitive layout intrinsics
This implements the load_vs_primitive_stride_ir3,
load_vs_vertex_stride_ir3 and load_primitive_location_ir3 intrinsics,
used for getting the primitive layout strides and locations.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-17 13:43:53 -07:00
Kristian H. Kristensen 8e16fb1528 freedreno/ir3: Implement lowering passes for VS and GS
This introduces two new lowering passes. One to lower VS to explicit
outputs using STLW and one to lower GS to load input using LDLW and
implement the GS specific functionality.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-17 13:43:53 -07:00
Erik Faye-Lund 71c0dcf266 nir: support feeding state to nir_lower_clip_[vg]s
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17 10:41:36 +02:00
Erik Faye-Lund eb3047c094 nir: support lowering clipdist to arrays
This allows us to make sure clipdist is emitted as a scalar array rather
than two vec4s. This matches SPIR-V semantics, and will be useful for
Zink.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17 10:41:36 +02:00
Marek Olšák cebc38ff60 nir: add nir_shader_compiler_options::lower_to_scalar
This will replace PIPE_SHADER_CAP_SCALAR_ISA.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-10 15:49:18 -04:00
Daniel Schürmann 10e508c815 freedreno: Enable the nir_opt_algebraic_late() pass.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Kristian H. Kristensen cc4fe81145 freedreno/a6xx: Turn on vectorize_io
We want this for tessellation eventually, but we can turn it on now.

Shader-db results:

total instructions in shared programs: 8612905 -> 8611387 (-0.02%)
instructions in affected programs: 164952 -> 163434 (-0.92%)

total dwords in shared programs: 11952000 -> 11950560 (-0.01%)
dwords in affected programs: 68096 -> 66656 (-2.11%)

total full in shared programs: 315019 -> 315009 (<.01%)
full in affected programs: 1642 -> 1632 (-0.61%)

total constlen in shared programs: 2463654 -> 2463654 (0.00%)
constlen in affected programs: 0 -> 0

total (ss) in shared programs: 152379 -> 152409 (0.02%)
(ss) in affected programs: 1503 -> 1533 (2.00%)

total (sy) in shared programs: 96473 -> 96525 (0.05%)
(sy) in affected programs: 654 -> 706 (7.95%)

total max_sun in shared programs: 1172454 -> 1172472 (<.01%)
max_sun in affected programs: 104 -> 122 (17.31%)

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-18 16:59:10 -07:00
Vasily Khoruzhick 9367d2ca37 nir: allow specifying filter callback in lower_alu_to_scalar
Set of opcodes doesn't have enough flexibility in certain cases. E.g.
Utgard PP has vector conditional select operation, but condition is always
scalar. Lowering all the vector selects to scalar increases instruction
number, so we need a way to filter only those ops that can't be handled
in hardware.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-06 01:51:28 +00:00
Jason Ekstrand 951cf94521 nir: Add explicit signs to image min/max intrinsics
This better matches all the other atomic intrinsics such as those for
SSBOs and shared variables where the sign is part of the intrinsic
opcode.  Both generators (GLSL and SPIR-V) know the sign from the type
of the image variable or handle.  In SPIR-V, signed min/max are separate
opcodes from unsigned.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-21 17:19:55 +00:00
Rob Clark 4a188e4215 freedreno/ir3: track # of driver params
To avoid emitting unneeded const state.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:26 -07:00
Rhys Perry da8ed68aca nir: replace nir_move_load_const() with nir_opt_sink()
This is mostly the same as nir_move_load_const() but can also move
undef instructions, comparisons and some intrinsics (being careful with
loops).

v2: actually delete nir_move_load_const.c
v3: fix nir_opt_sink() usage in freedreno
v3: update Makefile.sources
v4: replace get_move_def with nir_can_move_instr and nir_instr_ssa_def
v4: handle if uses
v4: fix handling of nested loops
v5: re-write adjust_block_for_loops
v5: re-write setting of use_block for if uses

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 22:01:30 +00:00
Sagar Ghuge 456557a837 nir: Add lower_rotate flag and set to true in all drivers
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Eric Anholt 852704976a freedreno: Stop treating UBO 0 specially in UBO uploading.
ir3_nir_analyze_ubo_ranges() has already told us how much of cb0 we
need to upload (all of it, since it will lower indirect UBO 0 accesses
from load_ubo back to indirection on the constant buffer).

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-24 14:23:07 -07:00
Daniel Schürmann 165b7f3a44 nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.
That is: the five least significant bits provide the values of
'bits' and 'offset' which is the case for all hardware currently
supported by NIR and using the bfm/bfe instructions.
This patch also changes the lowering of bitfield_insert/extract
using shifts to not use bfm and removes the flag 'lower_bfm'.

Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00
Eric Anholt 4449572c47 freedreno: Only upload UBO pointers for UBOs that haven't been lowered.
total constlen in shared programs: 2485933 -> 2462236 (-0.95%)

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-21 17:14:43 -07:00
Eric Anholt 01d0bad9ef freedreno: Remove silly return from ir3_optimize_nir().
We only ever return the shader we were passed in (but internally
modified).

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-21 17:14:43 -07:00
Kenneth Graunke c7d1b52a2c nir: Combine lower_fmod16/32 back into a single lower_fmod.
We originally had a single lower_fmod option.  In commit 2ab2d2e5, Sam
split 32 and 64-bit lowering into separate flags, with the rationale
that some drivers might want different options there.  This left 16-bit
unhandled, so Iago added a lower_fmod16 option in commit ca31df6f.

Now that lower_fmod64 is gone (in favor of nir_lower_doubles and
nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a
single lower_fmod flag again.  I'm not aware of any hardware which
need lowering for one bitsize and not the other.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 16:45:12 -07:00
Kenneth Graunke fa56a3795f gallium: Drop lower_fmod64 from drivers that don't support doubles.
Neither freedreno nor nv50 expose PIPE_CAP_DOUBLES, so there's no
fmod64 to be lowered.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 16:45:12 -07:00
Jonathan Marek d0bff89159 nir: allow specifying a set of opcodes in lower_alu_to_scalar
This can be used by both etnaviv and freedreno/a2xx as they are both vec4
architectures with some instructions being scalar-only.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-10 15:10:41 +00:00
Rob Clark b15c46e6bf freedreno/ir3: move const_state to ir3_shader
For a6xx, we construct/emit a single VS const state used for both
binning pass and draw pass.  So far we were mostly getting lucky that
there were not (obvious) mismatches between the const_state (like
different lowered immediates) between the binning and draw pass
VS ir3_shader_variant.

And I guess this situation will come up more as GS and tess is added
into the equation.

Since really everything about the const state is not specific to the
variant, move this.  The main exception is lowered immediates, but these
are the last to appear in the layout, and it doesn't hurt for each new
shader variant to just append any immed's it lowers to the end of the
immediate state.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark 5690f83bb5 freedreno/ir3: split out const_state setup
Next patch moves const_state to ir3_shader, before the compile context
is created.  So move the code around in prep to call it earlier.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark 23e7a34466 freedreno/ir3: consolidate const state
Combine the offsets of differenet parts of the constant space with (what
was formerly known as) ir3_driver_const_layout.  Bunch of churn, but no
functional change.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Ian Romanick d41cdef2a5 nir: Use the flrp lowering pass instead of nir_opt_algebraic
I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(

i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc.  On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.

No changes on any other Intel platforms.

v2: Add panfrost changes.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Christian Gmeiner 4e110eca42 nir: nir_shader_compiler_options: drop native_integers
Driver which do not support native integers should use a lowering
pass to go from integers to floats.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 07:35:52 +02:00
Rob Clark 650246523b freedreno/ir3: fb read support
Lower load_output to txf_ms_fb and add support for the new texture fetch
instruction.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Rob Clark 32925f4072 freedreno/ir3: fix shader variants vs UBO analysis
Otherwise we zero out the state again, but all the UBO loads that we
could lower are already lowered.  End result is that we didn't emit the
uniforms for lowered UBO access in any case where multiple shader
variants are used.

Fixes: 893425a607 freedreno/ir3: Push UBOs to constant file
Fixes: 3c8779af32 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-02 11:19:22 -07:00
Karol Herbst fe8c57e859 freedreno/ir3: use nir_src_as_uint in a few places
v2 (Jason Ekstrand):
 - Add even more places

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-14 22:25:56 +02:00
Jason Ekstrand 6279074de1 nir: Get rid of global registers
We have a pass to lower global registers to locals and many drivers
dutifully call it.  However, no one ever creates a global register ever
so it's all dead code.  It's time we bury it.

Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-09 00:29:36 -05:00
Timothy Arceri e30804c602 nir/radv: remove restrictions on opt_if_loop_last_continue()
When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.

However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.

28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)

Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)

The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.

This gives +10% FPS with Doom on my Vega56.

Rhys Perry also benchmarked Doom on his VEGA64:

Before: 72.53 FPS
After:  80.77 FPS

v2: disable pass on non-AMD drivers

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-09 11:29:41 +10:00
Rob Clark 7ff6705b8d freedreno/ir3: convert to "new style" frag inputs
Add support for load_barycentric_pixel, load_interpolated_input, and
friends.  For now, this retains support for old-style inputs, which can
probably be dropped with some ttn work.

Prep work for sample-shading support.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-30 12:56:01 -04:00
Kristian H. Kristensen 3c8779af32 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS
This commit turns on the gallium cap and adds a pass to lower the
load_ubo intrinsics for block 0 back to load_uniform intrinsics and
adjust the backend where the cap switches units from vec4s to dwords.

As we stop using ir3_glsl_type_size() for uniform layout, this also
corrects an issue where we would allocate a vec4 slot for samplers in
uniforms, fixing:

  dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_fragment
  dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_vertex
  dEQP-GLES3.functional.shaders.struct.uniform.sampler_nested_fragment
  dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex
  dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-27 13:26:02 -07:00
Rob Clark 3d8349048b freedreno/ir3: additional lowering
For some things that show up when we expose higher glsl

TODO check blob traces to see if we have instructions for some of this?
I guess we don't but worth a check..

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Jason Ekstrand 08f804ec0c anv,radv,turnip: Lower TG4 offsets with nir_lower_tex
v2: turn on for turnip as well (Karol Herbst)

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-03-21 02:58:41 +00:00
Eduardo Lima Mitev 2e4525883f ir3/compiler: Enable lower_io_offsets pass and handle new SSBO intrinsics
These intrinsics have the offset in dwords already computed in the last
source, so the change here is basically using that instead of emitting
the ir3_SHR to divide the byte-offset by 4.

The improvement in shader stats is significant, of up to ~15% in
instruction count in some cases. Tested only on a5xx.

shader-db is unfortunately not very useful here because shaders that use
SSBO require GLSL versions that are not supported by freedreno yet.

For examples, most Khronos CTS tests under 'dEQP-GLES31.functional.ssbo.*'
are helped.

A random case:

dEQP-GLES31.functional.ssbo.layout.2_level_array.packed.row_major_mat3x2

with current master:

; CL prog 14/1: 1252 instructions, 0 half, 48 full
; 8 const, 8 constlen
; 61 (ss), 43 (sy)

with the SSBO dword-offset moved to NIR:

; CL prog 14/1: 1053 instructions, 0 half, 45 full
; 7 const, 7 constlen
; 34 (ss), 73 (sy)

The SHR previously emitted for every single SSBO instruction disappears
in most cases, and the dword-offset ends up embedded in the STGB
instruction as immediate in many cases as well.

There are also a few of those tests that are currently failing on register
allocation, that start to pass as a result of reducing the pressure. At least
these, probably more:

dEQP-GLES31.functional.ssbo.layout.random.unsized_arrays.24
dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.6
dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.17
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays.14
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.5
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.7

No regressions observed with relevant CTS and piglit tests.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-13 21:19:44 +01:00
Rob Clark ad25948261 freedreno/ir3: turn on [iu]mul_high
Which also requires uadd_carry lowering

Until recently this was lowered in glsl ir so it went unnoticed that we
weren't lowering it.

Fixes: 1d8994a63b glsl: [u/i]mulExtended optimization for GLSL
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-08 18:44:57 -05:00
Rob Clark db1fa21374 freedreno/a6xx: vertex_id is not _zero_based
Fixes dEQP-GLES31.functional.draw_base_vertex.draw_elements_base_vertex.builtin_variable.vertex_id

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-26 13:19:44 -05:00
Eric Anholt 338d399fd0 freedreno: Use the NIR lowering for isign.
I think this will save an instruction and hopefully not increase any other
costs (possibly the immediate -1 and 1?), but I haven't actually tested.

Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-14 00:32:30 +00:00
Karol Herbst 9b24028426 nir: rename nir_var_function to nir_var_function_temp
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-19 20:01:41 +01:00
Karol Herbst d0c6ef2793 nir: rename global/local to private/function memory
the naming is a bit confusing no matter how you look at it. Within SPIR-V
"global" memory is memory accessible from all threads. glsl "global" memory
normally refers to shader thread private memory declared at global scope. As
we already use "shared" for memory shared across all thrads of a work group
the solution where everybody could be happy with is to rename "global" to
"private" and use "global" later for memory usually stored within system
accessible memory (be it VRAM or system RAM if keeping SVM in mind).
glsl "local" memory is memory only accessible within a function, while SPIR-V
"local" memory is memory accessible within the same workgroup.

v2: rename local to function as well
v3: rename vtn_variable_mode_local as well

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-08 18:51:46 +01:00
Ian Romanick 378f996771 nir/opt_peephole_select: Don't peephole_select expensive math instructions
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive.  On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.

This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).

v2: Remove stray #if block.  Noticed by Thomas.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick 09b7e1d8e4 nir/opt_peephole_select: Don't try to remove flow control around indirect loads
That flow control may be trying to avoid invalid loads.  On at least
some platforms, those loads can also be expensive.

No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").

v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select.  Suggested
by Rob.  See also the big comment in src/intel/compiler/brw_nir.c.

v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).

v4: Fix inverted condition in brw_nir.c.  Noticed by Lionel.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Rob Clark aa0fed10d3 freedreno: move ir3 to common location
Move (most of) the ir3 compiler to src/freedreno/ir3 so that it can be
re-used by some future vulkan driver.  The parts that are gallium
specific have been refactored out and remain in the gallium driver.

Getting the move done now so that it can happen before further
refactoring to support a6xx specific instructions.

NOTE also removes ir3_cmdline compiler tool from autotools build since
that was easier than fixing it and I normally use meson build.  Waiting
patiently for the day that we can remove *everything* from the autotools
build.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-11-27 15:44:02 -05:00
Renamed from src/gallium/drivers/freedreno/ir3/ir3_nir.c (Browse further)