Commit Graph

25 Commits

Author SHA1 Message Date
Dylan Baker a8e2d79e02 meson: use gnu_symbol_visibility argument
This uses a meson builtin to handle -fvisibility=hidden. This is nice
because we don't need to track which languages are used, if C++ is
suddenly added meson just does the right thing.

Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4740>
2020-06-01 18:59:18 +00:00
Rob Clark 3561d34fff freedreno/ir3: add simple validate pass
We can add to this as we notice other things that are worth validating
between ir3 passes.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
2020-05-19 16:06:17 +00:00
Rob Clark 947aa23eff freedreno/ir3: remove Sethi-Ullman numbering pass
We haven't used this for a while.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
2020-05-19 16:06:17 +00:00
Eric Anholt 1462b00391 freedreno/ir3: Add a unit test for our disassembler.
Makes sure that we can maintain consistent output from our disassembly as
we refactor.  I've only included stuff that matches qcom's disasm so far.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4736>
2020-04-27 19:35:00 +00:00
Kristian H. Kristensen da467817e3 freedreno/ir3: Move ir3 assembler to backend compiler
For easier reuse.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4741>
2020-04-25 00:03:43 +00:00
Rob Clark 751c11a8c7 freedreno/ir3: rename depth->dce
Since DCE is the only remaining function of this pass, after the pre-RA
scheduler rewrite.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4440>
2020-04-13 20:47:28 +00:00
Eric Engestrom 79af30768d meson: inline `inc_common`
Let's make it clear what includes are being added everywhere, so that
they can be cleaned up.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4360>
2020-03-28 21:36:54 +01:00
Rob Clark 29992a039e freedreno/ir3/ra: split-up
Split out regset and shared header, since the RA pass is already getting
large-ish.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4272>
2020-03-27 22:41:36 +00:00
Hyunjun Ko 6ee375f68d freedreno/ir3: Add new ir3 pass to fold out fp16 conversions
This pass tries to fold f2f16 conversion into alu instructions.
This will be useful to help reduce the number of instructions once
mesa starts supporting precision lowering.  For example:

  add.f r0.w, r0.w, c0.x
  cov.f32f16 hr2.x, r0.w

to

  add.f hr2.x, r0.w, c0.x

Additionally this pass also tries to fold f2f16 conversion into load_input
instruction:

  bary.f r0.x, 3, r0.w
  cov.f32f16 hr0.x, r0.x

to

  bary.f hr1.x, 3, r0.x

v2: Edit to not fold OPC_MAX_F and OPC_MIN_F, since that's not valid.

v3: Add OPC_ABSNEG_F to the blacklist as well.

v4: Don't remove dead cov instructions, DCE will do that later; don't
iterate through sources when a cov only has one; remove special
handling of IR3_REG_ARRAY and IR3_REG_RELATIV.

v5: Handle folding into u32.u32 movs of floats correctly, don't bail
out on IR3_REG_RELATIV or IR3_REG_ARRAY movs.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
2020-02-24 17:24:13 +00:00
Rob Clark 0f78c32492 freedreno/ir3: post-RA sched pass
After RA, we can schedule to increase parallelism (reduce nop's) without
worrying about increasing register pressure.  This pass lets us cut down
the instruction count ~10%, and prioritize bary.f, kill, etc, which
would tend to increase register pressure if we tried to do that before
RA.

It should be more useful if RA round-robin'd register choices.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
2020-02-01 02:40:22 +00:00
Rob Clark c803c662f9 freedreno/ir3: split out delay helpers
We're going to want these also for a post-RA sched pass.  And also to
split nop stuffing out into it's own pass.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
2020-02-01 02:40:22 +00:00
Eduardo Lima Mitev 2a0d45ae6c freedreno/ir3: Add a NIR pass to select tex instructions eligible for pre-fetch
The pass should run once at the end of shader compilation, for a4xx
onwards. It iterates texture sampling instructions and mark those
eligibile for pre-dispatch by changing the tex op from 'tex' to
'tex_prefetch'. An instruction is eligibile if:

* The coordinate is a vector where all its components come from a
  shader input.
* The order of the components match exactly that of the input (no
  swizzles).
* The instruction is in the 'main' function, and in the outer
  most-block.

The first two restrictions were arrived to empirically, so more
testing could tighten or loosen it.

The 3rd restriction is there to allow moving the instructions
eligible for pre-dispatch to the beginning of the shader, so
that we don't block the registers holding the result for too
long.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-18 21:11:54 +00:00
Kristian H. Kristensen 8e16fb1528 freedreno/ir3: Implement lowering passes for VS and GS
This introduces two new lowering passes. One to lower VS to explicit
outputs using STLW and one to lower GS to load input using LDLW and
implement the GS specific functionality.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-17 13:43:53 -07:00
Eduardo Lima Mitev 340277ad71 ir3/nir: Add new NIR AlgebraicPass for lowering imul
Currently, ir3 backend compiler is lowering integer multiplication from:

dst = a * b

to:

dst = (al * bl) + (ah * bl << 16) + (al * bh << 16)

by emitting this code:

mull.u tmp0, a, b           ; mul low, i.e. al * bl
madsh.m16 tmp1, a, b, tmp0  ; mul-add shift high mix, i.e. ah * bl << 16
madsh.m16 dst, b, a, tmp1   ; i.e. al * bh << 16

which at that point has very low chances of being optimized.

This patch adds a new nir_algebraic.AlgebraicPass to performs this
lowering during NIR algebraic optimization passes, giving it a better
chance for optimizing the resulting code.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 08:45:05 +02:00
Rob Clark 2f0b9d2249 freedreno/ir3: lower load_barycentric_at_offset
Calculates i,j at specified offset within a pixel.  A new load_size_ir3
intrinsic is used in conjunction with fddx/fddy to translate the offset
into primitive space and adjust the i,j from load_barycentric_pixel
accordingly.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark c4f423aa36 freedreno/ir3: lower load_barycentric_at_sample
This lowers load_barycentric_at_sample to load_sample_pos_from_id plus
load_barycentric_at_offset.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark fc865de777 freedreno/ir3: add pass to move varying loads
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-30 12:56:01 -04:00
Kristian H. Kristensen 3c8779af32 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS
This commit turns on the gallium cap and adds a pass to lower the
load_ubo intrinsics for block 0 back to load_uniform intrinsics and
adjust the backend where the cap switches units from vec4s to dwords.

As we stop using ir3_glsl_type_size() for uniform layout, this also
corrects an issue where we would allocate a vec4 slot for samplers in
uniforms, fixing:

  dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_fragment
  dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_vertex
  dEQP-GLES3.functional.shaders.struct.uniform.sampler_nested_fragment
  dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex
  dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-27 13:26:02 -07:00
Eduardo Lima Mitev 9dd0cfafc9 ir3/nir: Add a new pass 'ir3_nir_lower_io_offsets'
This NIR->NIR pass implements offset computations that are currently
done on the IR3 backend compiler, to give NIR a better chance of
optimizing them.

For now, it supports lowering the dword-offset computation for SSBO
instructions. It will take an SSBO intrinsic and replace it with the
new ir3-specific version that adds an extra source. That source will
hold the SSA value resulting from inserting a division by 4 (an SHR op)
of the original byte-offset source already provided by NIR in one of
the intrinsic sources.

Note that on a6xx the original byte-offset is not needed, so we could
potentially replace that source instead of adding a new one. But to
keep things simple and consistent we always add the new source and
a6xx will just ignore the original one.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-13 21:19:44 +01:00
Rob Clark 8a5f2d9444 freedreno/ir3: add Sethi–Ullman numbering pass
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-03 13:27:50 -05:00
Rob Clark 947848524d freedreno/ir3: add a6xx+ SSBO/image support
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:28:00 -05:00
Rob Clark feee3050d3 freedreno/ir3: split out a4xx+ instructions
Note that image/ssbo support is currently only implemented for a5xx.
But the instruction encoding is the same for a4xx.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:27:59 -05:00
Rob Clark 42af0640f6 freedreno/ir3: split out image helpers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:27:59 -05:00
Rob Clark 9517037bdc freedreno/ir3: code-motion
Split up ir3_compiler_nir.c a bit before starting to add new stuff for
a6xx SSBO/image instructions.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-12-07 13:49:21 -05:00
Rob Clark aa0fed10d3 freedreno: move ir3 to common location
Move (most of) the ir3 compiler to src/freedreno/ir3 so that it can be
re-used by some future vulkan driver.  The parts that are gallium
specific have been refactored out and remain in the gallium driver.

Getting the move done now so that it can happen before further
refactoring to support a6xx specific instructions.

NOTE also removes ir3_cmdline compiler tool from autotools build since
that was easier than fixing it and I normally use meson build.  Waiting
patiently for the day that we can remove *everything* from the autotools
build.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-11-27 15:44:02 -05:00