Commit Graph

13 Commits

Author SHA1 Message Date
Eduardo Lima Mitev 2a0d45ae6c freedreno/ir3: Add a NIR pass to select tex instructions eligible for pre-fetch
The pass should run once at the end of shader compilation, for a4xx
onwards. It iterates texture sampling instructions and mark those
eligibile for pre-dispatch by changing the tex op from 'tex' to
'tex_prefetch'. An instruction is eligibile if:

* The coordinate is a vector where all its components come from a
  shader input.
* The order of the components match exactly that of the input (no
  swizzles).
* The instruction is in the 'main' function, and in the outer
  most-block.

The first two restrictions were arrived to empirically, so more
testing could tighten or loosen it.

The 3rd restriction is there to allow moving the instructions
eligible for pre-dispatch to the beginning of the shader, so
that we don't block the registers holding the result for too
long.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-18 21:11:54 +00:00
Kristian H. Kristensen 8e16fb1528 freedreno/ir3: Implement lowering passes for VS and GS
This introduces two new lowering passes. One to lower VS to explicit
outputs using STLW and one to lower GS to load input using LDLW and
implement the GS specific functionality.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-17 13:43:53 -07:00
Eric Anholt 01d0bad9ef freedreno: Remove silly return from ir3_optimize_nir().
We only ever return the shader we were passed in (but internally
modified).

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-21 17:14:43 -07:00
Eduardo Lima Mitev 340277ad71 ir3/nir: Add new NIR AlgebraicPass for lowering imul
Currently, ir3 backend compiler is lowering integer multiplication from:

dst = a * b

to:

dst = (al * bl) + (ah * bl << 16) + (al * bh << 16)

by emitting this code:

mull.u tmp0, a, b           ; mul low, i.e. al * bl
madsh.m16 tmp1, a, b, tmp0  ; mul-add shift high mix, i.e. ah * bl << 16
madsh.m16 dst, b, a, tmp1   ; i.e. al * bh << 16

which at that point has very low chances of being optimized.

This patch adds a new nir_algebraic.AlgebraicPass to performs this
lowering during NIR algebraic optimization passes, giving it a better
chance for optimizing the resulting code.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 08:45:05 +02:00
Rob Clark b15c46e6bf freedreno/ir3: move const_state to ir3_shader
For a6xx, we construct/emit a single VS const state used for both
binning pass and draw pass.  So far we were mostly getting lucky that
there were not (obvious) mismatches between the const_state (like
different lowered immediates) between the binning and draw pass
VS ir3_shader_variant.

And I guess this situation will come up more as GS and tess is added
into the equation.

Since really everything about the const state is not specific to the
variant, move this.  The main exception is lowered immediates, but these
are the last to appear in the layout, and it doesn't hurt for each new
shader variant to just append any immed's it lowers to the end of the
immediate state.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark 5690f83bb5 freedreno/ir3: split out const_state setup
Next patch moves const_state to ir3_shader, before the compile context
is created.  So move the code around in prep to call it earlier.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark 23e7a34466 freedreno/ir3: consolidate const state
Combine the offsets of differenet parts of the constant space with (what
was formerly known as) ir3_driver_const_layout.  Bunch of churn, but no
functional change.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark 2f0b9d2249 freedreno/ir3: lower load_barycentric_at_offset
Calculates i,j at specified offset within a pixel.  A new load_size_ir3
intrinsic is used in conjunction with fddx/fddy to translate the offset
into primitive space and adjust the i,j from load_barycentric_pixel
accordingly.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark c4f423aa36 freedreno/ir3: lower load_barycentric_at_sample
This lowers load_barycentric_at_sample to load_sample_pos_from_id plus
load_barycentric_at_offset.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark fc865de777 freedreno/ir3: add pass to move varying loads
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-30 12:56:01 -04:00
Kristian H. Kristensen 3c8779af32 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS
This commit turns on the gallium cap and adds a pass to lower the
load_ubo intrinsics for block 0 back to load_uniform intrinsics and
adjust the backend where the cap switches units from vec4s to dwords.

As we stop using ir3_glsl_type_size() for uniform layout, this also
corrects an issue where we would allocate a vec4 slot for samplers in
uniforms, fixing:

  dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_fragment
  dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_vertex
  dEQP-GLES3.functional.shaders.struct.uniform.sampler_nested_fragment
  dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex
  dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-27 13:26:02 -07:00
Eduardo Lima Mitev 9dd0cfafc9 ir3/nir: Add a new pass 'ir3_nir_lower_io_offsets'
This NIR->NIR pass implements offset computations that are currently
done on the IR3 backend compiler, to give NIR a better chance of
optimizing them.

For now, it supports lowering the dword-offset computation for SSBO
instructions. It will take an SSBO intrinsic and replace it with the
new ir3-specific version that adds an extra source. That source will
hold the SSA value resulting from inserting a division by 4 (an SHR op)
of the original byte-offset source already provided by NIR in one of
the intrinsic sources.

Note that on a6xx the original byte-offset is not needed, so we could
potentially replace that source instead of adding a new one. But to
keep things simple and consistent we always add the new source and
a6xx will just ignore the original one.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-13 21:19:44 +01:00
Rob Clark aa0fed10d3 freedreno: move ir3 to common location
Move (most of) the ir3 compiler to src/freedreno/ir3 so that it can be
re-used by some future vulkan driver.  The parts that are gallium
specific have been refactored out and remain in the gallium driver.

Getting the move done now so that it can happen before further
refactoring to support a6xx specific instructions.

NOTE also removes ir3_cmdline compiler tool from autotools build since
that was easier than fixing it and I normally use meson build.  Waiting
patiently for the day that we can remove *everything* from the autotools
build.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-11-27 15:44:02 -05:00
Renamed from src/gallium/drivers/freedreno/ir3/ir3_nir.h (Browse further)