Commit Graph

337 Commits

Author SHA1 Message Date
Rob Clark 7bbf21e898 freedreno/ir3: immediately schedule meta instructions
The aren't real instructions, and don't change # of live values, so no
point in them competing with real instructions.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-03 12:44:03 -07:00
Rob Clark 771d04c82d freedreno/ir3: scheduler improvements
For instructions that increase the # of live values, apply a threshold
to avoid scheduling them too early.  And factor the net change of # of
live values that would result from scheduling an instruction, to
prioritize instructions that reduce number of live values as the number
of live values increases.

For manhattan:

  total instructions in shared programs: 27869 -> 28413 (1.95%)
  instructions in affected programs: 26756 -> 27300 (2.03%)
  helped: 102
  HURT: 87

  total full in shared programs: 1903 -> 1719 (-9.67%)
  full in affected programs: 1390 -> 1206 (-13.24%)
  helped: 124
  HURT: 9

The reduction in register usage nets ~20% gain in manhattan.  (So
getting mediump support should be a huge win for gles gfxbench.)

Also significantly helps some of the more complex shadertoy shaders,
like IQ's Piano (32 to 18 regs, doubles fps).

The effect is less pronounced on smaller shaders.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-03 12:44:03 -07:00
Rob Clark bb3aa44ade freedreno/ir3: sched should mark outputs used
Account for shader outputs and values live in any direct/indirect
successor block.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-03 12:44:03 -07:00
Rob Clark 8b7bf5e07a freedreno/ir3: fix constlen versus indirect UBO
If we access the address of the UBO indirectly, and there is no higher
const emitted w/ direct access (like an immediate lowered to uniform)
the assembler won't figure out the correct constlen.

Fixes:
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_vertex
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_vertex
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-31 12:58:33 -07:00
Rob Clark f9fa456e1d freedreno/ir3: set more barrier bits
Blob is also setting the .l bit, and it seems to solve some intermittent
failures with a couple of deqp's:

dEQP-GLES31.functional.image_load_store.2d.qualifiers.coherent_r32i
dEQP-GLES31.functional.image_load_store.2d.qualifiers.volatile_r32f

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
2019-05-31 12:58:33 -07:00
Rob Clark 5d43b806ba freedreno/ir3: set (ss) on last_input if ldlv
It seems like (ei) handling doesn't sync on (ss), so we could end up in
a situation where we release varying storage before an ldlv for flat
shaded varyings completes.  Keep track if we've done an (ss) since the
last ldlv, and if not add (ss) flag to last_input which gets (ei).

Noticed with dEQP-GLES3.functional.fragment_out.random.24 and
dEQP-GLES3.functional.fragment_out.random.27, which previously passed by
luck because ir3_sched ordered instructions in a way that resulted in a
lucky (ss).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
2019-05-31 12:58:33 -07:00
Rob Clark 73fb02c5d6 freedreno/ir3: add assert
The special handling for last_input assumes that all the varying loads
are in the first block.  Add an assert to catch if anyone breaks that
assumption.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-31 12:58:33 -07:00
Jonathan Marek 1db86d8b62 freedreno/ir3: fix input ncomp for vertex shaders
ncomp is never set for vertex shaders, but a3xx and a4xx still use it.

Fixes: 831f1a05c0 freedreno/ir3: rework varying packing

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-05-31 12:21:23 -04:00
Caio Marcelo de Oliveira Filho e45bf01940 spirv: Change spirv_to_nir() to return a nir_shader
spirv_to_nir() returned the nir_function corresponding to the
entrypoint, as a way to identify it.  There's now a bool is_entrypoint
in nir_function and also a helper function to get the entry_point from
a nir_shader.

The return type reflects better what the function name suggests.  It
also helps drivers avoid the mistake of reusing internal shader
references after running NIR_PASS on it.  When using NIR_TEST_CLONE or
NIR_TEST_SERIALIZE, those would be invalidated right in the first pass
executed.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 10:34:35 -07:00
Caio Marcelo de Oliveira Filho c92d002982 turnip: Don't re-use entry_point pointer from spirv_to_nir
Replace its uses with nir_shader_get_entrypoint(), and change the
helper function to return nir_shader *.

This is a preparation to change spirv_to_nir() return type.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-29 10:26:22 -07:00
Jason Ekstrand f2dc0f2872 nir: Drop imov/fmov in favor of one mov instruction
The difference between imov and fmov has been a constant source of
confusion in NIR for years.  No one really knows why we have two or when
to use one vs. the other.  The real reason is that they do different
things in the presence of source and destination modifiers.  However,
without modifiers (which many back-ends don't have), they are identical.
Now that we've reworked nir_lower_to_source_mods to leave one abs/neg
instruction in place rather than replacing them with imov or fmov
instructions, we don't need two different instructions at all anymore.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Rob Clark <robdclark@chromium.org>
2019-05-24 08:38:11 -05:00
Lionel Landwerlin cb7c9b2a93 vulkan: fix build dependency issue with generated files
On machines with many cores, you can run into that issue :

../mesa-9999/src/vulkan/overlay-layer/overlay.cpp:42:10: fatal error: vk_enum_to_str.h: No such file or directory

v2: Move declare_dependency around (Eric)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Jan Ziak
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-22 14:07:14 +00:00
Eric Anholt ef88e23d03 freedreno: Log the number of loops in the shader for shader-db.
shader-db's report.py will use this to see when we've changed loop
unrolling behavior on a shader and skip including other stats like
instruction count from being considered for that shader, since they won't
be useful as a proxy for real world performance in that case.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eduardo Lima Mitev <elima@igalia.com>
2019-05-16 10:25:22 -07:00
Eric Anholt 18d11cb4dc freedreno: Move msm_drm.h to the same spot as other DRM uapi.
The new location matches other drivers, and has a README about the rules
for updating it.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-05-14 11:51:55 -07:00
Eric Anholt c49f0159bd freedreno: Quiet compiler warnings on 64-bit.
__u64 is a ulonglong on x86_64, not uint64_t, so my gcc was complaining
about the wrong type being passed in.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-13 15:37:01 -07:00
Eric Anholt 0734905d9a freedreno: Make emacs indent the way robclark's eclipse does.
The .editorconfig helps with the tabs, but we've got this
two-tabs-from-previous-indentation line continuation style that requires
whacking the c-file-offsets.  This will throw emacs warnings when first
opening a file in the directory, press '!' to shut it up for the future.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-13 15:37:01 -07:00
Eric Anholt 257999d9a8 freedreno: Make .editorconfig match .dir-locals.el.
The editorconfig takes precedence over dir-locals in emacs26 with
editorconfig enabled, so the /.editorconfig was affecting these
directories.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-13 15:37:01 -07:00
Jason Ekstrand 072227da0a tu/entrypoints: Import copy
It's used without being imported
2019-05-13 17:20:33 -05:00
Jonathan Marek d0bff89159 nir: allow specifying a set of opcodes in lower_alu_to_scalar
This can be used by both etnaviv and freedreno/a2xx as they are both vec4
architectures with some instructions being scalar-only.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-10 15:10:41 +00:00
Rob Clark 9faf218b8c freedreno/ir3: fix rasterflat/glxgears
Ofc legacy gl features that are broken don't trigger fails in deqp.  I
should remember to test glxgears more often.

Fixes: 7ff6705b8d freedreno/ir3: convert to "new style" frag inputs
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-09 16:21:05 -07:00
Rob Clark b15c46e6bf freedreno/ir3: move const_state to ir3_shader
For a6xx, we construct/emit a single VS const state used for both
binning pass and draw pass.  So far we were mostly getting lucky that
there were not (obvious) mismatches between the const_state (like
different lowered immediates) between the binning and draw pass
VS ir3_shader_variant.

And I guess this situation will come up more as GS and tess is added
into the equation.

Since really everything about the const state is not specific to the
variant, move this.  The main exception is lowered immediates, but these
are the last to appear in the layout, and it doesn't hurt for each new
shader variant to just append any immed's it lowers to the end of the
immediate state.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark 5690f83bb5 freedreno/ir3: split out const_state setup
Next patch moves const_state to ir3_shader, before the compile context
is created.  So move the code around in prep to call it earlier.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark 9403184ddd freedreno/ir3: move immediates to const_state
They are really part of the constant state, and it will moving things
from ir3_shader_variant to ir3_shader if we combine them.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark 23e7a34466 freedreno/ir3: consolidate const state
Combine the offsets of differenet parts of the constant space with (what
was formerly known as) ir3_driver_const_layout.  Bunch of churn, but no
functional change.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Rob Clark ef3eecd66b freedreno/ir3: move ir3_pointer_size()
Move to ir3_compiler so it doesn't depend on the compile context.  Prep
work for moving constant state from variant (where we have compile
context) to shader (where we do not).

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-07 07:26:00 -07:00
Ian Romanick d41cdef2a5 nir: Use the flrp lowering pass instead of nir_opt_algebraic
I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(

i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders.  Gen4 and Gen5 both set lower_flrp32
only for vertex shaders.  For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc.  On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.

No changes on any other Intel platforms.

v2: Add panfrost changes.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-06 22:52:29 -07:00
Christian Gmeiner 4e110eca42 nir: nir_shader_compiler_options: drop native_integers
Driver which do not support native integers should use a lowering
pass to go from integers to floats.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 07:35:52 +02:00
John Stultz d04f44a459 mesa: Makefile.sources: Add ir3_nir_lower_load_barycentric_at_sample/offset to Makefile.sources
In commit 2f0b9d2249 ("freedreno/ir3: lower
load_barycentric_at_offset") a new file was added that needs to
also be added to the Makefile.sources list used by Android and
SCons build system.

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 2f0b9d2249 ("freedreno/ir3: lower load_barycentric_at_offset")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-05-06 11:29:26 +00:00
Amit Pundir 88105375c9 mesa: android: freedreno: build libfreedreno_{drm,ir3} static libs
Add libfreedreno_drm/ir3 to the build

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: b4476138d5 ("freedreno: move drm to common location")
Fixes: aa0fed10d3 ("freedreno: move ir3 to common location")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[jstultz: Tweaked to add extra ir3 files from master]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-05-06 11:29:26 +00:00
Rob Clark 6ffb58726b freedreno: update generated headers
Corrects tex state ubwc pitch/size

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-04 11:50:44 -07:00
Rob Clark 8c97b3c546 freedreno/ir3: remove assert
Fixes dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 and .20

ca3eb5db66 went from silently truncating
the constant state, which was also the wrong thing to do, to an assert.
Which then showed up in a couple of dEQPs.  Actually there is nothing
wrong with larger constant file so just drop the assert.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-04 11:50:44 -07:00
Rob Clark 650246523b freedreno/ir3: fb read support
Lower load_output to txf_ms_fb and add support for the new texture fetch
instruction.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Rob Clark 0704ddb2e5 freedreno/drm: expose GMEM_BASE address
Needed for sampling from tile buffer (GMEM).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Rob Clark ca3eb5db66 freedreno/ir3: add some ubo range related asserts
And a comment..  since we are mixing units of bytes/dwords/vec4,
hopefully this will avoid some unit confusion.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-02 11:19:22 -07:00
Rob Clark e941faf3e8 freedreno/ir3: add IR3_SHADER_DEBUG flag to disable ubo lowering
It isn't quite as simple as not running the pass, since with packed
varyings we get load_ubo for block==0 (ie. the "real" uniforms).  So
instead run the pass normally but decline to lower anything in
block > 0

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-02 11:19:22 -07:00
Rob Clark f697f61590 freedreno/ir3: fix lowered ubo region alignment
Since we emit UBO regions INDIRECTly (ie. not copied into cmdstream but
emit by EXT_SRC_ADDR) we need to keep them 4*vec4 aligned.  Which the
code already mostly did, except for aligning the first UBO region itself
(ie. the one after block==0 which is the "real" uniforms).

Fixes: 893425a607 freedreno/ir3: Push UBOs to constant file
Fixes: 3c8779af32 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-02 11:19:22 -07:00
Rob Clark 32925f4072 freedreno/ir3: fix shader variants vs UBO analysis
Otherwise we zero out the state again, but all the UBO loads that we
could lower are already lowered.  End result is that we didn't emit the
uniforms for lowered UBO access in any case where multiple shader
variants are used.

Fixes: 893425a607 freedreno/ir3: Push UBOs to constant file
Fixes: 3c8779af32 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-02 11:19:22 -07:00
Rob Clark ec6c229763 freedreno/ir3: fixes for half reg in/out
Needs to update max_half_reg, or be remapped to full reg and update
max_reg accordingly, depending on generation..

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-30 10:39:24 -07:00
Eric Engestrom 0cff98c8a0 turnip: update to use the new features struct names
These were updated in version 1.1.106 of vulkan.h to make more sense
with the extension names.  We may as well keep with the times.

See also: 90108deb27 "anv: Update to use the new features struct names"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-04-30 16:55:38 +01:00
Rob Clark da327afb2a freedreno/ir3: switch fragcoord to sysval
Because who are we kidding... it is a sysval.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-29 17:01:01 -07:00
Kristian H. Kristensen a7c70bb2a1 freedreno/drm: Quiet pointer to u64 conversion warning 2019-04-26 11:58:44 -07:00
Emil Velikov 1a9367c134 turnip: drop dead close(master_fd)
The fd is -1, thus the block of if (fd != -1) close(fd) is dead code.

Cc: Chad Versace <chadversary@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-04-26 11:26:33 +01:00
Rob Clark ee2e3a07bb freedreno/ir3: sample-shading support
The compiler support for:

  OES_sample_shading
  OES_sample_variables
  OES_shader_multisample_interpolation

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark c8e825aaac freedreno/ir3: fix load_interpolated_input slot
The so->inputs[] table is in units of vec4

Fixes: 7ff6705b8d freedreno/ir3: convert to "new style" frag inputs
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark f4b4d6cf23 freedreno/ir3: rename frag_vcoord -> ij_pixel
Since this is what the value actually is.  Cleanup the name before
adding more different i,j related values for sample-shading.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 5be415fc2b freedreno/ir3: remove bogus assert
tex instruction can actually return 16b values.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 2f0b9d2249 freedreno/ir3: lower load_barycentric_at_offset
Calculates i,j at specified offset within a pixel.  A new load_size_ir3
intrinsic is used in conjunction with fddx/fddy to translate the offset
into primitive space and adjust the i,j from load_barycentric_pixel
accordingly.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark c4f423aa36 freedreno/ir3: lower load_barycentric_at_sample
This lowers load_barycentric_at_sample to load_sample_pos_from_id plus
load_barycentric_at_offset.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 4e3ce224a7 freedreno: update generated headers
Pull in updates for sample shading.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 6d6ec2d4d2 freedreno/ir3: cleanup instruction builder macros
De-duplicate the "normal" and "flags" versions of the macros, and while
at it go ahead and add "flags" versions for all the remaining macros,
since we'll at least need INSTR1F in a following commit.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 77b3b96a3b freedreno/ir3: more emit-cat5 fixes
Couple more opcodes which don't take a sampler id as first arg.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 9032f0690c freedreno/ir3: fix rgetpos decoding
It takes an argument.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 4d08c1b595 compiler: rename SYSTEM_VALUE_VARYING_COORD
And add corresponding enums for different sorts of varying
interpolation.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark 6503918689 freedreno/drm: update for robustness
Update UABI header and add FD_PP_PGTABLE and FD_NR_FAULTS params.

Robustness can be supported by a kernel which provides the new ABI if it
also indicates that per-process pagetables are in use.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:07 -07:00
Bas Nieuwenhuizen f2e0f5c3c4 vulkan/wsi: Add X11 adaptive sync support based on dri options.
The dri options are optional. When the dri options are not provided
the WSI will not  use adaptive sync.

FWIW I think for xf86-video-amdgpu this still requires an X11 config
option, so only people who opt in can get possible regressions from this.

So then the remaining question is: why do this in the WSI?

It has been suggested in another MR that the application sets this.
However, I disagree with that as I don't think we'll ever get a
reasonable set of applications setting it.

The next questions is whether this can be a layer. It definitely
can be as implemented now. However, I think this generally fits
well with the function of the WSI. Furthemore, for e.g. the DISPLAY
WSI this is much harder to do in a layer.

Of course, most of the WSI could almost be a layer, but I think
this still fits best in the WSI.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-23 23:49:39 +00:00
Rob Clark a9241edfa3 freedreno/ir3: fix const assert
Fixes: fe8c57e859 freedreno/ir3: use nir_src_as_uint in a few places
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-19 12:36:06 -07:00
Kristian H. Kristensen 18ce6ac632 freedreno/ir3: Mark ir3_context_error() as NORETURN
Fixes a few warnings.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-04-18 11:46:13 -07:00
Dylan Baker 95aefc94a9 Delete autotools
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Matt Turner <mattst88@gmail.com>
2019-04-15 13:44:29 -07:00
Karol Herbst 14531d676b nir: make nir_const_value scalar
v2: remove & operator in a couple of memsets
    add some memsets
v3: fixup lima

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
2019-04-14 22:25:56 +02:00
Karol Herbst fe8c57e859 freedreno/ir3: use nir_src_as_uint in a few places
v2 (Jason Ekstrand):
 - Add even more places

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-14 22:25:56 +02:00
Timothy Arceri 035759b61b nir/i965/freedreno/vc4: add a bindless bool to type size functions
This required to calculate sizes correctly when we have bindless
samplers/images.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Jason Ekstrand 6279074de1 nir: Get rid of global registers
We have a pass to lower global registers to locals and many drivers
dutifully call it.  However, no one ever creates a global register ever
so it's all dead code.  It's time we bury it.

Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-09 00:29:36 -05:00
Timothy Arceri e30804c602 nir/radv: remove restrictions on opt_if_loop_last_continue()
When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.

However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.

28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)

Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)

The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.

This gives +10% FPS with Doom on my Vega56.

Rhys Perry also benchmarked Doom on his VEGA64:

Before: 72.53 FPS
After:  80.77 FPS

v2: disable pass on non-AMD drivers

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-09 11:29:41 +10:00
Rob Clark 7ff6705b8d freedreno/ir3: convert to "new style" frag inputs
Add support for load_barycentric_pixel, load_interpolated_input, and
friends.  For now, this retains support for old-style inputs, which can
probably be dropped with some ttn work.

Prep work for sample-shading support.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-30 12:56:01 -04:00
Rob Clark fc865de777 freedreno/ir3: add pass to move varying loads
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-30 12:56:01 -04:00
Rob Clark 831f1a05c0 freedreno/ir3: rework varying packing
Originally we kept track of a table of inputs.  But with new-style frag
inputs this becomes awkward.  Re-work it so that initially we assigned
un-packed varying locations, and then after the shader is compiled scan
to find actual used inputs, and re-pack.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-30 12:56:01 -04:00
Rob Clark 91a1354cd6 freedreno/ir3: re-indent comment
Make it more clear that it applies to the following 'case' statements,
rather than the previous one.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-30 12:56:01 -04:00
Rob Clark 26e2906382 freedreno/ir3: reads/writes to unrelated arrays are not dependent
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-28 14:36:24 -04:00
Rob Clark d71ce69d9c freedreno/ir3: sched fix
Not sure why new-style frag inputs start triggering this.  But we
probably shouldn't consider src's from other blocks.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-28 14:36:24 -04:00
Kristian H. Kristensen 107a8ec3b3 freedreno/ir3: Add workaround for VS samgq
This instruction needs a workaround when used from vertex shaders.

Fixes:

  dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler2dshadow_vertex
  dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler3d_fixed_vertex
  dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler3d_float_vertex
  dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler2dshadow_vertex
  dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler3d_fixed_vertex
  dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler3d_float_vertex
  dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler2dshadow_vertex

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-28 10:26:32 -07:00
Kristian H. Kristensen f30d4a1cca freedreno/ir3: Don't access beyond available regs
emit_cat5() needs to check if the last optional reg is there before it
accesses it.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-28 10:26:32 -07:00
Kristian H. Kristensen 893425a607 freedreno/ir3: Push UBOs to constant file
We have a rather big constant file and it seems that the best way to
use it is to upload all UBOs and lower UBO access the load_uniform.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-27 13:26:02 -07:00
Kristian H. Kristensen 3c8779af32 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS
This commit turns on the gallium cap and adds a pass to lower the
load_ubo intrinsics for block 0 back to load_uniform intrinsics and
adjust the backend where the cap switches units from vec4s to dwords.

As we stop using ir3_glsl_type_size() for uniform layout, this also
corrects an issue where we would allocate a vec4 slot for samplers in
uniforms, fixing:

  dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_fragment
  dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_vertex
  dEQP-GLES3.functional.shaders.struct.uniform.sampler_nested_fragment
  dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex
  dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-27 13:26:02 -07:00
Kristian H. Kristensen c7c432738a freedreno/ir3: Fix operand order for DSX/DSY
Most cat5 instructions are constructed using ir3_SAM, which uses
regs[1] for the (sampler, tex) src. Not DSX/DSY though, so we look up
src1 and src2 differently for those two.

Fixes: 1dffb089 ("freedreno/ir3: fix sam.s2en encoding")
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-25 18:36:48 -07:00
Kristian H. Kristensen a752422bd4 freedreno/ir3: Track whether shader needs derivatives
In 1088b788 ("freedreno/ir3: find # of samplers from uniform vars") we
started counting number of samplers based on the uniform vars instead
of number of cat5 instructions.  We used the number of samplers to
determine whether to enable derivatives, but when we only use
derivatives and no samplers, that now breaks.  Track whether we need
derivatives explicitly and use that to enable the state.

Fixes: 1088b788 ("freedreno/ir3: find # of samplers from uniform vars")
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-25 18:36:48 -07:00
Samuel Pitoiset 23d30f4099 spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass
This lowering isn't needed for RADV because AMDGCN has two
instructions. It will be disabled for RADV in an upcoming series.

While we are at it, factorize a little bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-22 19:41:46 +01:00
Rob Clark bf5a92811d freedreno/ir3: disable early-z for SSBO/image writes
Fixes:

dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_stencil
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_stencil_fbo

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-22 08:53:28 -04:00
Rob Clark dbac1a80d1 freedreno/ir3: rename has_kill to no_earlyz
There are other cases where we need to disable early-z, like image
writes.  So rename to something more generic.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-22 08:53:28 -04:00
Rob Clark 6e781a01b9 freedreno/ir3: dynamic UBO indexing vs 64b pointers
Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment
and similar things with multiple UBOs

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark 2e01c534f4 freedreno/ir3: fix bit_count
Seems like it can only work 16b at a time.  Fixes
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitcount.*

TODO need to check if this limitation applies to a3xx as well.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark 3d8349048b freedreno/ir3: additional lowering
For some things that show up when we expose higher glsl

TODO check blob traces to see if we have instructions for some of this?
I guess we don't but worth a check..

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark bcd81d2387 freedreno/ir3: optimize sam.s2en to sam
Detect when sampler/texture idx are immediate and switch to non s2en
encoding.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark 1443694ee5 freedreno/ir3: enable indirect tex/samp (sam.s2en)
For now it uses indirect for everything.  The next step is for the
ir3_cp pass to detect the case that tex and samp idx are immediate
and convert the sam instruction back to the non .s2en variant.  But
doing that in a following patch so we can shake out the bugs with
.s2en more easily.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark 1088b788d8 freedreno/ir3: find # of samplers from uniform vars
When we have indirect samplers, we cannot tell the max sampler
referenced.  Instead just refer to the number of sampler uniforms.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark 8eb16ae8bf freedreno/ir3: fix regmask for merged regs
On a6xx+ with half-regs conflicting with full-regs, the legalize pass
needs to set appropriate sync bits, such as (sy), on writes to full regs
that conflict with half regs, and visa-versa.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark 1dffb089f9 freedreno/ir3: fix sam.s2en encoding
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark 45b7a581b4 freedreno/ir3: fix sam.s2en decoding
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark 2d31cf9d3b freedreno/ir3/ra: fix half-class conflicts
On a6xx, half-regs conflict with full-regs.  But we were only setting up
conflicts for the first class (ie. scalar, but not hvec2/hvec3/hvec4),
resulting in higher half-reg classes getting assigned to regs that
overwrite full-regs.

Noticed while trying to enable indirect-sampler (sam.s2en) which uses an
hvec2 argument to pass the sampler/tex index.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark cc5ca9391c freedreno/ir3 better cat6 encoding detection
These two bits seem to be a better way to detect which encoding we are
looking at.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Jason Ekstrand 08f804ec0c anv,radv,turnip: Lower TG4 offsets with nir_lower_tex
v2: turn on for turnip as well (Karol Herbst)

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-03-21 02:58:41 +00:00
Rob Clark 70904eb99a freedreno/ir3/a6xx: fix ssbo comp_swap
One line left out of the conversion to ir3 ssbo intrinsics on a6xx.

Fixes: 2e4525883f ir3/compiler: Enable lower_io_offsets pass and handle new SSBO intrinsics
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-20 11:48:13 -04:00
Bas Nieuwenhuizen 42ed6d9789 turnip: Deconflict vk_format_table regeneration
Avoids

src/freedreno/vulkan/meson.build:42:0: ERROR:  Tried to create target "vk_format_table.c", but a target of that name already exists.

when building both radv and turnip.

Fixes: 26380b3a9f "turnip: Add driver skeleton (v2)"
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-03-16 14:38:51 +00:00
Bas Nieuwenhuizen e1161d2ea7 turnip: Fix GCC compiles.
Apparently GCC does not consider static const variables to be
integer constants, and hence the array size and the static assert
result in compile failures.

Fixes: 4b9f967cd1 "turnip: add a more complete format table"
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-03-16 14:38:51 +00:00
Rob Clark ca11f9263e freedreno/ir3/cp: fix ldib bug
Something that we didn't hit earlier because of the extra shr.b

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-03-15 10:52:11 -07:00
Eduardo Lima Mitev 759ceda07e ir3/lower_io_offsets: Try propagate SSBO's SHR into a previous shift instruction
While we lack value range tracking, this patch tries to 'manually' propogate
the division by 4 to calculate SSBO element-offset, into a possible previous
shift operation (shift left or right); checking that it is safe to do so.

This should help in cases like ie. when accessing a field in an array of
structs, where the offset is likely defined as base plus a multiplication
by a struct or array element size.

See dEQP test 'dEQP-GLES31.functional.ssbo.atomic.xor.highp_uint'
for an example of a shader that benefits from this.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-13 21:19:44 +01:00
Eduardo Lima Mitev 2e4525883f ir3/compiler: Enable lower_io_offsets pass and handle new SSBO intrinsics
These intrinsics have the offset in dwords already computed in the last
source, so the change here is basically using that instead of emitting
the ir3_SHR to divide the byte-offset by 4.

The improvement in shader stats is significant, of up to ~15% in
instruction count in some cases. Tested only on a5xx.

shader-db is unfortunately not very useful here because shaders that use
SSBO require GLSL versions that are not supported by freedreno yet.

For examples, most Khronos CTS tests under 'dEQP-GLES31.functional.ssbo.*'
are helped.

A random case:

dEQP-GLES31.functional.ssbo.layout.2_level_array.packed.row_major_mat3x2

with current master:

; CL prog 14/1: 1252 instructions, 0 half, 48 full
; 8 const, 8 constlen
; 61 (ss), 43 (sy)

with the SSBO dword-offset moved to NIR:

; CL prog 14/1: 1053 instructions, 0 half, 45 full
; 7 const, 7 constlen
; 34 (ss), 73 (sy)

The SHR previously emitted for every single SSBO instruction disappears
in most cases, and the dword-offset ends up embedded in the STGB
instruction as immediate in many cases as well.

There are also a few of those tests that are currently failing on register
allocation, that start to pass as a result of reducing the pressure. At least
these, probably more:

dEQP-GLES31.functional.ssbo.layout.random.unsized_arrays.24
dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.6
dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.17
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays.14
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.5
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.7

No regressions observed with relevant CTS and piglit tests.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-13 21:19:44 +01:00
Eduardo Lima Mitev 9dd0cfafc9 ir3/nir: Add a new pass 'ir3_nir_lower_io_offsets'
This NIR->NIR pass implements offset computations that are currently
done on the IR3 backend compiler, to give NIR a better chance of
optimizing them.

For now, it supports lowering the dword-offset computation for SSBO
instructions. It will take an SSBO intrinsic and replace it with the
new ir3-specific version that adds an extra source. That source will
hold the SSA value resulting from inserting a division by 4 (an SHR op)
of the original byte-offset source already provided by NIR in one of
the intrinsic sources.

Note that on a6xx the original byte-offset is not needed, so we could
potentially replace that source instead of adding a new one. But to
keep things simple and consistent we always add the new source and
a6xx will just ignore the original one.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-13 21:19:44 +01:00
Chia-I Wu 24af64baa5 turnip: preliminary support for Wayland WSI 2019-03-11 10:02:13 -07:00
Chia-I Wu ae82b5df88 turnip: preliminary support for tu_GetImageSubresourceLayout 2019-03-11 10:02:13 -07:00
Chad Versace 6cb5fd0d71 turnip: Use Vulkan 1.1 names instead of KHR
That is, drop KHR from all tokens that were promoted to Vulkan 1.1.
The consistency makes ctags more useful (it now jumps directly to the
real definitions in vulkan_core.h instead of the typedefs); and it makes
the code slightly less verbose.
2019-03-11 10:02:13 -07:00