Commit Graph

106558 Commits

Author SHA1 Message Date
Rob Clark f1c88336e6 freedreno: skip depth resolve if not written
For multi-pass rendering, it is common to keep the same depth buffer
from previous pass, to discard geometry that would be hidden by later
draws.  In the later passes with depth-test enabled, but depth-write
disabled, there is no reason to do gmem2mem resolve.

TODO probably do something similar for stencil.. although stencil
buffer isn't used as commonly these days

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-01-03 08:09:24 -05:00
Timothy Arceri 4d3f6cb973 nir: merge some basic consecutive ifs
After trying multiple times to merge if-statements with phis
between them I've come to the conclusion that it cannot be done
without regressions. The problem is for some shaders we end up
with a whole bunch of phis for the merged ifs resulting in
increased register pressure.

So this patch just merges ifs that have no phis between them.
This seems to be consistent with what LLVM does so for radeonsi
we only see a change (although its a large change) in a single
shader.

Shader-db results i965 (SKL):

total instructions in shared programs: 13098176 -> 13098152 (<.01%)
instructions in affected programs: 1326 -> 1302 (-1.81%)
helped: 4
HURT: 0

total cycles in shared programs: 332032989 -> 332037583 (<.01%)
cycles in affected programs: 60665 -> 65259 (7.57%)
helped: 0
HURT: 4

The cycles estimates reported by shader-db for i965 seem inaccurate
as the only difference in the final code is the removal of the
redundent condition evaluations and jumps.

Also the biggest code reduction (~7%) for radeonsi was in a tomb
raider tressfx shader but for some reason this does not get merged
for i965.

Shader-db results radeonsi (VEGA):

Totals from affected shaders:
SGPRS: 232 -> 232 (0.00 %)
VGPRS: 164 -> 164 (0.00 %)
Spilled SGPRs: 59 -> 59 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 14584 -> 13520 (-7.30 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 13 -> 13 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-01-03 15:17:16 +11:00
Timothy Arceri 19cafe8084 nir: add rewrite_phi_predecessor_blocks() helper
This will also be used by the if merge pass in the following commit.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-01-03 15:17:16 +11:00
Timothy Arceri 5122fbc4ba nir: simplify does_varying_match()
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-01-03 11:47:56 +11:00
Timothy Arceri 8d05ee2005 nir: make use of does_varying_match() helper
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-01-03 11:47:56 +11:00
Timothy Arceri 0016166d19 nir: make nir_opt_remove_phis_impl() static
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-01-03 11:47:56 +11:00
Eric Anholt d2b899c0ec v3d: Refactor compiler entrypoints.
Before, I had per-stage entryoints with some helpers shared between them.
As I extended for compute shaders and shader-db, it turned out that the
other common code in the middle wanted to be shared too.
2019-01-02 14:12:29 -08:00
Eric Anholt 0805060573 v3d: Handle dynamically uniform IF statements with uniform control flow.
Loops will be trickier, since we need some analysis to figure out if the
breaks/continues inside are uniform.  Until we get that in NIR, this gets
us some quick wins.

total instructions in shared programs: 6192844 -> 6174162 (-0.30%)
instructions in affected programs: 487781 -> 469099 (-3.83%)
2019-01-02 14:12:29 -08:00
Eric Anholt 5e9ee6e841 v3d: Fold comparisons for IF conditions into the flags for the IF.
total instructions in shared programs: 6193810 -> 6192844 (-0.02%)
instructions in affected programs: 800373 -> 799407 (-0.12%)
2019-01-02 14:12:29 -08:00
Eric Anholt 078dc176bc v3d: Don't try to fold non-SSA-src comparisons into bcsels.
There could have been a write of a src in between the comparison and the
bcsel that would invalidate the comparison.
2019-01-02 14:12:29 -08:00
Eric Anholt 2e0433b687 v3d: Move the "Find the ALU instruction generating our bool" out of bcsel.
This will be reused for if statements.
2019-01-02 14:12:29 -08:00
Eric Anholt c3ae0aa264 v3d: Simplify the emission of comparisons for the bcsel optimization.
I wanted to reuse the comparison stuff for nir_ifs, but for that I just
want the flags and no destination value.  Splitting the conditions from
the destinations ended up cleaning the existing code up, anyway.
2019-01-02 14:12:29 -08:00
Eric Anholt 49d8e2aff1 v3d: Don't forget to include RT writes in precompiles.
Looking at some assembly dumps for an optimization, we were clearly
missing important parts of the shader!
2019-01-02 14:12:29 -08:00
Eric Anholt 3a81c753a3 v3d: Fix segfault when failing to compile a program.
We'll still fail at draw time, but this avoids a regression in shader-db
execution once I enable TLB writes in precompiles.

Fixes: b38e4d313f ("v3d: Create a state uploader for packing our shaders together.")
2019-01-02 14:12:29 -08:00
Marek Olšák 3ae57957be radeonsi: always unmap texture CPU mappings on 32-bit CPU architectures
Team Fortress 2 32-bit version runs out of the CPU address space.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-01-02 15:01:59 -05:00
Marek Olšák edfca1f8dc radeonsi: remove unused variables in si_insert_input_ptr
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-01-02 15:01:58 -05:00
Marek Olšák cba475b3e7 radeonsi: use u_decomposed_prims_for_vertices instead of u_prims_for_vertices
It seems to be the same, but this doesn't use integer division with
a variable divisor.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-01-02 15:01:56 -05:00
Marek Olšák 54bc87469a radeonsi: make si_cp_wait_mem more configurable
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-01-02 15:01:54 -05:00
Marek Olšák 9d2c3a1fe0 radeonsi: call si_fix_resource_usage for the GS copy shader as well
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-01-02 15:01:53 -05:00
Marek Olšák d28e208213 radeonsi: don't emit redundant PKT3_NUM_INSTANCES packets
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-01-02 15:01:50 -05:00
Caio Marcelo de Oliveira Filho 7d6babf995 nir: add a way to print the deref chain
Makes debugging easier when we care about the deref chain and not the
deref instruction itself.  To make it take a const pointer, constify
some of the static functions in nir_print.c.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 10:09:04 -08:00
Dylan Baker a2596450ac meson: Error out if building nouveau and using LLVM without rtti
Nouveau requires rtti. Often LLVM is configured without rtti, and code
with and without cannot be linked safely. Lets just error out if nouveau
is requested and llvm is built without rtti.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109202
Fixes: c5a97d658e
       ("meson: fix builds against LLVM built without rtti")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-02 09:30:12 -08:00
Alexander von Gluck IV 1b97a72328 egl/haiku: Fix reference to disp vs dpy
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes: 00992700c9 "egl: set the EGLDevice when creating a display"
2019-01-02 13:45:09 +00:00
Iago Toral Quiroga ec79069856 compiler/spirv: use 32-bit polynomial approximation for 16-bit asin()
The 16-bit polynomial execution doesn't meet Khronos precision requirements.
Also, the half-float denorm range starts at 2^(-14) and with asin taking input
values in the range [0, 1], polynomial approximations can lead to flushing
relatively easy.

An alternative is to use the atan2 formula to compute asin, which is the
reference taken by Khronos to determine precision requirements, but that
ends up generating too many additional instructions when compared to the
polynomial approximation. Specifically, for the Intel case, doing this
adds +41 instructions to the program for each asin/acos call, which looks
like an undesirable trade off.

So for now we take the easy way out and fallback to using the 32-bit
polynomial approximation, which is better (faster) than the 16-bit atan2
implementation and gives us better precision that matches Khronos
requirements.

v2:
 - Fallback to 32-bit using recursion (Jason).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:39 +01:00
Iago Toral Quiroga fda3f6d424 compiler/spirv: implement 16-bit frexp
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:35 +01:00
Iago Toral Quiroga 7d3c34197a compiler/spirv: implement 16-bit hyperbolic trigonometric functions
v2:
 - use nir_fadd_imm and nir_fmul_imm helpers (Jason)

v3:
 - since we need to define one for fsub use it for fdiv too (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Iago Toral Quiroga 88663ba67c compiler/spirv: implement 16-bit exp and log
v2
 - use nir_fmul_imm helper (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Iago Toral Quiroga f18554e2ce compiler/spirv: implement 16-bit atan2
v2:
 - fix huge_val for 16-bit, it was mean't to be 2^14 not 10^14.

v3:
 - rebase on top of new bool sized opcodes
 - use nir_b2f helper
 - use nir_fmul_imm helper

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Iago Toral Quiroga 1c8de08ec9 compiler/spirv: implement 16-bit atan
v2:
 - use nir_fadd_imm and nir_fmul_imm helpers (Jason)
 - rebased on top of new sized boolean opcodes
 - use nir_b2f helper

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Iago Toral Quiroga df118535ca compiler/spirv: implement 16-bit acos
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Iago Toral Quiroga dbbbe24d76 compiler/spirv: implement 16-bit asin
v2:
  - use nir_fmul_imm and nir_fadd_imm helpers (Jason)

v3:
 - missed one case where we need to replace nir_imm_float
   with nir_imm_floatN_t (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Iago Toral Quiroga 95b7c29c2c compiler/spirv: handle 16-bit float in radians() and degrees()
v2:
 - use nir_imm_fmul helper (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Iago Toral Quiroga aeee683780 compiler/nir: add nir_fadd_imm() and nir_fmul_imm() helpers
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Iago Toral Quiroga 5fc9ad1cb0 compiler/nir: add a nir_b2f() helper
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-02 07:54:05 +01:00
Timothy Arceri 70be9afccb nir: link time opt duplicate varyings
If we are outputting the same value to more than one output
component rewrite the inputs to read from a single component.

This will allow the duplicate varying components to be optimised
away by the existing opts.

shader-db results i965 (SKL):

total instructions in shared programs: 12869230 -> 12860886 (-0.06%)
instructions in affected programs: 322601 -> 314257 (-2.59%)
helped: 3080
HURT: 8

total cycles in shared programs: 317792574 -> 317730593 (-0.02%)
cycles in affected programs: 2584925 -> 2522944 (-2.40%)
helped: 2975
HURT: 477

shader-db results radeonsi (VEGA):

SGPRS: 31576 -> 31664 (0.28 %)
VGPRS: 17484 -> 17064 (-2.40 %)
Spilled SGPRs: 184 -> 167 (-9.24 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 583340 -> 569368 (-2.40 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 6162 -> 6270 (1.75 %)
Wait states: 0 -> 0 (0.00 %)

vkpipeline-db results RADV (VEGA):

Totals from affected shaders:
SGPRS: 14880 -> 15080 (1.34 %)
VGPRS: 10872 -> 10888 (0.15 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 674016 -> 668396 (-0.83 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 2708 -> 2704 (-0.15 %)
Wait states: 0 -> 0 (0.00 %

V2: bunch of tidy ups suggested by Jason

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 12:19:17 +11:00
Timothy Arceri d828694b80 nir: rework nir_link_opt_varyings()
This just cleans things up a little and make things more safe for
derefs.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 12:19:17 +11:00
Timothy Arceri c0aba8b0dc nir: add can_replace_varying() helper
This will be reused by the following patch.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 12:19:17 +11:00
Timothy Arceri 50de3f80a8 nir: rename nir_link_constant_varyings() nir_link_opt_varyings()
The following patches will add support for an additional
optimisation so this function will no longer just optimise varying
constants.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 12:19:17 +11:00
Timothy Arceri 0a4378ce56 st/glsl_to_nir: call nir_lower_load_const_to_scalar() in the st
This will help the new opt introduced in the following patches
allowing us to remove extra duplicate varyings.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 12:19:17 +11:00
Timothy Arceri 2ef0f944f5 radeonsi: make use of ac_are_tessfactors_def_in_all_invocs()
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-01-02 10:01:31 +11:00
Timothy Arceri 2832bc972b ac/nir_to_llvm: add ac_are_tessfactors_def_in_all_invocs()
The following patch will use this with the radeonsi NIR backend
but I've added it to ac so we can use it with RADV in future.

This is a NIR implementation of the tgsi function
tgsi_scan_tess_ctrl().

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-01-02 10:01:24 +11:00
Timothy Arceri 2817a4ec0b radeonsi: remove unrequired param in si_nir_scan_tess_ctrl()
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-01-02 10:01:15 +11:00
Timothy Arceri 4dda445750 tgsi/scan: correctly walk instructions in tgsi_scan_tess_ctrl()
The previous code used a do while loop and continues after walking
a nested loop/if-statement. This means we end up evaluating the
last instruction from the nested block against the while condition
and potentially exit early if it matches the exit condition of the
outer block.

Fixes: 386d165d8d ("tgsi/scan: add a new pass that analyzes tess factor writes")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-01-02 09:53:01 +11:00
Timothy Arceri dd061eb044 tgsi/scan: fix loop exit point in tgsi_scan_tess_ctrl()
This just happened not to crash/assert because all loops have at
least 1 if-statement and due to a second bug we end up matching
the same ENDIF to exit both the iteration over the if-statment
and the loop.

The second bug is fixed in the following patch.

Fixes: 386d165d8d ("tgsi/scan: add a new pass that analyzes tess factor writes")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-01-02 09:53:01 +11:00
Ilia Mirkin 8f98ff362c nv30: disable rendering to 3D textures
There's no way to tell the 3D engine about swizzling on such textures.
While rendering to NPOT ones may be possible, there's no great way to
expose that in gallium, nor would there be any practical benefit.

Fixes the non-compressed-format "copyteximage 3D" failures. Something
odd going on with the compressed formats.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-01-01 15:11:14 -05:00
Bas Nieuwenhuizen 8c93ef5de9 radv: Do a cache flush if needed before reading predicates.
This caused random failures for two conditional rendering tests:

dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_discard
dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_no_discard

These wrote the predicate with the vertex shader, did a barrier and then
started the conditional rendering. However the cache flushes for the barrier
only happen on first draw, so after the predicate has been read.

Fixes: e45ba51ea4 "radv: add support for VK_EXT_conditional_rendering"
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-12-31 20:52:08 +01:00
Erik Faye-Lund 86089a7316 anv/autotools: make sure tests link with -msse2
Without this, I get the following error when building the tests with
autotools on i686:

---8<---
src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’:
src/intel/common/gen_clflush.h:37:7: warning: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Wimplicit-function-declaration]
       __builtin_ia32_clflush(p);
       ^~~~~~~~~~~~~~~~~~~~~~
       __builtin_ia32_pause
src/intel/common/gen_clflush.h: In function ‘gen_flush_range’:
src/intel/common/gen_clflush.h:45:4: warning: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Wimplicit-function-declaration]
    __builtin_ia32_mfence();
    ^~~~~~~~~~~~~~~~~~~~~
    __builtin_ia32_fnclex
---8<---

The erros are generated for each of these files:
- mesa/src/intel/vulkan/tests/state_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool.c
- mesa/src/intel/vulkan/tests/block_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool_free_list_only.c

This is obviously because gen_clflush.h contains code that uses
intrinsics that are only available with SSE3. Since the driver already
uses SSE3, it seems reasonable to add this to the tests as well.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Eric Engeström <eric@engestrom.ch>
2018-12-31 17:28:21 +01:00
Erik Faye-Lund 89679e18a9 anv/meson: make sure tests link with -msse2
Without this, I get the following error when building the tests using
meson on i686:

---8<---
In file included from ../../../mesa/src/intel/vulkan/anv_private.h:46,
                 from ../../../mesa/src/intel/vulkan/tests/state_pool_no_free.c:26:
../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’:
../../../mesa/src/intel/common/gen_clflush.h:37:7: error: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Werror=implicit-function-declaration]
       __builtin_ia32_clflush(p);
       ^~~~~~~~~~~~~~~~~~~~~~
       __builtin_ia32_pause
../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_flush_range’:
../../../mesa/src/intel/common/gen_clflush.h:45:4: error: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Werror=implicit-function-declaration]
    __builtin_ia32_mfence();
    ^~~~~~~~~~~~~~~~~~~~~
    __builtin_ia32_fnclex
---8<---

The errors are generated for each of these files:
- mesa/src/intel/vulkan/tests/state_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool.c
- mesa/src/intel/vulkan/tests/block_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool_free_list_only.c

This is obviously because gen_clflush.h contains code that uses
intrinsics that are only available with SSE3. Since the driver already
uses SSE3, it seems reasonable to add this to the tests as well.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engeström <eric@engestrom.ch>
2018-12-31 17:27:33 +01:00
Ilia Mirkin 207fb558e4 nv30: fix some s3tc layout issues
s3tc layouts are a bit finicky - they're packed, but not swizzled.
Adjust logic to allow for that case:

  - Don't set a uniform pitch for POT-sized compressed textures
  - Adjust define_rect API to be less confused about block sizes
  - Only mark a texture as linear if it has a uniform pitch set

This has been tested to fix xonotic (as well as the s3tc-* piglits)
on nv3x and keeps it working on nv4x.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-30 23:32:21 -05:00
Ilia Mirkin ad251330e8 nv30: use correct helper to get blocks in y direction
This doesn't matter since all compressed formats supported by this
hardware use square blocks, but best to use the correct helper.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-12-30 23:32:21 -05:00