Commit Graph

115979 Commits

Author SHA1 Message Date
Alyssa Rosenzweig 8462e82467 pan/midgard: Implement load/store pairing
We can bundle two load/store together. This eliminates the need for
explicit load/store pairing in a prepass, as well.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig 7cf4932410 pan/midgard: Extend csel_swizzle to branches
Conditions for branches don't have a swizzle explicitly in the emitted
binary, but they do implicitly get swizzled in whatever instruction
wrote r31, so we need to handle that.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig c9ce5a92a0 pan/midgard: Add helpers for scheduling conditionals
Conditional instructions (csel and conditional branches) require their
condition to be written to a special condition pipeline register (r31.w
for scalar, r31.xyzw for vector). However, pipeline registers are live
only for the duration of a single bundle. As such, the logic to schedule
conditionals correct is surprisingly complex. Essentially, we see if we
could stuff the conditional within the same bundle as the csel/branch
without breaking anything; if we can, we do that. If we can't, we add a
dummy move to make room.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig 6f92288e85 pan/midgard: Implement predicate->unit
This allows ALUs to select for each unit of the bundle separately.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig 5a9a48b81a pan/midgard: Add predicate->exclude
A bit of a kludge but allows setting an implicit dependency of synthetic
conditional moves on the actual condition, fixing code generated like:

   vmul.feq r0, ..
   sadd.imov r31, .., r0
   vadd.fcsel [...]

The imov runs simultaneous with feq so it gets garbage results, but it's
too late to add an actual dependency practically speaking, since the new
synthetic imov doesn't have a node associated.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig 6284f3ec25 pan/midgard: Add constant intersection filters
In the future, we will want to keep track of which components of
constants of various sizes correspond to which parts of the bundle
constants, like in the old scheduler. For now, let's just stub it out
for a simple rule of one instruction with embedded constants per bundle.
We can eventually do better, of course.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig 941bdd2088 pan/midgard: Remove csel constant unit force
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig da18525b6f pan/midgard: Add mir_schedule_texture/ldst/alu helpers
We don't actually do any scheduling here yet, but add per-tag helpers to
consume an instruction, print it, pop it off the worklist.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig 72a03bcafa pan/midgard: Add mir_choose_bundle helper
It's not always obvious what the optimal bundle type should be. Let's
break out the logic to decide.

Currently set for purely in-order operation.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig b5396369d2 pan/midgard: Add mir_update_worklist helper
After we've chosen an instruction, popped it off, and processed it, it's
time to update the worklist, removing that instruction from the
dependency graph to allow its dependents to be put onto the worklist.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig 826fd7308b pan/midgard: Add mir_choose_instruction stub
In the future, this routine will implement the core scheduling logic to
decide which instruction out of the worklist will be scheduled next, in
a way that minimizes cycle count and register pressure.

In the present, we are more interested in replicating in-order
scheduling with the much-more-powerful out-of-order model. So rather
than discriminating by a register pressure estimate, we simply choose
the latest possible instruction in the worklist.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig f48038b588 pan/midgard: Initialize worklist
This flows naturally from the dependency graph

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig a3b46c0db6 pan/midgard: Calculate dependency graph
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig adda411263 pan/midgard: Add flatten_mir helper
We would like to flatten a linked list of midgard_instructions into an
array of midgard_instruction pointers on the heap.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig 0ecfcbf462 pan/midgard: Squeeze indices before scheduling
This allows node_count to be correct while scheduling.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig ad05e8a52c pan/midgard: Fix component count handling for ldst
It's not based on the writemask and it can't be inferred; it's just
intrinsic to the op itself.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig cc0544a0f5 pan/midgard: Add missing parans in SWIZZLE definition
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-09-30 08:40:11 -04:00
Daniel Schürmann b3c1f601aa nouveau: set lower_sub = true
Subtractions are already implemented as additions anyway.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Eric Anholt ca1aa5d225 v3d: Enable the late algebraic optimizations to get real subs.
This worked better than my original v3d-local pass for just subs, and is a
huge win over not producing subs.

total instructions in shared programs: 6408469 -> 6167932 (-3.75%)
total threads in shared programs: 153784 -> 154104 (0.21%)
total uniforms in shared programs: 2157078 -> 1905823 (-11.65%)
total max-temps in shared programs: 904546 -> 895796 (-0.97%)
total spills in shared programs: 4959 -> 4993 (0.69%)
total fills in shared programs: 6558 -> 6670 (1.71%)
total sfu-stalls in shared programs: 25845 -> 25175 (-2.59%)
total inst-and-stalls in shared programs: 6434314 -> 6193107 (-3.75%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Daniel Schürmann 1d29895e5b aco: call nir_opt_algebraic_late() exhaustively
57559 shaders in 28980 tests
Totals:
SGPRS: 2963407 -> 2959935 (-0.12 %)
VGPRS: 2014812 -> 2016328 (0.08 %)
Spilled SGPRs: 1077 -> 1077 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 10348 -> 10348 (0.00 %) dwords per thread
Code Size: 114545436 -> 114498084 (-0.04 %) bytes
LDS: 933 -> 933 (0.00 %) blocks
Max Waves: 375997 -> 375866 (-0.03 %)

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Daniel Schürmann 0fb27f1e5a radv/aco: Don't lower subtractions
40228 shaders in 20236 tests
Totals:
SGPRS: 2045512 -> 2046496 (0.05 %)
VGPRS: 1430856 -> 1430464 (-0.03 %)
Spilled SGPRs: 1077 -> 1077 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 10348 -> 10348 (0.00 %) dwords per thread
Code Size: 77202840 -> 77151832 (-0.07 %) bytes
LDS: 863 -> 863 (0.00 %) blocks
Max Waves: 260729 -> 260754 (0.01 %)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Daniel Schürmann 239423d234 nir: Remove unnecessary subtraction optimizations
These optimizations are already covered after lowering.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Daniel Schürmann 99848a57b7 nir: recombine nir_op_*sub when lower_sub = false
There are some optimizations which are only implemented for additions
and some optimizations which assume that subtractions have been lowered.
By lowering all subtractions first and later recombine for backends
which prefer this option, we don't have to implement them twice.

This patch also moves lower_negate to nir_opt_algebraic_late() to enable
these optimizations for backends which make use of it.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Daniel Schürmann 10e508c815 freedreno: Enable the nir_opt_algebraic_late() pass.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Eric Anholt d54ae70ee7 vc4: Enable the nir_opt_algebraic_late() pass.
Upcoming changes to sub optimization will make this pass required.  Over
the course of that series, we see uniforms +.46%, instructions -.24%
(seems like a fine tradeoff -- uniforms are 1/2 the size of instructions
as far as cache occupancy)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-30 09:44:10 +00:00
Michel Dänzer f2b8051d69 gitlab-ci: Add test-container:arm64 to needs: for arm64 test jobs
Without this, it was theoretically possible for the jobs to run before
the docker image was ready.

v2:
* Use - list syntax instead of [] (Eric Engestrom)

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-30 09:17:44 +02:00
Michel Dänzer 42a18280e4 gitlab-ci: Add needs: for x86 buster docker image
This allows most build jobs to run before the stretch or arm64 docker
images are ready.

v2:
* Use - list syntax instead of [] (Eric Engestrom)

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-30 09:17:38 +02:00
Michel Dänzer 88319f2678 gitlab-ci: Declare needs: for stretch docker image
This allows the *-old-llvm jobs to run before the buster docker images
are ready.

v2:
* Use - list syntax instead of [] (Eric Engestrom)

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-30 09:17:00 +02:00
pal1000 ffb0d3a25c scons: Fix MSYS2 Mingw-w64 build.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

This patch is based on 28e3f85e09/mingw-w64-mesa/link-ole32.patch but with tweaks to avoid MSVC build break when applied.

v2: Create Mingw platform alias pointing to windows host platform define to avoid spurious crosscompilation;

v3: Fix obviously wrong compiler flags for swr driver;

v4: Update original patch URL because it has been relocated;

v5: Don't bother patching autools stuff as it's not used by MSYS2 Mingw-w64 build and it's days are numbered anyway;

v6: After Mingw posix flag fix in 295851eb things are far simpler as we don't need more linking of uuid, ole32, version and shell32 than what is already in place.
2019-09-29 10:57:16 +01:00
pal1000 bcb4dfb14b scons/windows: Support build with LLVM 9.
As X86AsmPrinter component is gone, LLVMX86AsmPrinter got replaced
with LLVMRemarks, LLVMBitstreamReader and LLVMDebugInfoDWARF.

Tests done with llvm-config on both LLVM 8 and 9 indicate that
mcjit, bitwriter and x86asmprinter fully fit inside engine component.

On other platforms and with meson build mcdisassembler was used to replace
X86AsmPrinter but mcdisassembler also fully fits inside engine component
for LLVM>=8 according to same tests.

v2: Avoid duplicating code related to Mingw pthreads.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>

On 19.1 this patch does not apply cleanly without 88eb2a1f
2019-09-29 10:51:34 +01:00
Vasily Khoruzhick 336b021d36 lima: set uniforms_address lower bits properly
Looks like blob uses following values for uniforms buffer:

0 for 8 bytes
1 for 16 bytes
2 for 24 bytes
2 for 32 bytes
3 for 40 bytes
3 for 48 bytes
3 for 56 bytes
3 for 64 bytes
4 for 72 bytes

It all looks like log2(size / 8) rounded up, so let's do the same.

Fixes: 931fc2a7b3f9("lima: do not set the PP uniforms address lowest bits")
Reviewed-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-28 10:34:19 -07:00
Michel Zou 3f92d17894 scons: add py3 support
SCons 3.1 has moved to python 3, requiring this fix
to continue supporting scons builds.

Closes: #944
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Eric Engestrom <eric@engestrom.ch>
2019-09-28 16:53:08 +00:00
Mauro Rossi 411e50a8fd android: aco: add support for libmesa_aco
Android building rules are added in src/amd/Android.compiler.mk
libmesa_aco static library is built conditionally to radeonsi
as done for vulkan.radv module

This will prevent Android build errors for non x86 systems

filter-out compiler/aco_instruction_selection_setup.cpp source,
as already included by compiler/aco_instruction_selection.cpp
and would cause several multiple definition linker errors

NOTE: libLLVM requires AMDGPU Disassembler to build radv with aco

Fixes: 93c8ebf ("aco: Initial commit of independent AMD compiler")
Fixes: a70a998 ("radv/aco: Setup alternate path in RADV to support the experimental ACO compiler")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
2019-09-28 15:56:34 +02:00
Mauro Rossi 268fb10e9c android: compiler/nir: build nir_divergence_analysis.c
Prerequisite to avoid following radv linking error happening with aco

FAILED: out/target/product/x86_64/obj_x86/SHARED_LIBRARIES/vulkan.radv_intermediates/LINKED/vulkan.radv.so
...
external/mesa/src/amd/compiler/aco_instruction_selection_setup.cpp:178:
error: undefined reference to 'nir_divergence_analysis'
clang.real: error: linker command failed with exit code 1 (use -v to see invocation)

Fixes: df86c5f ("nir: add divergence analysis pass.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
2019-09-28 15:56:28 +02:00
Mauro Rossi c24ad565ae android: aco: fix undefined template 'std::__1::array' build errors
Fixes a few building errors similar to the following:

In file included from external/mesa/src/amd/compiler/aco_instruction_selection.cpp:26:
In file included from external/libcxx/include/algorithm:639:
external/libcxx/include/utility:321:9:
error: implicit instantiation of undefined template 'std::__1::array<aco::Temp, 4>'
    _T2 second;
        ^

Fixes: 93c8ebf ("aco: Initial commit of independent AMD compiler")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
2019-09-28 15:56:23 +02:00
Jonathan Marek b38fcaa221 etnaviv: nir: fix gl_FragDepth
Fixes the following piglit test: fragdepth_gles2 (for ETNA_MESA_DEBUG=nir)

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-09-28 00:34:44 -04:00
Jonathan Marek d4e35e62d2 etnaviv: disable earlyZ when shader writes fragment depth
Fixes the following piglit test: fragdepth_gles2

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-09-28 00:34:43 -04:00
Jonathan Marek dc3656c9c4 etnaviv: nir: make lower_alu easier to follow
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-09-28 00:34:43 -04:00
Jonathan Marek c4f63be5a6 etnaviv: remove extra allocation for shader code
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-09-28 00:34:43 -04:00
Jonathan Marek 0b3957331d etnaviv: nir: remove "options" struct
It just makes thing more complicated for no reason.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-09-28 00:34:43 -04:00
Jonathan Marek 8f1b2ea7a9 etnaviv: nir: use store_deref instead of store_output
Allows some simplification.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-09-28 00:34:43 -04:00
Jonathan Marek d92689c46f etnaviv: nir: add native integers (HALTI2+)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-09-28 00:34:35 -04:00
Jonathan Marek d446134d2a qetnaviv: nir: use new immediates when possible
Note it can still be improved a bit:
* Use alu swizzle to determine if src is scalar
* Take into account new immediates in the multiple uniform src lowering

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-09-28 00:33:42 -04:00
Jonathan Marek 95fa799c86 etnaviv: nir: set num_components for inputs/outputs
This can improve performance by allowing the LAST_VARYING_2X bit to be
set when possible (and possibility more benefits on HALTI5 where the
number of components is set for each varying).

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-09-28 00:33:42 -04:00
Jonathan Marek 0036e078e3 etnaviv: nir: allocate contiguous components for LOAD destination
LOAD starts reading into the first enabled destination component, and
doesn't skip disabled components, so we need to allocate a destination with
contiguous components.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-09-28 00:33:42 -04:00
Jonathan Marek 7da15bdd2d etnaviv: nir: fix gl_FrontFacing
Only invert front facing when glFrontFace is GL_CW.

Fixes following deqp test:
dEQP-GLES2.functional.shaders.builtin_variable.frontfacing

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-09-28 00:33:33 -04:00
Icenowy Zheng 931fc2a7b3 lima: do not set the PP uniforms address lowest bits
The PP uniforms address register in render state is not a direct pointer
to the uniforms storage -- instead, it points to an one-item array, and
the array item is the real pointer to the uniforms storage.

This register reuses some of its LSBs as a size field. Currently the
size is set according to the length of the real uniforms storage.
However, as the register itself contains only a pointer to the one-item
array, the size field should be set to the length of the one-item array
and subtract it by 1, which means a fixed value of 0. That means we can
just omit it now.

Test shows this should be the correct approach to set this register.

Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-28 08:49:20 +08:00
Andrii Simiklit b32bb888c7 glsl: disallow incompatible matrices multiplication
glsl 4.4 spec section '5.9 expressions':
"The operator is multiply (*), where both operands are matrices or one operand is a vector and the
 other a matrix. A right vector operand is treated as a column vector and a left vector operand as a
 row vector. In all these cases, it is required that the number of columns of the left operand is equal
 to the number of rows of the right operand. Then, the multiply (*) operation does a linear
 algebraic multiply, yielding an object that has the same number of rows as the left operand and the
 same number of columns as the right operand. Section 5.10 “Vector and Matrix Operations”
 explains in more detail how vectors and matrices are operated on."

This fix disallows a multiplication of incompatible matrices like:
mat4x3(..) * mat4x3(..)
mat4x2(..) * mat4x2(..)
mat3x2(..) * mat3x2(..)
....

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111664
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
2019-09-27 21:42:09 +00:00
Eric Anholt 67e8977290 turnip: Fix failure behavior of vkCreateGraphicsPipelines.
According to the 1.1.123 spec:

    "The implementation will attempt to create all pipelines, and only
     return VK_NULL_HANDLE values for those that actually failed."

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-09-27 13:34:28 -07:00
Eric Anholt ab3cf128a6 turnip: Silence compiler warning about uninit pipeline.
The code was fine as far as I see, but the warning was irritating.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-09-27 13:34:28 -07:00