KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Alyssa Rosenzweig	8462e82467	pan/midgard: Implement load/store pairing We can bundle two load/store together. This eliminates the need for explicit load/store pairing in a prepass, as well. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig	7cf4932410	pan/midgard: Extend csel_swizzle to branches Conditions for branches don't have a swizzle explicitly in the emitted binary, but they do implicitly get swizzled in whatever instruction wrote r31, so we need to handle that. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig	c9ce5a92a0	pan/midgard: Add helpers for scheduling conditionals Conditional instructions (csel and conditional branches) require their condition to be written to a special condition pipeline register (r31.w for scalar, r31.xyzw for vector). However, pipeline registers are live only for the duration of a single bundle. As such, the logic to schedule conditionals correct is surprisingly complex. Essentially, we see if we could stuff the conditional within the same bundle as the csel/branch without breaking anything; if we can, we do that. If we can't, we add a dummy move to make room. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig	6f92288e85	pan/midgard: Implement predicate->unit This allows ALUs to select for each unit of the bundle separately. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig	5a9a48b81a	pan/midgard: Add predicate->exclude A bit of a kludge but allows setting an implicit dependency of synthetic conditional moves on the actual condition, fixing code generated like: vmul.feq r0, .. sadd.imov r31, .., r0 vadd.fcsel [...] The imov runs simultaneous with feq so it gets garbage results, but it's too late to add an actual dependency practically speaking, since the new synthetic imov doesn't have a node associated. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig	6284f3ec25	pan/midgard: Add constant intersection filters In the future, we will want to keep track of which components of constants of various sizes correspond to which parts of the bundle constants, like in the old scheduler. For now, let's just stub it out for a simple rule of one instruction with embedded constants per bundle. We can eventually do better, of course. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig	941bdd2088	pan/midgard: Remove csel constant unit force Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig	da18525b6f	pan/midgard: Add mir_schedule_texture/ldst/alu helpers We don't actually do any scheduling here yet, but add per-tag helpers to consume an instruction, print it, pop it off the worklist. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig	72a03bcafa	pan/midgard: Add mir_choose_bundle helper It's not always obvious what the optimal bundle type should be. Let's break out the logic to decide. Currently set for purely in-order operation. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig	b5396369d2	pan/midgard: Add mir_update_worklist helper After we've chosen an instruction, popped it off, and processed it, it's time to update the worklist, removing that instruction from the dependency graph to allow its dependents to be put onto the worklist. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig	826fd7308b	pan/midgard: Add mir_choose_instruction stub In the future, this routine will implement the core scheduling logic to decide which instruction out of the worklist will be scheduled next, in a way that minimizes cycle count and register pressure. In the present, we are more interested in replicating in-order scheduling with the much-more-powerful out-of-order model. So rather than discriminating by a register pressure estimate, we simply choose the latest possible instruction in the worklist. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig	f48038b588	pan/midgard: Initialize worklist This flows naturally from the dependency graph Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig	a3b46c0db6	pan/midgard: Calculate dependency graph Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig	adda411263	pan/midgard: Add flatten_mir helper We would like to flatten a linked list of midgard_instructions into an array of midgard_instruction pointers on the heap. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig	0ecfcbf462	pan/midgard: Squeeze indices before scheduling This allows node_count to be correct while scheduling. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig	ad05e8a52c	pan/midgard: Fix component count handling for ldst It's not based on the writemask and it can't be inferred; it's just intrinsic to the op itself. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:13 -04:00
Alyssa Rosenzweig	cc0544a0f5	pan/midgard: Add missing parans in SWIZZLE definition Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-09-30 08:40:11 -04:00
Daniel Schürmann	b3c1f601aa	nouveau: set lower_sub = true Subtractions are already implemented as additions anyway. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-09-30 09:44:10 +00:00
Eric Anholt	ca1aa5d225	v3d: Enable the late algebraic optimizations to get real subs. This worked better than my original v3d-local pass for just subs, and is a huge win over not producing subs. total instructions in shared programs: 6408469 -> 6167932 (-3.75%) total threads in shared programs: 153784 -> 154104 (0.21%) total uniforms in shared programs: 2157078 -> 1905823 (-11.65%) total max-temps in shared programs: 904546 -> 895796 (-0.97%) total spills in shared programs: 4959 -> 4993 (0.69%) total fills in shared programs: 6558 -> 6670 (1.71%) total sfu-stalls in shared programs: 25845 -> 25175 (-2.59%) total inst-and-stalls in shared programs: 6434314 -> 6193107 (-3.75%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-09-30 09:44:10 +00:00
Daniel Schürmann	1d29895e5b	aco: call nir_opt_algebraic_late() exhaustively 57559 shaders in 28980 tests Totals: SGPRS: 2963407 -> 2959935 (-0.12 %) VGPRS: 2014812 -> 2016328 (0.08 %) Spilled SGPRs: 1077 -> 1077 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 10348 -> 10348 (0.00 %) dwords per thread Code Size: 114545436 -> 114498084 (-0.04 %) bytes LDS: 933 -> 933 (0.00 %) blocks Max Waves: 375997 -> 375866 (-0.03 %) Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-09-30 09:44:10 +00:00
Daniel Schürmann	0fb27f1e5a	radv/aco: Don't lower subtractions 40228 shaders in 20236 tests Totals: SGPRS: 2045512 -> 2046496 (0.05 %) VGPRS: 1430856 -> 1430464 (-0.03 %) Spilled SGPRs: 1077 -> 1077 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 10348 -> 10348 (0.00 %) dwords per thread Code Size: 77202840 -> 77151832 (-0.07 %) bytes LDS: 863 -> 863 (0.00 %) blocks Max Waves: 260729 -> 260754 (0.01 %) Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-09-30 09:44:10 +00:00
Daniel Schürmann	239423d234	nir: Remove unnecessary subtraction optimizations These optimizations are already covered after lowering. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-09-30 09:44:10 +00:00
Daniel Schürmann	99848a57b7	nir: recombine nir_op_*sub when lower_sub = false There are some optimizations which are only implemented for additions and some optimizations which assume that subtractions have been lowered. By lowering all subtractions first and later recombine for backends which prefer this option, we don't have to implement them twice. This patch also moves lower_negate to nir_opt_algebraic_late() to enable these optimizations for backends which make use of it. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-09-30 09:44:10 +00:00
Daniel Schürmann	10e508c815	freedreno: Enable the nir_opt_algebraic_late() pass. Reviewed-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-09-30 09:44:10 +00:00
Eric Anholt	d54ae70ee7	vc4: Enable the nir_opt_algebraic_late() pass. Upcoming changes to sub optimization will make this pass required. Over the course of that series, we see uniforms +.46%, instructions -.24% (seems like a fine tradeoff -- uniforms are 1/2 the size of instructions as far as cache occupancy) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-09-30 09:44:10 +00:00
Michel Dänzer	f2b8051d69	gitlab-ci: Add test-container:arm64 to needs: for arm64 test jobs Without this, it was theoretically possible for the jobs to run before the docker image was ready. v2: * Use - list syntax instead of [] (Eric Engestrom) Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-09-30 09:17:44 +02:00
Michel Dänzer	42a18280e4	gitlab-ci: Add needs: for x86 buster docker image This allows most build jobs to run before the stretch or arm64 docker images are ready. v2: * Use - list syntax instead of [] (Eric Engestrom) Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-09-30 09:17:38 +02:00
Michel Dänzer	88319f2678	gitlab-ci: Declare needs: for stretch docker image This allows the -old-llvm jobs to run before the buster docker images are ready. v2: Use - list syntax instead of [] (Eric Engestrom) Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-09-30 09:17:00 +02:00
pal1000	ffb0d3a25c	scons: Fix MSYS2 Mingw-w64 build. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> This patch is based on `28e3f85e09/mingw-w64-mesa/link-ole32.patch` but with tweaks to avoid MSVC build break when applied. v2: Create Mingw platform alias pointing to windows host platform define to avoid spurious crosscompilation; v3: Fix obviously wrong compiler flags for swr driver; v4: Update original patch URL because it has been relocated; v5: Don't bother patching autools stuff as it's not used by MSYS2 Mingw-w64 build and it's days are numbered anyway; v6: After Mingw posix flag fix in 295851eb things are far simpler as we don't need more linking of uuid, ole32, version and shell32 than what is already in place.	2019-09-29 10:57:16 +01:00
pal1000	bcb4dfb14b	scons/windows: Support build with LLVM 9. As X86AsmPrinter component is gone, LLVMX86AsmPrinter got replaced with LLVMRemarks, LLVMBitstreamReader and LLVMDebugInfoDWARF. Tests done with llvm-config on both LLVM 8 and 9 indicate that mcjit, bitwriter and x86asmprinter fully fit inside engine component. On other platforms and with meson build mcdisassembler was used to replace X86AsmPrinter but mcdisassembler also fully fits inside engine component for LLVM>=8 according to same tests. v2: Avoid duplicating code related to Mingw pthreads. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org> On 19.1 this patch does not apply cleanly without `88eb2a1f`	2019-09-29 10:51:34 +01:00
Vasily Khoruzhick	336b021d36	lima: set uniforms_address lower bits properly Looks like blob uses following values for uniforms buffer: 0 for 8 bytes 1 for 16 bytes 2 for 24 bytes 2 for 32 bytes 3 for 40 bytes 3 for 48 bytes 3 for 56 bytes 3 for 64 bytes 4 for 72 bytes It all looks like log2(size / 8) rounded up, so let's do the same. Fixes: 931fc2a7b3f9("lima: do not set the PP uniforms address lowest bits") Reviewed-by: Icenowy Zheng <icenowy@aosc.io> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-09-28 10:34:19 -07:00
Michel Zou	3f92d17894	scons: add py3 support SCons 3.1 has moved to python 3, requiring this fix to continue supporting scons builds. Closes: #944 Cc: mesa-stable@lists.freedesktop.org Acked-by: Eric Engestrom <eric@engestrom.ch> Tested-by: Eric Engestrom <eric@engestrom.ch>	2019-09-28 16:53:08 +00:00
Mauro Rossi	411e50a8fd	android: aco: add support for libmesa_aco Android building rules are added in src/amd/Android.compiler.mk libmesa_aco static library is built conditionally to radeonsi as done for vulkan.radv module This will prevent Android build errors for non x86 systems filter-out compiler/aco_instruction_selection_setup.cpp source, as already included by compiler/aco_instruction_selection.cpp and would cause several multiple definition linker errors NOTE: libLLVM requires AMDGPU Disassembler to build radv with aco Fixes: `93c8ebf` ("aco: Initial commit of independent AMD compiler") Fixes: `a70a998` ("radv/aco: Setup alternate path in RADV to support the experimental ACO compiler") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>	2019-09-28 15:56:34 +02:00
Mauro Rossi	268fb10e9c	android: compiler/nir: build nir_divergence_analysis.c Prerequisite to avoid following radv linking error happening with aco FAILED: out/target/product/x86_64/obj_x86/SHARED_LIBRARIES/vulkan.radv_intermediates/LINKED/vulkan.radv.so ... external/mesa/src/amd/compiler/aco_instruction_selection_setup.cpp:178: error: undefined reference to 'nir_divergence_analysis' clang.real: error: linker command failed with exit code 1 (use -v to see invocation) Fixes: `df86c5f` ("nir: add divergence analysis pass.") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>	2019-09-28 15:56:28 +02:00
Mauro Rossi	c24ad565ae	android: aco: fix undefined template 'std::__1::array' build errors Fixes a few building errors similar to the following: In file included from external/mesa/src/amd/compiler/aco_instruction_selection.cpp:26: In file included from external/libcxx/include/algorithm:639: external/libcxx/include/utility:321:9: error: implicit instantiation of undefined template 'std::__1::array<aco::Temp, 4>' _T2 second; ^ Fixes: `93c8ebf` ("aco: Initial commit of independent AMD compiler") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>	2019-09-28 15:56:23 +02:00
Jonathan Marek	b38fcaa221	etnaviv: nir: fix gl_FragDepth Fixes the following piglit test: fragdepth_gles2 (for ETNA_MESA_DEBUG=nir) Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-09-28 00:34:44 -04:00
Jonathan Marek	d4e35e62d2	etnaviv: disable earlyZ when shader writes fragment depth Fixes the following piglit test: fragdepth_gles2 Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-09-28 00:34:43 -04:00
Jonathan Marek	dc3656c9c4	etnaviv: nir: make lower_alu easier to follow Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-09-28 00:34:43 -04:00
Jonathan Marek	c4f63be5a6	etnaviv: remove extra allocation for shader code Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-09-28 00:34:43 -04:00
Jonathan Marek	0b3957331d	etnaviv: nir: remove "options" struct It just makes thing more complicated for no reason. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-09-28 00:34:43 -04:00
Jonathan Marek	8f1b2ea7a9	etnaviv: nir: use store_deref instead of store_output Allows some simplification. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-09-28 00:34:43 -04:00
Jonathan Marek	d92689c46f	etnaviv: nir: add native integers (HALTI2+) Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-09-28 00:34:35 -04:00
Jonathan Marek	d446134d2a	qetnaviv: nir: use new immediates when possible Note it can still be improved a bit: * Use alu swizzle to determine if src is scalar * Take into account new immediates in the multiple uniform src lowering Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-09-28 00:33:42 -04:00
Jonathan Marek	95fa799c86	etnaviv: nir: set num_components for inputs/outputs This can improve performance by allowing the LAST_VARYING_2X bit to be set when possible (and possibility more benefits on HALTI5 where the number of components is set for each varying). Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-09-28 00:33:42 -04:00
Jonathan Marek	0036e078e3	etnaviv: nir: allocate contiguous components for LOAD destination LOAD starts reading into the first enabled destination component, and doesn't skip disabled components, so we need to allocate a destination with contiguous components. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-09-28 00:33:42 -04:00
Jonathan Marek	7da15bdd2d	etnaviv: nir: fix gl_FrontFacing Only invert front facing when glFrontFace is GL_CW. Fixes following deqp test: dEQP-GLES2.functional.shaders.builtin_variable.frontfacing Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-09-28 00:33:33 -04:00
Icenowy Zheng	931fc2a7b3	lima: do not set the PP uniforms address lowest bits The PP uniforms address register in render state is not a direct pointer to the uniforms storage -- instead, it points to an one-item array, and the array item is the real pointer to the uniforms storage. This register reuses some of its LSBs as a size field. Currently the size is set according to the length of the real uniforms storage. However, as the register itself contains only a pointer to the one-item array, the size field should be set to the length of the one-item array and subtract it by 1, which means a fixed value of 0. That means we can just omit it now. Test shows this should be the correct approach to set this register. Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-09-28 08:49:20 +08:00
Andrii Simiklit	b32bb888c7	glsl: disallow incompatible matrices multiplication glsl 4.4 spec section '5.9 expressions': "The operator is multiply (), where both operands are matrices or one operand is a vector and the other a matrix. A right vector operand is treated as a column vector and a left vector operand as a row vector. In all these cases, it is required that the number of columns of the left operand is equal to the number of rows of the right operand. Then, the multiply () operation does a linear algebraic multiply, yielding an object that has the same number of rows as the left operand and the same number of columns as the right operand. Section 5.10 “Vector and Matrix Operations” explains in more detail how vectors and matrices are operated on." This fix disallows a multiplication of incompatible matrices like: mat4x3(..) * mat4x3(..) mat4x2(..) * mat4x2(..) mat3x2(..) * mat3x2(..) .... CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111664 Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>	2019-09-27 21:42:09 +00:00
Eric Anholt	67e8977290	turnip: Fix failure behavior of vkCreateGraphicsPipelines. According to the 1.1.123 spec: "The implementation will attempt to create all pipelines, and only return VK_NULL_HANDLE values for those that actually failed." Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-09-27 13:34:28 -07:00
Eric Anholt	ab3cf128a6	turnip: Silence compiler warning about uninit pipeline. The code was fine as far as I see, but the warning was irritating. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-09-27 13:34:28 -07:00

... 3 4 5 6 7 ...

115979 Commits All Branches Search

115979 Commits

All Branches