KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Dave Airlie	ccbf700d6c	nir: remove gl.h include from nir headers. This saves a lot of pointless gl.h includes across the board, it moves the one place that needs GLenum into a separate file only used in those passes that require it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>	2022-01-19 21:54:58 +00:00
Iago Toral Quiroga	a65c605365	broadcom/compiler: track passthrough Z writes In some cases we need to make the shaders write the Z value produced from rasterization (FEP). Track these instances because they are relevant to early EZ setup. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14037>	2021-12-03 10:39:08 +00:00
Iago Toral Quiroga	6d4a645c90	broadcom/compiler: emit passthrough Z write if shader reads Z Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14037>	2021-12-03 10:39:08 +00:00
Iago Toral Quiroga	6923dd687c	broadcom/compiler: allow color TLB writes in last instruction Only Z writes are disallowed. total instructions in shared programs: 11578449 -> 11577369 (<.01%) instructions in affected programs: 38132 -> 37052 (-2.83%) helped: 1080 HURT: 0 Instructions are helped. total max-temps in shared programs: 2334416 -> 2334395 (<.01%) max-temps in affected programs: 218 -> 197 (-9.63%) helped: 21 HURT: 0 Max-temps are helped. total inst-and-stalls in shared programs: 11607890 -> 11606810 (<.01%) inst-and-stalls in affected programs: 38265 -> 37185 (-2.82%) helped: 1080 HURT: 0 Inst-and-stalls are helped. total nops in shared programs: 338316 -> 337236 (-0.32%) nops in affected programs: 2625 -> 1545 (-41.14%) helped: 1080 HURT: 0 Nops are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13964>	2021-11-29 06:44:07 +00:00
Iago Toral Quiroga	0cb58f80d2	v3d: use V3D_MAX_DRAW_BUFFERS instead of hardcoded constant Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13775>	2021-11-12 11:04:07 +00:00
Iago Toral Quiroga	3a95e25e84	v3dv,v3d: don't store swizzle pointer in shader/pipeline keys We had been storing pointers to a driver owned swizzle table rather than storing the actual swizzle value in various shader and pipeline keys on both GL and Vulkan drivers. This doesn't look very robust, particularly since we also compute sha1 hashes from these values and we may store these hashes to disk (for the disk cache). Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13738>	2021-11-10 11:24:26 +00:00
Alejandro Piñeiro	d50be41f8f	broadcom/compiler: remove unused macro and function definition Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13444>	2021-10-20 10:08:27 +00:00
Alejandro Piñeiro	193898c8b0	broadcom/compiler: remove commented out vir_LOAD_IMM methods It has been commented several years now. Let's remove it to reduce the noise. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13008>	2021-09-24 08:46:06 +00:00
Juan A. Suarez Romero	53c8b4c093	broadcom: make vir_emit_last_thrsw() private This function is only used in v3d_nir_to_vir(), so make it private. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12322>	2021-09-10 09:18:05 +00:00
Iago Toral Quiroga	b727eaac3c	broadcom/compiler: add a vir_get_cond helper Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12278>	2021-08-10 08:47:40 +00:00
Iago Toral Quiroga	d5acae3206	broadcom/compiler: implement nir_intrinsic_load_view_index This is used for multiview's gl_ViewIndex built-in. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12034>	2021-07-27 07:31:31 +00:00
Iago Toral Quiroga	940725a7d9	broadcom/compiler: implement gl_PrimitiveID in FS without a GS OpenGL ES 3.1 specifies that a geometry shader can write to gl_PrimitiveID, which can then be read by a fragment shader. OpenGL ES 3.2 additionally adds the capacity for the fragment shader to read gl_PrimitiveID even if there is no geometry shader. This commit adds support for this feature, which is also implicitly expected by the geometry shader feature in Vulkan 1.0. Fixes: dEQP-VK.pipeline.framebuffer_attachment.no_attachments dEQP-VK.pipeline.framebuffer_attachment.no_attachments_ms Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11874>	2021-07-14 12:05:56 +00:00
Iago Toral Quiroga	353f0a180f	broadcom/compiler: create a helper for computing VPM config This code is the same across drivers. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11783>	2021-07-12 08:35:55 +02:00
Iago Toral Quiroga	2733a17b14	broadcom/compiler: track if geometry shaders write gl_PointSize Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11783>	2021-07-12 08:35:55 +02:00
Iago Toral Quiroga	10313b03b5	broadcom/compiler: track if a compute shader uses subgroup functionality Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11620>	2021-06-29 08:43:06 +02:00
Iago Toral Quiroga	87fa5908b3	broadcom/compiler: add FLAFIRST and FLNAFIRST opcodes We will at least need the former to implement subgroupElect() Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11620>	2021-06-29 08:43:06 +02:00
Eric Anholt	95d41a3525	ra: Use struct ra_class in the public API. All these unsigned ints are awful to keep track of. Use pointers so we get some type checking. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>	2021-06-04 19:08:57 +00:00
Iago Toral Quiroga	f07c797e93	v3dv: implement vkCmdDispatchBase This was added with VK_KHR_device_group and allows users to specify a base offset that will be automatically added to gl_WorkGroupID. Unfortunately, V3D doesn't support this natively, so we need to add the base to the workgroup id generated by hardware manually. For this, we inject add instructions that source from a QUNIFORM that will retrieve the actual dispatch base from the compute job when it is dispatched. Since a compute shader can be dispatched with CmdDispatch and/or CmdDispatchBase, we always need to add these additional add instructions and use a base of (0,0,0) for regular dispatches. Since we don't support any version of OpenGL with this dispatch base functionality we can avoid the extra instructions there. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11037>	2021-05-31 09:06:18 +00:00
Iago Toral Quiroga	d19ce36ff2	broadcom/compiler: refactor compile strategies Until now, if we can't compile at 4 threads we would lower thread count with optimizations disabled, however, lowering thread count doubles the amount of registers available per thread, so that alone is already a big relief for register pressure so it makes sense to enable optimizations when we do that, and progressively disable them until we enable spilling as a last resort. This can slightly improve performance for some applications. Sponza, for example, gets a ~1.5% boost. I see several UE4 shaders that also get compiled to better code at 2 threads with this, but it is more difficult to assess how much this improves performance in practice due to the large variance in frame times that we observe with UE4 demos. Also, if a compiler strategy disables an optimization that did not make any progress in the previous compile attempt, we would end up re-compiling the exact same shader code and failing again. This, patch keeps track of which strategies won't make progress and skips them in that case to save some CPU time during shader compiles. Care should be taken to ensure that we try to compile with the default NIR scheduler at minimum thread count at least once though, so a specific strategy for this is added, to prevent the scenario where no optimizations are used and we skip directly to the fallback scheduler if the default strategy fails at 4 threads. Similarly, we now also explicitly specify which strategies are allowed to do TMU spills and make sure we take this into account when deciding to skip strategies. This prevents the case where no optimizations are used in a shader and we skip directly to the fallback scheduler after failing compilation at 2 threads with the default NIR scheduler but without trying to spill first. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>	2021-05-06 12:27:06 +02:00
Iago Toral Quiroga	296fe4daa6	broadcom/compiler: add a compiler strategy to disable loop unrolling Loop unrolling can increase register pressure significantly, leading to lower thread counts and spilling. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>	2021-05-06 12:27:06 +02:00
Iago Toral Quiroga	4742300e6b	v3d: move NIR compiler options to GL driver The Vulkan driver was already creating and using its own set of options, so the ones defined in the compiler are only used with GL, which is confusing. Move them to the GL driver. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>	2021-05-06 12:27:06 +02:00
Iago Toral Quiroga	f514280524	broadcom/compiler: track if a shader has control barriers in prog_data Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>	2021-05-04 15:53:23 +00:00
Iago Toral Quiroga	0a3bfacabb	broadcom/compiler: rename unifa tracking fields The term 'last' may be misleading because the offset represents the current unifa offset, which is the offset used by the last load plus 4 bytes, so rename these to use the term 'current' instead. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>	2021-04-09 10:31:40 +00:00
Iago Toral Quiroga	8998666de7	broadcom/compiler: sort constant UBO loads by index and offset This implements a NIR pass that groups together constant UBO loads for the same UBO index in order of increasing offset when the distance between them is small enough that it enables the "skip unifa write" optimization. This may increase register pressure because it can move UBO loads earlier, so we also add a compiler strategy fallback to disable the optimization if we need to drop thread count to compile the shader with this optimization enabled. total instructions in shared programs: 13557555 -> 13550300 (-0.05%) instructions in affected programs: 814684 -> 807429 (-0.89%) helped: 4485 HURT: 2377 Instructions are helped. total uniforms in shared programs: 3777243 -> 3760990 (-0.43%) uniforms in affected programs: 112554 -> 96301 (-14.44%) helped: 7226 HURT: 36 Uniforms are helped. total max-temps in shared programs: 2318133 -> 2333761 (0.67%) max-temps in affected programs: 63230 -> 78858 (24.72%) helped: 23 HURT: 3044 Max-temps are HURT. total sfu-stalls in shared programs: 32245 -> 32567 (1.00%) sfu-stalls in affected programs: 389 -> 711 (82.78%) helped: 139 HURT: 451 Inconclusive result. total inst-and-stalls in shared programs: 13589800 -> 13582867 (-0.05%) inst-and-stalls in affected programs: 817738 -> 810805 (-0.85%) helped: 4478 HURT: 2395 Inst-and-stalls are helped. total nops in shared programs: 354365 -> 342202 (-3.43%) nops in affected programs: 31000 -> 18837 (-39.24%) helped: 4405 HURT: 265 Nops are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>	2021-04-09 10:31:40 +00:00
Iago Toral Quiroga	fb2214a441	broadcom/compiler: allow compilation strategies to limit minimum thread count This adds a minimum thread count parameter to each compilation strategy with the intention to limit the minimum allowed thread count that can be used to register allocate with that strategy. For now all strategies allow the minimum thread count supported by the hardware, but we will be using this infrastructure to impose a more strict limit in an upcoming optimization. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>	2021-04-09 10:31:40 +00:00
Iago Toral Quiroga	4b244dc64f	broadcom/compiler: add a definition for the unifa skip distance We will be using this distance to setup another optimization in a follow-up patch. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> x# Please enter the commit message for your changes. Lines starting Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>	2021-04-09 10:31:40 +00:00
Iago Toral Quiroga	f33ca092da	broadcom/compiler: add a NOP count stat to shader-db Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9918>	2021-03-31 05:51:22 +00:00
Alejandro Piñeiro	b71fd5587e	broadcom/compiler: add driver_location_map at vs prog data This maps the nir shader data.location to its final data.driver_location. In general we are using the driver location as index (like vattr_sizes on the same struct), so having this map is useful if what we have is the data.location, and we don't have available the original nir shader. v2: use memset instead of for loop, and nir_foreach_shader_in_variable instead of nir_foreach_variable_with_modes (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>	2021-03-22 17:10:47 +00:00
Alejandro Piñeiro	2be0c36775	broadcom/compiler: add local_size in v3d_compute_prog_data As we plan to try to get directly the compiled variant from the cache, it would be possible to not have available the nir shaders, so we add this info on prog data. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>	2021-03-22 17:10:47 +00:00
Iago Toral Quiroga	947e9e42cc	broadcom/compiler: simplify ldvary pipelining We get optimal ldvary pipelining by doing the following: 1) Carefully merge a paired ldvary into the previous instruction when possible. 2) When the above succeeds, flag the ldvary as scheduled immediately so we can merge one of its children into the current instruction. 3) When scheduling ldvary sequences, only pick up instructions that are part of the sequence to avoid picking up something that prevents successful pipelining. This patch skips 3) assuming some hurt shaders in exchange for better scheduling flexibility during ldvary sequences. Besides eliminating most of the code dedicated to special handling ldvary sequences, this also usually allows us to produce better code by merging instructions that are unrelated to ldvary sequences into the ldvary sequences, which is particularly effective to fill up the gaps produced when scheduling the first and last ldvary sequences as well as the gaps produced by flat and noperspective varyings sequences that don't have both mul and add instructions. Notice that there are some hurt shaders, because some times the extra scheduler flexibility can lead to picking up instructions that will break a sequence without compensating for that, typically an ldunif that prevents us from doing the fixup for a follow-up ldvary. We will try to correct some of these cases with the next patch. total instructions in shared programs: 13786037 -> 13760415 (-0.19%) instructions in affected programs: 3201387 -> 3175765 (-0.80%) helped: 16155 HURT: 4146 Instructions are helped. total max-temps in shared programs: 2324834 -> 2322991 (-0.08%) max-temps in affected programs: 22160 -> 20317 (-8.32%) helped: 1340 HURT: 103 Max-temps are helped. total sfu-stalls in shared programs: 30685 -> 31827 (3.72%) sfu-stalls in affected programs: 782 -> 1924 (146.04%) helped: 253 HURT: 1416 Inconclusive result. total inst-and-stalls in shared programs: 13816722 -> 13792242 (-0.18%) inst-and-stalls in affected programs: 3171642 -> 3147162 (-0.77%) helped: 15331 HURT: 4179 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>	2021-03-10 07:52:22 +00:00
Iago Toral Quiroga	839007e490	broadcom/compiler: always restart ldvary pipelining when scheduling ldvary When we were only able to pipeline smooth varyings, if we had to disable ldvary pipelining in the middle of a sequence it would stay disabled for the rest of the program, to prevent us from prioritizing scheduling of ldvary instructions that we would not be able to pipeline effectively. Now that we can pipeline all ldvary sequences we can change this. This change re-enables ldvary pipelining upon finding the next ldvary in the program in the hopes that we can continue pipelining succesfully. To do this, we track the number of ldvary instructions we emitted so far and compare that to the number of inputs in the fragment shader we are scheduling. This also allows us to simplify our ldvary tracking at nir to vir time, since that is all now handled in the QPU scheduler. total instructions in shared programs: 13817048 -> 13810783 (-0.05%) instructions in affected programs: 810114 -> 803849 (-0.77%) helped: 4843 HURT: 591 Instructions are helped. total max-temps in shared programs: 2326612 -> 2326300 (-0.01%) max-temps in affected programs: 4689 -> 4377 (-6.65%) helped: 285 HURT: 7 Max-temps are helped. total sfu-stalls in shared programs: 30942 -> 30865 (-0.25%) sfu-stalls in affected programs: 207 -> 130 (-37.20%) helped: 120 HURT: 42 Sfu-stalls are helped. total inst-and-stalls in shared programs: 13847990 -> 13841648 (-0.05%) inst-and-stalls in affected programs: 825378 -> 819036 (-0.77%) helped: 4899 HURT: 590 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9404>	2021-03-05 10:32:19 +01:00
Iago Toral Quiroga	acbd4881c2	broadcom/compiler: ldvary pipelining tracking and documentation clean-ups Now that we can pipeline all varyings we should not be referring specifically to smooth varyings anywhere. Also, rename the instruction field 'ldvary_pipelining' to 'is_ldvary_sequence', which is more appropriate, since we always set this for any instruction involved with varying setups, independently of whether they end up being pipelined or not. This also does some other minor edits which intend to slightly simplify the code and make it a bit more compact. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9363>	2021-03-02 13:54:14 +01:00
Iago Toral Quiroga	1d021539a2	broadcom/compiler: track pipelineable ldvary sequences If we have two (or more) smooth varyings like this: nop t3; ldvary.rf0 fmul t5, t3, t0 fadd t6, t5, r5 nop t7; ldvary.rf0 fmul t9, t7, t0 fadd t10, t9, r5 nop t11; ldvary.rf0 fmul t13, t11, t0 fadd t14, t13, r5 We may be able to pipeline them like this: nop ; nop ; ldvary.r4 nop ; fmul r0, r4, rf0 ; ldvary.r1 fadd rf13, r0, r5 ; fmul r2, r1, rf0 ; ldvary.r3 fadd rf12, r2, r5 ; fmul r4, r3, rf0 ; ldvary.r0 But in order to do this, we will need to manually tweak the QPU scheduling. This patch tracks information about ldvary sequences that are good candidates for pipelining, and a follow-up patch will use this information to pipeline them when we emit the QPU code. v2 (apinheiro): - Rename the v3d_compile fields to avoid confusion with the qinst fields. - Assert that a sequence's start instruction is not the same as the end. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>	2021-03-02 07:56:00 +01:00
Eric Anholt	60573b443b	v3d: Replace driver lowering of GL_CLAMP with mesa/st's. Mesa core can do this logic for us now. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9228>	2021-02-24 18:03:46 +00:00
Iago Toral Quiroga	54c17e45ae	broadcom/compiler: skip unnecessary unifa writes If a new UBO load happens to read exactly at the offset right after the previous UBO load (something that is fairly common, for example when reading a matrix), we can skip the unifa write (with its 3 delay slots) and just continue to call ldunifa to continue reading consecutive addresses. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>	2021-02-23 08:08:01 +00:00
Iago Toral Quiroga	e1cf2406da	broadcom/compiler: add a constant alu optimization pass Currently this is useful to clean up after DCEing leading ldunifa instructions, but it can be expanded to handle more cases which may allow to simplify the compiler code in places where we have been trying to optimize manually for similar cases. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>	2021-02-23 08:08:01 +00:00
Iago Toral Quiroga	14af7b3085	broadcom/compiler: don't emit redundant ldunif If we emit a new uniform and that uniform has already been emitted in the same block we can just reuse that. There is a balancing game here between reducing ldunif instructions and not increasing register pressure too much though, so we put a limit to how far back we are willing to look for a previous definition of the uniform. Based on shader-db results, 20 instructions produces best results. total instructions in shared programs: 14928266 -> 14907432 (-0.14%) instructions in affected programs: 6431841 -> 6411007 (-0.32%) helped: 15270 HURT: 10772 Instructions are helped. total uniforms in shared programs: 3944672 -> 3840276 (-2.65%) uniforms in affected programs: 1827184 -> 1722788 (-5.71%) helped: 30423 HURT: 845 Uniforms are helped. total inst-and-stalls in shared programs: 14957813 -> 14936873 (-0.14%) inst-and-stalls in affected programs: 6475349 -> 6454409 (-0.32%) helped: 15287 HURT: 10852 Inst-and-stalls are helped. v2 (Eric): - consider ldunifrf too - check that no other instruction writes to the register Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>	2021-02-17 09:01:01 +01:00
Iago Toral Quiroga	f85fcaa494	broadcom/compiler: pass a devinfo to check if an instruction writes to TMU V3D 3.x has V3D_QPU_WADDR_TMU which in V3D 4.x is V3D_QPU_WADDR_UNIFA (which isn't a TMU write address). This change passes a devinfo to any functions that need to do these checks so we can account for the target V3D version correctly. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Arcady Goldmints-Orlov	9909fe6bac	broadcom/compiler: Skip bool_to_cond where possible This change keeps track of when a boolean temp is loaded into the flags by a comparison instruction and uses that information to skip emitting instructions to set the flags in ntq_emit_bool_to_cond when the flags already have the right contents. total instructions in shared programs: 11116502 -> 11112225 (-0.04%) instructions in affected programs: 631691 -> 627414 (-0.68%) helped: 1591 HURT: 754 helped stats (abs) min: 1 max: 94 x̄: 4.14 x̃: 3 helped stats (rel) min: 0.11% max: 13.46% x̄: 2.10% x̃: 1.58% HURT stats (abs) min: 1 max: 19 x̄: 3.07 x̃: 2 HURT stats (rel) min: 0.13% max: 19.67% x̄: 1.88% x̃: 1.15% 95% mean confidence interval for instructions value: -2.02 -1.63 95% mean confidence interval for instructions %-change: -0.94% -0.71% Instructions are helped. total uniforms in shared programs: 3281555 -> 3281513 (<.01%) uniforms in affected programs: 1754 -> 1712 (-2.39%) helped: 10 HURT: 5 helped stats (abs) min: 1 max: 19 x̄: 7.90 x̃: 5 helped stats (rel) min: 0.56% max: 11.11% x̄: 7.37% x̃: 11.05% HURT stats (abs) min: 1 max: 15 x̄: 7.40 x̃: 3 HURT stats (rel) min: 0.64% max: 9.55% x̄: 5.31% x̃: 3.41% 95% mean confidence interval for uniforms value: -8.57 2.97 95% mean confidence interval for uniforms %-change: -7.35% 1.07% Inconclusive result (value mean confidence interval includes 0). total max-temps in shared programs: 1758419 -> 1758174 (-0.01%) max-temps in affected programs: 7006 -> 6761 (-3.50%) helped: 290 HURT: 14 helped stats (abs) min: 1 max: 8 x̄: 1.13 x̃: 1 helped stats (rel) min: 0.79% max: 22.86% x̄: 6.61% x̃: 4.88% HURT stats (abs) min: 1 max: 13 x̄: 6.00 x̃: 3 HURT stats (rel) min: 1.54% max: 54.17% x̄: 23.99% x̃: 9.12% 95% mean confidence interval for max-temps value: -1.03 -0.58 95% mean confidence interval for max-temps %-change: -6.24% -4.16% Max-temps are helped. total sfu-stalls in shared programs: 23676 -> 23610 (-0.28%) sfu-stalls in affected programs: 1578 -> 1512 (-4.18%) helped: 257 HURT: 252 helped stats (abs) min: 1 max: 3 x̄: 1.37 x̃: 1 helped stats (rel) min: 11.11% max: 100.00% x̄: 46.70% x̃: 40.00% HURT stats (abs) min: 1 max: 2 x̄: 1.14 x̃: 1 HURT stats (rel) min: 0.00% max: 200.00% x̄: 41.65% x̃: 25.00% 95% mean confidence interval for sfu-stalls value: -0.25 -0.01 95% mean confidence interval for sfu-stalls %-change: -8.24% 2.33% Inconclusive result (%-change mean confidence interval includes 0). total inst-and-stalls in shared programs: 11140178 -> 11135835 (-0.04%) inst-and-stalls in affected programs: 633972 -> 629629 (-0.69%) helped: 1581 HURT: 755 helped stats (abs) min: 1 max: 94 x̄: 4.26 x̃: 3 helped stats (rel) min: 0.11% max: 13.46% x̄: 2.12% x̃: 1.59% HURT stats (abs) min: 1 max: 17 x̄: 3.17 x̃: 2 HURT stats (rel) min: 0.05% max: 19.67% x̄: 1.93% x̃: 1.20% 95% mean confidence interval for inst-and-stalls value: -2.06 -1.66 95% mean confidence interval for inst-and-stalls %-change: -0.93% -0.70% Inst-and-stalls are helped. Reviewed-by: Iago Toral Quioroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>	2021-02-12 07:05:33 +00:00
Arcady Goldmints-Orlov	8762f29e9c	broadcom/compiler: Add a v3d_compile argument to vir_set_[pu]f Reviewed-by: Iago Toral Quioroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>	2021-02-12 07:05:33 +00:00
Eric Anholt	bcb5f9f94a	v3d: Stop advertising support for flat shading. The GL frontend can lower this weird GL feature away for us. This should fix redeclaration of the gl_Color/SecondaryColor as centroid, since that case had been missed in the !flat special case here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Eric Anholt	ff805f8ac7	v3d: Stop advertising support for PIPE_CAP_*_COLOR_CLAMPED. The GL frontend can lower away this deprecated GL feature for us. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Eric Anholt	2992dc7386	v3d: Stop advertising support for PIPE_CAP_TWO_SIDED_COLOR. The GL frontend can lower away this deprecated GL feature for us. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Eric Anholt	5ddc2f916f	v3d: Clean up vestiges of alpha test lowering. We had an unnecessary case in our uniforms upload switch statement, since we no longer advertise the cap. Fixes: `8ad931808e` ("v3d: do not report alpha-test as supported") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Iago Toral Quiroga	6630825dcf	broadcom/compiler: let QPUs stall on TMU input/config overflows We have been trying to avoid this by tracking fifo usages in the driver and flushing all outstanding TMU sequences if we overflowed any of these, however, this is actually not the most efficient strategy. Instead, we would like to flush only enough operations to get things going again, which is better for pipelining. Doing that in the driver would require some additional work, but thankfully, it is not required, since this seems to be what the hardware does automatically, so we can just remove overflow tracking for these two fifos and enjoy the benefits. This also further improves shader-db stats: total instructions in shared programs: 8975062 -> 8955145 (-0.22%) instructions in affected programs: 1637624 -> 1617707 (-1.22%) helped: 4050 HURT: 2241 Instructions are helped. total threads in shared programs: 236802 -> 237042 (0.10%) threads in affected programs: 252 -> 492 (95.24%) helped: 122 HURT: 2 Threads are helped. total sfu-stalls in shared programs: 19901 -> 19592 (-1.55%) sfu-stalls in affected programs: 4744 -> 4435 (-6.51%) helped: 1248 HURT: 1051 Sfu-stalls are helped. total inst-and-stalls in shared programs: 8994963 -> 8974737 (-0.22%) inst-and-stalls in affected programs: 1636184 -> 1615958 (-1.24%) helped: 4050 HURT: 2239 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	e18d6bbf2f	broadcom/compiler: disable TMU pipelining if we fail to register allocate TMU pipelining can severely reduce our capacity to emit TMU spills, causing us to fail to compile a shader we may otherwise be able to compile. This is because pipelining extends the liveness of TMU sequences by posponing the thread switch and LDTMU until a result is needed, and we can't emit TMU spills while in the middle of a TMU sequence. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	be45960d3e	broadcom/compiler: support pipelining of tex instructions This follows the same idea as for TMU general instructions of reusing the existing infrastructure to first count required register writes and flush outstanding TMU dependencies, and then emit the actual writes, which requires that we split the code that decides about register writes to a helper. We also need to start using a component mask instead of the number of components that we need to read with a particular TMU operation. v2: update tmu_writes for V3D_QPU_WADDR_TMUOFF Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	197090a3fc	broadcom/compiler: implement pipelining for general TMU operations This creates the basic infrastructure to implement TMU pipelining and applies it to general TMU. Follow-up patches will expand this to texture and image/load store operations. TMU pipelining means that we don't immediately end TMU sequences, and instead, we postpone the thread switch and LDTMU (for loads) or TMUWT (for stores) until we really need to do them. For loads, we may need to flush them if another instruction reads the result of a load operation. We can detect this because in that case ntq_get_src() will not find the definition for that ssa/reg (since we have not emitted the LDTMU instructions for it yet), so when that happens, we flush all pending TMU operations and then try again to find the definition for the source. We also need to flush pending TMU operations when we reach the end of a control flow block, to prevent the case where we emit a TMU operation in a block, but then we read the result in another block possibly under control flow. It is also required to flush across barriers and discards to honor their semantics. Since this change doesn't implement pipelining for texture and image load/store, we also need to flush outstanding TMU operations if we ever have to emit one of these. This will be corrected with follow-up patches. Finally, the TMU has 3 fifos where it can queue TMU operations. These fifos have limited capacity, depending on the number of threads used to compile the shader, so we also need to ensure that we don't have too many outstanding TMU requests and flush pending TMU operations if a new TMU operation would overflow any of these fifos. While overflowing the Input and Config fifos only leads to stalls (which we want to avoid anyway), overflowing the Output fifo is incorrect and would end up with a broken shader. This means that we need to know how many TMU register writes are required to emit a TMU operation and use that information to decide if we need to flush pending TMU operations before we emit any register writes for the new TMU operation. v2: fix TMU flushing for NIR registers reads (jasuarez) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Arcady Goldmints-Orlov	79bde75131	broadcom/compiler: Emit uniform loops using uniform control flow Similarly to if statements, uniform loops are now emitted without predication, using simple branches for breaks and continues. The uniformity of the loop is determined by running the nir_divergence_analysis pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>	2021-02-01 08:11:48 +00:00
Alejandro Piñeiro	429c336412	broadcom/compiler: separate texture/sampler info from v3d_key So far the v3d compiler has them combined, as for OpenGL both are the same. This change is intended to fit the v3d compiler better with Vulkan, where they are separate concepts. Note that NIR has them separate for a long time, both on nir_variable and on some NIR lowerings. v2: (from Iago feedback) * Use key->num_tex/sampler_used to iterate through the array * Fill up num_samplers_used on v3d, assert that is the same that num_tex_used if possible. v3: (Iago) * Assert num_tex/samplers_used is smaller that tex/sampler array size. v4: Update assert mentioned on v3 to use <= instead of < (detected by CI) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> squash! broadcom/compiler: separate texture/sampler info from v3d_key Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7545>	2020-11-14 15:59:02 +00:00

1 2 3 4

168 Commits