Commit Graph

646 Commits

Author SHA1 Message Date
Rhys Perry a2619b97f5 nir/lower_idiv: add options to use fp32 for 8-bit division lowering
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10081>
2021-04-12 16:19:46 +00:00
Juan A. Suarez Romero e7f4f1b582 broadcom/compiler: use signed pointers for packed condition
`qpu.raddr_b` is an unsigned int, so it is always positive, even after
casting to signed int.

Fixes CID#1438117 "Operands don't affect result
(CONSTANT_EXPRESSION_RESULT)":

   "result_independent_of_operands: (int)inst->qpu.raddr_b >= -16 is
    always true regardless of the values of its operands. This occurs as
    the logical first operand of "&&".

v2:
 - Use signed pointers (Iago)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10131>
2021-04-12 15:22:05 +00:00
Iago Toral Quiroga 0a3bfacabb broadcom/compiler: rename unifa tracking fields
The term 'last' may be misleading because the offset represents
the current unifa offset, which is the offset used by the last
load plus 4 bytes, so rename these to use the term 'current'
instead.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
2021-04-09 10:31:40 +00:00
Iago Toral Quiroga 8998666de7 broadcom/compiler: sort constant UBO loads by index and offset
This implements a NIR pass that groups together constant UBO loads
for the same UBO index in order of increasing offset when the distance
between them is small enough that it enables the "skip unifa write"
optimization.

This may increase register pressure because it can move UBO loads
earlier, so we also add a compiler strategy fallback to disable the
optimization if we need to drop thread count to compile the shader
with this optimization enabled.

total instructions in shared programs: 13557555 -> 13550300 (-0.05%)
instructions in affected programs: 814684 -> 807429 (-0.89%)
helped: 4485
HURT: 2377
Instructions are helped.

total uniforms in shared programs: 3777243 -> 3760990 (-0.43%)
uniforms in affected programs: 112554 -> 96301 (-14.44%)
helped: 7226
HURT: 36
Uniforms are helped.

total max-temps in shared programs: 2318133 -> 2333761 (0.67%)
max-temps in affected programs: 63230 -> 78858 (24.72%)
helped: 23
HURT: 3044
Max-temps are HURT.

total sfu-stalls in shared programs: 32245 -> 32567 (1.00%)
sfu-stalls in affected programs: 389 -> 711 (82.78%)
helped: 139
HURT: 451
Inconclusive result.

total inst-and-stalls in shared programs: 13589800 -> 13582867 (-0.05%)
inst-and-stalls in affected programs: 817738 -> 810805 (-0.85%)
helped: 4478
HURT: 2395
Inst-and-stalls are helped.

total nops in shared programs: 354365 -> 342202 (-3.43%)
nops in affected programs: 31000 -> 18837 (-39.24%)
helped: 4405
HURT: 265
Nops are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
2021-04-09 10:31:40 +00:00
Iago Toral Quiroga fb2214a441 broadcom/compiler: allow compilation strategies to limit minimum thread count
This adds a minimum thread count parameter to each compilation strategy with
the intention to limit the minimum allowed thread count that can be used to
register allocate with that strategy.

For now all strategies allow the minimum thread count supported by the
hardware, but we will be using this infrastructure to impose a more
strict limit in an upcoming optimization.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
2021-04-09 10:31:40 +00:00
Iago Toral Quiroga 4b244dc64f broadcom/compiler: add a definition for the unifa skip distance
We will be using this distance to setup another optimization in a
follow-up patch.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

x# Please enter the commit message for your changes. Lines starting

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
2021-04-09 10:31:40 +00:00
Juan A. Suarez Romero cc8d4cd1ae broadcom/compiler: fix first_component assertion
first_component is an uint, and thus if it takes value 0 we can't know
if it is because writemask has its first bit to 1, or all bits to 0.

As we want to ensure that at least one bit is set, apply the assertion
in writemask.

Fixes CID#1472829 "Macro compares unsigned to 0 (NO_EFFECT)".

v2:
 - Restore "first_component <= last_component" assertion (Iago)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10103>
2021-04-09 07:55:41 +00:00
Bas Nieuwenhuizen 580f1ac473 nir: Extract shader_info->cs.shared_size out of union.
It is valid for all stages, just 0 for most of them. In particular
mesh/task shaders might be using it.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10094>
2021-04-08 14:39:28 +00:00
Iago Toral Quiroga 9ca0e070e7 broadcom/compiler: optimize branch emission for uniform break/continue
A break/continue in a loop is typically emitted like this:

if (cond) {
    break/continue;
} else {
}

If cond is uniform, we'll emit code for a uniform if statement and
that will emit a branch right before the if to jump directly to the
else (or the block after the else in this case, since the else is
empty) in case cond evaluates to false. This means we end up emitting
two consecutive branch instructions, one before the if and one for the
THEN block right after:

branch(!cond) -> jump to else (or after else) if cond is false
nop
nop
nop
branch -> unconditional jump to break/continue
nop
nop
nop

Instead, if we are in this scenario, we can do better by emitting the
conditional jump directly and avoiding the "jump to else" case:

branch(cond) -> jump to break/continue if cond is true
nop
nop
nop

We need to be careful when emitting the break/continue for the case
where all lanes are disabled to avoid infinite loops: if we have a
break we always want to take the jump, but we don't want to take it
if it is a continue.

total instructions in shared programs: 13563672 -> 13557348 (-0.05%)
instructions in affected programs: 348034 -> 341710 (-1.82%)
helped: 1158
HURT: 10
Instructions are helped.

total uniforms in shared programs: 3779137 -> 3777535 (-0.04%)
uniforms in affected programs: 90583 -> 88981 (-1.77%)
helped: 1169
HURT: 0
Uniforms are helped.

total max-temps in shared programs: 2317670 -> 2317575 (<.01%)
max-temps in affected programs: 1943 -> 1848 (-4.89%)
helped: 85
HURT: 4
Max-temps are helped.

total sfu-stalls in shared programs: 32247 -> 32247 (0.00%)
sfu-stalls in affected programs: 69 -> 69 (0.00%)
helped: 7
HURT: 9
Inconclusive result (value mean confidence interval includes 0).

total inst-and-stalls in shared programs: 13595919 -> 13589595 (-0.05%)
inst-and-stalls in affected programs: 350674 -> 344350 (-1.80%)
helped: 1154
HURT: 11
Inst-and-stalls are helped.

total nops in shared programs: 358202 -> 354325 (-1.08%)
nops in affected programs: 17367 -> 13490 (-22.32%)
helped: 1168
HURT: 1
Nops are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9948>
2021-04-05 06:38:19 +00:00
Iago Toral Quiroga 14843ccc33 broadcom/compiler: implement restriction for branch after setmsf
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9948>
2021-04-05 06:38:19 +00:00
Iago Toral Quiroga 8f7640293d broadcom/compiler: try to fill up delay slots after unconditional branch
If we have an unconditional branch then we can try to fill up its
delay slots with the initial instructions of its successor block by
copying them into the delay slots and adjusting the branch offset to
skip the copied instructions.

total nops in shared programs: 365640 -> 364471 (-0.32%)
nops in affected programs: 15416 -> 14247 (-7.58%)
helped: 462
HURT: 0

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9918>
2021-03-31 05:51:22 +00:00
Iago Toral Quiroga e266e6c634 broadcom/compiler: try to fill up delay slots after a branch instruction
For this we do something similar to what we do with thrsw where we try to
move the branch instruction earlier so the previous instructions execute
in the delay slots of the branch.

Generally, we can do this with any instruction except:
 - If the instruction reads a uniform: since our branches do as well and
   uniforms come from an ordered FIFO stream.
 - If the instruction writes flags, since our branch instruction will
   probably read them.
 - If the instruction is in the delay slots of another thread switch,
   branch, or unifa write, which is disallowed.

total instructions in shared programs: 13648140 -> 13613972 (-0.25%)
instructions in affected programs: 2209552 -> 2175384 (-1.55%)
helped: 6765
HURT: 0
Instructions are helped.

total max-temps in shared programs: 2318687 -> 2318436 (-0.01%)
max-temps in affected programs: 5046 -> 4795 (-4.97%)
helped: 152
HURT: 0
Max-temps are helped.

total inst-and-stalls in shared programs: 13680494 -> 13646326 (-0.25%)
inst-and-stalls in affected programs: 2220394 -> 2186226 (-1.54%)
helped: 6765
HURT: 0
Inst-and-stalls are helped.

total nops in shared programs: 399818 -> 365640 (-8.55%)
nops in affected programs: 127311 -> 93133 (-26.85%)
helped: 6765
HURT: 0
Nops are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9918>
2021-03-31 05:51:22 +00:00
Iago Toral Quiroga f33ca092da broadcom/compiler: add a NOP count stat to shader-db
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9918>
2021-03-31 05:51:22 +00:00
Iago Toral Quiroga 062eee7d33 broadcom/compiler: dump instruction index when failing to pack instructions
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9918>
2021-03-31 05:51:22 +00:00
Juan A. Suarez Romero cc1f070a27 broadcom/compiler: fix unused value
Do not assign to a variable that won't be used.

Fixes CID#1451708 and CID#1451710 "Unused value (UNUSED_VALUE)".

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9910>
2021-03-30 14:15:43 +00:00
Iago Toral Quiroga c3c251d98f broadcom/compiler: flag TMU reads with a read dependency on last TMU config
We were using a write dependency to ensure ordering since LDTMUs sequences
are ordered, but by using a write dependency with TMU config we were also
preserving ordering with TMU config writes that are not a sequence
terminator, which is not required and reduces scheduling flexibility.
Instead, use a write dependency to ensure strict ordering of TMU reads,
but only a read depdency with TMU config.

With this change we also need to update CS barriers to also have a write
dependency with TMU reads to ensure that we don't move TMU reads around
CS barriers.

total instructions in shared programs: 13602500 -> 13597851 (-0.03%)
instructions in affected programs: 2681428 -> 2676779 (-0.17%)
helped: 6567
HURT: 4960
Instructions are helped.

total max-temps in shared programs: 2317927 -> 2317914 (<.01%)
max-temps in affected programs: 13861 -> 13848 (-0.09%)
helped: 355
HURT: 300
Inconclusive result (value mean confidence interval includes 0).

total sfu-stalls in shared programs: 32074 -> 32247 (0.54%)
sfu-stalls in affected programs: 848 -> 1021 (20.40%)
helped: 160
HURT: 327
Inconclusive result (%-change mean confidence interval includes 0).

total inst-and-stalls in shared programs: 13634574 -> 13630098 (-0.03%)
inst-and-stalls in affected programs: 2703041 -> 2698565 (-0.17%)
helped: 6558
HURT: 5020
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9856>
2021-03-29 06:21:22 +00:00
Iago Toral Quiroga cbf9a7c3c1 broadcom/compiler: flag TMU read dependencies against last TMU config
Instead of last TMU write. According to the documentation, the entries
in the output FIFO are pushed with the *final* input write for the
lookup, which is the one terminating the sequence. We flag these
with last_tmu_config.

This will allow us to move all TMU register writes for a lookup except
the last one ahead of the LDTMUs for the previous lookup, possibly
allowing us to pair up these writes the wrtmuc instructions for the
same lookup, turning code like this:

nop                  ; nop               ; wrtmuc (tex[0].p0 | 0x3)
nop                  ; nop               ; wrtmuc (tex[2].p1 | 0x1)
nop                  ; nop               ; ldunif (ubo[2]+0xe0)
fadd  r4, rf33, rf51 ; mov  unifa, r5    ; ldunif (ubo[2]+0x110)
fmax  rf34, 0, r4    ; nop
nop                  ; mov  tmut, rf11
nop                  ; mov  tmus, rf0

into:

nop                  ; mov  tmut, rf11   ; wrtmuc (tex[0].p0 | 0x3)
nop                  ; nop               ; wrtmuc (tex[2].p1 | 0x1)
nop                  ; nop               ; ldunif (ubo[2]+0xe0)
fadd  r4, rf33, rf51 ; mov  unifa, r5    ; ldunif (ubo[2]+0x110)
fmax  rf34, 0, r4    ; nop
nop                  ; mov  tmus, rf0

total instructions in shared programs: 13648140 -> 13602500 (-0.33%)
instructions in affected programs: 3497402 -> 3451762 (-1.30%)
helped: 12044
HURT: 3484
Instructions are helped.

total max-temps in shared programs: 2318687 -> 2317927 (-0.03%)
max-temps in affected programs: 17234 -> 16474 (-4.41%)
helped: 615
HURT: 198
Max-temps are helped.

total sfu-stalls in shared programs: 32354 -> 32074 (-0.87%)
sfu-stalls in affected programs: 1462 -> 1182 (-19.15%)
helped: 461
HURT: 188
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 13680494 -> 13634574 (-0.34%)
inst-and-stalls in affected programs: 3514405 -> 3468485 (-1.31%)
helped: 12062
HURT: 3486
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9856>
2021-03-29 06:21:22 +00:00
Iago Toral Quiroga e790c20403 broadcom/compiler: try to fill up delay slots after a thrsw
The way we handle thrsw instructions is that we try to merge them
back into previously scheduled instructions to fill up its delay
slots. This is generally safe, because the thrsw won't happen until
after the delay slots, so we are not really changing the execution
order of the instructions and we just need to make sure we don't
violate a few specific restrictions.

If we have not managed to fill up all delay slots after doing this,
then we emit as many NOPs as needed to fill them. This is to ensure
that we don't schedule an instruction that needs to execute after the
thread switch before the thread switch happens. However, doing this
can lead to inefficient code, since some times the instructions we
schedule after a thrsw are indepdent of the thrsw and could be safely
executed in its delay slots.

This change removes the fixed NOP emission after a thrsw to fill
delay slots and instead adds code to ensure that our instruction
scheduling is aware of when it is scheduling instructions in the
delay slots of a previous thrsw to avoid selecting conflicting
instructions.

The only case were we still emit fixed NOPs is for the thread end that
we emit to terminate the program after scheduling all instructions
because we can't end the instruction stream before the thread end
is properly executed.

total instructions in shared programs: 13691004 -> 13648140 (-0.31%)
instructions in affected programs: 4345951 -> 4303087 (-0.99%)
helped: 19645
HURT: 652
Instructions are helped.

total max-temps in shared programs: 2319317 -> 2318687 (-0.03%)
max-temps in affected programs: 10510 -> 9880 (-5.99%)
helped: 532
HURT: 9
Max-temps are helped.

total sfu-stalls in shared programs: 31752 -> 32354 (1.90%)
sfu-stalls in affected programs: 840 -> 1442 (71.67%)
helped: 7
HURT: 467
Sfu-stalls are HURT.

total inst-and-stalls in shared programs: 13722756 -> 13680494 (-0.31%)
inst-and-stalls in affected programs: 4335590 -> 4293328 (-0.97%)
helped: 19453
HURT: 758
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9825>
2021-03-26 07:13:07 +00:00
Iago Toral Quiroga 22a979be65 broadcom/compiler: convert add to mul when possible to allow merge
Integer add/sub can be implemented as either an add or a mul instruction
but we always emit them as add instructions at VIR level. We can use this
flexibility to improve our QPU scheduling so we can be more effective
at instruction merging by converting these to mul instructions when we
are attempting to merge them with another add instruction.

total instructions in shared programs: 13721549 -> 13691004 (-0.22%)
instructions in affected programs: 3340493 -> 3309948 (-0.91%)
helped: 12805
HURT: 1656
Instructions are helped.

total max-temps in shared programs: 2319528 -> 2319317 (<.01%)
max-temps in affected programs: 5285 -> 5074 (-3.99%)
helped: 195
HURT: 3
Max-temps are helped.

total sfu-stalls in shared programs: 31616 -> 31752 (0.43%)
sfu-stalls in affected programs: 469 -> 605 (29.00%)
helped: 52
HURT: 161
Sfu-stalls are HURT.

total inst-and-stalls in shared programs: 13753165 -> 13722756 (-0.22%)
inst-and-stalls in affected programs: 3340383 -> 3309974 (-0.91%)
helped: 12782
HURT: 1666
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9769>
2021-03-25 09:51:42 +00:00
Alejandro Piñeiro b71fd5587e broadcom/compiler: add driver_location_map at vs prog data
This maps the nir shader data.location to its final
data.driver_location. In general we are using the driver location as
index (like vattr_sizes on the same struct), so having this map is
useful if what we have is the data.location, and we don't have
available the original nir shader.

v2: use memset instead of for loop, and nir_foreach_shader_in_variable
    instead of nir_foreach_variable_with_modes (Iago)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Alejandro Piñeiro 2be0c36775 broadcom/compiler: add local_size in v3d_compute_prog_data
As we plan to try to get directly the compiled variant from the cache,
it would be possible to not have available the nir shaders, so we add
this info on prog data.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9403>
2021-03-22 17:10:47 +00:00
Iago Toral Quiroga cbe24a0e9c broadcom/compiler: use nir_lower_undef_to_zero
total instructions in shared programs: 13731663 -> 13721549 (-0.07%)
instructions in affected programs: 98242 -> 88128 (-10.29%)
helped: 191
HURT: 131
Instructions are helped.

total threads in shared programs: 412272 -> 412296 (<.01%)
threads in affected programs: 24 -> 48 (100.00%)
helped: 12
HURT: 0
Threads are helped.

total uniforms in shared programs: 3780693 -> 3779137 (-0.04%)
uniforms in affected programs: 10564 -> 9008 (-14.73%)
helped: 114
HURT: 7
Uniforms are helped.

total max-temps in shared programs: 2319942 -> 2319528 (-0.02%)
max-temps in affected programs: 4191 -> 3777 (-9.88%)
helped: 113
HURT: 22
Max-temps are helped.

total sfu-stalls in shared programs: 31584 -> 31616 (0.10%)
sfu-stalls in affected programs: 217 -> 249 (14.75%)
helped: 51
HURT: 54
Inconclusive result (value mean confidence interval includes 0).

total inst-and-stalls in shared programs: 13763247 -> 13753165 (-0.07%)
inst-and-stalls in affected programs: 98719 -> 88637 (-10.21%)
helped: 187
HURT: 134
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9681>
2021-03-22 12:17:13 +00:00
Iago Toral Quiroga 1c987f5db3 broadcom/compiler: optimize constant vfpack
total instructions in shared programs: 13733627 -> 13731663 (-0.01%)
instructions in affected programs: 174140 -> 172176 (-1.13%)
helped: 1597
HURT: 310
Instructions are helped.

total uniforms in shared programs: 3784601 -> 3780693 (-0.10%)
uniforms in affected programs: 58678 -> 54770 (-6.66%)
helped: 2886
HURT: 3
Uniforms are helped.

total max-temps in shared programs: 2322714 -> 2319942 (-0.12%)
max-temps in affected programs: 15729 -> 12957 (-17.62%)
helped: 2189
HURT: 1
Max-temps are helped.

total spills in shared programs: 6010 -> 6012 (0.03%)
spills in affected programs: 61 -> 63 (3.28%)
helped: 0
HURT: 1

total fills in shared programs: 13494 -> 13497 (0.02%)
fills in affected programs: 89 -> 92 (3.37%)
helped: 0
HURT: 1

total sfu-stalls in shared programs: 31521 -> 31584 (0.20%)
sfu-stalls in affected programs: 328 -> 391 (19.21%)
helped: 30
HURT: 94
Inconclusive result (%-change mean confidence interval includes 0).

total inst-and-stalls in shared programs: 13765148 -> 13763247 (-0.01%)
inst-and-stalls in affected programs: 174237 -> 172336 (-1.09%)
helped: 1551
HURT: 316
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9681>
2021-03-22 12:17:13 +00:00
Iago Toral Quiroga b189409a46 broadcom/compiler: handle implicit uniform loads when optimizing constant alu
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9681>
2021-03-22 12:17:13 +00:00
Iago Toral Quiroga aefac60741 broadcom/compiler: use nir_lower_wrmasks to simplify TMU general stores
This pass splits writemaks with non-consecutive bits into multiple
store operations ensuring that each store only has consecutive
writemask bits set.

We can use this to simplify writemask handling in our backend removing
a loop solely intended to handle this case.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9619>
2021-03-17 09:35:19 +00:00
Iago Toral Quiroga 51a263530f broadcom/compiler: use nir_opt_load_store_vectorize
This will make it so we pack consecutive scalar operations into a vector
operation, reducing the amount of load/store operations in the NIR program.
Our backend can handle vector load/stores, and doing so may be more efficient
since we don't need to setup individual load/stores all the time.

A pathological case is:
dEQP-VK.spirv_assembly.instruction.compute.opcopymemory.array
which goes from 862 instructions to only 573 by converting
all scalar SSBO load/store operations to vec4 operations.

total instructions in shared programs: 13752607 -> 13733627 (-0.14%)
instructions in affected programs: 367117 -> 348137 (-5.17%)
helped: 1168
HURT: 371
Instructions are helped.

total threads in shared programs: 412230 -> 412272 (0.01%)
threads in affected programs: 54 -> 96 (77.78%)
helped: 23
HURT: 2
Threads are helped.

total uniforms in shared programs: 3790248 -> 3784601 (-0.15%)
uniforms in affected programs: 57417 -> 51770 (-9.84%)
helped: 1420
HURT: 19
Uniforms are helped.

total max-temps in shared programs: 2322170 -> 2322714 (0.02%)
max-temps in affected programs: 14353 -> 14897 (3.79%)
helped: 185
HURT: 306
Max-temps are HURT.

total spills in shared programs: 5940 -> 6010 (1.18%)
spills in affected programs: 65 -> 135 (107.69%)
helped: 0
HURT: 11

total fills in shared programs: 13372 -> 13494 (0.91%)
fills in affected programs: 75 -> 197 (162.67%)
helped: 0
HURT: 11

total sfu-stalls in shared programs: 31505 -> 31521 (0.05%)
sfu-stalls in affected programs: 751 -> 767 (2.13%)
helped: 210
HURT: 246
Inconclusive result (value mean confidence interval includes 0).

total inst-and-stalls in shared programs: 13784112 -> 13765148 (-0.14%)
inst-and-stalls in affected programs: 360283 -> 341319 (-5.26%)
helped: 1125
HURT: 366
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9619>
2021-03-17 09:35:19 +00:00
Iago Toral Quiroga 3db322f305 broadcom/compiler: fix end of tmu sequence detection
TMUWT always terminates a TMU sequence.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9619>
2021-03-17 09:35:19 +00:00
Iago Toral Quiroga 177dcd4b68 broadcom/compiler: be more flexible scheduling TMU writes
V3D 4.x allows more flexibility, so take advantage of that. Generally,
we can reorder any writes in the same sequence, so long as they are
not the sequence terminator (which must always be last, since it is
the one triggering the operation), and TMUD writes, since these must
be ordered with respect to each other.

total instructions in shared programs: 13735183 -> 13731927 (-0.02%)
instructions in affected programs: 903057 -> 899801 (-0.36%)
helped: 2358
HURT: 746
Instructions are helped.

total max-temps in shared programs: 2322020 -> 2322009 (<.01%)
max-temps in affected programs: 619 -> 608 (-1.78%)
helped: 19
HURT: 11
Inconclusive result (value mean confidence interval includes 0).

total sfu-stalls in shared programs: 31494 -> 31489 (-0.02%)
sfu-stalls in affected programs: 182 -> 177 (-2.75%)
helped: 40
HURT: 40
Inconclusive result (value mean confidence interval includes 0).

total inst-and-stalls in shared programs: 13766677 -> 13763416 (-0.02%)
inst-and-stalls in affected programs: 901343 -> 898082 (-0.36%)
helped: 2349
HURT: 746
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9555>
2021-03-15 08:03:28 +01:00
Iago Toral Quiroga 87ed614c47 broadcom/compiler: flag wrtmuc with a read dependency on last_tmu_config
Instead of using a write depdency. We use last_tmu_config to ensure ordering
of instructions participating in different TMU sequences. To this end,
all sequence terminators flag a write dependency on last_tmu_config, but
wrtmuc is not a sequence terminator, so we can be more flexible by flagging
it as a read depedency. This would prevent it to be moved into a previous
sequence (since it cannot be moved past the previous sequence terminator due
to the read depedency), but it allows it to be reordered with instructions in
the same sequence, which allows us to pair it up more effectively. Particularly,
it allows to pair up a wrtmuc with the sequence terminator of the same sequence,
turning code like this:

nop                  ; mov  tmut, r0     ; thrsw; wrtmuc (tex[0].p0 | 0x3)
nop                  ; nop               ; wrtmuc (tex[0].p1 | 0x0)
nop                  ; mov  tmus, r1

Into this:

nop                  ; mov  tmut, r0     ; thrsw; wrtmuc (tex[0].p0 | 0x3)
nop                  ; mov  tmus, r1     ; wrtmuc (tex[0].p1 | 0x0)

total instructions in shared programs: 13755738 -> 13735183 (-0.15%)
instructions in affected programs: 2510921 -> 2490366 (-0.82%)
helped: 10963
HURT: 485
Instructions are helped.

total max-temps in shared programs: 2322828 -> 2322020 (-0.03%)
max-temps in affected programs: 11303 -> 10495 (-7.15%)
helped: 608
HURT: 19
Max-temps are helped.

total sfu-stalls in shared programs: 31545 -> 31494 (-0.16%)
sfu-stalls in affected programs: 235 -> 184 (-21.70%)
helped: 62
HURT: 11
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 13787283 -> 13766677 (-0.15%)
inst-and-stalls in affected programs: 2525187 -> 2504581 (-0.82%)
helped: 10989
HURT: 477
Inst-and-stalls are helped.

v2: add a comment explaining the read depdency (Piñeiro).

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9555>
2021-03-15 08:03:28 +01:00
Iago Toral Quiroga c057a1211b broadcom/compiler: disallow ldunif during ldvary sequences if possible
This restores many of the hurt shaders from the previous patch at the
expense of re-adding ldvary tracking in the scheduler.

total instructions in shared programs: 13760415 -> 13755738 (-0.03%)
instructions in affected programs: 1207560 -> 1202883 (-0.39%)
helped: 5080
HURT: 1731
Instructions are helped.

total max-temps in shared programs: 2322991 -> 2322828 (<.01%)
max-temps in affected programs: 5063 -> 4900 (-3.22%)
helped: 229
HURT: 108
Max-temps are helped.

total sfu-stalls in shared programs: 31827 -> 31545 (-0.89%)
sfu-stalls in affected programs: 478 -> 196 (-59.00%)
helped: 304
HURT: 21
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 13792242 -> 13787283 (-0.04%)
inst-and-stalls in affected programs: 1220856 -> 1215897 (-0.41%)
helped: 5162
HURT: 1697
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>
2021-03-10 07:52:22 +00:00
Iago Toral Quiroga 947e9e42cc broadcom/compiler: simplify ldvary pipelining
We get optimal ldvary pipelining by doing the following:

1) Carefully merge a paired ldvary into the previous instruction when
   possible.
2) When the above succeeds, flag the ldvary as scheduled immediately so
   we can merge one of its children into the current instruction.
3) When scheduling ldvary sequences, only pick up instructions that are
   part of the sequence to avoid picking up something that prevents
   successful pipelining.

This patch skips 3) assuming some hurt shaders in exchange for better
scheduling flexibility during ldvary sequences. Besides eliminating most
of the code dedicated to special handling ldvary sequences, this also
usually allows us to produce better code by merging instructions that are
unrelated to ldvary sequences into the ldvary sequences, which is
particularly effective to fill up the gaps produced when scheduling the
first and last ldvary sequences as well as the gaps produced by flat
and noperspective varyings sequences that don't have both mul and add
instructions.

Notice that there are some hurt shaders, because some times the extra
scheduler flexibility can lead to picking up instructions that will
break a sequence without compensating for that, typically an ldunif
that prevents us from doing the fixup for a follow-up ldvary. We will
try to correct some of these cases with the next patch.

total instructions in shared programs: 13786037 -> 13760415 (-0.19%)
instructions in affected programs: 3201387 -> 3175765 (-0.80%)
helped: 16155
HURT: 4146
Instructions are helped.

total max-temps in shared programs: 2324834 -> 2322991 (-0.08%)
max-temps in affected programs: 22160 -> 20317 (-8.32%)
helped: 1340
HURT: 103
Max-temps are helped.

total sfu-stalls in shared programs: 30685 -> 31827 (3.72%)
sfu-stalls in affected programs: 782 -> 1924 (146.04%)
helped: 253
HURT: 1416
Inconclusive result.

total inst-and-stalls in shared programs: 13816722 -> 13792242 (-0.18%)
inst-and-stalls in affected programs: 3171642 -> 3147162 (-0.77%)
helped: 15331
HURT: 4179
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>
2021-03-10 07:52:22 +00:00
Iago Toral Quiroga d37241bdc4 broadcom/compiler: move code block around
These checks depend on prev_inst being set, so move them down below
with all the other checks with the same requirement.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>
2021-03-10 07:52:22 +00:00
Iago Toral Quiroga 8bcda472a0 broadcom/compiler: add an additional sanity check assert to the ldvary fixup
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>
2021-03-10 07:52:22 +00:00
Jason Ekstrand e20e85f01e nir: Make nir_ssa_def_rewrite_uses_after take an SSA value
This replaces the new_src parameter of nir_ssa_def_rewrite_uses_after()
with an SSA def, and rewrites all the users as needed.

Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
2021-03-08 16:59:55 +00:00
Jason Ekstrand 117668b811 nir: Make nir_ssa_def_rewrite_uses take an SSA value
This commit replaces the new_src parameter of nir_ssa_def_rewrite_uses()
with an SSA def, removes nir_ssa_def_rewrite_uses_ssa(), and rewrites
all the users as needed.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
2021-03-08 16:59:55 +00:00
Iago Toral Quiroga 6e6e71ddf9 broadcom/compiler: fix flags check for ldvary merge
We were checking that the previous instruction doesn't write flags,
but we also need to check it doesn't read them.

Fixes: 1784dd22a3 ('broadcom/compiler: pipeline smooth ldvary sequences')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9431>
2021-03-05 12:55:47 +00:00
Iago Toral Quiroga 839007e490 broadcom/compiler: always restart ldvary pipelining when scheduling ldvary
When we were only able to pipeline smooth varyings, if we had to disable
ldvary pipelining in the middle of a sequence it would stay disabled for
the rest of the program, to prevent us from prioritizing scheduling of
ldvary instructions that we would not be able to pipeline effectively.
Now that we can pipeline all ldvary sequences we can change this.

This change re-enables ldvary pipelining upon finding the next
ldvary in the program in the hopes that we can continue pipelining
succesfully. To do this, we track the number of ldvary instructions we
emitted so far and compare that to the number of inputs in the fragment
shader we are scheduling. This also allows us to simplify our ldvary
tracking at nir to vir time, since that is all now handled in the QPU
scheduler.

total instructions in shared programs: 13817048 -> 13810783 (-0.05%)
instructions in affected programs: 810114 -> 803849 (-0.77%)
helped: 4843
HURT: 591
Instructions are helped.

total max-temps in shared programs: 2326612 -> 2326300 (-0.01%)
max-temps in affected programs: 4689 -> 4377 (-6.65%)
helped: 285
HURT: 7
Max-temps are helped.

total sfu-stalls in shared programs: 30942 -> 30865 (-0.25%)
sfu-stalls in affected programs: 207 -> 130 (-37.20%)
helped: 120
HURT: 42
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 13847990 -> 13841648 (-0.05%)
inst-and-stalls in affected programs: 825378 -> 819036 (-0.77%)
helped: 4899
HURT: 590
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9404>
2021-03-05 10:32:19 +01:00
Iago Toral Quiroga c3732ac0d0 broadcom/compiler: be more aggressive skipping unifa writes
We had an optimization in place to skip a unifa write if the address
happens to be right after the last ldunifa read address, but we can
take this further and update the unifa address by emitting ldunifa
instructions if needed to skip a unifa write that is close enough.
This is because a unifa write involves 4 cycles: 1 for the write
and 3 delay slots before we can emit the first ldunifa.

So if we have code like this:

unifa addr + 0
ldunifa.r0
unifa addr + 12
ldunifa.r1

In practice we end up with QPU like this:

unifa addr + 0
nop
nop
nop
ldunifa.r0
unifa addr + 12
nop
nop
nop
ldunifa.r1

And with this patch we get:

unifa addr + 0
nop
nop
nop
ldunifa.r0  <--- reads offset 0
ldunifa.-   <--- reads offset 4
ldunifa.-   <--- reads offset 8
ldunifa.r1  <--- reads offset 12

Of course, QPU scheduling might find ways to fill the NOPs to some
extent and remove some of the gains, but generally speaking, this is
still usually a win.

Going by shader-db results, allowing the next unifa address to be up
to 12 bytes after the address resulting from the last ldunifa read
shows the best results:

total instructions in shared programs: 13817048 -> 13812202 (-0.04%)
instructions in affected programs: 602701 -> 597855 (-0.80%)
helped: 1750
HURT: 760
Instructions are helped.

total uniforms in shared programs: 3795485 -> 3793200 (-0.06%)
uniforms in affected programs: 43930 -> 41645 (-5.20%)
helped: 898
HURT: 0
Uniforms are helped.

total max-temps in shared programs: 2326612 -> 2326621 (<.01%)
max-temps in affected programs: 651 -> 660 (1.38%)
helped: 10
HURT: 21
Inconclusive result (value mean confidence interval includes 0).

total sfu-stalls in shared programs: 30942 -> 30906 (-0.12%)
sfu-stalls in affected programs: 627 -> 591 (-5.74%)
helped: 186
HURT: 158
Inconclusive result (value mean confidence interval includes 0).

total inst-and-stalls in shared programs: 13847990 -> 13843108 (-0.04%)
inst-and-stalls in affected programs: 601404 -> 596522 (-0.81%)
helped: 1747
HURT: 757
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9384>
2021-03-04 09:00:15 +01:00
Iago Toral Quiroga 2897a83ff8 broadcom/compiler: drop the destination for unused ldunifa
We can't remove unused ldunifa that are not the first or last
in a sequence, but we can still ignore their destination
to reduce register pressure.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9384>
2021-03-04 09:00:15 +01:00
Iago Toral Quiroga acbd4881c2 broadcom/compiler: ldvary pipelining tracking and documentation clean-ups
Now that we can pipeline all varyings we should not be referring
specifically to smooth varyings anywhere.

Also, rename the instruction field 'ldvary_pipelining' to
'is_ldvary_sequence', which is more appropriate, since we always
set this for any instruction involved with varying setups,
independently of whether they end up being pipelined or not.

This also does some other minor edits which intend to slightly
simplify the code and make it a bit more compact.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9363>
2021-03-02 13:54:14 +01:00
Iago Toral Quiroga 05f8efbc2c broadcom/compiler: allow pipelining of flat and noperspective varyings
These end up having a NOP between the ldvary and the next instruction
in the sequence (a MOV for flat and an FADD for noperspetive):

nop                  ; nop               ; ldvary.r0
nop                  ; nop
fadd  rf6, r0, r5    ; nop               ; ldvary.r1
nop                  ; nop
fadd  rf5, r1, r5    ; nop               ; ldvary.r2
nop                  ; nop
fadd  rf4, r2, r5    ; nop               ; ldvary.r3

To pipeline these, we can reuse the same infrastructure we have in
place for smooth varyings but we need to avoid breaking the sequence
due to the NOP instruction. We do that by testing if dropping the
sequence when we failed to pick up the next instruction also fails
to choose an instruction.

This is not perfect, because we may be able to choose an instruction
outside the sequence such as an ldunif, and use that to break a
sequence that we could otherwise continue after scheduling the NOP
instruction, but it is still better than nothing.

total instructions in shared programs: 13820690 -> 13819774 (<.01%)
instructions in affected programs: 64026 -> 63110 (-1.43%)
helped: 479
HURT: 62
Instructions are helped.

total max-temps in shared programs: 2326435 -> 2326423 (<.01%)
max-temps in affected programs: 102 -> 90 (-11.76%)
helped: 7
HURT: 0
Max-temps are helped.

total sfu-stalls in shared programs: 30683 -> 30710 (0.09%)
sfu-stalls in affected programs: 13 -> 40 (207.69%)
helped: 2
HURT: 24
Sfu-stalls are HURT.

total inst-and-stalls in shared programs: 13851373 -> 13850484 (<.01%)
inst-and-stalls in affected programs: 62818 -> 61929 (-1.42%)
helped: 466
HURT: 65
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
2021-03-02 07:56:00 +01:00
Iago Toral Quiroga 1784dd22a3 broadcom/compiler: pipeline smooth ldvary sequences
Typically, we would schedule smooth varyings like this:

nop                  ; nop               ; ldvary.r4
nop                  ; fmul  r0, r4, rf0
fadd  rf13, r0, r5   ; nop               ; ldvary.r1
nop                  ; fmul  r2, r1, rf0
fadd  rf12, r2, r5   ; nop               ; ldvary.r3
nop                  ; fmul  r4, r3, rf0
fadd  rf11, r4, r5   ; nop               ; ldvary.r0

where we pair up an ldvary with the fadd of the previous sequence
instead of the previous fmul. This is because ldvary has an implicit
write to r5 which is read by the fadd of the previous sequence, so
our dependency tracking doesn't allow us to move the ldvary before the
fadd, however, the r5 write of the ldvary instruction happens in the
instruction after it is emitted so we can actually move it to the fmul
and the r5 write would still happen in the same instruction as the fadd,
which is fine.

This patch allows us to pipeline these sequences optimally. For that,
after merging an ldvary into a previous instruction in the middle of
a pipelineable ldvary sequence, we check if we can manually move it
to the last scheduled instruction instead (the one before the
instruction we are currently scheduling).

If we are successful at moving the ldvary to the previous instruction,
then we flag the ldvary as scheduled immediately, which may promote
its children (the follow-up fmul instruction for that ldvary) to DAG
heads and continue the merge loop so that fmul can be picked and
merged into the final fadd of the previous sequence (where we had
originally merged the ldvary). This leads to a result that looks like
this:

nop                  ; nop               ; ldvary.r4
nop                  ; fmul  r0, r4, rf0 ; ldvary.r1
fadd  rf13, r0, r5   ; fmul  r2, r1, rf0 ; ldvary.r3
fadd  rf12, r2, r5   ; fmul  r4, r3, rf0 ; ldvary.r0

Shader-db results:

total instructions in shared programs: 14071591 -> 13820690 (-1.78%)
instructions in affected programs: 7809692 -> 7558791 (-3.21%)
helped: 41209
HURT: 4528
Instructions are helped.

total max-temps in shared programs: 2335784 -> 2326435 (-0.40%)
max-temps in affected programs: 84302 -> 74953 (-11.09%)
helped: 4561
HURT: 293
Max-temps are helped.

total sfu-stalls in shared programs: 31537 -> 30683 (-2.71%)
sfu-stalls in affected programs: 3551 -> 2697 (-24.05%)
helped: 1713
HURT: 750
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 14103128 -> 13851373 (-1.79%)
inst-and-stalls in affected programs: 7820726 -> 7568971 (-3.22%)
helped: 41411
HURT: 4535
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
2021-03-02 07:56:00 +01:00
Iago Toral Quiroga 1d021539a2 broadcom/compiler: track pipelineable ldvary sequences
If we have two (or more) smooth varyings like this:

nop t3; ldvary.rf0
fmul t5, t3, t0
fadd t6, t5, r5
nop t7; ldvary.rf0
fmul t9, t7, t0
fadd t10, t9, r5
nop t11; ldvary.rf0
fmul t13, t11, t0
fadd t14, t13, r5

We may be able to pipeline them like this:

nop                  ; nop               ; ldvary.r4
nop                  ; fmul  r0, r4, rf0 ; ldvary.r1
fadd  rf13, r0, r5   ; fmul  r2, r1, rf0 ; ldvary.r3
fadd  rf12, r2, r5   ; fmul  r4, r3, rf0 ; ldvary.r0

But in order to do this, we will need to manually tweak the
QPU scheduling.

This patch tracks information about ldvary sequences that are
good candidates for pipelining, and a follow-up patch will
use this information to pipeline them when we emit the QPU
code.

v2 (apinheiro):
  - Rename the v3d_compile fields to avoid confusion with the qinst fields.
  - Assert that a sequence's start instruction is not the same as the end.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
2021-03-02 07:56:00 +01:00
Iago Toral Quiroga c2c2cdc3d3 broadcom/compiler: fix indentation style
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
2021-03-02 07:56:00 +01:00
Iago Toral Quiroga b41edee879 broadcom/compiler: fix DAG pre-remove for merged instructions
When selecting an instruction to merge, we want to pre-remove that
instruction from the DAG, not the one we are merging it in, which
we had already pre-removed right before.

The reason this was not causing problems before is that the
consequence of this bug is we will choose the same instruction
again in the merge loop and trying to merge that instruction twice
will fail and we would break out of the merge loop and move on.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
2021-03-02 07:56:00 +01:00
Eric Anholt 60573b443b v3d: Replace driver lowering of GL_CLAMP with mesa/st's.
Mesa core can do this logic for us now.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9228>
2021-02-24 18:03:46 +00:00
Iago Toral Quiroga b17ec53c81 broadcom/compiler: use nir_opt_sink
total instructions in shared programs: 14072341 -> 14062334 (-0.07%)
instructions in affected programs: 1996685 -> 1986678 (-0.50%)
helped: 3038
HURT: 2432
Instructions are helped.

total uniforms in shared programs: 3797720 -> 3794523 (-0.08%)
uniforms in affected programs: 191711 -> 188514 (-1.67%)
helped: 831
HURT: 449
Uniforms are helped.

total max-temps in shared programs: 2340632 -> 2335124 (-0.24%)
max-temps in affected programs: 113632 -> 108124 (-4.85%)
helped: 2728
HURT: 436
Max-temps are helped.

total spills in shared programs: 6050 -> 5931 (-1.97%)
spills in affected programs: 2869 -> 2750 (-4.15%)
helped: 14
HURT: 4

total fills in shared programs: 13970 -> 13371 (-4.29%)
fills in affected programs: 8831 -> 8232 (-6.78%)
helped: 14
HURT: 4

total inst-and-stalls in shared programs: 14103668 -> 14093712 (-0.07%)
inst-and-stalls in affected programs: 2004035 -> 1994079 (-0.50%)
helped: 3009
HURT: 2426
Inst-and-stalls are helped.

LOST:   0
GAINED: 10

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9209>
2021-02-24 08:02:00 +01:00
Iago Toral Quiroga 54c17e45ae broadcom/compiler: skip unnecessary unifa writes
If a new UBO load happens to read exactly at the offset right after the
previous UBO load (something that is fairly common, for example when
reading a matrix), we can skip the unifa write (with its 3 delay slots)
and just continue to call ldunifa to continue reading consecutive addresses.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>
2021-02-23 08:08:01 +00:00
Iago Toral Quiroga e1cf2406da broadcom/compiler: add a constant alu optimization pass
Currently this is useful to clean up after DCEing leading ldunifa
instructions, but it can be expanded to handle more cases which
may allow to simplify the compiler code in places where we have
been trying to optimize manually for similar cases.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>
2021-02-23 08:08:01 +00:00
Iago Toral Quiroga 89de085055 broadcom/compiler: remove unused leading ldunifa
This requires that we go back to the unifa write and update the address
to jump over the unused leading component.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>
2021-02-23 08:08:01 +00:00
Iago Toral Quiroga 9d16d2d0be broadcom/compiler: allow dead code elimination of unused trailing ldunifa
If a ldunifa is the last in a sequence and is not used, we can safely
eliminate it.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>
2021-02-23 08:08:01 +00:00
Iago Toral Quiroga e20ae14978 broadcom/compiler: fix ldunif optimization
When we look back for a previous uniform definition we want to
start looking from the current position of the cursor, not the
end of the current block. The latter only works when translating
from NIR, since in that case both always match, but any optimization
pass may rewrite code and emit uniforms at any place in the middle of
the program.

Also, ntq_store_dest expects result to be written by the last instruction
to handle the case where it is stored to a NIR register. That won't be
the case if the result comes from an optimized uniform, so in that case
we need to insert a MOV, like we do in non-uniform control flow.

v2: fix ntq_store_dest for optimized uniforms.

Fixes: 14af7b3085 ('broadcom/compiler: don't emit redundant ldunif')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>
2021-02-23 08:08:01 +00:00
Iago Toral Quiroga 064b846949 broadcom/compiler: don't dump shader-db stats for failed shaders
Shaders that fail register allocation were dumped with an instruction
count of 0, so getting them to compile would show up as an instruction
count regression. Also, the LOST/GAINED stats depend on us not dumping
data for failed shaders, which is why we were always seeing 0/0 there.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>
2021-02-17 09:01:02 +01:00
Iago Toral Quiroga df6c19c1fd broadcom/compiler: use a helper function to decide on TMU spilling
As we add more compiler optimizations that can increase register pressure
we may decide to disallow TMU spilling in more cases so it is probably
better to move this to its own helper function.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>
2021-02-17 09:01:02 +01:00
Iago Toral Quiroga 14af7b3085 broadcom/compiler: don't emit redundant ldunif
If we emit a new uniform and that uniform has already been emitted
in the same block we can just reuse that.

There is a balancing game here between reducing ldunif instructions
and not increasing register pressure too much though, so we put
a limit to how far back we are willing to look for a previous
definition of the uniform. Based on shader-db results, 20 instructions
produces best results.

total instructions in shared programs: 14928266 -> 14907432 (-0.14%)
instructions in affected programs: 6431841 -> 6411007 (-0.32%)
helped: 15270
HURT: 10772
Instructions are helped.

total uniforms in shared programs: 3944672 -> 3840276 (-2.65%)
uniforms in affected programs: 1827184 -> 1722788 (-5.71%)
helped: 30423
HURT: 845
Uniforms are helped.

total inst-and-stalls in shared programs: 14957813 -> 14936873 (-0.14%)
inst-and-stalls in affected programs: 6475349 -> 6454409 (-0.32%)
helped: 15287
HURT: 10852
Inst-and-stalls are helped.

v2 (Eric):
 - consider ldunifrf too
 - check that no other instruction writes to the register

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>
2021-02-17 09:01:01 +01:00
Arcady Goldmints-Orlov 7f61ff7b4d broadcom/compiler: Merge instructions more efficiently
Instructions are allowed to access up to two rf registers, or one rf
register and a small immediate. This change allows qpu_merge_inst to
take full advantage of this by allowint the merging of two instructions
if they have no more than two different rf registers between them,
or one rf register and one small immediate. qpu_merge_inst rewrites
the instructions as needed to pack everything into raddr_a and raddr_b
in the merged instruction.

shader-db stats:
total instructions in shared programs: 19938769 -> 18929664 (-5.06%)
instructions in affected programs: 17929438 -> 16920333 (-5.63%)
helped: 95008
HURT: 242
helped stats (abs) min: 1 max: 785 x̄: 10.62 x̃: 7
helped stats (rel) min: 0.30% max: 21.25% x̄: 5.37% x̃: 4.98%
HURT stats (abs)   min: 1 max: 2 x̄: 1.10 x̃: 1
HURT stats (rel)   min: 0.30% max: 3.12% x̄: 1.62% x̃: 1.54%
95% mean confidence interval for instructions value: -10.67 -10.52
95% mean confidence interval for instructions %-change: -5.37% -5.33%
Instructions are helped.

total max-temps in shared programs: 3122664 -> 3112446 (-0.33%)
max-temps in affected programs: 124881 -> 114663 (-8.18%)
helped: 5445
HURT: 0
helped stats (abs) min: 1 max: 15 x̄: 1.88 x̃: 1
helped stats (rel) min: 1.49% max: 40.54% x̄: 8.97% x̃: 6.67%
95% mean confidence interval for max-temps value: -1.91 -1.84
95% mean confidence interval for max-temps %-change: -9.12% -8.81%
Max-temps are helped.

total sfu-stalls in shared programs: 38028 -> 41231 (8.42%)
sfu-stalls in affected programs: 6053 -> 9256 (52.92%)
helped: 664
HURT: 3380
helped stats (abs) min: 1 max: 2 x̄: 1.04 x̃: 1
helped stats (rel) min: 9.09% max: 100.00% x̄: 70.81% x̃: 100.00%
HURT stats (abs)   min: 1 max: 4 x̄: 1.15 x̃: 1
HURT stats (rel)   min: 0.00% max: 300.00% x̄: 46.39% x̃: 25.00%
95% mean confidence interval for sfu-stalls value: 0.76 0.82
95% mean confidence interval for sfu-stalls %-change: 25.03% 29.26%
Sfu-stalls are HURT.

total inst-and-stalls in shared programs: 19976797 -> 18970895 (-5.04%)
inst-and-stalls in affected programs: 17963129 -> 16957227 (-5.60%)
helped: 95017
HURT: 245
helped stats (abs) min: 1 max: 785 x̄: 10.59 x̃: 7
helped stats (rel) min: 0.30% max: 21.25% x̄: 5.35% x̃: 4.95%
HURT stats (abs)   min: 1 max: 2 x̄: 1.09 x̃: 1
HURT stats (rel)   min: 0.30% max: 3.12% x̄: 1.61% x̃: 1.54%
95% mean confidence interval for inst-and-stalls value: -10.64 -10.48
95% mean confidence interval for inst-and-stalls %-change: -5.35% -5.31%
Inst-and-stalls are helped.

v2 (Iago):
 - moved early return for naddrs > 2 even earlier.
 - only update {add,mul}.b mux if instruction has more than one operand.
 - don't OR b->raddr_{a,b} if we are not merging add/mul instructions.
 - don't initialize packed to 0.
 - minor style fixes.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9026>
2021-02-16 11:46:31 +00:00
Iago Toral Quiroga 82981ccbb1 broadcom/compiler: use unifa for UBO loads from uniform addresses
This basically processes UBO loads as uniform loads by writing
the load address to the unifa register and reading sequential
values with ldunifa.

This process is faster than going through the TMU, but we can only
use it when the address we are reading from is uniform across all
channels, since we are basically reading from the UBO address
as if it was a uniform stream.

This leads to better performance in the UE4 Shooter demo.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:22 +00:00
Iago Toral Quiroga 878555976e broadcom/compiler: emit ldunifarf when needed
Just like ldunif and ldunifrf, ldunifa writes to the r5 accumulator
and ldunifarf writes to the register file.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga c2a04aca48 broadcom/compiler: do not DCE ldunifa
ldunifa reads a uniform from the unifa address and updates the unifa
address implicitly, so if we dead-code-eliminate one a follow-up
ldunifa will not read from the appropriate address.

We could avoid this if the compiler ensures that every ldunifa is
paired with an explicit unifa, so for example if we are reading a
vec4, we could emit:

unifa (addrr)
ldunifa
unifa (addr+4)
ldunifa
unifa (addr+8)
ldunifa
unifa (addr+12)
ldunifa

instead of:

unifa (addr)
ldunifa
ldunifa
ldunifa
ldunifa

But since each unifa has a 3 delay slot before we can do ldunifa,
that would end up being quite expensive.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga efc75e13ea broadcom/compiler: disallow reading two uniforms in the same instruction
The simulator asserts on this, which can happen if we merge a ldunif
(or any other instruction that reads a uniform implicitly) and
ldunifa in the same instruction.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga e8e4bdae8d broadcom/compiler: ensure 3-slot delay between unifa and ldunifa
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga 42880fdf5d broadcom/compiler: preserve ordering of unifa/ldunifa sequences
unifa writes the addresss from which follow-up ldunifa loads,
and each ldunifa increments the unifa addeess by 32-bit so the
loads need to be ordered too.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga 97c078488f broadcom/compiler: disallow unifa overlap with thread switch/end
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga 4b929ae9f0 broadcom/compiler: don't check for GFXH-1633 on V3D 4.2.x
This has been fixed since V3D 4.2.14 (Rpi4), which is the hardware
we are targetting. Our version resolution doesn't allow us to check
for 4.2 versions lower than .14, but that is okay because the
simulator would still validate this in any case.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga 457ed5aa01 broadcom/compiler: name registers correctly based on V3D version
So we can differentiate between TMU for V3D 4.x and UNIFA for V3D 4.x,
which are aliased.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Iago Toral Quiroga f85fcaa494 broadcom/compiler: pass a devinfo to check if an instruction writes to TMU
V3D 3.x has V3D_QPU_WADDR_TMU which in V3D 4.x is V3D_QPU_WADDR_UNIFA
(which isn't a TMU write address). This change passes a devinfo to
any functions that need to do these checks so we can account for the
target V3D version correctly.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>
2021-02-12 08:24:21 +00:00
Arcady Goldmints-Orlov 9909fe6bac broadcom/compiler: Skip bool_to_cond where possible
This change keeps track of when a boolean temp is loaded into the flags
by a comparison instruction and uses that information to skip emitting
instructions to set the flags in ntq_emit_bool_to_cond when the flags
already have the right contents.

total instructions in shared programs: 11116502 -> 11112225 (-0.04%)
instructions in affected programs: 631691 -> 627414 (-0.68%)
helped: 1591
HURT: 754
helped stats (abs) min: 1 max: 94 x̄: 4.14 x̃: 3
helped stats (rel) min: 0.11% max: 13.46% x̄: 2.10% x̃: 1.58%
HURT stats (abs)   min: 1 max: 19 x̄: 3.07 x̃: 2
HURT stats (rel)   min: 0.13% max: 19.67% x̄: 1.88% x̃: 1.15%
95% mean confidence interval for instructions value: -2.02 -1.63
95% mean confidence interval for instructions %-change: -0.94% -0.71%
Instructions are helped.

total uniforms in shared programs: 3281555 -> 3281513 (<.01%)
uniforms in affected programs: 1754 -> 1712 (-2.39%)
helped: 10
HURT: 5
helped stats (abs) min: 1 max: 19 x̄: 7.90 x̃: 5
helped stats (rel) min: 0.56% max: 11.11% x̄: 7.37% x̃: 11.05%
HURT stats (abs)   min: 1 max: 15 x̄: 7.40 x̃: 3
HURT stats (rel)   min: 0.64% max: 9.55% x̄: 5.31% x̃: 3.41%
95% mean confidence interval for uniforms value: -8.57 2.97
95% mean confidence interval for uniforms %-change: -7.35% 1.07%
Inconclusive result (value mean confidence interval includes 0).

total max-temps in shared programs: 1758419 -> 1758174 (-0.01%)
max-temps in affected programs: 7006 -> 6761 (-3.50%)
helped: 290
HURT: 14
helped stats (abs) min: 1 max: 8 x̄: 1.13 x̃: 1
helped stats (rel) min: 0.79% max: 22.86% x̄: 6.61% x̃: 4.88%
HURT stats (abs)   min: 1 max: 13 x̄: 6.00 x̃: 3
HURT stats (rel)   min: 1.54% max: 54.17% x̄: 23.99% x̃: 9.12%
95% mean confidence interval for max-temps value: -1.03 -0.58
95% mean confidence interval for max-temps %-change: -6.24% -4.16%
Max-temps are helped.

total sfu-stalls in shared programs: 23676 -> 23610 (-0.28%)
sfu-stalls in affected programs: 1578 -> 1512 (-4.18%)
helped: 257
HURT: 252
helped stats (abs) min: 1 max: 3 x̄: 1.37 x̃: 1
helped stats (rel) min: 11.11% max: 100.00% x̄: 46.70% x̃: 40.00%
HURT stats (abs)   min: 1 max: 2 x̄: 1.14 x̃: 1
HURT stats (rel)   min: 0.00% max: 200.00% x̄: 41.65% x̃: 25.00%
95% mean confidence interval for sfu-stalls value: -0.25 -0.01
95% mean confidence interval for sfu-stalls %-change: -8.24% 2.33%
Inconclusive result (%-change mean confidence interval includes 0).

total inst-and-stalls in shared programs: 11140178 -> 11135835 (-0.04%)
inst-and-stalls in affected programs: 633972 -> 629629 (-0.69%)
helped: 1581
HURT: 755
helped stats (abs) min: 1 max: 94 x̄: 4.26 x̃: 3
helped stats (rel) min: 0.11% max: 13.46% x̄: 2.12% x̃: 1.59%
HURT stats (abs)   min: 1 max: 17 x̄: 3.17 x̃: 2
HURT stats (rel)   min: 0.05% max: 19.67% x̄: 1.93% x̃: 1.20%
95% mean confidence interval for inst-and-stalls value: -2.06 -1.66
95% mean confidence interval for inst-and-stalls %-change: -0.93% -0.70%
Inst-and-stalls are helped.

Reviewed-by: Iago Toral Quioroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>
2021-02-12 07:05:33 +00:00
Arcady Goldmints-Orlov 8762f29e9c broadcom/compiler: Add a v3d_compile argument to vir_set_[pu]f
Reviewed-by: Iago Toral Quioroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>
2021-02-12 07:05:33 +00:00
Iago Toral Quiroga bd0ef080d0 v3d/compiler: fix QPU scheduler TMU sequence shuffling
The QPU scheduler allows to move certain TMU instructions around and
since we enabled pipelining, we need to protect against the case where
doing this might break a TMU sequence. For example, this test:

dEQP-VK.rasterization.line_continuity.line-strip

Was generating this VIR:

mov tmud, t187
mov.pushz null, t176
mov.ifa tmua, t9
nop null; wrtmuc (img[0].p0 | 0x0)
mov tmut, t185
mov tmud, t180
mov.ifa tmusf, t183
nop null; thrsw

where we have a general TMU access (tmud,tmua) followed by an image
access (wrtmuc, tmut, tmud, tmusf), which the QPU scheduler was turning
into:

nop            ; nop               ; ldunifrf.rf22 (0xffffff00 / -nan)
nop            ; nop               ; wrtmuc (img[0].p0 | 0x0)
nop            ; nop               ; ldtmu.r2
add  r0, r2, 1 ; nop               ; ldtmu.r3
nop            ; nop               ; ldtmu.r4
nop            ; mov  tmud, r0
nop            ; mov.ifa  tmua, rf15
nop            ; mov  tmut, r4     ; thrsw
nop            ; mov  tmud, rf22
nop            ; mov.ifa  tmusf, r3

where it allowed the wrtmuc to move up and before the general TMU access,
leading to an incorrect TMU sequence.

Fix this by flagging TMUA writes (which are the sequence terminators for
general TMU accessess) as writing new TMU configuration, like we do for all
other TMU sequence terminators for textures and images.

Fixes: 197090a3fc ('broadcom/compiler: implement pipelining for general TMU operations')

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8954>
2021-02-10 13:18:25 +00:00
Eric Anholt bcb5f9f94a v3d: Stop advertising support for flat shading.
The GL frontend can lower this weird GL feature away for us.  This should
fix redeclaration of the gl_Color/SecondaryColor as centroid, since that
case had been missed in the !flat special case here.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>
2021-02-09 20:06:48 -08:00
Eric Anholt ff805f8ac7 v3d: Stop advertising support for PIPE_CAP_*_COLOR_CLAMPED.
The GL frontend can lower away this deprecated GL feature for us.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>
2021-02-09 20:06:48 -08:00
Eric Anholt 2992dc7386 v3d: Stop advertising support for PIPE_CAP_TWO_SIDED_COLOR.
The GL frontend can lower away this deprecated GL feature for us.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>
2021-02-09 20:06:48 -08:00
Eric Anholt 5ddc2f916f v3d: Clean up vestiges of alpha test lowering.
We had an unnecessary case in our uniforms upload switch statement, since
we no longer advertise the cap.

Fixes: 8ad931808e ("v3d: do not report alpha-test as supported")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>
2021-02-09 20:06:48 -08:00
Arcady Goldmints-Orlov 0b29a8a206 Revert "broadcom/compiler: improve generation of if conditions"
This reverts commit 93f8f83a95.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8903>
2021-02-08 06:52:59 +00:00
Iago Toral Quiroga 6630825dcf broadcom/compiler: let QPUs stall on TMU input/config overflows
We have been trying to avoid this by tracking fifo usages in the driver and
flushing all outstanding TMU sequences if we overflowed any of these, however,
this is actually not the most efficient strategy. Instead, we would like to
flush only enough operations to get things going again, which is better for
pipelining. Doing that in the driver would require some additional work, but
thankfully, it is not required, since this seems to be what the hardware does
automatically, so we can just remove overflow tracking for these two fifos
and enjoy the benefits.

This also further improves shader-db stats:

total instructions in shared programs: 8975062 -> 8955145 (-0.22%)
instructions in affected programs: 1637624 -> 1617707 (-1.22%)
helped: 4050
HURT: 2241
Instructions are helped.

total threads in shared programs: 236802 -> 237042 (0.10%)
threads in affected programs: 252 -> 492 (95.24%)
helped: 122
HURT: 2
Threads are helped.

total sfu-stalls in shared programs: 19901 -> 19592 (-1.55%)
sfu-stalls in affected programs: 4744 -> 4435 (-6.51%)
helped: 1248
HURT: 1051
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 8994963 -> 8974737 (-0.22%)
inst-and-stalls in affected programs: 1636184 -> 1615958 (-1.24%)
helped: 4050
HURT: 2239
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Iago Toral Quiroga d57a358128 broadcom/compiler: log spilling shaders to perf output
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Iago Toral Quiroga 0f90b729fb broadcom/compiler: disallow spilling if TMU pipelining was enabled
TMU pipelining makes TMU spilling difficult and can easily lead to
doing large amounts of spills to compile a shader. It is best to
only use pipelining if we can compile without spilling.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Iago Toral Quiroga e18d6bbf2f broadcom/compiler: disable TMU pipelining if we fail to register allocate
TMU pipelining can severely reduce our capacity to emit TMU spills,
causing us to fail to compile a shader we may otherwise be able to
compile. This is because pipelining extends the liveness of TMU
sequences by posponing the thread switch and LDTMU until a result
is needed, and we can't emit TMU spills while in the middle of a
TMU sequence.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Iago Toral Quiroga ecd654bf00 broadcom/compiler: support pipelining of image load/store instructions
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Iago Toral Quiroga 0bdc6dca6c broadcom/compiler: refactor image load/store TMU emission code
This mostly moves code around to group together the code involved with
actually emitting a TMU sequence. This will make it a bit easier to
then implement pipelining while reusing this code, similar to how we
handled other cases of TMU pipelining.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Iago Toral Quiroga be45960d3e broadcom/compiler: support pipelining of tex instructions
This follows the same idea as for TMU general instructions of reusing
the existing infrastructure to first count required register writes and
flush outstanding TMU dependencies, and then emit the actual writes, which
requires that we split the code that decides about register writes to
a helper.

We also need to start using a component mask instead of the number
of components that we need to read with a particular TMU operation.

v2: update tmu_writes for V3D_QPU_WADDR_TMUOFF

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Iago Toral Quiroga 197090a3fc broadcom/compiler: implement pipelining for general TMU operations
This creates the basic infrastructure to implement TMU pipelining and
applies it to general TMU. Follow-up patches will expand this
to texture and image/load store operations.

TMU pipelining means that we don't immediately end TMU sequences,
and instead, we postpone the thread switch and LDTMU (for loads)
or TMUWT (for stores) until we really need to do them.

For loads, we may need to flush them if another instruction reads
the result of a load operation. We can detect this because in that
case ntq_get_src() will not find the definition for that ssa/reg
(since we have not emitted the LDTMU instructions for it yet), so
when that happens, we flush all pending TMU operations and then
try again to find the definition for the source.

We also need to flush pending TMU operations when we reach the end
of a control flow block, to prevent the case where we emit a TMU
operation in a block, but then we read the result in another block
possibly under control flow.

It is also required to flush across barriers and discards to honor
their semantics.

Since this change doesn't implement pipelining for texture and
image load/store, we also need to flush outstanding TMU operations
if we ever have to emit one of these. This will be corrected with
follow-up patches.

Finally, the TMU has 3 fifos where it can queue TMU operations.
These fifos have limited capacity, depending on the number of threads
used to compile the shader, so we also need to ensure that we
don't have too many outstanding TMU requests and flush pending
TMU operations if a new TMU operation would overflow any of these
fifos. While overflowing the Input and Config fifos only leads
to stalls (which we want to avoid anyway), overflowing the Output
fifo is incorrect and would end up with a broken shader. This means
that we need to know how many TMU register writes are required
to emit a TMU operation and use that information to decide if we need
to flush pending TMU operations before we emit any register
writes for the new TMU operation.

v2: fix TMU flushing for NIR registers reads (jasuarez)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Iago Toral Quiroga 0e96f0f8cd broadcom/compiler: prepare TMU spilling code to account for TMU pipelining
Follow-up patches will implement support for TMU pipelining in the
compiler, which basically means that we will be able to have more
than one outstanding TMU operation.

Our spilling code currently relies on properly identifying the end
of a TMU sequence (since we can't emit a new TMU sequence for a spill
in the middle of an existing TMU sequence), however, that code expects
that only one TMU sequence may be outstanding, which won't be true
once we implement pipelining.

This change fixes the 'end of TMU sequence' checks to account for this
in preparation for upcoming patches.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Iago Toral Quiroga 3926030183 broadcom/compiler: fix indentation with TABs
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
2021-02-04 10:33:10 +00:00
Arcady Goldmints-Orlov 93f8f83a95 broadcom/compiler: improve generation of if conditions
Where it is safe to do so, avoid the generation of code to convert a
condition code into a boolean which is then tested to generate a
condition code. This is only done in uniform ifs, and only for condition
values that are SSA and only used once (in that if statement).

shader-db relative to MR 7726:

total instructions in shared programs: 8985667 -> 8974151 (-0.13%)
instructions in affected programs: 390140 -> 378624 (-2.95%)
helped: 810
HURT: 276
helped stats (abs) min: 1 max: 49 x̄: 17.77 x̃: 16
helped stats (rel) min: 0.10% max: 33.63% x̄: 7.97% x̃: 6.45%
HURT stats (abs)   min: 1 max: 46 x̄: 10.42 x̃: 10
HURT stats (rel)   min: 0.16% max: 21.54% x̄: 2.26% x̃: 2.03%
95% mean confidence interval for instructions value: -11.46 -9.75
95% mean confidence interval for instructions %-change: -5.76% -4.97%
Instructions are helped.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8709>
2021-02-02 06:55:49 +00:00
Arcady Goldmints-Orlov 8f583df7b6 broadcom/compiler: Enable PER_QUAD TMU access only in uniform control flow
PER_QUAD TMU lookups will partially override the predication mask on TMU
writes. If some but not all lanes in a quad are predicated out, setting
PER_QUAD will force them all to be enabled. This can result in TMU
access to bogus addresses when in nonuniform control flow. Also, since
PER_QUAD is needed to make sure derivatives work with helper
invocations, and derivatives are undefined in nonuniform control flow,
there is no reason to leave it enabled in this case.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>
2021-02-01 08:11:48 +00:00
Arcady Goldmints-Orlov 79bde75131 broadcom/compiler: Emit uniform loops using uniform control flow
Similarly to if statements, uniform loops are now emitted without
predication, using simple branches for breaks and continues. The
uniformity of the loop is determined by running the
nir_divergence_analysis pass.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>
2021-02-01 08:11:48 +00:00
Arcady Goldmints-Orlov 6643bdbd53 broadcom/compiler: Use ANYA for branches in uniform ifs
Using ANYAP instead of ALLAP makes things work correctly in cases
where all lanes are masked out.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>
2021-02-01 08:11:48 +00:00
Caio Marcelo de Oliveira Filho 9f3d5e99ea compiler: Use util/bitset.h for system_values_read
It is currently a bitset on top of a uint64_t but there are already
more than 64 values.  Change to use BITSET to cover all the
SYSTEM_VALUE_MAX bits.

Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8585>
2021-01-26 20:20:47 +00:00
Alejandro Piñeiro 212b1516df v3d/compiler: enable lower_add_sat NIR option
We are enabling this option for the Vulkan driver, so it makes sense
to enable it for the OpenGL one.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8582>
2021-01-20 12:41:52 +00:00
Christian Gmeiner 36e1c902b9 v3d: mark some variables static const
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8438>
2021-01-13 07:24:32 +00:00
Christian Gmeiner 9151dab967 v3d: update fallthrough comments
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8438>
2021-01-13 07:24:32 +00:00
Christian Gmeiner 4ec956a2b0 v3d: drop not use function parameter
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8438>
2021-01-13 07:24:32 +00:00
Daniel Schürmann bd8e84eb8d nir: replace .lower_sub with .has_fsub and .has_isub
This allows a more fine-grained control about whether
a backend supports one of these instructions.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6597>
2021-01-11 19:13:51 +00:00
Christian Gmeiner 66d51965af v3d: use intrinsic builders
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8295>
2021-01-06 14:34:41 +00:00
Rob Clark 790144e65a util+treewide: container_of() cleanup
Replace mesa's slightly different container_of() with one more aligned
to the linux kernel's version which takes a type as the 2nd param.  This
avoids warnings like:

  freedreno_context.c:396:44: warning: variable 'batch' is uninitialized when used within its own initialization [-Wuninitialized]

At the same time, we can add additional build-time type-checking asserts

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7941>
2020-12-10 16:48:36 +00:00
Jason Ekstrand 630e54a08b nir: Add a halt instruction type
Halt is like a return for the entire shader or exit() if you prefer to
think of it that way.  Once an invocation hits a halt, it's 100% dead.
Any writes to output variables which happened before the halt do,
however, still apply.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
2020-11-25 05:37:09 +00:00
Alejandro Piñeiro 429c336412 broadcom/compiler: separate texture/sampler info from v3d_key
So far the v3d compiler has them combined, as for OpenGL both are the
same. This change is intended to fit the v3d compiler better with
Vulkan, where they are separate concepts.

Note that NIR has them separate for a long time, both on nir_variable
and on some NIR lowerings.

v2: (from Iago feedback)
    * Use key->num_tex/sampler_used to iterate through the array
    * Fill up num_samplers_used on v3d, assert that is the same that
      num_tex_used if possible.

v3: (Iago)
    * Assert num_tex/samplers_used is smaller that tex/sampler array size.

v4: Update assert mentioned on v3 to use <= instead of < (detected by CI)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

squash! broadcom/compiler: separate texture/sampler info from v3d_key

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7545>
2020-11-14 15:59:02 +00:00
Arcady Goldmints-Orlov a1a365e818 broadcom/compiler: Allow spills of temporaries from TMU reads
Since spills and fills use the TMU, special care has to be taken to
avoid putting one between a TMU setup instruction and the corresponding
reads or writes. This change adds logic to move fills up and move spills
down to avoid interrupting such sequences.

This allows compiling 6 more programs from shader-db. Other stats:

total spills in shared programs: 446 -> 446 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0

total fills in shared programs: 606 -> 610 (0.66%)
fills in affected programs: 38 -> 42 (10.53%)
helped: 0
HURT: 2

total instructions in shared programs: 19330 -> 19363 (0.17%)
instructions in affected programs: 3299 -> 3332 (1.00%)
helped: 0
HURT: 5

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6606>
2020-11-09 20:45:58 +00:00
Juan A. Suarez Romero 1e723745dd v3d/compiler: extend swapping R/B support to all vertex attributes
So far the support for R/B swapping in vertex attributes were for the
generic attributes.

But there are cases like glSecondaryColorPointer() supporting BGRA
formats that require the R/B swapping to be also allowed in the
non-generic vertex attributes (in this case, in the COLOR1 attribute).

v2:
 - Don't split line (Iago)

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7196>
2020-11-05 12:15:28 +00:00
Arcady Goldmints-Orlov 0b30336906 broadcom/compiler: Handle non-SSA destinations for tex instructions
The NIR that is given to the VIR compiler is not in SSA form, and so
the v3d*_vir_emit_tex() functions must be able to handle both SSA and
register destinations.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7318>
2020-11-05 09:03:46 +00:00
Alejandro Piñeiro 09b2bd1df9 broadcom/compiler: remove v3d_fs_key depth_enabled field.
It is not used right now, so keeping it adds some noise/confusion.

So far configuring Z test are done through the CFG_BITS. See
v3dX(emit_state) at v3dx_emit.c for v3d, and pack_cfg_bits at
v3dv_pipeline.c for v3dv. There flags like z_updates_enable and others
are filled up.

That key field seems like a leftover coming from using vc4 as
reference, as that driver defines and uses a field with name name.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7421>
2020-11-03 10:55:08 +00:00
Iago Toral Quiroga 40788be134 v3d/compiler: fix BGRA vertex attributes for vec2/float size.
We don't natively support BGRA format, instead we handle these
as RGBA and we lower the loads to swap components R and B.

However, the driver emits VPM loads based on the size of the
input variables so when we have a vec2 or float BGRA input,
it would only emit VPM loads for components 0 and 1, which is
not correct since we emit a load of component 2 to swap with
component 0.

v2: handle GL legacy vertex inputs gracefully.

Fixes:
dEQP-VK.draw.output_location.array.b8g8r8a8-unorm-highp-output-vec2
dEQP-VK.draw.output_location.array.b8g8r8a8-unorm-mediump-output-vec2

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7271>
2020-10-23 09:19:02 +02:00
Erik Faye-Lund 8ad931808e v3d: do not report alpha-test as supported
This triggers lowering in the state-tracker, which makes things a bit
simpler.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7251>
2020-10-21 16:33:43 +00:00
Iago Toral Quiroga 442f48f27b v3d/compiler: implement load interpolated input intrinsics
We will lower GLSL interpolateAt functions to these.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7155>
2020-10-15 02:04:04 +02:00
Iago Toral Quiroga 3ec165bce9 broadcom/compiler: track partially interpolated fragment inputs
We will need these to implement GLSL's interpolateAt*() functions where
we are required to perform interpolation in the shader at arbitrary
offsets.

Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7155>
2020-10-15 02:04:04 +02:00
Arcady Goldmints-Orlov e881290979 broadcom/compiler: use nir io semantics
This allows to clean up some code.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6721>
2020-10-14 22:54:58 +00:00
Arcady Goldmints-Orlov ac5f0ee19c broadcom/compiler: support varyings with struct types
This adds support for using structs as outputs from vertex shaders and
inputs to fragment shaders.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6721>
2020-10-14 22:54:58 +00:00
Iago Toral Quiroga 7eb8eb10f6 v3d/compiler: allow to batch spills
Some shaders that need to spill hundreds of registers can take very long times
to compile as each allocation attempt spills a single register and restarts
the allocation process. We can significantly cut down these times if we allow
the compiler to spill in batches, which should be possible if we are spilling
uniforms, which is in fact the kind of spills that we do first because they
have lower cost than TMU spills.

Doing this could cause us to slightly over spill in some cases (depending on
the chosen batch size) leading to slightly worse performance, so we only
enable this behavior after we have started to spill over a certain threshold,
at which point we assume that performance won't be good and we want to
favor compilation speed instead.

v2:
  - Keep it simple and just try to spill a fixed amount of registers in a
    batch instead of trying to compute this dynamically based on accumulated
    spills and current register pressure. (Eric).

v3:
  - Check if the node is valid before doing anything with it.
  - Drop the environment variable to select batch size and just fix it to 20.

With this we can take this CTS test from 35 minutes down to about 3 minutes:
dEQP-VK.ssbo.layout.random.all_shared_buffer.5

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
2020-10-13 21:21:33 +00:00
Iago Toral Quiroga 23c727dd67 v3d/compiler: add a lowering pass for robust buffer access
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
2020-10-13 21:21:33 +00:00
Iago Toral Quiroga 4401dde0e9 broadcom/compiler: rename QUNIFORM_GET_BUFFER_SIZE to QUNIFORM_GET_SSBO_SIZE
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
2020-10-13 21:21:33 +00:00
Iago Toral Quiroga d93d903a37 v3d/compiler: implement nir_intrinsic_get_ubo_size
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
2020-10-13 21:21:33 +00:00
Alejandro Piñeiro 02b9670611 broadcom/compiler: allow GLSL_SAMPLER_DIM_BUF on txs emission
Although we don't support texture buffers on the OpenGL driver, we are
already doing that for the Vulkan driver. This would be needed for the
OpenGL driver in any case.

Fixes following tests on v3dv:
 dEQP-VK.memory.pipeline_barrier.host_write_uniform_texel_buffer.*
 dEQP-VK.memory.pipeline_barrier.transfer_dst_uniform_texel_buffer.*

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
2020-10-13 21:21:33 +00:00
Iago Toral Quiroga 644a15e69e v3dv: implement nir_texop_texture_samples
Fixes:
dEQP-VK.glsl.texture_functions.query.texturesamples.*

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
2020-10-13 21:21:32 +00:00
Iago Toral Quiroga 1c4c7d95f7 broadcom/compiler: track if the fragment shader forces per-sample MSAA
For example, regarding gl_SampleID, the GLSL spec states:

   "Any static use of this variable in a fragment shader causes the
    entire shader to be evaluated per-sample."

So we need to track if the fragment shader does anything that implicitly
enables per-sample shading in the compiler for the driver to
auto-enable sample rate shading if needed.

v2:
 - Instead of tracking reads of gl_SampleID, check SYSTEM_BIT_SAMPLE_ID
   and SYSTEM_BIT_SAMPLE_POS as well as the sample layout qualifier like
   other drivers are doing to activate this behavior (Eric).

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
2020-10-13 21:21:32 +00:00
Iago Toral Quiroga 531ea3596d broadcom/compiler: implement nir_intrinsic_load_sample_pos
This is intended to return the sample location within the pixel.

Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_position.*

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
2020-10-13 21:21:32 +00:00
Iago Toral Quiroga 14d74c07aa broadcom/compiler: handle gl_SampleMask writes in fragment shaders
We didn't need this until now, since this was included with GLES 3.2,
but we need it for Vulkan.

Eric had already done the plumbing for it though, we just need to
actually emit the mask.

Fixes some tests in:
dEQP-VK.renderpass.suballocation.multisample_resolve.*

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
2020-10-13 21:21:32 +00:00
Iago Toral Quiroga 5a2ef59963 v3d/compiler: support swapping R/B channels in vertex attributes.
We will need this in Vulkan to support vertex format
VK_FORMAT_B8G8R8A8_UNORM.  The hardware doesn't allow to swizzle
vertex attribute components, so we need to do it in the shader.

v2:
 - Use nir_intrinsic_io_semantics() to retrieve the location instead
   of looping through the shader input variables (Eric).
 - Assert that we only have one component (Eric).

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
2020-10-13 21:21:31 +00:00
Alejandro Piñeiro b9dd7e30a6 v3d/tex: avoid to ask back for a sampler state if not needed
So far we were not asking the driver for the sampler state if we could
just use the default P1 values. But even if we need to fill P1 (for
example to fill up the output type of the format), if the texture
operation doesn't need a sampler, we can let that field as NULL (so
default values) and avoid calling back the driver for a sampler.

This is not mandatory for OpenGL (as we always have a sampler object),
although still a good to have. For Vulkan this is needed, as we don't
have a sampler object in that case.

v2: reword comment (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
2020-10-13 21:21:31 +00:00
Iago Toral Quiroga f41857eb48 v3d/compiler: implement nir_intrinsic_load_base_instance
Vulkan lowers gl_InstanceIndex to load_base_instance +
load_instance_id, so we need to implement loading the base instance in
the compiler.

The base instance is set by the BASE_VERTEX_BASE_INSTANCE command
right before the instanced draw call and it is included in the VPM
payload together with the InstanceID and VertexID if this is requested
by the shader record.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
2020-10-13 21:21:29 +00:00
Iago Toral Quiroga 1f41a128e0 v3d/compiler: implement nir_op_fquantize2f16
Reviewd-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
2020-10-13 21:21:28 +00:00
Alejandro Piñeiro c8212731e7 v3d/compiler: handle GL/Vulkan differences in uniform handling
This also adds a v3d_execution_environment, so compiler could know if
it is generating code for OpenGL or Vulkan needs.

Reviewed-by: Iago Toral <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
2020-10-13 21:21:27 +00:00
Alejandro Piñeiro 1c8226c682 v3d/compiler: update uses_vid/uses_iid check
In order to take into account the vulkan specific system values
SYSTEM_VALUE_INSTANCE_INDEX and SYSTEM_VALUE_VERTEX_ID_ZERO_BASE.

Reviewed-by: Iago Toral <itoral@igalia.com>
Reviewed-by: Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
2020-10-13 21:21:26 +00:00
Alejandro Piñeiro 62ca997476 v3d/compiler: num_tex_used on v3d_key
We would need on OpenGL to update values for all the textures used. On
OpenGL that value can be always took from the context or the nir
shader, but there are cases on Vulkan that it is not the case, or
would force up to recompute it.

Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
2020-10-13 21:21:25 +00:00
Alejandro Piñeiro 8de380d26a broadcom/compiler: add V3D_DEBUG_RA option
To ask to debug a registr allocation failure
(V3D_DEBUG_REGISTER_ALLOCATION seemed too long to me).

When a fallback register allocation algorithm was added, if the
register allocation fails, it only dumpg the current vir with the
register pressure info with the failed fallback. But if we want do
debug the problem, we would be interested on both.

Additionally, it was strange that we got the full vir dump with the
failure even if no debug option was set.

Additionally we add shaderdb like stats for those failures, to make
easier to compare one and the other.

v2: keep a small warning message in case both register allocation
    algorithms fails (Neil)

Reviewed-by: Neil Roberts <nroberts@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6999>
2020-10-07 20:21:17 +00:00
Jason Ekstrand 0aa08ae2f6 nir: Split NIR_INTRINSIC_TYPE into separate src/dest indices
We're about to introduce conversion ops which are going to want two
different types.  We may as well just split the one we have rather than
end up with three.  There are a couple places where this is mildly
inconvenient but most of the time I find it to actually be nicer.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
2020-10-01 18:36:53 +00:00
Kenneth Graunke 140f53e646 Revert "nir: replace lower_ffma and fuse_ffma with has_ffma"
This reverts commit 939ddf3f67.

Intel has a separate pass for fusing FFMAs selectively.  We split
these flags in commit 1b72c31e1f and
the reasoning still stands.  The patch being reverted was just a
cleanup, so there should be no issue with reverting it.

Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6849>
2020-09-24 13:11:50 -07:00
Marek Olšák 939ddf3f67 nir: replace lower_ffma and fuse_ffma with has_ffma
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6756>
2020-09-24 12:29:11 +00:00
Marek Olšák 771aad3027 nir: split lower_ffma into lower_ffma16/32/64
AMD wants different behavior for each bit size

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6756>
2020-09-24 12:29:11 +00:00
Jason Ekstrand 9750164c09 nir: Rename get_buffer_size to get_ssbo_size
This makes it explicit that this intrinsic is only for SSBOs.  For the
v3dv driver, we'll be adding a get_ubo_size intrinsic and we want to be
able to distinguish between the two.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6812>
2020-09-22 13:34:12 +00:00
Iago Toral Quiroga 3182209673 v3d/compiler: fix V3D double-rounding of .8 fixed-point XY coordinates
Pre-V3D 4.3 hardware has a quirk where it expects XY coordinates in
.8 fixed-point format, but then it will internally round it to .6 fixed-point,
introducing a double rounding. The double rounding can cause very slight
differences in triangle raterization coverage that can actually be noticed by
some CTS tests.

The correct fix for this as recommended by Broadcom is to convert to
.8 fixed-point with ffloor().

Fixes:
dEQP-VK.renderpass.suballocation.subpass_dependencies.late_fragment_tests.*

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6677>
2020-09-11 09:51:33 +02:00
Marek Olšák ac55b1a9a6 nir: get ffma support from NIR options for nir_lower_flrp
This also fixes the inverted last parameter of nir_lower_flrp in most drivers.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6599>
2020-09-04 17:06:22 +00:00
Alejandro Piñeiro 7059708dcd broadcom/qpu_instr: wait is not a read or write vpm instruction
For several schedule restrictions, we are checking if the instruction
is using the vpm. So far it was implemented as being a read or a write
of the vpm. But VPM wait (vpmwt) is not a read or a write (it is a
wait until all pending writes finishes). This is relevant to implement
peripheral accesses restrictions, as for some cases where vpm
read|writes are allowed, vpmwt is not.

Fixes:
  dEQP-VK.binding_model.descriptorset_random.sets8.constant.ubolimitlow.sbolimitlow.sampledimglow.outimgtexlow.noiub.nouab.vert.noia.0

On the sim, as it was raising an assert for wrong peripheral access.

v2: simplify v3d_qpu_waits_vpm (Iago)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6498>
2020-08-31 15:02:42 +02:00
Jesse Natalie d3faac7a15 nir: Add options to nir_lower_compute_system_values to control compute ID base lowering
If no options are provided, existing intrinsics are used.
If the lowering pass indicates there should be offsets used for global
invocation ID or work group ID, then those instructions are lowered to
include the offset.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5891>
2020-08-21 22:07:05 +00:00
Jesse Natalie 2e1df6a17f nir: Move compute system value lowering to a separate pass
The actual variable -> intrinsic lowering stays where it is, but
ops which convert one intrinsic to be implemented in terms of
another have moved.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5891>
2020-08-21 22:07:05 +00:00
Karol Herbst e5899c1e88 nir: rename nir_op_fne to nir_op_fneu
It was always fneu but naming it fne causes confusion from time to time. So
lets rename it. Later we also want to add other unordered and fne, this is
a smaller preparation for that.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6377>
2020-08-21 17:26:21 +00:00
Jason Ekstrand 1ccd681109 nir: Add an LOD parameter to image_*_size
The OpenCL image_width/height/depth functions have variants which can
take an LOD parameter.  More importantly, LLVM-SPIRV-Translator always
generates OpImageQuerySizeLod even if the LOD is guaranteed to be zero.
Given that over half the hardware out there has an LOD field for image
size queries (based on a rudimentary scan through their NIR -> whatever
code), we may as well just add the source to the NIR intrinsic.  If this
is ever a problem for anyone, the lowering is pretty trivial.

I've also added asserts to everyone's drivers that should alert them if
they ever see an LOD other than zero.  This will never happen with GL or
Vulkan so there's no need for panic.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6396>
2020-08-20 20:48:10 +00:00
Arcady Goldmints-Orlov a104902590 broadcom/compiler: Enable PER_QUAD for UBO and SSBO loads.
Helper invocations need to be able to read from UBOs since those values
can be used for flow control, but writes from helper invocations need to
be dropped.

Fixes CTS tests:
  dEQP-VK.glsl.derivate.*.uniform_loop.*

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6356>
2020-08-20 20:14:14 +00:00
Arcady Goldmints-Orlov c3258f927c broadcom/compiler: Add a constant folding pass after nir_lower_io
The nir_lower_io pass produces a bunch of constant arithmetic, and
assumes that constant folding will simplify it away.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6356>
2020-08-20 20:14:14 +00:00
Arcady Goldmints-Orlov bd87cdad18 broadcom/compiler: support nir_intrinsic_load_sample_id
This adds support for the intrinsic as well as the vir_SAMPID
instruction that corresponds to it in vir.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6356>
2020-08-20 20:14:14 +00:00
Alejandro Piñeiro bd38ea77e8 v3d/compiler: add v3dv_prog_data_size helper
Main use case is to help to implement Vulkan PipelineCache, as we are
serializing/deserializing the prog_data too.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6078>
2020-08-19 22:50:21 +02:00
Karol Herbst 025bdbac3e nir: Add goto_if jump instruction
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2401>
2020-08-14 20:35:36 +00:00
Jason Ekstrand feb32f898c nir: Add a nir_foreach_uniform_variable helper
This one's a bit more complex because it filters off only those
variables with mode == nir_var_uniform.  As such, it's not exactly a
drop-in replacement for nir_foreach_variable(var, &nir->uniforms).

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
2020-07-29 17:38:58 +00:00
Jason Ekstrand 2956d53400 nir: Add nir_foreach_shader_in/out_variable helpers
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
2020-07-29 17:38:57 +00:00
Jason Ekstrand 5746af4446 nir: Take a mode in remove_unused_io_vars
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
2020-07-29 17:38:57 +00:00
Iago Toral Quiroga d1677c8f8c v3d/compiler: request fragment shader clip lowering to be vulkan compatible.
Vulkan allows fragment shaders to read gl_ClipDistance[], in which case
the SPIR-V compiler will inject a compact array variable at
VARYING_SLOT_CLIP_DIST0. Request the lowering to always work in terms
of a compact array variable so we don't have to care about the API
in use.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6022>
2020-07-27 08:25:57 +02:00
Iago Toral Quiroga 71d5c19241 v3d/compiler: handle compact varyings
We are going to need this in Vulkan because the SPIR-V compiler
defines clip distances as a single compact array of scalars, so
our compiler needs to know what to do with them.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6022>
2020-07-27 08:25:57 +02:00
Neil Roberts de5130fea0 v3d: Retry with the fallback scheduler when RA fails
v3d_compile is now split out into a helper function that gets called a
second time if compilation fails the first time with the result
reporting the register allocation failed. The second time it is run with
the fallback scheduler to try and increase the chances of successfully
allocating the registers.

v2: Add a performance debug message when using the fallback scheduler.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
2020-07-24 12:27:07 +02:00
Neil Roberts 1c8167da61 v3d: Changed v3d_compile:failed to an enum
Instead of just having a bool status for the failure, there is now an
enum so that the compilation can report a more detailed status.
Currently this is only used to report whether the failure was due to
failed register allocation. The “failed” bool doesn’t seem to actually
have been used anywhere so this doesn’t really change a lot.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
2020-07-24 12:27:07 +02:00
Neil Roberts 08f1746fad v3d: Mark scheduling dependency for prim id and first output
The input primitive ID is read from the VPM in the same memory segment
as the outputs. This means that writing the GS header to VPM location 0
needs to be done after reading the primitive ID. This patch adds a
dependency between the load_primitive_id intrinsic and the store_output
intrinsic for location 0 to stop the scheduler from reordering them.

v2: Use an enum for the dependency class number.
v3: Add "GS" to the class enum name.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
2020-07-24 09:21:11 +02:00
Neil Roberts 7665398e6c nir/scheduler: Move nir_scheduler to its own header
nir_schedule already has a struct for options which makes it more than
just a function declaration. Later patches intend to add more structs to
complement these options. In order to make the code easier to manage,
this moves the nir_scheduler-related parts out of nir.h to their own
header.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
2020-07-24 09:21:11 +02:00
Icecream95 314ba5e174 nir: Add a face_sysval argument to nir_lower_two_sided_color
This is needed for handling drivers that use an input for loading the
face, for example Panfrost with Midgard GPUs.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Tested-by: Urja Rannikko <urjaman@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5915>
2020-07-17 14:50:26 +00:00
Neil Roberts 97f8ec321b v3d/compiler: Lower geometry output store base into offset src
When generating the VPM write instruction for geometry shader outputs,
emit_store_output_gs ends up adding the base and offset arguments
together with an ADD instruction. The addition was done at the VIR level
after scheduling so it always ends up right next to the corresponding
stvpm instruction. Most of the time the offset is constant but nothing
does any constant folding at the VIR level.

This patch makes it instead fold the addition into the offset at the NIR
level in v3d_nir_lower_io so that the NIR-level constant folding can get
rid of the addition most of the time.

v2: Use nir_iadd_imm to simplify the code. (Eric Anholt)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5825>
2020-07-16 08:48:06 +02:00
Neil Roberts deefebc55b v3d/compiler: Fix sorting the gs and fs inputs
ntq_setup_fs_inputs and ntq_setup_gs_inputs sort the inputs according to
the driver location. This input array is then used to calculate the VPM
offset for the outputs in the previous stage. However, it wasn’t taking
into account variables that are packed into a single varying slot. In
that case they would have the same driver_location and are
distinguished by location_frac.

This patch makes it additionally sort by location_frac when the driver
locations are equal. This can happen when the compiler packs varyings
that are sized less than vec4. Without this fix, when the VPM is used to
transmit data free-form between the stages (such as VS->GS) then it
would end up writing to inconsistent locations.

Fixes dEQP tests such as:
dEQP-GLES31.functional.primitive_bounding_box.lines.global_state.
vertex_geometry_fragment.default_framebuffer_bbox_equal

Fixes: 5d578c27ce ("v3d: add initial compiler plumbing for geometry shaders")

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5787>
2020-07-08 07:39:47 +00:00
Neil Roberts ee4d51f8b2 v3d: Add a lowering pass for line smoothing
When line smoothing is enabled, the driver now increases the width of
the line so that it can add some semi-transparent pixels to either side
of the line. A lowering pass is added which modifies the alpha component
of every write to fragment output 0 so that if the fragment is outside
the width of the line then the alpha is reduced. It additionally
discards fragments that are completely invisible. It might seem bad to
use discard on a tiled renderer but the assumption is that any bad
effects from using discard will also happen anyway because of enabling
alpha blending.

v2: Disable the line smoothing pass entirely when the framebuffer
    contains an integer colour output or one with no alpha channel.
    Calculate the coverage once upfront and store in a global variable
    instead of calculating each time an output write is modified. Also
    do the conditional discard once upfront.
v3: Don’t check whether the output buffer has an alpha channel. Only
    look at output 0. Use aa_line_width intrinsic instead of calculating
    the real line width in the shader. Clamp the coverage as part of the
    global variable, not per output write.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>
2020-07-06 21:59:16 +00:00
Neil Roberts 207da33a86 v3d: Handle the line width intrinsics
Adds new QUNIFORMs to store the line widths.

v2: Also handle the aa_line_width intrinsic

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>
2020-07-06 21:59:16 +00:00
Neil Roberts 2c4616368b v3d: Implement the line coord intrinsic
The line coord intrinsic is loaded from the implicit varying stored in
the same slot as the point coord when drawing lines.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>
2020-07-06 21:59:16 +00:00
Alejandro Piñeiro f8946bd705 v3d/tex: handle correctly coordinates for cube/cubearrays images
When fetching for cube maps, we need to interpret them as 2d texture
arrays, being the third coordinate the index for the face.

Fixes Vulkan CTS tests like the following using v3dv:

dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_image.fragment.single_descriptor.cube_base_mip
dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_image.compute.multiple_descriptor_sets.multiple_contiguous_descriptors.cube_array_base_mip

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5675>
2020-07-03 08:14:57 +00:00
Iago Toral Quiroga 8456ff75b3 v3d/compiler: fix image size for 1D arrays
Reviewed by: Alejandro Piñeiro <apinheiro@igalia.com>

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5692>
2020-07-01 10:01:46 +00:00
Eric Anholt f55a308c75 v3d: Enable PIPE_CAP_TGSI_TEXCOORD.
Dave wants to drop the !TEXCOORD path from NIR, and it's easy enough to
do.  Untested.

Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2952>
2020-06-29 09:07:21 -07:00
Iago Toral Quiroga 653dff949e v3d/compiler: fix spill offset
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Fixes: 97566efe5c ("v3d: Rematerialize MOVs of uniforms instead of spilling them.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5664>
2020-06-29 14:21:25 +02:00
Iago Toral Quiroga 4845f184d7 v3d/compiler: don't rewrite unused temporaries to point to NOP register
This was assuming that unused temporaries are written but never read,
since the NOP register can only be used as a destination register,
but we can end up here also for temporaries that are read once but
never written.

This was found with a graphicsfuzz test that has a switch with
cases that have unreachable discards. In that test, NIR genrates
code like this:

decl_reg vec3 32 r19
...
r20 = mov r19.z
r21 = mov r19.y
r22 = mov r19.x

Where r19.xyz would generate 3 temporary registers that are read but
never written, so we would rewrite them to point to the NOP register
as QPU instruction sources, which is not allowed and would hit an
assert that expect magic reads to be from [r0,r5] only.

Fixes:
dEQP-VK.graphicsfuzz.unreachable-switch-case-with-discards

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5645>
2020-06-26 08:57:32 +00:00
Neil Roberts 3b1c511b09 v3d: Use stvpmd for non-uniform offsets in GS
The offset for the VPM write for storing outputs from the geometry
shader isn’t necessarily uniform across all the lanes. This can happen
if some of the lanes don’t emit some of the vertices. In that case the
offset for the subsequent vertices will be different in each lane. In
that case we need to use the stvpmd instruction instead of stvpmv
because it will scatter the values out.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3150

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5621>
2020-06-26 09:36:15 +02:00
Neil Roberts dab8a9169c v3d: Add missing macro for stvpmd instruction
stvpmd is like stvpmv but it scatters the output. It can be used with
non-dynamically uniform offsets.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5621>
2020-06-26 09:36:15 +02:00
Neil Roberts 053df9bd8f v3d: Let scheduler know GS doesn’t have shared I/O memory
Unlike the vertex shaders, the memory for inputs and outputs is stored
in separate segments so the scheduler doesn’t need to serialise them.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
2020-06-22 08:23:06 +02:00
Neil Roberts ed29b576cb nir/scheduler: Add an option to specify what stages share memory for I/O
The scheduler has code to handle hardware that shares the same memory
for inputs and outputs. Seeing as the specific stages that need this is
probably hardware-dependent, this patch makes it a configurable option
instead of hard-coding it to everything but fragment shaders.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
2020-06-22 08:23:06 +02:00
Neil Roberts 0a18c935e1 v3d: Remove unused member of v3d_compile
It looks like gs_input_sizes was added when GS shaders were implemented
but it was never used anywhere.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
2020-06-22 08:23:06 +02:00
Rob Clark c148dbe07e v3d: don't use intr->num_components for non-vectorized intrinsics
Squashed-in-fix-from: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5371>
2020-06-16 02:48:18 +00:00
Timothy Arceri 04dbf709ed nir: add callback to nir_remove_dead_variables()
This allows us to do API specific checks before removing variable
without filling nir_remove_dead_variables() with API specific code.

In the following patches we will use this to support the removal
of dead uniforms in GLSL.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4797>
2020-06-03 02:22:23 +00:00
Dylan Baker a8e2d79e02 meson: use gnu_symbol_visibility argument
This uses a meson builtin to handle -fvisibility=hidden. This is nice
because we don't need to track which languages are used, if C++ is
suddenly added meson just does the right thing.

Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4740>
2020-06-01 18:59:18 +00:00
Alejandro Piñeiro f7fcbe9830 v3d/tex: use TMUSLOD register if possible
TMUSLOD register is the same that TMUS but having the same effect that
setting disable_autolod on the TMU configuration parameter 2.

So using that register is potentially more efficient, as in several
cases we would be able to skip writing P2.

One case where we can't use it is for texture cube maps, as we need to
use TMUSCM.

v2: don't put a comment in the middle of the conditions (Iago)

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4962>
2020-05-11 23:52:46 +00:00
Alejandro Piñeiro c3af695bb0 v3d/tex: set up default values for Configuration Parameter 1 if possible
Texture access has three configuration parameters, P0 (texture), P1
(sampler) and P2(lookup). P1 and P2 are optional, but if P2 is needed
(like for example to set the offset for texelFetchOffset), then you
need to set P1.

But until now when setting up P1 we were asking the driver to fill up
the address with the shader state. But in that case we can just fill
that address with the default value NULL.

So let's avoid asking the driver to fill that default values, and do
it directly on the compiler. This is a good-to-have on OpenGL, and
likely would be needed on Vulkan.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4962>
2020-05-11 23:52:46 +00:00
Alejandro Piñeiro 50c2c76ea3 v3d/tex: only look up the 2nd texture gather offset for 1d non-arrays
Commit 1bc71e8b65 already did that for
the 3rd offset, but it also needs to do it for the 2nd (to handle 1d
array).

Fixes assertion failures with Vulkan CTS tests using 1darray
targets. Seems that there isn't too many 1darray tests on OpenGL CTS,
and OpenGL-ES don't support 1d arrays, but the same problem could
arise eventually on OpenGL.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4962>
2020-05-11 23:52:46 +00:00
Alejandro Piñeiro ad460c5dd6 v3d: support for textureQueryLOD
Fixes all the ARB_texture_query_lod piglit tests, and needed to get
the Vulkan CTS textureQueryLOD passing with the ongoing Vulkan driver.

Note that LOD Query bit flag became only available on V42 of the hw,
but the v3d40_tex is using V41 as reference. In order to avoid setting
up the infrastructure to support both v41 and v42, we manually set the
bit if the device version is the correct one.

We also fix how the ARB_texture_query_lod (so EXT_texture_query_lod)
is exposed. Before this commit it was always exposed (wrongly as it
was not really supported). Now it is exposed for devinfo.ver >= 42.

v2: move _need_sampler helper to nir.h (Eric Anholt)

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4677>
2020-04-22 23:43:23 +02:00
Alejandro Piñeiro 9967c26ae6 v3d/tex: Configuration Parameter 1 can be only skipped if P2 can be skipped too
Configuration Parameter packets 1 and 2 are pointed as optional, but
it is not clearly stated if you can skip only P1 when P2 is needed.

In the practice, it seems that the situation P0 - non-P1 - P2 can
causes problems, and at least on the simulator, it seems that sampler
info are attempted to be accessed. So let's just be conservative, and
only skip P1 configuration if we can skip P2 configuration too.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4677>
2020-04-22 23:39:34 +02:00
Alejandro Piñeiro d0b644d9f9 v3d/tex: don't configure tmu config 1 if not needed
TMU configuration parameter 1 configures the sampler for the texture
operation. But there are some texture operations that doesn't need a
sampler. Skipping the configuration could provide a small perf
improvement on OpenGL. On the incoming Vulkan driver, would allow us
to avoid to set up an unneeded sampler.

Note that we still need to add the sampler configuration parameter if
the output is a 32bit, as it is on the sampler where we configure that
info.

Also, note that for images this is done comparing against a unpacked
p1 default. But in order to do that it is needed to go through the
code that fills up the unpacked p1. We can skip that too.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4677>
2020-04-22 23:38:18 +02:00
Eric Engestrom 79af30768d meson: inline `inc_common`
Let's make it clear what includes are being added everywhere, so that
they can be cleaned up.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4360>
2020-03-28 21:36:54 +01:00
Rob Clark 36aed70b59 util/ra: spiff out select_reg_callback
Add a parameter so the callback can know which node it is selecting a
register for.  And remove the graph parameter, as it is unused by
existing users, and somewhat unnecessary (ie. the callback data could
be used instead).

And add a comment so $future_me remembers how this works.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
2020-03-10 16:01:39 +00:00
Eric Anholt 12cf484d02 v3d: Ask the state tracker to lower image accesses off of derefs.
This saves a bunch of hassle in handling derefs in the backend, and would
be needed for reasonable handling of dynamic indexing of image arrays.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>
2020-02-24 18:25:02 +00:00
Eric Anholt 8d07d66180 glsl,nir: Switch the enum representing shader image formats to PIPE_FORMAT.
This means you can directly use format utils on it without having to have
your own GL enum to number-of-components switch statement (or whatever) in
your vulkan backend.

Thanks to imirkin for fixing up the nouveau driver (and a couple of core
details).

This fixes the computed qualifiers for EXT_shader_image_load_store's
non-integer sizeNxM qualifiers, which we don't have tests for.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v3d)
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
2020-02-05 10:31:14 -08:00
Anthony Pesch f77369086c util/hash_table: update users to use new optimal integer hash functions
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3475>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3475>
2020-01-23 17:06:57 +00:00
Jason Ekstrand d3737002ee nir/lower_atomics_to_ssbo: Also lower barriers
This is more correct for a pass which is supposed to completely lower
away atomic counters.  It also lets us stop supporting atomic counter
barriers in most of the drivers.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:47 +00:00
Jason Ekstrand e40b11bbcb nir: Rename nir_intrinsic_barrier to control_barrier
This is a more explicit name now that we don't want it to be doing any
memory barrier stuff for us.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:47 +00:00
Jason Ekstrand 60097cc840 nir: Add a new memory_barrier_tcs_patch intrinsic
Right now, it's implemented as a no-op for everyone.  For most drivers,
it's a switch case in the NIR -> whatever which just breaks.  For ir3,
they already have code to delete tessellation barriers so we just add a
case to also delete memory_barrier_tcs_patch.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:47 +00:00
Iago Toral Quiroga 6c7a2b69f8 v3d: handle writes to gl_Layer from geometry shaders
When geometry shaders write a value to gl_Layer that doesn't correspond to
an existing layer in the target framebuffer the rendering behavior is
undefined according to the spec, however, there are CTS tests that trigger
this scenario on purpose, probably to ensure that nothing terrible happens.

For V3D, this situation is problematic because the binner uses the layer
index to select the offset to write into the tile state data, and we only
allocate tile state for MAX2(num_layers, 1), so we want to make sure we
don't produce values that would lead to out of bounds writes. The simulator
has an assert to catch this, although we haven't observed issues in actual
hardware it is probably best to play safe.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16 08:42:37 +01:00
Iago Toral Quiroga a6b318ef52 v3d: predicate geometry shader outputs inside non-uniform control flow
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16 08:42:37 +01:00
Iago Toral Quiroga a07d70c54b v3d: we always have at least one output segment
If we program an output size of 0 the simulator asserts. This was
not a problem until now because our VS would always have to
emit fixed function outputs, however, now that it can be paired
with a GS we can end up with a VS shader that no longer emits
any outputs.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16 08:42:37 +01:00
Iago Toral Quiroga 76fc8c8bb1 v3d: compute appropriate VPM memory configuration for geometry shader workloads
Geometry shaders can output many vertices and thus have higher VPM memory
pressure as a result. It is possible that too wide geometry shader dispatches
exceed the maximum available VPM output allocated, in which case we need
to reduce the dispatch width until we can fit the VPM memory requirements.
Supported dispatch widths for geometry shaders are 16, 8, 4, 1.

There is a limit in the number of VPM output sectors that can be used by a
geometry shader that we can meet by lowering the dispatch width at compile
time, however, at draw time we need to revisit this number and, together with
other elements that can contribute to total VPM memory requirements, decide
on a configuration that can fit the program into the available VPM memory.
Ideally, we also want to aim for not using more than half of the available
memory so we that we can run a pair of bin and render programs in parallel.

v2: fixed language in comment and typo in commit log. (Alejandro)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16 08:42:37 +01:00
Iago Toral Quiroga 4f5fbd6490 v3d: implement geometry shader instancing
v2:
 - Remove unused field uses_iid from v3d_gs_prog_data (Alejandro)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16 08:42:37 +01:00
Iago Toral Quiroga 5d578c27ce v3d: add initial compiler plumbing for geometry shaders
Most of the relevant work happens in the v3d_nir_lower_io. Since
geometry shaders can write any number of output vertices, this pass
injects a few variables into the shader code to keep track of things
like the number of vertices emitted or the offsets into the VPM
of the current vertex output, etc. This is also where we handle
EmitVertex() and EmitPrimitive() intrinsics.

The geometry shader VPM output layout has a specific structure
with a 32-bit general header, then another 32-bit header slot for
each output vertex, and finally the actual vertex data.

When vertex shaders are paired with geometry shaders we also need
to consider the following:
  - Only geometry shaders emit fixed function outputs.
  - The coordinate shader used for the vertex stage during binning must
    not drop varyings other than those used by transform feedback, since
    these may be read by the binning GS.

v2:
 - Use MAX3 instead of a chain of MAX2 (Alejandro).
 - Make all loop variables unsigned in ntq_setup_gs_inputs (Alejandro)
 - Update comment in IO owering so it includes the GS stage (Alejandro)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16 08:42:37 +01:00
Iago Toral Quiroga f63750accf v3d: remove unused variable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16 08:42:37 +01:00
Iago Toral Quiroga d6b0786a38 v3d: add debug assert
While lowering vpm outputs we look for the NIR variables matching
particular store output instructions and we expect to find a match,
so assert on that.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16 08:42:37 +01:00
Iago Toral Quiroga 6e68f74395 v3d: add missing plumbing for VPM load instructions
We will need to use LDVPMG_IN specifically to read VPM inputs
in geometry shaders.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16 08:42:37 +01:00
Eric Anholt 8afab607ac nir: Add a scheduler pass to reduce maximum register pressure.
This is similar to a scheduler I've written for vc4 and i965, but this
time written at the NIR level so that hopefully it's reusable.  A notable
new feature it has is Goodman/Hsu's heuristic of "once we've started
processing the uses of a value, prioritize processing the rest of their
uses", which should help avoid the heuristic otherwise making such
systematically bad choices around getting texture results consumed.

Results for v3d:

total instructions in shared programs: 6497588 -> 6518242 (0.32%)
total threads in shared programs: 154000 -> 152828 (-0.76%)
total uniforms in shared programs: 2119629 -> 2068681 (-2.40%)
total spills in shared programs: 4984 -> 472 (-90.53%)
total fills in shared programs: 6418 -> 1546 (-75.91%)

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v2)

v2: Use the DAG datastructure, fold in the scheduling-for-parallelism
    patch, include SSA defs in live values so we can switch to bottom-up
    if we want.
v3: Squash in improvements from Alejandro Piñeiro for getting V3D to
    successfully register allocate on GLES3.1 dEQP.  Make sure that
    discards don't move after store_output.  Comment spelling fix.
2019-11-25 21:12:21 +00:00
Alejandro Piñeiro b4bc59e37e v3d: adds an extra MOV for any sig.ld*
Specifically when we are in non-uniform control flow, as we would need
to set the condition for the last instruction. If (for example) a
image atomic load stores directly their value on a NIR register,
last_inst would be a nop, and would fail when set the condition.

Fixes piglit test:
spec/glsl-es-3.10/execution/cs-ssbo-atomic-if-else-2.shader_test

Fixes: 6281f26f06 ("v3d: Add support for shader_image_load_store.")

v2: (Changes suggested by Eric Anholt)
   * Cover all sig.ld* signals, not just ldunif and ldtmu, as all of
     them have the same restriction.
   * Update comment explaining why we add a MOV in that case
   * Tweak commit message.

v3:
   * Drop extra set of parens (Eric)
   * Add missing ld signal to is_ld_signal to fix shader-db regression.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-20 11:21:16 +01:00
Jose Maria Casanova Crespo d983055184 v3d: Fix predication with atomic image operations
Fixes dEQP test:
dEQP-GLES31.functional.synchronization.inter_call.with_memory_barrier.image_atomic_multiple_interleaved_write_read

Fixes piglit test:
spec/glsl-es-3.10/execution/cs-image-atomic-if-else.shader_test

Fixes: 6281f26f06 ("v3d: Add support for shader_image_load_store.")

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-11-20 11:20:55 +01:00
Eric Anholt 882ca6dfb0 util: Move gallium's PIPE_FORMAT utils to /util/format/
To make PIPE_FORMATs usable from non-gallium parts of Mesa, I want to
move their helpers out of gallium.  Since u_format used
util_copy_rect(), I moved that in there, too.

I've put it in a separate directory in util/ because it's a big chunk
of related code, and it's not clear to me whether we might want it as
a separate library from libmesa_util at some point.

Closes: #1905
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-14 10:47:20 -08:00
Iago Toral Quiroga e7e501efce v3d: rename vertex shader key (num)_fs_inputs fields
Until now this made sense because we always paired vertex shaders
with fragment shaders, but as soon as we implement geometry and
tessellation shaders that will no longer be the case, so rename
this to (num_)used_outputs.

v2: Use 'used_outputs' instead of ns_outputs, which is more explicit (Eric).

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-10-31 08:46:35 +00:00
Timothy Arceri 7f106a2b5d util: rename list_empty() to list_is_empty()
This makes it clear that it's a boolean test and not an action
(eg. "empty the list").

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-10-28 11:24:38 +00:00
Erik Faye-Lund 65328bd32d Revert "v3d: do not report alpha-test as supported"
This reverts commit 9d0523b569.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jose Maria Casanova <jmcasanova@igalia.com>
2019-10-23 13:03:55 +02:00