Commit Graph

2471 Commits

Author SHA1 Message Date
Connor Abbott fa17295ebd ir3: Add simple CSE pass
RA currently can't handle a live value that's part of a vector and
introduces extra copies. This was espeically a problem for bary.f, where
the bary coords were being split and repeatedly re-collected. But this
could be a problem in other situations as well.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:24:06 -07:00
Connor Abbott b1a1de76e8 ir3/sched: Consider unused destinations when computing live effect
If an instruction's destination is unused, then we shouldn't penalize
it. For example, this helps us schedule atomic operations whose results
aren't read. This works around RA failures when CSE is enabled in some
robustness2 tests.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:24:06 -07:00
Connor Abbott ba8efeb7fa ir3/sched: Make collects count against tex/sfu limits
In a scenario where there are a lot of texture fetches with constant
coordinates, this prevents the scheduler from scheduling all the setup
instructions after the first group of textures has been scheduled
because they are the only non-syncing thing and scheduling them didn't
decrease tex_delay. Collects with immed/const sources will turn into
moves of those sources, so we should treat them the same.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:24:06 -07:00
Connor Abbott 8b15c2f30c ir3/sched: Don't schedule collect early
I don't think there was ever a good reason to do this, but when we start
folding constants/immediates into collect, this can become actively
harmful.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:24:06 -07:00
Connor Abbott 27593cb241 ir3: Remove right and left copy prop restrictions
This is leftover from the old RA, and inhibits copy propagation
unnecessarily with the new RA.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:24:06 -07:00
Connor Abbott 2f51379d03 ir3/ra: Add a validation pass
This helps catch tricky-to-debug bugs in RA, or helps rule them out.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:24:06 -07:00
Connor Abbott 0ffcb19b9d ir3: Rewrite register allocation
Switch to the new SSA-based register allocator.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:24:06 -07:00
Connor Abbott df9f41cc02 ir3: Expose occupancy calculation functions
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:24:06 -07:00
Connor Abbott 3ac743c333 ir3: Add pass to lower arrays to SSA
This will be run right after nir->ir3. Even though we have SSA coming
out of NIR, we still need it for NIR registers, even though we keep the
original array around to insert false dependencies.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:24:04 -07:00
Connor Abbott d4b5a550ed ir3: Add dominance infrastructure
Mostly lifted from nir.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:20:38 -07:00
Connor Abbott 1f3546c9e2 ir3: Remove unused check_src_cond()
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:20:38 -07:00
Connor Abbott c0789395e0 ir3/postsched: Don't use SSA source information
This was only used for calculating if a source is a tex or SFU
instruction, which is easily replacable. It's going away with the new
RA.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:20:38 -07:00
Connor Abbott c947475533 ir3/delay: Delete pre-RA repeat handling
It looks likely that any implementation of (rptN) in ir3 will have to
actually create (rptN) instructions after RA, which means that this can
be dropped.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:20:38 -07:00
Connor Abbott 58d82add87 ir3: Rewrite delay calculation
The old delay calculation relied on the SSA information staying around,
and wouldn't work once we start introducing phi nodes and making
"normal" values defined in multiple blocks not array regs anymore.
What's worse is that properly inserting phi nodes when splitting live
ranges would make that code even more complicated, and this was the last
place post-RA that actually needed that information.

The new version only compares the physical registers of sources and
destinations. It works by going backwards up to a maximum number of
cycles, so it might be slightly slower when the definition is closer but
should be faster when it is farther away.

To avoid complicating the new method, the old method is kept around, but
only for pre-RA scheduling and it can therefore be drastically
simplified as the array case can be dropped.

ir3_delay_calc() is split into a few variants to avoid an explosion of
boolean arguments in users, especially now that merged_regs now has to
be passed to it.

The new method is a little more complicated when it comes to handling
(rptN), because both the assigner and consumer may be (rptN). This adds
some unit tests for those cases, in addition to dropping the to-SSA code
in the test harness since it's no longer needed.

Finally, ir3_legalize has to be switched to using physical registers for
the branch condition. This was the one place where IR3_REG_SSA remained
after RA.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:20:38 -07:00
Connor Abbott c0823a2d31 ir3: Make branch conditions non-SSA
In particular, make sure they have a physreg assigned. This was the last
place after RA where SSA registers were created, which won't work with
the new post-RA delay calculation that relies on the physreg.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:20:38 -07:00
Connor Abbott fc7402b4cf ir3: Add reg_elems(), reg_elem_size(), and reg_size()
For working with registers in units of half-regs in the new RA.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:20:38 -07:00
Connor Abbott 890de1a436 ir3/delay: Fix full->half and half->full delay
The current compiler never does this, but the new compiler will start to
in mergeregs mode. There is an extra penalty for this.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:20:38 -07:00
Connor Abbott 9ad83f51eb ir3: Add ir3_register::array.base
There were two different approaches I saw in the post-RA code for
figuring out what regiser range a relative access touched:

1. Use reg->array.offset and reg->array.size. This is wrong in case
   reg->array.offset was non-zero before RA, because array.size is
   the size of the whole array and array.offset has the const offset
   within the array baked in.
2. Lookup the array from the array ID and use the base + range there.
   This is correct, but won't work with the new RA, where an array might
   not always be assigned to the same register.

This replaces both methods with a new ir3_register::array.base field,
and switches all the users I could find to it.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:20:38 -07:00
Connor Abbott 939ee6966f ir3: Improve register printing for SSA
Print the ssa name for array destinations, and handle printing undef SSA
sources.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:20:38 -07:00
Connor Abbott edf23e15eb ir3: Prepare for instructions with multiple destinations
To simplify the pre-RA merge set code and express the result live-range
splitting in RA, we need to add support for parallel copy instructions,
and for the merge set code these parallel copies need to be in SSA form.
Parallel copies have multiple destinations by necessity, but there was
no way to express this in the existing IR. In particular there was no
support for marking a register as being a destination, and no support
for indicating which destination register out of several an SSA source
refers to. This replaces ir3_register::instr with ir3_register::def and
re-purposes ir3_register::instr. I haven't propagated this into common
helpers, like ssa(), because that would vastly increase the amount of
churn and the number of places that produce such instructions should be
limited -- only RA will create parallel copies and they will be
destroyed right after RA. In the future swz will have multiple
destinations too, but it will only be created after RA via parallel copy
lowering.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:20:38 -07:00
Connor Abbott e1d7240576 ir3: Readd support for translating NIR phi nodes
This is roughly based on the support removed a while ago, but it handles
sources better by associating each source with a predecessor block.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:20:38 -07:00
Connor Abbott 0ef021be4a ir3: Add ir3_start_block()
Name based on nir_start_block(). A number of places were already
open-coding this, convert them.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:20:38 -07:00
Connor Abbott ef4e07a1a2 ir3: Introduce phi and parallelcopy instructions
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9842>
2021-06-10 12:20:38 -07:00
Rob Clark 3f758afe6a freedreno: Fix fdperf flush
We created and initialized the fence, but forgot to pass it to
fd_submit_flush().

Fixes: aafcd8aacb ("freedreno: Re-work fd_submit fence interface")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11200>
2021-06-09 19:08:53 -07:00
Rob Clark 09f64f74db freedreno/ir3: Fix use after free
If the tex/sfu ssa src is from a different block than the one currently
being scheduled, we do not have a valid sched-node.  So fallback to
previous behavior rather than dereference an invalid ptr.

Fixes: 7821e5a3f8 ("ir3/sched: Don't penalize uses of already-waited tex/SFU")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10306>
2021-06-09 00:37:15 +00:00
Caio Marcelo de Oliveira Filho 8af6766062 nir: Move workgroup_size and workgroup_variable_size into common shader_info
Move it out the "cs" sub-struct, since these will be used for other
shader stages in the future.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11225>
2021-06-08 09:23:55 -07:00
Rhys Perry 1cbcfb8b38 nir, nir/algebraic: add byte/word insertion instructions
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>
2021-06-08 08:57:42 +00:00
Caio Marcelo de Oliveira Filho c8a7bd0dc8 nir: Rename WORK_GROUP (and similar) to WORKGROUP
Be consistent with other usages in Vulkan and SPIR-V, and the recently
added workgroup_size field.

Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
2021-06-07 22:34:42 +00:00
Caio Marcelo de Oliveira Filho a71a780598 nir: Rename nir_intrinsic_load_local_group_size to nir_intrinsic_load_workgroup_size
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
2021-06-07 22:34:42 +00:00
Caio Marcelo de Oliveira Filho 430d2206da compiler: Rename local_size to workgroup_size
Acked-by: Emma Anholt <emma@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11190>
2021-06-07 22:34:42 +00:00
Dmitry Baryshkov cac88b5f06 freedreno/regs: split old/not used phy registers to separate DB
In order to simplify main DSI host database, split away phy register
definitions used on DSI v2 hosts to the separate database file.

Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11075>
2021-06-05 19:20:50 +00:00
Eric Anholt 95d41a3525 ra: Use struct ra_class in the public API.
All these unsigned ints are awful to keep track of.  Use pointers so we
get some type checking.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
2021-06-04 19:08:57 +00:00
Danylo Piliaiev 20d8324a1b turnip: implement VK_EXT_provoking_vertex
Passes: dEQP-VK.rasterization.provoking_vertex.*

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11112>
2021-06-04 14:37:01 +00:00
Hyunjun Ko 41eaa07823 turnip/kgsl: Fix to build on android.
Fixes: 3f229e34 ("turnip: Implement VK_KHR_timeline_semaphore.")

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11153>
2021-06-03 08:55:06 +00:00
Chia-I Wu 3ba3681b58 tu: use vk_default_allocator
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11117>
2021-06-03 08:13:26 +00:00
Emma Anholt d3e419f9d8 ci/freedreno: Add some more known flakes from recent marge runs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11144>
2021-06-03 03:07:35 +00:00
Danylo Piliaiev b71e27ea84 turnip: fix register_index calculations of xfb outputs
nir_assign_io_var_locations() does not use outputs_written when
assigning driver locations. Use driver_location to avoid incorrectly
guessing what locations it assigned.

Copied from lavapipe 8731a1beb7

Will fix provoking vertex tf tests when VK_EXT_provoking_vertex
would be enabled:
 dEQP-VK.rasterization.provoking_vertex.transform_feedback.*

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11111>
2021-06-02 23:55:00 +00:00
Danylo Piliaiev 551d7fddfb turnip: emit vb stride dynamic state when it is dirty
Due to incorrect condition we never emitted vb stride
if state was dynamically set.

Fixes vertex explosion with Zink.

See https://gitlab.freedesktop.org/mesa/mesa/-/issues/4738

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11133>
2021-06-02 21:38:19 +00:00
Danylo Piliaiev 74aa09b22c turnip: reset push descriptor set on command buffer reset
Otherwise it will store a pointer to already unmapped memory which
could lead to a crash in tu_CmdPushDescriptorSetWithTemplateKHR since
it tries to copy data from the old memory.

Fixes a crash with Zink's new lazy descriptor manager instroduced
in bfdd1d8d

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11137>
2021-06-02 16:01:40 +00:00
Matt Turner 09935c0dde freedreno/afuc: Print uintptr_t with PRIxPTR
Fixes a compilation error on 32-bit.

Fixes: bba61cef38 ("freedreno/afuc: Add emulator mode to afuc-disasm")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11118>
2021-06-02 03:57:20 +00:00
Tomeu Vizoso bc50a16103 Revert "ci/freedreno: Skip Portal 2 trace on a630, due to flakiness"
This reverts commit e381bc0e67.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Corentin Noël <corentin.noel@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11058>
2021-06-01 08:50:45 +02:00
Rob Clark 3dff0c30cf freedreno/headergen2: Fix compile warnings with CP_DRAW_INDIRECT_MULTI
Using stripes to deal with the different packet layout variants resulted
in redefining "register" offsets with different values, so use "prefix"
to add a suffix to disambiguate.

  drivers/gpu/drm/msm/adreno/adreno_pm4.xml.h:1066: warning: "REG_A6XX_CP_DRAW_INDIRECT_MULTI_INDIRECT" redefined
   1066 | #define REG_A6XX_CP_DRAW_INDIRECT_MULTI_INDIRECT  0x00000006
        |
  drivers/gpu/drm/msm/adreno/adreno_pm4.xml.h:1057: note: this is the location of the previous definition
   1057 | #define REG_A6XX_CP_DRAW_INDIRECT_MULTI_INDIRECT  0x00000003
        |

(Admittedly it isn't really a "prefix" but that was the field in the
schema available to use, and REG_INDEXED_CP_DRAW_INDIRECT_MULTI_STRIDE
sounds somewhat more funny.)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
2021-05-31 23:34:43 +00:00
Rob Clark ff5e17f1f8 freedreno/afuc: Use emulator to extract jmptbl
This runs through the SQE bootstrap code to extract the packet-table,
rather than relying on heuristics.  As a bonus, it can detect the start
of the LPAC fw in a660+ fw so that we can properly decode the LPAC fw
and packet-table.

Note that this decodes the jmptable as normal instructions, which is a
change in behavior from the previous heuristic based jmptbl extraction.
Not sure if that is a good or bad thing.

For a5xx, for now the legacy heuristic based jmptable decoding is
preserved, at least until enough control regs are figured out.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
2021-05-31 23:34:43 +00:00
Rob Clark 2beb5b015a freedreno/ci: Add real packet-table loading for afuc test
When we start running the bootstrap code thru the emulator we will need
the packet-table loading to actually happen.  So add this.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
2021-05-31 23:34:43 +00:00
Rob Clark df14af6480 freedreno/afuc: Add emulator support to run bootstrap
Run until the packet-table is populated, so the disassembler can use
this to know the offsets of various pm4 packet handlers without having
to rely on heuristics.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
2021-05-31 23:34:43 +00:00
Rob Clark ea2e244198 freedreno/afuc: Split out helpers to parse labels and packet-table
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
2021-05-31 23:34:43 +00:00
Rob Clark 9a4ca194e8 freedreno/afuc: Extract full gpu-id
Some of the a6xx gens will require some control reg initialization, and
go into an infinite loop if they don't see the values they expect, so
we'll need to extract the compute gpu-id.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
2021-05-31 23:34:43 +00:00
Rob Clark c2f8c98d56 freedreno/registers: Add a few a6xx regs and notes
A few things I noticed while playing with the emulator.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
2021-05-31 23:34:43 +00:00
Rob Clark bba61cef38 freedreno/afuc: Add emulator mode to afuc-disasm
This is an (at least somewhat complete) logical emulator of the a6xx SQE
that lets us step through firmware execution (bootstrap, cmdstream pkt
handling, etc).  It lets us poke at various fw visible state and run
through pm4 packet(s) to better understand what the fw is doing when it
handles various packets.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
2021-05-31 23:34:43 +00:00
Rob Clark 745dad0446 freedreno/afuc: Add pipe reg name decoding
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
2021-05-31 23:34:43 +00:00
Rob Clark 184f474574 freedreno/afuc: Clean up special regs
Allow for different mnemonics depending on whether they are used as
source or destination register, to better reflect what they do.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
2021-05-31 23:34:43 +00:00
Rob Clark 2876253f28 freedreno/afuc: Split out utils
With disasm emulator mode, we'll start wanting some things that are
duplicationg what the assembler does, so just split out all the rnndb
bits into shared utils.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
2021-05-31 23:34:43 +00:00
Rob Clark d367d84d87 freedreno/afuc: Split out instruction decode helper
Split the giant switch/decode out into a helper function so that we can
re-use it for emulator mode.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
2021-05-31 23:34:43 +00:00
Rob Clark 33b9445a68 freedreno: Move pkt parsing helpers to common
I'll be needing these in afuc as well.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
2021-05-31 23:34:43 +00:00
Rob Clark 62c53d4361 freedreno/tu+drm: Extract out pm4 pkt header helpers
I'm going to need these in a 3rd place, so let's deduplicate first.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
2021-05-31 23:34:43 +00:00
Danylo Piliaiev 8d0c76b143 freedreno: reduce the upper bound of IB size by one
Going beyond 0x100000 results in hangs, however I found that the
last 0x100000 packet just doesn't get executed. Thus the real limit is
0x0FFFFF. At least this is true for a6xx.

This could be tested by appending nops to the cmdstream and placing
e.g. CP_INTERRUPT at the end, at any position other than being
0x100000 packet it results in a hang.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10786>
2021-05-31 17:38:26 +00:00
Danylo Piliaiev f38fd3c577 turnip: place a limit on the growth of BOs
There is a limit on IB size, which on freedreno is set to 0x100000.
Going beyond it results in hangs, however I found that the last
0x100000 packet just doesn't get executed. Thus the real limit is
0x0FFFFF.

This could be tested by appending nops to the cmdstream and placing
e.g. CP_INTERRUPT at the end, at any position other than being
0x100000 packet it results in a hang.

Fixes:
  dEQP-VK.api.command_buffers.record_many_draws_secondary_2
  dEQP-VK.api.command_buffers.record_many_draws_primary_2

However these tests could trigger hangcheck timeouts.

Also this fixes hangs when opening captures of games in RenderDoc.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10786>
2021-05-31 17:38:26 +00:00
Emma Anholt 9d28bac9d0 turnip: Make sure that SNORM blits don't clamp ambiguous -1.0 values.
The CTS expects that some paths transfer SNORM data exactly, but the HW
will clamp 0x80 values to 0x81 in the process.  We can treat snorm as
unorm, though, and get working compression without the clamping happening.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10735>
2021-05-27 19:27:40 +00:00
Emma Anholt 69df1e8650 turnip: Reorganize copy_format()'s switch statement.
Now that we need FALLTHROUGH macros we weren't saving much by falling
through, and things were weirdly ordered anyway with depth intermixed with
color formats and the default case tucked in the middle of the switch
statement.  Replace with pretty obvious ordering of normal color, planar
color, depth, then default.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10735>
2021-05-27 19:27:40 +00:00
Tomeu Vizoso e381bc0e67 ci/freedreno: Skip Portal 2 trace on a630, due to flakiness
Sometimes the reticle is missing. Skip it for now to keep the number of
pipeline failures due to flakes low.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11033>
2021-05-27 14:07:44 +02:00
Tomeu Vizoso 6cd37e4bf0 Partial revert of "ci: Add a manual job for tracking the performance of Freedreno"
This reverts commit 8e470457de.

Drop that jobs, as it's sometimes causing a gitlab-ci.yml parse error
that isn't readily reproducible:

Found errors in your .gitlab-ci.yml:

'a630-profile-traces' job needs 'arm_test' job but it was not added to the pipeline
'a630-profile-traces' job needs 'meson-arm64' job but it was not added to the pipeline

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11030>
2021-05-27 10:41:53 +02:00
Emma Anholt 26677008b9 ci/freedreno: Turn off default a530 quick_gl testing, do full quick_shader.
The quick_gl set is too unstable -- even when I switched to a consistent
set of tests, and added lots of flakes, I keep getting new ones as some
test (unclear which, but it's like 7-8 minutes into the run) kills other
innocent ones.  Until we get per process pagetables, disable this testing
by default.

However, now that we've freed up a board that was doing quick_gl, we have
time to do all of quick_shader so that piglit uprevs don't reshuffle the
test list and expose new failures.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11018>
2021-05-26 20:49:47 +00:00
Tomeu Vizoso 26079868a7 ci/freedreno: Add depth32f_stencil8 flakes
Started happening after disabling cpufreq, devfreq and runtime PM.

At least one of these fail in each run, so it's blocking MRs.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7987>
2021-05-26 18:20:19 +00:00
Antonio Caggiano 8e470457de ci: Add a manual job for tracking the performance of Freedreno
Use Piglit's replay profile to measure and store the time that frames
take to render in the GPU.

This job won't run automatically in regular pipelines, but will be
triggered automatically by a script for every successful pre-merge
pipeline.

This is because we want to generate performance data for every relevant
commit merged in main, but we don't want to keep a device busy during
the pre-merge run.

Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-By: Rohan Garg <rohan.garg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7987>
2021-05-26 18:20:19 +00:00
Emma Anholt ee408df29c ci/freedreno: Consolidate ssbo.fragment_binding_array flake annotation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10933>
2021-05-24 16:42:33 +00:00
Emma Anholt fec60d5bee ci/freedreno: Drop VK flake annotations not seen in the last ~year.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10933>
2021-05-24 16:42:33 +00:00
Emma Anholt 4caf9b430d ci/freedreno: Add a link explaining get_display_plane_capabilities
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10933>
2021-05-24 16:42:33 +00:00
Emma Anholt a8c3783982 ci/freedreno: Drop a630 flake annotation from the go-fast changes.
The async fix seems to have fixed it, haven't seen this one since May 3rd.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10933>
2021-05-24 16:42:33 +00:00
Emma Anholt 1dbaaa22f9 ci/freedreno: Clear stale validation failure flake annotation.
Haven't seen it in my current set of IRC logs.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10933>
2021-05-24 16:42:33 +00:00
Emma Anholt a40479b6bc ci/freedreno: Clear compswap flake annotation.
These flakes disappeared around 2020-08 and the only sign since then has
been some flakes on versions of the ir3 RA rewrite.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10933>
2021-05-24 16:42:33 +00:00
Connor Abbott 9350900fcd ir3: Only use per-wave pvtmem layout for compute
The blob seems to do this since a630, and it fixes
spec@glsl-1.30@execution@fs-large-local-array on a650.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10922>
2021-05-21 20:45:07 +00:00
Connor Abbott 0ab01f4215 ir3: Call nir_lower_wrmask() again after lowering scratch
I forgot that after rebasing on large_consts support that this is now
called after the first time nir_lower_wrmask is called and can generate
partial writemasks that need to be lowered. While we're here, also call
the main optimization loop if things are lowered to scratch because it
generates address arithmetic that may need to be cleaned up.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10922>
2021-05-21 20:45:07 +00:00
Emma Anholt 307139c7f9 vulkan: Avoid stomping array padding in the MemoryProperties wrapper.
The deqp test for it expects that the unused array elements are untouched,
so make sure they don't get replaced with random stack data.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10737>
2021-05-20 21:41:06 +00:00
Connor Abbott de9f2170cc ir3: Use round-to-nearest-even for fquantize2f16
We're supposed to map a floating-point value too large to be represented
as fp16 to infinity, however round-to-zero naturally rounds it down to
the largest representable fp16 number instead. The blob emits a bunch of
fixup code to work around this, but instead we can just do what all the
other drivers seem to do and use round-to-nearest-even instead.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10897>
2021-05-20 18:45:59 +00:00
Ian Romanick 7d85dc4f35 nir/algebraic: Equality comparison inversions require sources be numbers
v2: Update A630 expected image checksum for minetest.trace.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Tiger Lake
total instructions in shared programs: 21036690 -> 21049485 (0.06%)
instructions in affected programs: 852085 -> 864880 (1.50%)
helped: 240
HURT: 2514
helped stats (abs) min: 1 max: 46 x̄: 2.45 x̃: 2
helped stats (rel) min: 0.15% max: 4.30% x̄: 0.79% x̃: 0.55%
HURT stats (abs)   min: 1 max: 198 x̄: 5.32 x̃: 2
HURT stats (rel)   min: 0.06% max: 10.71% x̄: 1.48% x̃: 1.04%
95% mean confidence interval for instructions value: 4.14 5.15
95% mean confidence interval for instructions %-change: 1.23% 1.34%
Instructions are HURT.

total cycles in shared programs: 856045255 -> 855816220 (-0.03%)
cycles in affected programs: 16743786 -> 16514751 (-1.37%)
helped: 790
HURT: 1973
helped stats (abs) min: 1 max: 10766 x̄: 627.97 x̃: 18
helped stats (rel) min: <.01% max: 32.59% x̄: 3.01% x̃: 0.64%
HURT stats (abs)   min: 1 max: 4078 x̄: 135.36 x̃: 18
HURT stats (rel)   min: <.01% max: 54.56% x̄: 2.80% x̃: 0.82%
95% mean confidence interval for cycles value: -131.36 -34.42
95% mean confidence interval for cycles %-change: 0.88% 1.40%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

total spills in shared programs: 9771 -> 9766 (-0.05%)
spills in affected programs: 47 -> 42 (-10.64%)
helped: 1
HURT: 0

total fills in shared programs: 9451 -> 9430 (-0.22%)
fills in affected programs: 91 -> 70 (-23.08%)
helped: 1
HURT: 0

LOST:   16
GAINED: 51

All Intel GPUs from Sandybridge through Ice Lake had similar results. (Ice Lake shown)
total instructions in shared programs: 20024781 -> 20025568 (<.01%)
instructions in affected programs: 103309 -> 104096 (0.76%)
helped: 12
HURT: 389
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.20% max: 2.70% x̄: 1.36% x̃: 1.37%
HURT stats (abs)   min: 1 max: 8 x̄: 2.06 x̃: 1
HURT stats (rel)   min: 0.05% max: 7.14% x̄: 1.25% x̃: 0.95%
95% mean confidence interval for instructions value: 1.78 2.15
95% mean confidence interval for instructions %-change: 1.06% 1.28%
Instructions are HURT.

total cycles in shared programs: 979419070 -> 979439180 (<.01%)
cycles in affected programs: 4968711 -> 4988821 (0.40%)
helped: 60
HURT: 381
helped stats (abs) min: 1 max: 1296 x̄: 96.92 x̃: 26
helped stats (rel) min: <.01% max: 27.10% x̄: 1.64% x̃: 0.65%
HURT stats (abs)   min: 1 max: 7320 x̄: 68.04 x̃: 30
HURT stats (rel)   min: <.01% max: 19.77% x̄: 1.32% x̃: 0.87%
95% mean confidence interval for cycles value: 10.25 80.95
95% mean confidence interval for cycles %-change: 0.69% 1.15%
Cycles are HURT.

LOST:   1
GAINED: 2

GM45 and Iron Lake had similar results. (Iron Lake shown)
total instructions in shared programs: 8128474 -> 8132527 (0.05%)
instructions in affected programs: 642323 -> 646376 (0.63%)
helped: 12
HURT: 1972
helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 4
helped stats (rel) min: 0.72% max: 1.72% x̄: 1.09% x̃: 0.83%
HURT stats (abs)   min: 1 max: 16 x̄: 2.07 x̃: 3
HURT stats (rel)   min: 0.12% max: 7.14% x̄: 0.77% x̃: 0.70%
95% mean confidence interval for instructions value: 1.99 2.10
95% mean confidence interval for instructions %-change: 0.74% 0.79%
Instructions are HURT.

total cycles in shared programs: 238280994 -> 238294376 (<.01%)
cycles in affected programs: 8841250 -> 8854632 (0.15%)
helped: 84
HURT: 1192
helped stats (abs) min: 4 max: 64 x̄: 12.50 x̃: 8
helped stats (rel) min: 0.02% max: 1.61% x̄: 0.28% x̃: 0.17%
HURT stats (abs)   min: 2 max: 198 x̄: 12.11 x̃: 12
HURT stats (rel)   min: 0.02% max: 8.03% x̄: 0.28% x̃: 0.14%
95% mean confidence interval for cycles value: 9.65 11.32
95% mean confidence interval for cycles %-change: 0.22% 0.27%
Cycles are HURT.

No fossil-db changes on any Intel platform.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Ian Romanick 4246c2869c nir/algebraic: Invert comparisons less often
This fixes the piglit test range_analysis_fsat_of_nan.shader_test.  That
test contains some code like

    o = saturate(X) > 0 ? vec4(1.0, 0.0, 0.0, 1.0)
                        : vec4(0.0, 1.0, 0.0, 1.0);

A clever optimizer will convert this to

    o = vec4(float(saturate(X) > 0),
             float(!(saturate(X) > 0)),
             0, 1);

Due to the ordering of optimizations in the compiler, the `saturate`
operations are removed.  This is safe even in the presense of NaN.

    o = vec4(float(X > 0), float(!(X > 0)), 0, 1);

Since the calculations are not marked precise, an overzealous
optimizer may reduce this to

    o = vec4(float(X > 0), float(X <= 0), 0, 1);

This will result in black being output.  The GLSL spec gives quite a bit
of leeway with respect to NaN, but that seems too far.  The shader
author asked for a result of red or green.  A result of black is still
"undefined behavior," but it's also a little mean.

This also enables CSE to do its job better.

v2: Update A530 expected image checksum for minetest.trace.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4531
Fixes: 0dbda153aa ("nir/algebraic: Flag inexact optimizations")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Tiger Lake
total instructions in shared programs: 21041563 -> 21041789 (<.01%)
instructions in affected programs: 992066 -> 992292 (0.02%)
helped: 526
HURT: 548
helped stats (abs) min: 1 max: 16 x̄: 2.48 x̃: 2
helped stats (rel) min: 0.04% max: 5.56% x̄: 0.74% x̃: 0.49%
HURT stats (abs)   min: 1 max: 27 x̄: 2.80 x̃: 2
HURT stats (rel)   min: 0.04% max: 4.55% x̄: 0.59% x̃: 0.38%
95% mean confidence interval for instructions value: -0.00 0.42
95% mean confidence interval for instructions %-change: -0.12% <.01%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 855885569 -> 856118189 (0.03%)
cycles in affected programs: 343637248 -> 343869868 (0.07%)
helped: 907
HURT: 541
helped stats (abs) min: 1 max: 7724 x̄: 206.45 x̃: 36
helped stats (rel) min: <.01% max: 29.97% x̄: 1.01% x̃: 0.37%
HURT stats (abs)   min: 1 max: 14177 x̄: 776.09 x̃: 31
HURT stats (rel)   min: <.01% max: 29.94% x̄: 1.24% x̃: 0.35%
95% mean confidence interval for cycles value: 84.30 237.00
95% mean confidence interval for cycles %-change: -0.32% -0.01%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

LOST:   3
GAINED: 5

Ice Lake
total instructions in shared programs: 20027107 -> 20025352 (<.01%)
instructions in affected programs: 1068856 -> 1067101 (-0.16%)
helped: 1153
HURT: 273
helped stats (abs) min: 1 max: 14 x̄: 1.83 x̃: 1
helped stats (rel) min: 0.03% max: 5.66% x̄: 0.61% x̃: 0.35%
HURT stats (abs)   min: 1 max: 15 x̄: 1.29 x̃: 1
HURT stats (rel)   min: 0.16% max: 1.30% x̄: 0.58% x̃: 0.60%
95% mean confidence interval for instructions value: -1.33 -1.13
95% mean confidence interval for instructions %-change: -0.43% -0.34%
Instructions are helped.

total cycles in shared programs: 979499227 -> 979448725 (<.01%)
cycles in affected programs: 344261539 -> 344211037 (-0.01%)
helped: 1079
HURT: 441
helped stats (abs) min: 1 max: 9384 x̄: 147.78 x̃: 48
helped stats (rel) min: <.01% max: 31.83% x̄: 0.90% x̃: 0.33%
HURT stats (abs)   min: 1 max: 7220 x̄: 247.07 x̃: 32
HURT stats (rel)   min: <.01% max: 31.30% x̄: 1.52% x̃: 0.53%
95% mean confidence interval for cycles value: -70.01 3.56
95% mean confidence interval for cycles %-change: -0.35% -0.05%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 10564 -> 10568 (0.04%)
spills in affected programs: 143 -> 147 (2.80%)
helped: 0
HURT: 1

total fills in shared programs: 11343 -> 11347 (0.04%)
fills in affected programs: 287 -> 291 (1.39%)
helped: 0
HURT: 1

LOST:   3
GAINED: 2

Skylake
total instructions in shared programs: 18192274 -> 18190128 (-0.01%)
instructions in affected programs: 1000188 -> 998042 (-0.21%)
helped: 1149
HURT: 55
helped stats (abs) min: 1 max: 14 x̄: 1.92 x̃: 1
helped stats (rel) min: 0.04% max: 6.67% x̄: 0.67% x̃: 0.42%
HURT stats (abs)   min: 1 max: 2 x̄: 1.05 x̃: 1
HURT stats (rel)   min: 0.16% max: 0.55% x̄: 0.27% x̃: 0.26%
95% mean confidence interval for instructions value: -1.87 -1.69
95% mean confidence interval for instructions %-change: -0.67% -0.58%
Instructions are helped.

total cycles in shared programs: 960856054 -> 960728040 (-0.01%)
cycles in affected programs: 340840968 -> 340712954 (-0.04%)
helped: 1079
HURT: 233
helped stats (abs) min: 1 max: 7640 x̄: 170.95 x̃: 46
helped stats (rel) min: <.01% max: 30.20% x̄: 0.96% x̃: 0.28%
HURT stats (abs)   min: 1 max: 6864 x̄: 242.23 x̃: 26
HURT stats (rel)   min: <.01% max: 34.64% x̄: 2.10% x̃: 0.22%
95% mean confidence interval for cycles value: -135.62 -59.53
95% mean confidence interval for cycles %-change: -0.59% -0.25%
Cycles are helped.

LOST:   15
GAINED: 1

Broadwell
total instructions in shared programs: 17855624 -> 17853580 (-0.01%)
instructions in affected programs: 1012209 -> 1010165 (-0.20%)
helped: 1105
HURT: 52
helped stats (abs) min: 1 max: 13 x̄: 1.90 x̃: 1
helped stats (rel) min: 0.03% max: 6.67% x̄: 0.67% x̃: 0.36%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.13% max: 0.52% x̄: 0.26% x̃: 0.25%
95% mean confidence interval for instructions value: -1.86 -1.67
95% mean confidence interval for instructions %-change: -0.68% -0.58%
Instructions are helped.

total cycles in shared programs: 1029905447 -> 1029840699 (<.01%)
cycles in affected programs: 347102680 -> 347037932 (-0.02%)
helped: 1007
HURT: 211
helped stats (abs) min: 1 max: 1360 x̄: 89.76 x̃: 48
helped stats (rel) min: <.01% max: 16.26% x̄: 0.69% x̃: 0.25%
HURT stats (abs)   min: 1 max: 1297 x̄: 121.51 x̃: 20
HURT stats (rel)   min: <.01% max: 31.31% x̄: 1.21% x̃: 0.20%
95% mean confidence interval for cycles value: -62.39 -43.92
95% mean confidence interval for cycles %-change: -0.47% -0.25%
Cycles are helped.

total spills in shared programs: 20335 -> 20333 (<.01%)
spills in affected programs: 19 -> 17 (-10.53%)
helped: 2
HURT: 0

total fills in shared programs: 25905 -> 25899 (-0.02%)
fills in affected programs: 23 -> 17 (-26.09%)
helped: 2
HURT: 0

LOST:   9
GAINED: 0

Haswell
total instructions in shared programs: 16418516 -> 16417293 (<.01%)
instructions in affected programs: 223785 -> 222562 (-0.55%)
helped: 590
HURT: 67
helped stats (abs) min: 1 max: 15 x̄: 2.19 x̃: 1
helped stats (rel) min: 0.03% max: 6.52% x̄: 0.87% x̃: 0.60%
HURT stats (abs)   min: 1 max: 2 x̄: 1.04 x̃: 1
HURT stats (rel)   min: 0.04% max: 1.85% x̄: 0.44% x̃: 0.25%
95% mean confidence interval for instructions value: -2.01 -1.71
95% mean confidence interval for instructions %-change: -0.80% -0.67%
Instructions are helped.

total cycles in shared programs: 1037179754 -> 1037084874 (<.01%)
cycles in affected programs: 352541071 -> 352446191 (-0.03%)
helped: 1093
HURT: 182
helped stats (abs) min: 1 max: 888 x̄: 111.03 x̃: 64
helped stats (rel) min: <.01% max: 27.30% x̄: 0.84% x̃: 0.20%
HURT stats (abs)   min: 1 max: 6777 x̄: 145.49 x̃: 21
HURT stats (rel)   min: <.01% max: 24.10% x̄: 1.99% x̃: 0.29%
95% mean confidence interval for cycles value: -88.10 -60.73
95% mean confidence interval for cycles %-change: -0.58% -0.29%
Cycles are helped.

total spills in shared programs: 17457 -> 17456 (<.01%)
spills in affected programs: 12 -> 11 (-8.33%)
helped: 1
HURT: 0

total fills in shared programs: 20387 -> 20385 (<.01%)
fills in affected programs: 15 -> 13 (-13.33%)
helped: 1
HURT: 0

LOST:   6
GAINED: 1

Ivy Bridge and earlier platforms had similar results. (Ivy Bridge shown)
total instructions in shared programs: 15515482 -> 15513998 (<.01%)
instructions in affected programs: 239739 -> 238255 (-0.62%)
helped: 573
HURT: 57
helped stats (abs) min: 1 max: 20 x̄: 2.73 x̃: 2
helped stats (rel) min: 0.03% max: 9.84% x̄: 0.94% x̃: 0.55%
HURT stats (abs)   min: 1 max: 2 x̄: 1.39 x̃: 1
HURT stats (rel)   min: 0.09% max: 1.85% x̄: 0.52% x̃: 0.35%
95% mean confidence interval for instructions value: -2.57 -2.14
95% mean confidence interval for instructions %-change: -0.89% -0.73%
Instructions are helped.

total cycles in shared programs: 584509880 -> 584463152 (<.01%)
cycles in affected programs: 11765280 -> 11718552 (-0.40%)
helped: 661
HURT: 152
helped stats (abs) min: 1 max: 3073 x̄: 101.99 x̃: 32
helped stats (rel) min: <.01% max: 34.38% x̄: 1.46% x̃: 0.50%
HURT stats (abs)   min: 1 max: 6637 x̄: 136.10 x̃: 15
HURT stats (rel)   min: <.01% max: 24.19% x̄: 1.75% x̃: 0.25%
95% mean confidence interval for cycles value: -82.79 -32.16
95% mean confidence interval for cycles %-change: -1.11% -0.61%
Cycles are helped.

LOST:   9
GAINED: 0

Tiger Lake
Instructions in all programs: 160905127 -> 160900949 (-0.0%)
SENDs in all programs: 6812418 -> 6812085 (-0.0%)
Loops in all programs: 38225 -> 38225 (+0.0%)
Cycles in all programs: 7431911114 -> 7433914697 (+0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304539 -> 304537 (-0.0%)

Ice Lake
Instructions in all programs: 145296733 -> 145292370 (-0.0%)
SENDs in all programs: 6863818 -> 6863485 (-0.0%)
Loops in all programs: 38219 -> 38219 (+0.0%)
Cycles in all programs: 8798257570 -> 8800204360 (+0.0%)
Spills in all programs: 216880 -> 216880 (+0.0%)
Fills in all programs: 334250 -> 334248 (-0.0%)

Skylake
Instructions in all programs: 135891485 -> 135887357 (-0.0%)
SENDs in all programs: 6803031 -> 6802698 (-0.0%)
Loops in all programs: 38216 -> 38216 (+0.0%)
Cycles in all programs: 8442221881 -> 8444201959 (+0.0%)
Spills in all programs: 194839 -> 194839 (+0.0%)
Fills in all programs: 301116 -> 301114 (-0.0%)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>
2021-05-20 01:39:35 +00:00
Connor Abbott e894e83e47 ir3/cf: Rewrite pass
The old pass had a few bugs:
- It tried to avoid folding f2f32 into f2f16, but didn't consider
  conversions that were already folded in.
- It didn't prevent folding an f2f16 or f2f32 into a non-floating-point
  op.

In addition it wasn't written in a manner which made handling integer
conversions practical. This rewrites the pass to instead calculate the
"type" of the conversion source and then check whether folding the
conversion is allowed. This allows us to cleanly separate the
declarative part where we describe how the HW works from the policy part
where we decide whether the transform is allowed, and makes it simple to
add support for folding integer conversions.

Closes: #3208
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10859>
2021-05-19 20:03:19 +00:00
Danylo Piliaiev 931ad19a18 turnip: make cmdstream bo's read-only to GPU
Would allow earlier faults instead of having corrupted cmdstream.

This was already done to Freedreno long ago in:
    04aff7e4 "freedreno: make cmdstream bo's read-only to GPU"

Since private memory should be GPU writable it is now allocated
separately, instead of suballocation from now read-only cmdstream.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10807>
2021-05-17 18:29:09 +00:00
Danylo Piliaiev 413e7c6dc8 turnip: make possible to create read-only bo with tu_bo_init_new
GPU won't be able to write to such BOs, which would to useful for
cmdstream BOs.

Move "bool dump" to the new flags along the way.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10807>
2021-05-17 18:29:09 +00:00
Connor Abbott a40714abf7 nir/lower_phis_to_scalar: Add "lower_all" option
We don't want to have to deal with vector phis in freedreno, because
vectors are always split/unsplit around vectorized instructions anyways,
and the stated reason for not scalarising them (it hurting coalescing)
won't apply to us because we won't be using nir_from_ssa. Add this
option so that we don't have to do the equivalent thing while
translating from NIR.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10809>
2021-05-17 09:59:45 +00:00
Danylo Piliaiev 3a29e45a90 turnip: do not ignore early_fragment_tests
Specifying "early_fragment_tests" in fragment shader takes precedence
over our internal conditions.

Fixes test:
 dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_stencil

Fixes: b2a60c157e "turnip: add LRZ early-z support"

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10803>
2021-05-17 11:56:32 +03:00
Rob Clark a5a86adc23 freedreno/a6xx: Add a few registers
Based-on: https://patchwork.freedesktop.org/patch/429745/?series=89269&rev=2
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10829>
2021-05-16 18:47:55 +00:00
Dmitry Baryshkov 0c30ad402d freedreno/regs: split DSI PHY registers to separate xml files.
In-kernel DSI PHY driver is being more and more split from the main DSI
driver. Split PHY registers from main dsi.xml file to ease further split
of the drivers.

Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10817>
2021-05-16 15:53:14 +00:00
Juan A. Suarez Romero 629e8347ad ci: Update VK-GL-CTS to 1.2.6.1
Reviewed-by: Emma Anholt <emma@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10800>
2021-05-14 20:35:24 +00:00
Emma Anholt 341ecb2dfc ci/freedreno: Skip refract on a306 now that it hangchecks sometimes.
Not every MR, but several per day since I landed the apitrace switch which
increased trace replay resolution to 1080p.  Hopefully some day we can
tune the hangcheck to be less aggressive.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10793>
2021-05-14 16:59:13 +00:00
Danylo Piliaiev 5a133ef1f2 ci/turnip: drop fail annotation for image.extend_operands_spirv1p4.*
They were fixed in
ed20e69b "vtn: Handle ZeroExtend/SignExtend image operands"

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10783>
2021-05-13 17:18:48 +00:00
Danylo Piliaiev 9a477ccbea ci/turnip: drop fail annotation for float_control tests
These tests are NotSupported and therefore cannot fail.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10783>
2021-05-13 17:18:48 +00:00
Rob Clark 6c530ebf40 freedreno/ir3: Don't force RTNE if rounding mode is undefined
Forcing round-to-nearest-even results in loss of opportunities for
conversion folding, causing a regression in gfxbench gl_alu2.

Fixes: de195671bd ("ir3: nir_op_f2f16 should round to even")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10773>
2021-05-12 19:05:27 +00:00
Hyunjun Ko 3f229e34c9 turnip: Implement VK_KHR_timeline_semaphore.
Implements non-shareable timelines using legacy syncobjs,
inspired by anv/radv implementation.

v1. Avoid memcpy in/out_syncobjs and fix some mistakes.
v2.
 - Handle vkQueueWaitIdle.
 - Add enum tu_semaphore_type.
 - Fix to handle VK_SEMAPHORE_WAIT_ANY_BIT_KHR correctly.
 - Fix a crash of dEQP-VK.synchronization.timeline_semaphore.device_host.misc.max_difference_value.

v3. Avoid indefinite waiting in vkQueueWaitIdle by calling
tu_device_submit_deferred_locked itself.

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10126>
2021-05-12 05:07:44 +00:00
Hyunjun Ko daefc6e2a4 turnip: prep work for timeline semaphore support
Small refactor to classify semphore types, currently only binary
syncobj is being used though.

v1. Fix a crash of dEQP-VK.api.null_handle.destroy_semaphore

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10126>
2021-05-12 05:07:44 +00:00
Emma Anholt 7520ac54dd ci: Switch to apitraces for glmark2
This brings in upstream mediump fixes, and should also replay faster than
.rdc files.

Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10295>
2021-05-11 20:07:29 +00:00
Emma Anholt 9a5c9ff342 turnip: Drop fail annotation for driver_properties.
These subtests weren't run in CI, and the whole set is skipped since
dropping to 1.1.

Fixes: 7bcda21441 ("turnip: Demote API version to 1.1.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10734>
2021-05-11 14:16:25 +00:00
Emma Anholt 63a3d18ae1 ci/turnip: Add some links to issues and MRs for some test failures.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10734>
2021-05-11 14:16:25 +00:00
Emma Anholt 90b08175b7 ci/turnip: Clean up some stale fail annotations.
This test group was fixed in the deqp 1.2.6.0 uprev, but we do a
fractional run that didn't include these tests.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10734>
2021-05-11 14:16:25 +00:00
Danylo Piliaiev 811f289c56 turnip: copy all layers specified in vkCmdCopyImage
When copying layered images we ignored .layerCount parameter.

Fixes mis-rendering of walls in D3D11 game "Company Of Heroes 2".

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10736>
2021-05-11 11:42:12 +00:00
Emma Anholt d93acf1001 freedreno: Update editorconfig and emacs settings for freedreno reformat.
Fixes: 2d439343ea ("freedreno: Re-indent")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10742>
2021-05-10 23:16:00 +00:00
Eric Anholt 3eee475e39 turnip: Claim 2 discrete queue priorities.
The spec requires at least 2, but says "No specific guarantees are made
about higher priority queues receiving more processing time or better
quality of service than lower priority queues."  So, we can just leave the
priorities as a stub.

Fixes dEQP-VK.info.device_properties

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10470>
2021-05-10 17:21:02 +00:00
Eric Anholt d8099df65a turnip: Drop wideLines properties since we don't support wide lines.
The blob doesn't expose wideLines either, and
dEQP-VK.info.device_properties fails if you claim wide line properties
without it.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10470>
2021-05-10 17:21:02 +00:00
Rob Clark 3a772be026 freedreno: Add perfetto renderpass support
Add a custom DataSource to provide trace events for render stages.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9901>
2021-05-10 15:34:07 +00:00
Rob Clark 133a3e4dd3 freedreno/pps: Detect GPU suspend on newer kernels
We can avoid re-sending the configuration cmdstream constantly if we
know the device has not suspended since the last sampling period.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9901>
2021-05-10 15:34:07 +00:00
Rob Clark e63ef520fe freedreno/drm: Add support to query device suspend count
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9901>
2021-05-10 15:34:07 +00:00
Rob Clark 3e13e45467 freedreno: Add freedreno pps driver
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9901>
2021-05-10 15:34:07 +00:00
Danylo Piliaiev daad8f2245 freedreno/a5xx: SP_BLEND_CNTL has per-mrt blend enable bit
Blending in SP_BLEND_CNTL is not a binary flag but the same
mask as in RB_BLEND_CNTL. It is a per-mrt enable bit for blending.

Copied form a6xx, on a5xx it should be have the same since it seems
to have the same structure layout.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10682>
2021-05-07 19:12:17 +00:00
Danylo Piliaiev 14da2444a9 turnip,freedreno/a6xx: SP_BLEND_CNTL has per-mrt blend enable bit
Blending in SP_BLEND_CNTL is not a binary flag but the same
mask as in RB_BLEND_CNTL. It is a per-mrt enable bit for blending.

Example SP_BLEND_CNTL produced by blob on a630 and different MRT
blendings:
  SP_BLEND_CNTL: { UNK8 | 0x6 }
  SP_BLEND_CNTL: { ENABLED | UNK8 | 0xe }
(Decoded before this commit)

Fixes mis-rendering with D3D11 game "Spelunky 2".

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10682>
2021-05-07 19:12:17 +00:00
Emma Anholt 0cd63e891d turnip: Move the extension tables to tu_device.c
Following intel's lead in 27d49670.  In the dEQP-VK.info.* tests, this
bumps apiVersion from 1.1.128 to 1.1.177.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10635>
2021-05-06 00:14:12 +00:00
Emma Anholt c5438450ad turnip: Switch to the shared vulkan ICD generator.
One less python script to maintain.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10635>
2021-05-06 00:14:12 +00:00
Eric Anholt cc5df4398a freedreno/a5xx: Fix up border color pointers.
We were forgetting to increment in the loop, but also it looks from blob
dumps on Pixel 2 like all the pointers it emitted were shifted up by 3
compared to our xml, and that's the same shift that a6xx uses for its
pointers.  None of the tests seem to use more than one
border-color-requiring texture, so it's hard to tell.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9904>
2021-05-05 21:28:03 +00:00
Rob Clark b447db41fc freedreno/tools: Fix async flush vs fdperf/computerator
They need to wait on the ready fence to ensure the submit has been
flushed to the kernel.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10626>
2021-05-05 20:32:31 +00:00
Eric Anholt 7bcda21441 turnip: Demote API version to 1.1.
We don't support major 1.2 required extensions like timeline semaphores.
Fixes many complaints in the dEQP-VK.info.vulkan1p2.* group.

We were originally bumped to 1.2 in 75755e0eba ("turnip: Pretend to
support Vulkan 1.2") but hopefully that build issue has been fixed in the
entrypoint reworks since then.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10471>
2021-05-05 17:09:09 +00:00
Danylo Piliaiev d8ab0ec8e4 turnip: implement VK_KHR_vulkan_memory_model
No handling of Acquire/Release because at the moment scheduler
works as if any barrier is Acq+Rel.

Instead of removing scoped_barrier with scope/mode that for TCS
corresponds to a control_barrier or a memory_barrier_tcs_patch in
ir3_nir_lower_tess_ctrl - remove them in emit_intrinsic_barrier.
And do the same for memory_barrier_tcs_patch and control_barrier.
While in any case hw fence/barrier shouldn't be emitted for them,
they still affect ordering of stores, and in feature ir3 backend
may want to have that information.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9054>
2021-05-05 10:05:38 +00:00
Danylo Piliaiev a898828a63 ir3: update bar/fence bits in accordance to blob
On a6xx blob uses .l rather differently from a5xx.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9054>
2021-05-05 10:05:38 +00:00
Danylo Piliaiev cb8a00791c ir3: memory_barrier also controls shared memory access order
nir_intrinsic_memory_barrier has the same semantic as memoryBarrier()
in GLSL, which is:

GLSL 4.60, 4.10. "Memory Qualifiers":
 "The built-in function memoryBarrier() can be used if needed to
 guarantee the completion and relative ordering of memory accesses
 performed by a single shader invocation."

GLSL 4.60, 8.17. "Shader Memory Control Functions":
 "The built-in functions memoryBarrier() and groupMemoryBarrier() wait
 for the completion of accesses to all of the above variable types."

Fixes tests:
 dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.image.guard_nonlocal.workgroup.comp
 dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.workgroup.guard_local.image.comp

Fixes: 819a613a ("freedreno/ir3: moar better scheduler")

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9054>
2021-05-05 10:05:38 +00:00
Eric Anholt 89114225b5 tunrip: Add support for VK_EXT_separate_stencil_usage.
We were implictly including it in exposing VK 1.2, but we weren't making
use of the supplied struct.  Actually enabling it gives us a chance to do
slightly better at Z/S UBWC, and means we won't lose the separate usage
test coverage when switching back to exposing VK 1.1.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10594>
2021-05-04 20:30:50 +00:00
Eric Anholt 7c52a79057 ci/freedreno: Add another db820c flake that's appeared in the last few months.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10597>
2021-05-04 01:03:50 +00:00
Connor Abbott 9fa587ae96 ir3: Don't assume regs[1] exists in ir3_fixup_src_type()
It won't exist for phi nodes because they are only partially constructed
beforehand. Move it into the switch arguments where we know it's needed.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
2021-05-03 19:52:31 +00:00
Connor Abbott 3c8a5d7e17 ir3: Rework outputs
Instead of using a separate outputs array, make the "end" instruction
(or chmask) take the outputs as sources. This works better for the new
RA, because it better models the fact that outputs are consumed all at
the same time. With the old model, each output collect would be assumed
dead after it was processed and subsequent collects could use it when
inserting shuffle code, which wouldn't work, and the new RA also deletes
collect instructions after lowering them to moves so the information
would be gone after RA.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
2021-05-03 19:52:31 +00:00
Connor Abbott dd55bd8f68 ir3: Make predecessors an array
We need a stable order in order to create phi instructions. In the
future we can make this more sophisticated in order to make manipulating
the CFG easier, but for now that only happens after RA, so we won't have
to worry about it.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
2021-05-03 19:52:31 +00:00
Connor Abbott 0bd68b8386 ir3: Refactor nir->ir3 block handling
Originally I wrote this to support multiple ir3 blocks per NIR block,
but this turned out to be more useful for creating a stable ordering to
the predecessors. We compute the predecessors ourselves, rather than
relying on NIR, so that the array of predecessors we create in the next
commit has a stable order we can rely on when creating phi nodes.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
2021-05-03 19:52:31 +00:00
Connor Abbott d28b22374c ir3/cp_postsched: Fixup SSA use pointer for direct reads
There's an optimization here to sink direct (i.e. not relative) reads of
an array past unrelated direct writes. However, since each write
actually reads, modifies, and then writes again to the array, this means
that we need to read the latest updated array. The old RA used the array
id instead of the SSA information, so it didn't care, but the new RA
uses ->instr instead and ignores the array id because arrays are now SSA
so it needs to be correct.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
2021-05-03 19:52:31 +00:00
Connor Abbott 40a1c4ba2d ir3/postsched: Fix ir3_postsched_node::delay calculation
This wasn't using the same calculation that add_reg_dep() was using to
get the index into state->regs, so it was using the wrong register. Fix
this by folding it into add_reg_dep().

This shouldn't fix anything, because it's just used for scheduler
priorities, but it should reduce nop's and syncs.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
2021-05-03 19:52:31 +00:00
Connor Abbott 4b41ffc231 ir3/delay: Remove special case for array deps
The case it was trying to handle (array read-after-write depedendencies)
is already handled by the normal SSA source handling, so this is just
useless.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
2021-05-03 19:52:31 +00:00
Connor Abbott 873e21f4e9 ir3/postsched: Use correct src index
Match what ir3_delay_calc() does. Caught by an assert later.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
2021-05-03 19:52:31 +00:00
Connor Abbott af7f29a78e ir3/sched: Use correct src index
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
2021-05-03 19:52:31 +00:00
Connor Abbott 7df7bab03b ir3/cp: Clone registers for compare-folding optimization
Sharing the same register between instructions happened to work with the
old RA, but not with the new RA because they may get different register
assignments.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
2021-05-03 19:52:31 +00:00
Connor Abbott e597f8b122 ir3/postsched: Fix dependencies for a0.x/p0.x
a0.x is written as a half-reg, but just interpreting it as "hr61.x" will
result in it overlapping with r30.z in merged mode, which is not what
the hardware does at all. This introduced a spurious dependency on
a write to r30.z which resulted in an assert tripping. Just pretend it's
a full reg instead.

This fixes
spec@arb_tessellation_shader@execution@variable-indexing@vs-output-array-vec3-index-wr-before-tcs
with the new RA.

Fixes: 0f78c32 ("freedreno/ir3: post-RA sched pass")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10591>
2021-05-03 19:52:31 +00:00
Danylo Piliaiev ea72be8b7c ir3: do not fold cmps from different blocks with non-null address
Scheduling don't like address being in the different block from
the instruction.

Fixes a crash in the trace of "War Thunder" (DX11)

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10355>
2021-05-03 17:25:05 +00:00
Connor Abbott 3d5c1c4989 tu: Fix SP_GS_PRIM_SIZE for large sizes
Based on the previous commit.

Fixes: 012773b ("turnip: Configure VPC for geometry shaders")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10551>
2021-05-03 14:06:24 +00:00
Connor Abbott 0157076982 freedreno/a6xx: Better document SP_GS_PRIM_SIZE
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10551>
2021-05-03 14:06:24 +00:00
Rob Clark c554757bc2 freedreno/drm: Initialize control->fence
Don't rely on getting a zero'd out buffer, we could hit the bo-cache.

Fixes: 7dabd62464 ("freedreno/drm: Userspace fences")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10567>
2021-05-02 15:17:25 +00:00
Rob Clark 928453ccb2 freedreno/ci: Mark client_wait_sync_finish as flake
This one has shown up a couple times since fd/go-fast, I'm still trying
to reproduce/debug.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10530>
2021-05-01 08:47:50 -07:00
Rob Clark 5181f40670 freedreno/drm: Allow FD_BO_PREP_FLUSH without _NOSYNC
This provides the upper layer (gallium, etc) a way to ensure that
rendering involving the bo has been flushed all the way to the kernel.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10530>
2021-05-01 08:46:27 -07:00
Rob Clark 9fa3312773 freedreno/ci: Isolate dEQP-EGL reset_context tests
To reduce flakes, separate out the dEQP-EGL tests that are intentionally
triggering GPU hangs.  This avoids some kernel side issues with bad
handling of ringbuffer-full scenarios, causing innocent tests to flake.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10560>
2021-05-01 02:37:05 +00:00
Eric Anholt 0987df6a3e ci/freedreno: Mark dEQP-EGL flakes reported on IRC since its introduction.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10552>
2021-04-30 16:44:20 -07:00
Danylo Piliaiev 1201aa9332 ir3: do not move varying inputs that depend on unmovable instrs
Not all varying fetches could be pulled into the start block.
If there are fetches we couldn't pull, like load_interpolated_input
with offset which depends on a non-reorderable ssbo load or on a
phi node, this pass is skipped since it would be hard to find a place
to set (ei) flag (beside at the very end).

We also don't have to manually set (ei) in such cases since a5xx and
a6xx do automatically release varying storage at the end.
Earlier gens need further testing, however they do not support
interpolateAt* functions at moment, so unless we would like to support
sample shading on them - they are fine.

Fixes crash in GTA V.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10483>
2021-04-30 14:49:18 +00:00
Juan A. Suarez Romero e532a47f76 util/hash_table: do not leak u64 struct key
For non 64bit devices the key stored in hash_table_u64 is wrapped in
hash_key_u64 structure, which is never free.

This commit fixes this issue by just removing the user-defined
`delete_function` parameter in hash_table_u64_{destroy,clear} (which
nobody is using) and using instead a delete function to free this
structure.

Fixes: 608257cf82 ("i965: Fix INTEL_DEBUG=bat")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10480>
2021-04-29 12:58:23 +02:00
Rob Clark 1d19325483 freedreno/ci: Disable counterstrike trace on a306 for now
The combination of removing bottlenecks in userspace (userspace fences,
etc) and slow GPU results in hitting full ringbuffer on a306.  Haven't
figured out a reasonable way to work around that in userspace until a
kernel fix is in place, so disable this one for now.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark f92f31455a freedreno/drm: Assume explicit fences if in_fence_fd
If we ever see explicit fencing used, then we can disable implicit
fencing, even for internal or unfenced batches.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark e9a9ac6f77 freedreno/drm: Async submit support
Move the submit ioctl to it's own thread to unblock the driver thread
and let it move on to the next frame.

Note that I did experiment with doing the append_bo() parts
synchronously on the theory that we should be more likely to hit the
fast path if we did that part of submit merging before the bo was
potentially re-used in the next batch/submit.  It helped some things
by a couple percent, but hurt more things.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark 2c9e8db28d freedreno/drm: pipe should hold reference to device
A more direct solution would be for bo's to have a reference to the
device.  But bo's are ref/unrefd more frequently.

This avoids async submits unrefing a bo after the device handle-
table is freed.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark ad9654a4c1 freedreno/drm: fd_submit should hold ref to fd_pipe
Also, move this into the base class, no reason for it to be in backend.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark cccdc513e3 freedreno/drm/sp: Implement deferred submit merging
For submits flushed with (a) no required fence, and (b) no externally
visible effects (ie. imported/exported bo), we can defer flushing the
submit and merge it into a later submit.

This is a bit more work in userspace, but it cuts down the number of
submit ioctls.  And a common case is that later submits overlap in the
bo's used (for example, blit upload to a buffer, which is then used in
the following draw pass), so it reduces the net amount of work needed
to be done in the kernel to handle the submit ioctl.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/19
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark c7dc5cf3cb freedreno/drm/sp: Split submit prep and finish
For deferred submits, we still need to do the prep steps immediately,
but the ioctl construction can be deferred.. prepare for that by
splitting the prep out.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark 62a6773d80 freedreno/drm: Add pipe tracking for deferred submits
Now that we have some bo state tracking for userspace fences, we can
build on this to add a way for the pipe implementation to defer a submit
flush in order to merge submits into a single ioctl.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark aafcd8aacb freedreno: Re-work fd_submit fence interface
Move everything into a struct assocated with the pipe_fence_handle, so
that the drm layer can fill in the seqn/fd fences directly.

This will give us a comvenient place to insert a util_queue_fence in the
next commit.

While we're at it, extract the uint32_t fence (previously called
'timestamp' in place, a kgsl legacy) into a struct that encapsulates
both the kernel fence and the userspace fence.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark d531c8d22a freedreno/drm: Reference count submits
To merge submits, we'll need drm to internally hold an extra reference
to the submit.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark 49852ace2a freedreno/drm: Inline the fence-table
In the common case, a bo will have no more than a single fence attached.
We can inline the storage to avoid a separate allocation in this case.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark 7dabd62464 freedreno/drm: Userspace fences
Add a per-fd_pipe fence "timeline" so we can detect cases where we don't
need to call into the kernel to determine if a fd_bo is still busy.

This reuses table_lock, rather than introducing a per-bo lock to protect
fence state updates because (a) the common / hotpath pattern is to
update fences on a lot of objects, but checking the fence state of a
single object is less common, and (b) because we already hold the table
lock in common spots where we need to check the bo's fence state (ie.
allocations from the bo-cache).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark df78934cdf freedreno/drm: Add locked version fd_{bo,pipe}_del()
This will be needed in the next patch, so we can reuse the bo table_lock
for fence related locking.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark 8f5c89350f freedreno/drm: Move the growable array helper
We'll need this to track userspace fences attached to a fd_bo.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark c98ada6ad1 freedreno/drm: Add FD_BO_PREP_FLUSH
There are a couple cases where we want to use _NOSYNC, but at the same
time we want to ensure that rendering related to a bo is actually
flushed.

This doesn't do anything yet, but when we start deferring/merging
submits we'll need a way to trigger anything deferred to flush.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark 8ab227c373 freedreno/drm: Cleanup bo cpu_prep flags
Also add some STATIC_ASSERT()

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark 7f0abd9048 freedreno/drm: Cleanup bo allocation flags
Most of them were actually unused.  The memory type (KMEM vs SMI) only
applied to very old a2xx era devices that had a small/fast stacked
memory (SMI) vs normal memory (KMEM).  And the cache flags are ignored
(ie. everything is writecombine), but we can add new cache flags later
when they actually do something.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark ef0c5007f2 freedreno/drm: Move submit->primary to base class
Gets rid of a bit of duplication between the two current
implementations, and will be needed in next patch.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Rob Clark 224dbd77d5 freedreno: Small indent fix
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
2021-04-28 15:36:42 +00:00
Danylo Piliaiev addab037f0 tu: do not corrupt unwritten render targets
There is no point in having a write to an attachment enabled when there
is no corresponding output in the shader. Per VK spec it is an UB,
however a few apps depend on attachment not being changed if FS doesn't
have corresponding output.

Fixes water in Genshin Impact.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10489>
2021-04-28 08:51:49 +00:00
Eric Anholt 7ae0719117 turnip: Only write the tu_RegisterDeviceEXT() out fence on success.
Fixes a double-free in dEQP-VK.wsi.display_control.register_device_event
where the fence that we destroyed got destroyed again.  Leaving it unset
in the error path leaves the test's NULL in place.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10473>
2021-04-27 16:19:26 +00:00
Italo Nicola 8074a040e7 util: add util_sign_extend
This code is taken from src/freedreno/isa/decode.c.
Since we need a similar function in panfrost, it's probably good to move
it to utils.

Signed-off-by: Italo Nicola <italonicola@collabora.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9461>
2021-04-27 07:04:07 +00:00
Connor Abbott 643f2cb8a3 ir3, tu: Cleanup indirect i/o lowering
Do all the necessary lowering in one place, during finalization, and
stop uselessly calling nir_lower_indirect_derefs in turnip. Splitting
i/o to elements should no longer be necessary since we use the i/o
semantics instead of variables now.

This has the side effect that we no longer generate enormous if-ladders
for tess/GS shaders with turnip.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7274>
2021-04-26 17:07:02 +00:00
Connor Abbott decfea2f4e ir3: Prevent oob writes to inputs/outputs array
Don't setup inputs and outputs if we aren't using
load_input/store_output intrinsics. While it's mostly harmless, there
may be more outputs than expected which would lead to an oob write of
the outputs array when setting the register id to INVALID_REG.

Also be more paranoid with asserts to catch this.

Fixes: a6291b1 ("freedreno/ir3: rework setup_{input,output} to make struct varyings work")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7274>
2021-04-26 17:07:02 +00:00
Robert Foss 9967dabe91 freedreno/regs: add 5nm DSI PHY/PLL regs
This is for the kernel driver.

Signed-off-by: Robert Foss <robert.foss@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10368>
2021-04-21 15:42:03 +00:00
Danylo Piliaiev 9402d5a6b5 ir3: make possible to specify branchstack up to 64
On a6xx/a5xx there is such dependency between branchstack bitfield
and the amount of nested ifs, which could be seen with blob:

IFs   BRANCHSTACK
0	0
1	1
2	2
3	2
4	3
5	3
6	4
...
59	30
60	31
61	31
62	32
63	32
64	32

Remove open-coded branchstack for a5xx compute along the way.

Fixes tests:
 dEQP-VK.spirv_assembly.instruction.compute.float16.opvectorshuffle.344
 dEQP-VK.spirv_assembly.instruction.graphics.float16.opvectorshuffle.344_vert
 dEQP-VK.spirv_assembly.instruction.graphics.float16.opvectorshuffle.444_geom
 dEQP-VK.spirv_assembly.instruction.graphics.float16.opvectorshuffle.244_tessc
 dEQP-VK.spirv_assembly.instruction.graphics.float16.opvectorshuffle.344_frag

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9859>
2021-04-21 11:57:07 +00:00
Danylo Piliaiev e7eed45869 ir3: do not double threadsize when exceeding branchstack limit
We can't support more than compiler->branchstack_size diverging threads
in a wave. Thus, doubling the threadsize is only possible if we don't
exceed the branchstack size limit.

As of blob version 512.490.0 - it doesn't have this heuristics.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9859>
2021-04-21 11:57:07 +00:00
Danylo Piliaiev 1e33b6a32b turnip: enable shaderInt16
We should have everything to enable it.
16b integer division is lowered by nir_lower_idiv.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10054>
2021-04-20 20:32:20 +00:00
Danylo Piliaiev d918bbfa1c ir3: treat 16b imul as mul.s24
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10054>
2021-04-20 20:32:20 +00:00
Rob Clark 5bf7475460 ir3: handle 16b op_i2b1
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10054>
2021-04-20 20:32:20 +00:00
Samuel Iglesias Gonsálvez b2a60c157e turnip: add LRZ early-z support
Imported the logic from Freedreno driver.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7186>
2021-04-20 10:01:58 +00:00
Samuel Iglesias Gonsálvez af049b6668 turnip: fix setting dynamic state mask for VK_DYNAMIC_STATE_STENCIL_OP_EXT case
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7186>
2021-04-20 10:01:58 +00:00
Samuel Iglesias Gonsálvez 88c7aa0b3e turnip: group all geometry constant draw states in one
Thus, we can free some draw state slots for future use.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7186>
2021-04-20 10:01:58 +00:00
Samuel Iglesias Gonsálvez 2c0c696f16 turnip: update LRZ state based on stencil test state
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7186>
2021-04-20 10:01:58 +00:00
Samuel Iglesias Gonsálvez ff8e3547b3 turnip: implement LRZ direction
There are some LRZ compare op switches that are not supported by
the HW, like GREATER* <-> LESS* ones.

This patch tracks the direction of the switch and disables LRZ
if needed.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7186>
2021-04-20 10:01:58 +00:00
Eric Anholt 8a8e55d6a8 ci/freedreno: Test dEQP-EGL against Xorg.
This should help us be able to refactor core EGL code with more
confidence, and increase our confidence uprevving Mesa in ChromeOS.

Part of #1884

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10240>
2021-04-19 20:53:27 +00:00
Danylo Piliaiev 64367f2359 turnip: implement VK_KHR_shader_terminate_invocation
OpTerminateInvocation provides the behavior required by the GLSL
discard statement, which we already implement.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9460>
2021-04-19 17:11:36 +00:00
Danylo Piliaiev 9dd9424a85 turnip: implement VK_EXT_shader_demote_to_helper_invocation
The "demote" intrinsic has the semantics of D3D discard, which means
it doesn't change the control flow, allowing derivatives to work.

On A6xx there is no known way to check whether invocation was demoted,
thus we use nir_lower_is_helper_invocation.

Add "logical" OPC_DEMOTE which is later translated to "kill".
Such separation is necessary to run "kill" specific optimizations
which are invalid for "demote".

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9460>
2021-04-19 17:11:36 +00:00
Connor Abbott 08499369d0 ir3: Assemble and disassemble swz/gat/sct
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10291>
2021-04-19 16:10:44 +00:00
Connor Abbott d48d43039a ir3: Improve cat1 modifier disassembly
Remove bit that shouldn't be part of (rptN), and rewrite the handling of
(even) and (pos_infinity) to uncover a missing (neg_infinity) modifier.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10291>
2021-04-19 16:10:44 +00:00
Connor Abbott 4c5b696cc3 ir3/parser: Fix oob write with immediates array
immediates_count and immediates_size are supposed to have the same
units, but it was only incrementing immediates_count by 1. While we're
here, also fix the case where constants are specified out-of-order.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10291>
2021-04-19 16:10:44 +00:00
Rob Clark c74d93cf01 freedreno/fdl: Re-indent
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10293>
2021-04-17 15:38:56 +00:00
Rob Clark 6050976232 freedreno/perfcntrs: Re-indent
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10293>
2021-04-17 15:38:56 +00:00
Rob Clark d26a224ca9 freedreno/ir2: Re-indent
clang-format -fallback-style=none --style=file -i src/freedreno/ir2/*.[ch]

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10293>
2021-04-17 15:38:56 +00:00
Rob Clark 2dbf09c2b4 freedreno/drm-shim: Re-indent
clang-format -fallback-style=none --style=file -i src/freedreno/drm-shim/*.[ch]

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10293>
2021-04-17 15:38:56 +00:00
Rob Clark 45856c5fbc freedreno/decode: Re-indent
clang-format -fallback-style=none --style=file -i src/freedreno/decode/*.[ch]

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10293>
2021-04-17 15:38:56 +00:00
Rob Clark 3894bc9664 freedreno/computerator: Re-indent
clang-format -fallback-style=none --style=file -i src/freedreno/computerator/*.[ch]

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10293>
2021-04-17 15:38:56 +00:00
Rob Clark ccd68b672a freedreno/common: Re-indent
clang-format -fallback-style=none --style=file -i src/freedreno/common/*.[ch]

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10293>
2021-04-17 15:38:56 +00:00
Rob Clark f5918f750f freedreno/afuc: Re-indent
clang-format -fallback-style=none --style=file -i src/freedreno/afuc/*.[ch]

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10293>
2021-04-17 15:38:56 +00:00
Rob Clark b94db11708 freedreno/drm: Re-indent
clang-format -fallback-style=none --style=file -i src/freedreno/drm/*.[ch]

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10293>
2021-04-17 15:38:56 +00:00
Eric Anholt 23159f1a7a ci/freedreno: Skip some precision tests on a530.
These have flaked as Timeouts in CI in the last month.  .precision.* is
generally very slow (some in the 15s-30s range), but it's unclear to me
why they sometimes spike up to 60 seconds (thermal throttling?).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10274>
2021-04-16 04:34:14 +00:00
Eric Anholt 7d234da6ee freedreno: Fix YUV sampler regression.
We have to keep sampler uniforms around for later YUV lowering, and we
only need to remove uniforms that take up storage space.  Code comes from
radeonsi.

Closes: #4644.
Fixes: de17b4aab5 ("freedreno: Remove uniform variables after finalizing NIR.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10246>
2021-04-15 16:20:15 +00:00
Michel Dänzer d200f45875 Use explicit break instead of fall-through to break-only case
clang generates a warning if there's no explicit break or fall-through
annotation. The latter would be kind of silly in this case, and not
robust against any future changes turning the fall-through invalid.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
2021-04-15 16:01:22 +00:00
Michel Dänzer 2928c21eb7 Convert most remaining free-form fall-through comments to FALLTHROUGH
One exception is src/amd/addrlib/, for which -Wimplicit-fallthrough is
explicitly disabled.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
2021-04-15 16:01:22 +00:00
Connor Abbott cf727e6ba4 tu: Expose VK_EXT_robustness2
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7573>
2021-04-15 16:05:13 +02:00
Connor Abbott 0fb14420da tu: Handle null descriptors
Writing all 0's, including for the format, seems to work. Actually
setting the format seems to break textureSize() (getsize returns 1 for
some reason).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7573>
2021-04-15 16:05:13 +02:00
Connor Abbott f58ece08da tu: Handle robust UBO behavior for pushed UBO ranges
If we push a UBO range but then find out at draw-time that part of the
pushed range is out of range of the UBO descriptor, then we have to fill
in the rest of the range with 0's to mimic the bounds-checking that ldc
would've done.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7573>
2021-04-15 16:05:13 +02:00
Connor Abbott cb02a48f83 tu: Correctly preserve old push descriptor contents
We were never setting set->size, so we were always copying 0 bytes. But
as we only copy the contents when the layout and therefore the size is
the same, we don't have to take the old size into account anyway.

This fixes some VK_EXT_robustness2 tests that use push descriptors.

Fixes: 6d4f33e ("turnip: initial implementation of VK_KHR_push_descriptor")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7573>
2021-04-15 16:05:13 +02:00
Connor Abbott c68ea960a7 ir3, tu: Add compiler flag for robust UBO behavior
This needs to be part of the compiler because it's the only piece that
we always have access to in all the places ir3_optimize_loop() is
called, and it's only enabled for the whole Vulkan device. Right now
it's just used for constraining vectorization, but the next commit adds
another use.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7573>
2021-04-15 16:05:11 +02:00
Connor Abbott 8f54028479 ir3: Reduce max const file indirect offset base to 9 bits
This fixes
dEQP-VK.robustness.robustness2.bind.notemplate.r32i.dontunroll.nonvolatile.uniform_buffer.no_fmt_qual.len_260.samples_1.1d.frag,
which accesses the shader UBO with c<a0.x + 512> due to the constant
data UBO coming before it in the const file. The len_256 variant has a
smaller constant data UBO, so it uses c<a0.x + 256> instead, and that
works, so 512 seems to be the real limit.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7573>
2021-04-15 16:03:54 +02:00
Connor Abbott 8e11f0560e ir3: Fix list corruption in legalize_block()
We forgot to remove the instruction under consideration from instr_list
before inserting it into the block's list, which caused instr_list to
become corrupted. This happened to work but caused further corruption in
some rare scenarios.

Fixes: adf1659 ("freedreno/ir3: use standard list implementation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7573>
2021-04-15 16:03:54 +02:00
Eric Anholt 6d510fd473 ci/freedreno: Merge a630 piglit to a single job.
piglit_gl clocked in at 6:12 end-to-end runtime, and piglit_shader spent
2:53 in deqp-runner, so merging them together should be about 9 minutes.
Removing a boot should save us a minute or two of runner time per
pipeline.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10243>
2021-04-15 10:06:14 +00:00
Samuel Iglesias Gonsálvez 029bc53be6 turnip: fix typo in tu_CmdBeginRenderPass2()
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8615>
2021-04-15 09:51:25 +02:00
Samuel Iglesias Gonsálvez d52917f858 turnip/lrz: added support for depth bounds test enable
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8615>
2021-04-15 09:51:25 +02:00
Samuel Iglesias Gonsálvez 2161aebf8d turnip: document GRAS_LRZ_CNTL's UNK5 bitfield
It is used by the blob to enable depth bounds test for LRZ.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8615>
2021-04-15 09:51:25 +02:00
Samuel Iglesias Gonsálvez 54cf12774a turnip/lrz: add support for VK_EXT_extended_dynamic_state
When the depth or stencil state changes dynamically, that might affect
LRZ state and we need to recalculate it and emit it again.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8615>
2021-04-15 09:51:20 +02:00
Samuel Iglesias Gonsálvez 6d6cbb7361 turnip: refactor how LRZ state is calculated
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8615>
2021-04-15 09:50:51 +02:00
Samuel Iglesias Gonsálvez 43ebba4e88 turnip: initialize pipeline->rb_{stencil,depth}_cntl always
This change will simplify further changes on LRZ state management.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8615>
2021-04-15 09:50:51 +02:00
Samuel Iglesias Gonsálvez 1f9fb7677b turnip: move pipeline gras_su and rb{stencil,depth}_cntl_mask initialization
Move them up, so they are initialized even when the dynamic state is
not used.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8615>
2021-04-15 09:50:51 +02:00
Rob Clark 31782330da freedreno: Add missing foreach macros and update indentation
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10247>
2021-04-14 16:53:26 -07:00
Rob Clark 2fb3984805 freedreno: Add .clang-format
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8883>
2021-04-14 19:52:21 +00:00
Connor Abbott 2deead184c ir3/sched: Don't schedule too many tex/SFU instructions
Consider a simple loop that does a series of texture instructions and
then reduces the results:

vec4 sum = vec4(0);
for (int i = 0; i < N; i++) {
   sum += texture(...);
}

Assume that the loop is unrolled and we schedule the resulting basic
block. Right now, after we schedule the first texture instruction, the
only instructions available to schedule that don't incur a sync are the
instructions to setup the second texture instruction. So we keep picking
the texture instructions, no matter how large N is, resulting in a
pathological schedule for register pressure when N is very large:

sum1 = texture(...);
sum2 = texture(...);
sum3 = texture(...);
...
sum = sum1 + sum2 + sum3 + ...;

In particular this happens with some CTS tests for VK_EXT_robustness2,
where a loop like that with many iterations is marked as [[unroll]],
forcing NIR to unroll it.

This solution is a balance between the current approach and always
scheduling for register pressure (and ignoring sync's). We only allow a
certain number of texture fetches to be in flight before considering
textures to "sync", even though they don't really, both because they
likely *will* sync in reality (overflowing the internal queue of waiting
texture instructions) and because at some point we need the normal
algorithm to kick in and start lowering register pressure.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7571>
2021-04-14 17:33:58 +00:00
Connor Abbott 7821e5a3f8 ir3/sched: Don't penalize uses of already-waited tex/SFU
Once we insert a use of a given tex or SFU instruction, then we must
wait for that tex/SFU instruction (as well as all earlier ones) to
complete, so we shouldn't penalize further uses, even if a subsequent
tex/SFU instruction gets scheduled after the first use. This especially
matters after the next commit when we start forcibly breaking up long
sequences of texture instructions, since if we schedule a group of 8
texture instructions then we want to schedule the uses of those
instructions in parallel with the next 8 texture instructions to reduce
register pressure.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7571>
2021-04-14 17:33:58 +00:00
Michel Dänzer af0fde955c ci: Move docker images from Debian buster to bullseye
Among other things, this gets us GCC 10 (was 6).

Requires some changes to third party components we use:

* Install apitrace (& waffle) from Debian; was hitting issues with the
  local build, and it's the same version 9.0 anyway.
* Update Fossilize to a newer commit which builds with GCC 10.
* apt.llvm.org repositories are no longer needed.
* Use an SPIRV-LLVM-Translator commit which builds with LLVM 11.0.1.
* Install XCB packages from Debian, 1.13 fails to build with Python 3.9.
* Install wayland-protocols from Debian, 1.12 is too old for
  libgtk-3-dev in bullseye.

LLVM 7/8 packages are no longer available.

Also adapt expected test results to Xvfb now exposing multi-samle
GLXFBConfigs.

v2:
* Install clang instead of clang-11.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3124
Reviewed-by: Eric Anholt <eric@anholt.net> # v1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9833>
2021-04-14 13:05:08 +00:00
Connor Abbott 271c18f48e tu: Expose VK_KHR_relaxed_block_layout
This was absorbed into Vulkan 1.1, but we forgot to expose it
separately. It's a subset of what's allowed by
VK_EXT_scalar_block_layout.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8695>
2021-04-14 11:48:38 +00:00
Connor Abbott 765c3b85a5 tu: Expose VK_KHR_spirv_1_4 and VK_EXT_scalar_block_layout
VK_KHR_spirv_1_4 is trivial because vtn already supports all the added
SPIR-V features that aren't gated behind Vulkan extensions. I've
observed some robustness2 CTS tests requiring this. However there are
a few tests currently failing due to lacking spilling.

VK_EXT_scalar_block_layout should also be trivial, since support for
"straddling" UBO loads was added recently for other reasons. This is
used by every robustness2 CTS test.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8695>
2021-04-14 11:48:38 +00:00
Juan A. Suarez Romero 9e5762c387 ci: Update VK-GL-CTS to 1.2.6.0
v2:
 - Bump up MESA_ROOTFS_TAG instead of arm_build (Michel)

Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10136>
2021-04-14 08:06:55 +00:00
Marek Olšák fb29cef8dd nir: add many passes that lower and optimize 16-bit input/outputs and samplers
Added:
* a pass that renumbers bases of IO intrinsics
* a pass that converts mediump IO to 16 bits, optionally using the new
  packed varying slots
* a pass that sets (forces) mediump in IO intrinsics (for testing)
* a pass that remaps VARYING_SLOT_VAR[0..15]_16BIT to VARYING_SLOT_VAR[0..31]
  (if some shader stages don't want packed varyings)
* a pass that folds type conversions around texture opcodes into those
  opcodes (e.g. tex(f2f32(coord), ..) is changed into tex accepting f16)
* a pass that changes (legalizes) sampler src and dst types based on specified
  hw constraints (e.g. derivatives must be the same type as coordinates)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9050>
2021-04-13 05:07:42 +00:00
Rhys Perry a2619b97f5 nir/lower_idiv: add options to use fp32 for 8-bit division lowering
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10081>
2021-04-12 16:19:46 +00:00
Danylo Piliaiev 16fd5bd996 turnip: support copying both aspects of D32_SFLOAT_S8_UINT
We cannot copy both aspects at the same time, so copy them one by one.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10140>
2021-04-12 14:36:30 +00:00
Connor Abbott ba796d5115 ir3/postsched: Make sure to schedule inputs before kill
Before, we would prefer to schedule inputs before kills, which works
assuming that the live range of the bary_ij system value don't get
split and therefore all bary.f are ready at the start of the block.
However live range splitting can mess up that assumption and cause a
kill to get scheduled before a move that leads to a bary.f.

This fixes even e.g. dEQP-GLES2.functional.shaders.discard.basic_always
on a3xx before introducing CSE of collect instructions, but even after
that it could be a problem theoretically as the register allocator
doesn't guarantee that any live ranges aren't split.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10143>
2021-04-09 16:31:29 +00:00
Matt Turner 4251e9cddf ir3: Don't count (nopX) towards the wrong category
Prior to this commit

   (nop3) mad.f32 r0.y, c0.x, r1.w, c0.y

was counted as 4 cat3 instructions (and still 3 cat0/nops) in shader-db
results. With this change, it is counted as only 1 cat3 instruction.

Probably never going to have better shader-db results than this in my
career:

total cat2 in shared programs: 1214667 -> 732058 (-39.73%)
cat2 in affected programs: 1194729 -> 712120 (-40.39%)
helped: 8551
HURT: 0

total cat3 in shared programs: 376448 -> 274745 (-27.02%)
cat3 in affected programs: 344918 -> 243215 (-29.49%)
helped: 7222
HURT: 0

Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10116>
2021-04-09 14:26:35 +00:00
Bas Nieuwenhuizen 4ca4de50f7 nir: Remove nir_shader->shared_size.
The same info is in shader_info. Dedupe.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10094>
2021-04-08 14:39:28 +00:00
Chad Versace 5e6db19168 anv: Remove vkCreateDmaBufINTEL (v4)
Superceded by VK_EXT_image_drm_format_modifier.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v4)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1466>
2021-04-08 14:15:55 +00:00
Chad Versace 0845cabc72 vulkan: Track dependencies of Python imports
The meson.build was unaware of transitive dependencies introduced by
Python imports.

Android still needs fixing. But I did not update the Android files lest
I break the build.

Ideally, we would fix this by using a Python runner that generates
a depfile, similar to how meson creates depfiles for C files by passing
flags -MD -MQ -MF to gcc. But this patch gets the job done, without
stalling on the ideal general solution, by manually tracking the Python
imports in new 'foo_depend_files' variables.

CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1466>
2021-04-08 14:15:54 +00:00
Connor Abbott 5a70c4d4a0 ir3: Don't copy propagate arrays in ir3_cp
We don't check whether there's an intervening write in this pass, which
makes it incorrect. ir3_cp_postsched does check correctly, but we were
accidentally doing it here anyway for some sources.

While we're here, delete some code that was only used in the array case.

Fixes: f370e954 ("freedreno/ir3: handle const/immed/abs/neg in cp")
Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10076>
2021-04-07 14:35:13 +00:00
Connor Abbott 1ad5ee5a04 ir3/cp_postsched: Set address of uses for relative mov's
Fixes: 680ca5b ("freedreno/ir3: add post-scheduler cp pass")
Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10076>
2021-04-07 14:35:13 +00:00
Connor Abbott dcc26a3945 ir3: Fix valid flags for STIB
Disallow immediates for the source. This was hidden by the fact that we
didn't copy-propagate trivial collect instructions.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10076>
2021-04-07 14:35:13 +00:00
Connor Abbott 94beaa1d92 ir3/legalize: Fix last input (ss) insertion
If there was a mix of ldlv and bary.f and we inserted an (ss) *after*
the last input which was a bary.f, then last_input_needs_ss would get
unset, even though it shouldn't. For figuring out whether we need the
(ss), we need to know whether there are any pending ldlv's when
last_input gets executed, not at the end of the block, which means that
the existing code's strategy of inserting it after the whole block has
been processed won't work. Rework it to do the last_input processing in
the main loop instead.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10076>
2021-04-07 14:35:13 +00:00
Connor Abbott 35ffe4fec1 freedreno/a3xx: Fix SP_FS_CTRL_REG1_INITIALOUTSTANDING
Unfortunately this didn't fix anything, but I thought I might as well
include it.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10076>
2021-04-07 14:35:13 +00:00
Danylo Piliaiev 519eb735a3 turnip: implement variableMultisampleRate
If subpass doesn't have depth/color attachments - samples count is
devised from VkPipelineMultisampleStateCreateInfo::rasterizationSamples.
Without variableMultisampleRate enabled all pipelines in such subpass
should have the same samples count; variableMultisampleRate allows
to have pipelines with different number of samples in one subpass,
given that it doesn't have depth/color attachments.

Blob doesn't have it enabled but there is no known reason for this.

Passes:
 dEQP-VK.pipeline.multisample.variable_rate.*

Fixes test:
 dEQP-VK.pipeline.framebuffer_attachment.no_attachments_ms

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9556>
2021-04-07 12:04:45 +00:00
Alejandro Piñeiro 1e0a69afa7 vulkan: track number of bindings instead of max binding for CreateDescriptorSetLayout
As that handles better, and more clear, the case of bindingCount being
zero. For the case of Anvil and Turnip, this avoids allocating a
non-needed binding when bindingCount is zero.

Inspired on radv, that was what it was doing so far.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4526

Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9905>
2021-04-05 20:17:53 +00:00
Danylo Piliaiev 0709a6b363 turnip: fix alignment of non-32b types in workgroup memory
Fixes tests:
 dEQP-VK.spirv_assembly.instruction.compute.workgroup_memory.float16

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10039>
2021-04-05 17:31:11 +00:00
Alyssa Rosenzweig 06ebbde630 vulkan: Deduplicate mesa stage conversion
Across every driver...

v2: Add casts to appease -fpermissive used on CI.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9477>
2021-04-03 17:34:39 +00:00
Danylo Piliaiev 0ec495e3c9 turnip: handle format list for compressed formats
Compressed formats may have compatible formats, however they could
only be sampled, so we should not call tu6_format_color with them.

tu6_format_texture should have the same behaviour for checking swap
so use it for all cases.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10009>
2021-04-02 21:52:05 +00:00
Eric Anholt f67b6f9c47 ci/freedreno: Fix up the a5xx border color flake annotation.
Looks like I put it in the wrong file back when I first caught it.  It's a
one-or-twice-a-week back flake that seems to happen.  The upcoming
deqp-runner uprev would have caught this mistake.

Fixes: 957132294f ("ci/a5xx: Increase the gles3/31 coverage.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9806>
2021-04-02 18:42:04 +00:00
Eric Anholt adf04d1af4 ci/freedreno: Switch to the trimmed glxgears trace.
The old one had a ton of frames and took ~5 minutes on a306.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9957>
2021-04-01 21:04:11 +00:00
Eric Anholt fe5349f70c freedreno/a6xx: Fix alpha tests.
Apparently I inverted the sense of this flag back when we didn't have
piglit testing.  Fixes terrible rendering in minetest, HL2, CS:Source, and
CS.

Fixes: 0369dd9077 ("freedreno/a6xx: Add ARB_depth_clamp and separate clamp support.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9957>
2021-04-01 21:04:11 +00:00
Eric Anholt 3043940183 freedreno/a5xx: Fix alpha test vs early Z bugs.
Just like with discards, we have to disable early Z writes when alpha test
is enabled.

Fixes rendering on HL2, CS: Source, counter-strike, and minetest.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9957>
2021-04-01 21:04:11 +00:00
Eric Anholt c9fd8c2570 ci/freedreno: Add trace testing on a3xx, a5xx.
Having compared rendering between a6xx and these, I found several bugs.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9957>
2021-04-01 21:04:11 +00:00
Eric Anholt 8e3a1d0dd2 ci/freedreno: Rename a306-test and a530-test to drop "arm64" from the name.
We don't have an armhf variant, and probably won't.  Now matches a630.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9957>
2021-04-01 21:04:11 +00:00
Eric Anholt ec54546b2a ci/freedreno: Add more new traces for a630 (minetest, TDM, pioneer, glyphy).
These are all recent traces that have been added.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9957>
2021-04-01 21:04:11 +00:00
Danylo Piliaiev ce1a381e57 turnip: enable VK_KHR_16bit_storage on A650
A650 can use the same SSBO descriptor for both 32-bit and 16-bit access,
which makes it easy to enable this extension.

Passes tests that run under:

dEQP-VK.spirv_assembly.instruction.*.16bit_storage.*

Rebased and modified commit from Jonathan Marek.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
2021-04-01 17:51:07 +00:00
Jonathan Marek 14acc64c3b turnip: enable VK_KHR_shader_float16_int8
ir3 supports 16-bit floats, so we can enable this.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
2021-04-01 17:51:07 +00:00
Danylo Piliaiev 64aaa4afc3 turnip: enable infinities for f16 math and document the register
When float16 is enabled this will allow to pass a number of
float16 tests.

When A6XX_SP_FLOAT_CNTL_F16_NO_INF is set - all operations which
generate +-infinity generate +-MAX_HALF_FLOAT.

Fixes some tests from:
 dEQP-VK.spirv_assembly.instruction.*.float16.*
 dEQP-VK.spirv_assembly.instruction.*.float_controls.fp16.*

E.g.:
 dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_1.sinh_vert
 dEQP-VK.spirv_assembly.instruction.compute.float16.arithmetic_4.length
 dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.log_denorm_flush_to_zero_nostorage
 dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.log2_denorm_flush_to_zero_nostorage
 dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.inv_sqrt_denorm_flush_to_zero_nostorage

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
2021-04-01 17:51:07 +00:00
Danylo Piliaiev 14460faa64 ir3: convert shift amount to 16b for 16b shifts
NIR has shifts defined as:
 opcode("*shr", 0, tuint, [0, 0], [tuint, tuint32], False, ...

However, in ir3 we have to ensure that both operators of shift
instruction have the same bitness.

Let's hope that in future the additional COV for constants would
be optimized away.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
2021-04-01 17:51:07 +00:00
Jonathan Marek 3777ecdf11 turnip: implement VK_KHR_shader_float_controls
This matches the blob and doesn't require actually implementing controls
since the supported modes are just what the HW does.

Passes tests under:

dEQP-VK.spirv_assembly.*.float_controls.*

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
2021-04-01 17:51:07 +00:00
Danylo Piliaiev de195671bd ir3: nir_op_f2f16 should round to even
cat1 instructions round to zero by default.

When fp16 is enabled this will fix:
 dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp32_nostorage_frag
 dEQP-VK.spirv_assembly.instruction.graphics.float_controls.fp16.input_args.rounding_rte_conv_from_fp32_nostorage_vert
 dEQP-VK.spirv_assembly.instruction.compute.float_controls.fp16.input_args.rounding_rte_conv_from_fp32_nostorage

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9840>
2021-04-01 17:51:07 +00:00
Michel Dänzer 6652c5018c ci: Merge ARM testing docker images to a single arm_test one
The merged image contains kernels & rootfs for both arm64 & armhf
baremetal test jobs, and is smaller than either arm{64,hf}_test image
before.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9955>
2021-04-01 16:35:26 +00:00
Michel Dänzer 4b20bd7425 ci: Build ARM baremetal rootfs in native container
Doing so in an x86 container via qemu was slow, and started failing
recently after updating to a newer qemu version.

This also results in smaller arm*_test* docker images, since we need to
install fewer Debian packages in them.

As a bonus, this turns some piglit tests from fail to pass (Or maybe
they'll turn out to be flakes? They've passed at least 3 times in a
row).

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9955>
2021-04-01 16:35:26 +00:00
Eric Anholt 0be9a40225 ci/freedreno: Demote a630-asan to a manual test for now.
It's flaky in producing Missing results. I've got an uprev that should
avoid the issue (and possibly a followon actual fix), but it's blocked on
being able to rebuild the arm containers.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9932>
2021-03-31 17:15:27 +00:00
Danylo Piliaiev 00d6ccebf9 ir3/isa: account for randomly set by blob lowest bit of ibo atomics
As far as I could see - blob randomly sets the lowest bit of atomic.b.*
instructions.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9841>
2021-03-31 15:03:35 +00:00
Bas Nieuwenhuizen 83c92a48b7 vulkan: Fix descriptor set creation with zero bindings.
MAX2(count * struct size, 1) results in 1 for count=0, not the size of a struct.

Since this MAX only seems to exist so we can keep using NULL for error reporting,
just refactor to return a VkResult.

Fixes: ad241b15a9 ("vk: consolidate dynamic descriptor binding sorting")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4522
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9880>
2021-03-29 23:32:50 +00:00
Matt Turner 0b35987895 tu: Skip tu_tiling_config_update_tile_layout() if not using gmem
Otherwise pass->tile_align_w will be 0, leading to a divide by zero and
undefined behavior. In practice, I saw this lead to an infinite loop in
tests like

dEQP-VK.draw.instanced.draw_indexed_indirect_vk_primitive_topology_line_list_attrib_divisor_0_multiview

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9606>
2021-03-29 21:58:24 +00:00
Eric Anholt 99838513ae freedreno/a5xx: Add support for clip distances and use them for userclip.
A little low-stakes RE effort as I unwind from fighting CI all day.  Comes
from diffing dEQP-VK.clipping.user_defined.clip_distance.vert.* on the
blob and comparing to a6xx behavior.  (My blob doesn't do tess, so if
there are equivalent tess fields for some of these, I didn't find them)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9870>
2021-03-29 21:24:16 +00:00