Commit Graph

136212 Commits

Author SHA1 Message Date
Mike Blumenkrantz bc5dcf1527 zink: ci updates
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9291>
2021-03-03 01:37:02 +00:00
Mike Blumenkrantz 587d15ca6c zink: use staging resource for write transfer_map in order to not stall
we can just give the user a staging resource and then flush the data back
later

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9291>
2021-03-03 01:37:02 +00:00
Marek Olšák db67d9c0d1 radeonsi: don't crash on NULL images in si_check_needs_implicit_sync
This fixes CTS test: KHR-GL46.arrays_of_arrays_gl.AtomicUsage

Fixes: bddc0e023c "radeonsi: fix read from compute / write from draw sync"

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9361>
2021-03-03 01:19:24 +00:00
Marek Olšák f9e6c7a220 ac/llvm: fix ac_build_atomic_rmw with LLVM 13
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4383

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9361>
2021-03-03 01:19:24 +00:00
Eric Anholt 8bd0cc1a5a nir/vec_to_movs: Don't generate MOVs for undef channels.
This appeared in softpipe's image operations, since NIR always uses
4-component values for the coords, while the GLSL IR only has 2 components
for a 2D image (for example).
arb_shader_image_load_store-shader-mem-barrier (which times out in CI and
spends its time inside of tgsi_exec) was spending 4/51 of its instructions
on moving these undefs around.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9345>
2021-03-03 00:51:44 +00:00
Eric Anholt 1e5ef4c60c nir: Add a nir_src_is_undef() helper, like nir_src_is_const().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9345>
2021-03-03 00:51:44 +00:00
Mike Blumenkrantz c77df59c9e zink: export PIPE_CAP_TGSI_VS_LAYER_VIEWPORT
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9283>
2021-03-02 17:42:00 -05:00
Mike Blumenkrantz ffd046cf32 zink: enable PIPE_CAP_CLEAR_SCISSORED
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9283>
2021-03-02 17:42:00 -05:00
Dave Airlie abc724e440 lavapipe: sort bindings before creating descriptor set
This ensures the dynamic offsets are correct

Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9359>
2021-03-03 08:06:02 +10:00
Dave Airlie 0a939e788f lavapipe: reorder descriptor set stages to get correct binding
The fragment stage was in the wrong place here.

Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9359>
2021-03-03 08:02:16 +10:00
Ian Romanick 7ca3e90c18 gallium/dri: Remove dri2_format_mapping::cpp
I was suspicious that some entries in dri2_format_table (in
dri_helpers.c) had this field set incorrectly.  It seemed like
DRM_FORMAT_ABGR16161616F and DRM_FORMAT_XBGR16161616F should have been 8
instead of 4.  Upon digging I found that nothing uses the field.  Fix
code by removing it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9354>
2021-03-02 19:42:04 +00:00
Karol Herbst f0dccd9578 clover: Add missing include for llvm-12 build fix
Fixes: d1eab2b1eb ("clover: Fix build with llvm-12.")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9372>
2021-03-02 19:35:40 +00:00
Mike Blumenkrantz 1294aec650 zink: apply only the pending zs clear bits during deferred clears
both bits will have been flagged at this point in order to indicate
that the aspects will be cleared "at some point" during the loop, but
when actually iterating through the pending clears, only the bits set
in the clear call should be applied

Fixes: 5c629e9ff2 ("zink: defer pipe_context::clear calls when not currently in a renderpass")

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9366>
2021-03-02 19:24:52 +00:00
Axel Davy e891f039da st/nine: Simplify checks for driconf options
Remove the useless driCheckOption calls. They always
succeed.

As a result the intended behaviour for thread_submit
was not working (different default depending on the gpu
used). Add a comment to fix that in the future.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
2021-03-02 20:07:08 +01:00
Axel Davy 642e19dc44 driconf: Rename csmt_int back to csmt_force
Fixes regression introduced by
<https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6916>

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
2021-03-02 20:07:07 +01:00
Axel Davy 7a1a1fc5d9 st/nine: Fix leak at device destruction
At the release of the last object holding a reference
on the device, the device dtor was executed and
the objector dtor was ignored.

The proper way is to execute the object dtor, then
the device dtor.

The previous code was likely for a workaround against
something that was fixed since.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
2021-03-02 20:07:07 +01:00
Axel Davy d730f8d7a9 st/nine: Protect *PrivateData also for Volumes
*PrivateData functions were not protected by
a mutex for Volumes whereas they definitely
should.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
2021-03-02 20:07:07 +01:00
Axel Davy b383b1e01a st/nine: Refactor ht_guid_delete
Have ht_guid_delete take a hash_entry.

As a result, we can use _mesa_hash_table_remove instead of
_mesa_hash_table_remove_key.
The previous code using the latter was incorrect as the key
of the entry was read after it was freed.

Fixes: https://github.com/iXit/wine-nine-standalone/issues/40

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
2021-03-02 20:07:07 +01:00
Axel Davy 501ad0e134 st/nine: Add new debug and error checks
Add new debug messages and error checks

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
2021-03-02 20:07:07 +01:00
Axel Davy 1a53099909 st/nine: Enable DF24 support
We can enable it, now that FETCH4 is
implemented.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
2021-03-02 20:07:07 +01:00
Axel Davy 1357d2a60a st/nine: Implement experimental FETCH4
FETCH4 is a feature that needs to be implemented
to advertise D3DFMT_DF24.
It's basically a variant of Gather4.

This first implementation will need to be completed
to implement the feature fully, but the feature
doesn't seem to be much used (other equivalent
features are preferred by games).

Note until DF24 is advertised, apps are not supposed to use
FETCH4.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
2021-03-02 20:07:07 +01:00
Axel Davy d097bdcc78 st/nine: Track formats compatible with FETCH4
FETCH4 is a d3d9 extension not much used, as newer
ones were prefered. However it's support is required
to advertise the DF24 format.

Prepares support by tracking compatible formats.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
2021-03-02 20:07:07 +01:00
Axel Davy 6a3451e170 st/nine: Unmap buffers after full unlock
Do not unmap anything until all buffer unlocks
were received.

A buffer can be filled in several threads, and thus
in the case of double locks, it's not possible to know
which unlock is received first.
Thus only unmap the buffers when the last unlock is
received.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
2021-03-02 20:07:07 +01:00
Axel Davy 3dd6b79215 st/nine: Clamp GetAvailableTextureMem
Previously we used to clamp "available_texture_limit",
which was incorrect. "available_texture_mem" should
have been clamped instead.
The resulting code was noop.

The idea behind that code was that 32 bits executable
would see maximum 4GB video memory.

However it seems according to users that 32 bits apps
should be able to allocate more than 4GB, thus the
clamping is inappropriate.

Instead clamp the return of GetAvailableTextureMem, to
correctly report a high value when there is more than
4GB available.

I do not know what should exactly be the clamp value,
for now have a 64MB margin below UINT_MAX.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
2021-03-02 20:07:07 +01:00
Axel Davy f85f025a05 st/nine: Do not allow depth buffer render targets
Without the proposed check, some apps will decide to use depth buffers
as render targets.

Bug found investigating:
https://github.com/iXit/wine-nine-standalone/issues/82

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
2021-03-02 20:07:07 +01:00
Axel Davy 3dbc542f97 st/nine: Reduce system memory allocated by D3DUSAGE_AUTOGENMIPMAP
For D3DUSAGE_AUTOGENMIPMAP basically, everything behaves
for the application as if the texture had one level.
However the pipe_resource has more levels, and those
get generated automatically.

Previously we did allocate all the Surfaces as if
the texture had all the levels, except of just one.
The app could still just access the first level.

This patch completly removes the useless unaccessible
Surfaces.
In addition removes redundant handling of D3DUSAGE_AUTOGENMIPMAP.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
2021-03-02 20:07:07 +01:00
Gert Wollny ec74a13618 r600/sfn: Update status
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
2021-03-02 18:46:17 +01:00
Gert Wollny 43816d20dd r600: Enable GLSL 450 for nir shaders.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
2021-03-02 18:46:17 +01:00
Gert Wollny 4d91812d3c r600: Don't optimize using source modifiers on literals
The code improvement is limited and it interferes with using literals
directly in LDS index ops, since here source modifiers are not
supported, but the current assembler code might inject the modifiers.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
2021-03-02 18:46:17 +01:00
Gert Wollny 49b0e8657e r600/sfn: Fix loading TES gl_PatchVerticesIn
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
2021-03-02 18:46:17 +01:00
Gert Wollny bd57bf6d82 r600/sfn: handle querying the number of layers in cube arrays
This has to be loaded from a constant buffer instead of the actual
texture.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
2021-03-02 18:46:17 +01:00
Gert Wollny 935d9e6863 nir: disaallow reordering for r600 shared load and remove component field
The original shared load op can't be reordered, so it might be better to
also not allow this for the lowered variant.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
2021-03-02 18:46:17 +01:00
Gert Wollny d1ccf4a0ee r600/sfn: encode component in address for local IO
The backend code was actually assuming this, but the lowering still set
the components and write masks like it would be honoured.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
2021-03-02 18:46:17 +01:00
Gert Wollny c0c025c870 r600/sfn: remove some old debug output
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
2021-03-02 18:46:17 +01:00
Gert Wollny b07992c4dc r600/sfn: remove unused emit_alu_op2_split_src_mods
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
2021-03-02 18:46:17 +01:00
Gert Wollny ddc5c99402 r600/sfn: remove code for nir_op_fsign since it is lowered
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
2021-03-02 18:46:17 +01:00
Gert Wollny 4fe0339941 r600: unify nir shader options evaluation
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
2021-03-02 18:46:17 +01:00
Gert Wollny 04d8d455b7 r600/sfn: Allow any channel for the helper invocation evaluation
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
2021-03-02 18:46:17 +01:00
Gert Wollny 911c6af2fd r600/sfn: lower isign and iabs in nir
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
2021-03-02 18:46:17 +01:00
Gert Wollny 7d94d759fa r600/sfn: set info about using helper_invocation to skip sb
sb can't handle helper invocations, so skip sb when it is used.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
2021-03-02 18:46:17 +01:00
Gert Wollny c427ed7ffe r600/sfn: Lower FS inputs to temps late and, and lower interpolate at
This fixes FS shaders where a var is loaded with two different
interpolators.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
2021-03-02 18:46:17 +01:00
Jose Fonseca 3ba7784b1e util: Always use timespec_get on Windows.
include/c11/threads_win32.h provides a fallback implementation of
timespec_get when necessary.

Fixes https://gitlab.freedesktop.org/mesa/mesa/-/issues/4109

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9280>
2021-03-02 14:37:46 +00:00
Rhys Perry 3a72044ece aco: add missing usable_read2 check
A Hitman 2 shader does: read64(local_invocation_index() * 4 - 4). This was
likely emitting a ds_read2_b32 on GFX6. For local_invocation_index()=0,
because the first dword was out-of-bounds, the second was likely also
considered out-of-bounds (even though it's not, at offset 0).

Likely fixes https://gitlab.freedesktop.org/mesa/mesa/-/issues/3882

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 57e6886f98 ("aco: refactor load_lds to use new helpers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9332>
2021-03-02 13:13:59 +00:00
Rhys Perry 941739619e Revert "radv,aco: allow unaligned LDS access on GFX9+"
This reverts commit 1a0b0e8460.

The bounds checking behaviour of ds_read_b64, ds_read_b96 and ds_read_b128
make this feature very difficult to use safely.

This fixes a blocking artifact in Hitman 2. Previously, it contained:
ds_read_b64(local_invocation_index() * 4 - 4)
For local_invocation_index()=0, the second dword would be considered
out-of-bounds, even though it's at offset 0.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9332>
2021-03-02 13:13:59 +00:00
Iago Toral Quiroga acbd4881c2 broadcom/compiler: ldvary pipelining tracking and documentation clean-ups
Now that we can pipeline all varyings we should not be referring
specifically to smooth varyings anywhere.

Also, rename the instruction field 'ldvary_pipelining' to
'is_ldvary_sequence', which is more appropriate, since we always
set this for any instruction involved with varying setups,
independently of whether they end up being pipelined or not.

This also does some other minor edits which intend to slightly
simplify the code and make it a bit more compact.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9363>
2021-03-02 13:54:14 +01:00
Kenneth Graunke a48151ffad glsl/float64: Bump #version to 400
An earlier commit tried to make this shader compatible with GLSL 3.30,
but it requires, GL_ARB_gpu_shader_int64, which requires GLSL 4.00 and
GL 4.0 according to the extension spec.  So we were failing to enable
the required extension, breaking compilation of this shader.

The original intention of that patch was to get this working on zink,
which at the time only supported GL 3.3.  But now it supports later
OpenGL versions, so we don't need to do this any longer.  Rather than
revert the patch and raise the version all the way back to 430, just
bump it to the require 400 at Ian Romanick's suggestion.

Fixes: 4d47b22bf0 ("glsl/float64: make this compatible with glsl 330")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3991
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9351>
2021-03-02 09:30:24 +00:00
Karol Herbst d1eab2b1eb clover: Fix build with llvm-12.
Fix build error after LLVM commit c495dfe0268b ("[clang][cli] NFC:
Decrease the scope of ParseLangArgs parameters").

../src/gallium/frontends/clover/llvm/invocation.cpp: In function ‘std::unique_ptr<clang::CompilerInstance> {anonymous}::create_compiler_instance(const clover::device&, const string&, const std::vector<std::__cxx11::basic_string<char> >&, std::string&)’:
../src/gallium/frontends/clover/llvm/invocation.cpp:252:55: error: cannot convert ‘clang::PreprocessorOptions’ to ‘std::vector<std::__cxx11::basic_string<char> >&’
  252 |                                 c->getPreprocessorOpts(),
      |                                 ~~~~~~~~~~~~~~~~~~~~~~^~
      |                                                       |
      |                                                       clang::PreprocessorOptions

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4114
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8543>
2021-03-02 09:16:53 +00:00
Iago Toral Quiroga 05f8efbc2c broadcom/compiler: allow pipelining of flat and noperspective varyings
These end up having a NOP between the ldvary and the next instruction
in the sequence (a MOV for flat and an FADD for noperspetive):

nop                  ; nop               ; ldvary.r0
nop                  ; nop
fadd  rf6, r0, r5    ; nop               ; ldvary.r1
nop                  ; nop
fadd  rf5, r1, r5    ; nop               ; ldvary.r2
nop                  ; nop
fadd  rf4, r2, r5    ; nop               ; ldvary.r3

To pipeline these, we can reuse the same infrastructure we have in
place for smooth varyings but we need to avoid breaking the sequence
due to the NOP instruction. We do that by testing if dropping the
sequence when we failed to pick up the next instruction also fails
to choose an instruction.

This is not perfect, because we may be able to choose an instruction
outside the sequence such as an ldunif, and use that to break a
sequence that we could otherwise continue after scheduling the NOP
instruction, but it is still better than nothing.

total instructions in shared programs: 13820690 -> 13819774 (<.01%)
instructions in affected programs: 64026 -> 63110 (-1.43%)
helped: 479
HURT: 62
Instructions are helped.

total max-temps in shared programs: 2326435 -> 2326423 (<.01%)
max-temps in affected programs: 102 -> 90 (-11.76%)
helped: 7
HURT: 0
Max-temps are helped.

total sfu-stalls in shared programs: 30683 -> 30710 (0.09%)
sfu-stalls in affected programs: 13 -> 40 (207.69%)
helped: 2
HURT: 24
Sfu-stalls are HURT.

total inst-and-stalls in shared programs: 13851373 -> 13850484 (<.01%)
inst-and-stalls in affected programs: 62818 -> 61929 (-1.42%)
helped: 466
HURT: 65
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
2021-03-02 07:56:00 +01:00
Iago Toral Quiroga 1784dd22a3 broadcom/compiler: pipeline smooth ldvary sequences
Typically, we would schedule smooth varyings like this:

nop                  ; nop               ; ldvary.r4
nop                  ; fmul  r0, r4, rf0
fadd  rf13, r0, r5   ; nop               ; ldvary.r1
nop                  ; fmul  r2, r1, rf0
fadd  rf12, r2, r5   ; nop               ; ldvary.r3
nop                  ; fmul  r4, r3, rf0
fadd  rf11, r4, r5   ; nop               ; ldvary.r0

where we pair up an ldvary with the fadd of the previous sequence
instead of the previous fmul. This is because ldvary has an implicit
write to r5 which is read by the fadd of the previous sequence, so
our dependency tracking doesn't allow us to move the ldvary before the
fadd, however, the r5 write of the ldvary instruction happens in the
instruction after it is emitted so we can actually move it to the fmul
and the r5 write would still happen in the same instruction as the fadd,
which is fine.

This patch allows us to pipeline these sequences optimally. For that,
after merging an ldvary into a previous instruction in the middle of
a pipelineable ldvary sequence, we check if we can manually move it
to the last scheduled instruction instead (the one before the
instruction we are currently scheduling).

If we are successful at moving the ldvary to the previous instruction,
then we flag the ldvary as scheduled immediately, which may promote
its children (the follow-up fmul instruction for that ldvary) to DAG
heads and continue the merge loop so that fmul can be picked and
merged into the final fadd of the previous sequence (where we had
originally merged the ldvary). This leads to a result that looks like
this:

nop                  ; nop               ; ldvary.r4
nop                  ; fmul  r0, r4, rf0 ; ldvary.r1
fadd  rf13, r0, r5   ; fmul  r2, r1, rf0 ; ldvary.r3
fadd  rf12, r2, r5   ; fmul  r4, r3, rf0 ; ldvary.r0

Shader-db results:

total instructions in shared programs: 14071591 -> 13820690 (-1.78%)
instructions in affected programs: 7809692 -> 7558791 (-3.21%)
helped: 41209
HURT: 4528
Instructions are helped.

total max-temps in shared programs: 2335784 -> 2326435 (-0.40%)
max-temps in affected programs: 84302 -> 74953 (-11.09%)
helped: 4561
HURT: 293
Max-temps are helped.

total sfu-stalls in shared programs: 31537 -> 30683 (-2.71%)
sfu-stalls in affected programs: 3551 -> 2697 (-24.05%)
helped: 1713
HURT: 750
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 14103128 -> 13851373 (-1.79%)
inst-and-stalls in affected programs: 7820726 -> 7568971 (-3.22%)
helped: 41411
HURT: 4535
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
2021-03-02 07:56:00 +01:00
Iago Toral Quiroga 1d021539a2 broadcom/compiler: track pipelineable ldvary sequences
If we have two (or more) smooth varyings like this:

nop t3; ldvary.rf0
fmul t5, t3, t0
fadd t6, t5, r5
nop t7; ldvary.rf0
fmul t9, t7, t0
fadd t10, t9, r5
nop t11; ldvary.rf0
fmul t13, t11, t0
fadd t14, t13, r5

We may be able to pipeline them like this:

nop                  ; nop               ; ldvary.r4
nop                  ; fmul  r0, r4, rf0 ; ldvary.r1
fadd  rf13, r0, r5   ; fmul  r2, r1, rf0 ; ldvary.r3
fadd  rf12, r2, r5   ; fmul  r4, r3, rf0 ; ldvary.r0

But in order to do this, we will need to manually tweak the
QPU scheduling.

This patch tracks information about ldvary sequences that are
good candidates for pipelining, and a follow-up patch will
use this information to pipeline them when we emit the QPU
code.

v2 (apinheiro):
  - Rename the v3d_compile fields to avoid confusion with the qinst fields.
  - Assert that a sequence's start instruction is not the same as the end.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
2021-03-02 07:56:00 +01:00