This appeared in softpipe's image operations, since NIR always uses
4-component values for the coords, while the GLSL IR only has 2 components
for a 2D image (for example).
arb_shader_image_load_store-shader-mem-barrier (which times out in CI and
spends its time inside of tgsi_exec) was spending 4/51 of its instructions
on moving these undefs around.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9345>
I was suspicious that some entries in dri2_format_table (in
dri_helpers.c) had this field set incorrectly. It seemed like
DRM_FORMAT_ABGR16161616F and DRM_FORMAT_XBGR16161616F should have been 8
instead of 4. Upon digging I found that nothing uses the field. Fix
code by removing it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9354>
both bits will have been flagged at this point in order to indicate
that the aspects will be cleared "at some point" during the loop, but
when actually iterating through the pending clears, only the bits set
in the clear call should be applied
Fixes: 5c629e9ff2 ("zink: defer pipe_context::clear calls when not currently in a renderpass")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9366>
Remove the useless driCheckOption calls. They always
succeed.
As a result the intended behaviour for thread_submit
was not working (different default depending on the gpu
used). Add a comment to fix that in the future.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
At the release of the last object holding a reference
on the device, the device dtor was executed and
the objector dtor was ignored.
The proper way is to execute the object dtor, then
the device dtor.
The previous code was likely for a workaround against
something that was fixed since.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
*PrivateData functions were not protected by
a mutex for Volumes whereas they definitely
should.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
FETCH4 is a feature that needs to be implemented
to advertise D3DFMT_DF24.
It's basically a variant of Gather4.
This first implementation will need to be completed
to implement the feature fully, but the feature
doesn't seem to be much used (other equivalent
features are preferred by games).
Note until DF24 is advertised, apps are not supposed to use
FETCH4.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
FETCH4 is a d3d9 extension not much used, as newer
ones were prefered. However it's support is required
to advertise the DF24 format.
Prepares support by tracking compatible formats.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
Do not unmap anything until all buffer unlocks
were received.
A buffer can be filled in several threads, and thus
in the case of double locks, it's not possible to know
which unlock is received first.
Thus only unmap the buffers when the last unlock is
received.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
Previously we used to clamp "available_texture_limit",
which was incorrect. "available_texture_mem" should
have been clamped instead.
The resulting code was noop.
The idea behind that code was that 32 bits executable
would see maximum 4GB video memory.
However it seems according to users that 32 bits apps
should be able to allocate more than 4GB, thus the
clamping is inappropriate.
Instead clamp the return of GetAvailableTextureMem, to
correctly report a high value when there is more than
4GB available.
I do not know what should exactly be the clamp value,
for now have a 64MB margin below UINT_MAX.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
For D3DUSAGE_AUTOGENMIPMAP basically, everything behaves
for the application as if the texture had one level.
However the pipe_resource has more levels, and those
get generated automatically.
Previously we did allocate all the Surfaces as if
the texture had all the levels, except of just one.
The app could still just access the first level.
This patch completly removes the useless unaccessible
Surfaces.
In addition removes redundant handling of D3DUSAGE_AUTOGENMIPMAP.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
The code improvement is limited and it interferes with using literals
directly in LDS index ops, since here source modifiers are not
supported, but the current assembler code might inject the modifiers.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
The original shared load op can't be reordered, so it might be better to
also not allow this for the lowered variant.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
The backend code was actually assuming this, but the lowering still set
the components and write masks like it would be honoured.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
A Hitman 2 shader does: read64(local_invocation_index() * 4 - 4). This was
likely emitting a ds_read2_b32 on GFX6. For local_invocation_index()=0,
because the first dword was out-of-bounds, the second was likely also
considered out-of-bounds (even though it's not, at offset 0).
Likely fixes https://gitlab.freedesktop.org/mesa/mesa/-/issues/3882
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 57e6886f98 ("aco: refactor load_lds to use new helpers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9332>
This reverts commit 1a0b0e8460.
The bounds checking behaviour of ds_read_b64, ds_read_b96 and ds_read_b128
make this feature very difficult to use safely.
This fixes a blocking artifact in Hitman 2. Previously, it contained:
ds_read_b64(local_invocation_index() * 4 - 4)
For local_invocation_index()=0, the second dword would be considered
out-of-bounds, even though it's at offset 0.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9332>
Now that we can pipeline all varyings we should not be referring
specifically to smooth varyings anywhere.
Also, rename the instruction field 'ldvary_pipelining' to
'is_ldvary_sequence', which is more appropriate, since we always
set this for any instruction involved with varying setups,
independently of whether they end up being pipelined or not.
This also does some other minor edits which intend to slightly
simplify the code and make it a bit more compact.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9363>
An earlier commit tried to make this shader compatible with GLSL 3.30,
but it requires, GL_ARB_gpu_shader_int64, which requires GLSL 4.00 and
GL 4.0 according to the extension spec. So we were failing to enable
the required extension, breaking compilation of this shader.
The original intention of that patch was to get this working on zink,
which at the time only supported GL 3.3. But now it supports later
OpenGL versions, so we don't need to do this any longer. Rather than
revert the patch and raise the version all the way back to 430, just
bump it to the require 400 at Ian Romanick's suggestion.
Fixes: 4d47b22bf0 ("glsl/float64: make this compatible with glsl 330")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3991
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9351>
These end up having a NOP between the ldvary and the next instruction
in the sequence (a MOV for flat and an FADD for noperspetive):
nop ; nop ; ldvary.r0
nop ; nop
fadd rf6, r0, r5 ; nop ; ldvary.r1
nop ; nop
fadd rf5, r1, r5 ; nop ; ldvary.r2
nop ; nop
fadd rf4, r2, r5 ; nop ; ldvary.r3
To pipeline these, we can reuse the same infrastructure we have in
place for smooth varyings but we need to avoid breaking the sequence
due to the NOP instruction. We do that by testing if dropping the
sequence when we failed to pick up the next instruction also fails
to choose an instruction.
This is not perfect, because we may be able to choose an instruction
outside the sequence such as an ldunif, and use that to break a
sequence that we could otherwise continue after scheduling the NOP
instruction, but it is still better than nothing.
total instructions in shared programs: 13820690 -> 13819774 (<.01%)
instructions in affected programs: 64026 -> 63110 (-1.43%)
helped: 479
HURT: 62
Instructions are helped.
total max-temps in shared programs: 2326435 -> 2326423 (<.01%)
max-temps in affected programs: 102 -> 90 (-11.76%)
helped: 7
HURT: 0
Max-temps are helped.
total sfu-stalls in shared programs: 30683 -> 30710 (0.09%)
sfu-stalls in affected programs: 13 -> 40 (207.69%)
helped: 2
HURT: 24
Sfu-stalls are HURT.
total inst-and-stalls in shared programs: 13851373 -> 13850484 (<.01%)
inst-and-stalls in affected programs: 62818 -> 61929 (-1.42%)
helped: 466
HURT: 65
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
Typically, we would schedule smooth varyings like this:
nop ; nop ; ldvary.r4
nop ; fmul r0, r4, rf0
fadd rf13, r0, r5 ; nop ; ldvary.r1
nop ; fmul r2, r1, rf0
fadd rf12, r2, r5 ; nop ; ldvary.r3
nop ; fmul r4, r3, rf0
fadd rf11, r4, r5 ; nop ; ldvary.r0
where we pair up an ldvary with the fadd of the previous sequence
instead of the previous fmul. This is because ldvary has an implicit
write to r5 which is read by the fadd of the previous sequence, so
our dependency tracking doesn't allow us to move the ldvary before the
fadd, however, the r5 write of the ldvary instruction happens in the
instruction after it is emitted so we can actually move it to the fmul
and the r5 write would still happen in the same instruction as the fadd,
which is fine.
This patch allows us to pipeline these sequences optimally. For that,
after merging an ldvary into a previous instruction in the middle of
a pipelineable ldvary sequence, we check if we can manually move it
to the last scheduled instruction instead (the one before the
instruction we are currently scheduling).
If we are successful at moving the ldvary to the previous instruction,
then we flag the ldvary as scheduled immediately, which may promote
its children (the follow-up fmul instruction for that ldvary) to DAG
heads and continue the merge loop so that fmul can be picked and
merged into the final fadd of the previous sequence (where we had
originally merged the ldvary). This leads to a result that looks like
this:
nop ; nop ; ldvary.r4
nop ; fmul r0, r4, rf0 ; ldvary.r1
fadd rf13, r0, r5 ; fmul r2, r1, rf0 ; ldvary.r3
fadd rf12, r2, r5 ; fmul r4, r3, rf0 ; ldvary.r0
Shader-db results:
total instructions in shared programs: 14071591 -> 13820690 (-1.78%)
instructions in affected programs: 7809692 -> 7558791 (-3.21%)
helped: 41209
HURT: 4528
Instructions are helped.
total max-temps in shared programs: 2335784 -> 2326435 (-0.40%)
max-temps in affected programs: 84302 -> 74953 (-11.09%)
helped: 4561
HURT: 293
Max-temps are helped.
total sfu-stalls in shared programs: 31537 -> 30683 (-2.71%)
sfu-stalls in affected programs: 3551 -> 2697 (-24.05%)
helped: 1713
HURT: 750
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 14103128 -> 13851373 (-1.79%)
inst-and-stalls in affected programs: 7820726 -> 7568971 (-3.22%)
helped: 41411
HURT: 4535
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
If we have two (or more) smooth varyings like this:
nop t3; ldvary.rf0
fmul t5, t3, t0
fadd t6, t5, r5
nop t7; ldvary.rf0
fmul t9, t7, t0
fadd t10, t9, r5
nop t11; ldvary.rf0
fmul t13, t11, t0
fadd t14, t13, r5
We may be able to pipeline them like this:
nop ; nop ; ldvary.r4
nop ; fmul r0, r4, rf0 ; ldvary.r1
fadd rf13, r0, r5 ; fmul r2, r1, rf0 ; ldvary.r3
fadd rf12, r2, r5 ; fmul r4, r3, rf0 ; ldvary.r0
But in order to do this, we will need to manually tweak the
QPU scheduling.
This patch tracks information about ldvary sequences that are
good candidates for pipelining, and a follow-up patch will
use this information to pipeline them when we emit the QPU
code.
v2 (apinheiro):
- Rename the v3d_compile fields to avoid confusion with the qinst fields.
- Assert that a sequence's start instruction is not the same as the end.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>