anv_pipeline_get_last_vue_prog_data (used by emit_3dstate_primitive_replication)
doesn't work for mesh stage.
Fixes: ae57628dd5 ("anv: Drop anv_pipeline::use_primitive_replication")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18495>
this will enable direct calling of the right function without the overhead
of having conditionals in the barrier functions themselves
eventually, the '2' variants will be widely enough deployed that
this can be deleted
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18628>
in some apps (hl2), there's a weird sequence like:
* bind attachment with srgb view
* clear
* bind attachment with base format
* draw
rewriting the clear color like this avoids unnecessarily triggering
a renderpass
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18627>
void clears are intended to be the first clear applied to a surface,
so ensure that these don't clobber any scissored clears
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18627>
If memory allocation fails, we look for a suitable sized BO in the BO cache and
wait until we can use its memory. That usually works, but there's a case when it
can fail despite sufficient memory in the system: BOs in the BO cache
contributing to memory pressure but none of them being of sufficient size. This
case is not just theoretical: it's seen in the OpenCL
test_non_uniform_work_group, which puts the system under considerable memory
pressure with an unusual allocation pattern.
To handle this case, try evicting *everything* from the BO cache and stalling
in order to allocate, if the above attempts failed. Fixes the following error:
DRM_IOCTL_PANFROST_CREATE_BO failed: No space left on device
on the aforementioned OpenCL test.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18579>
Example from dEQP-GLES2.functional.shaders.indexing.tmp_array.float_dynamic_write_dynamic_loop_read_fragment
Fragment Program: after 'pair translate'
0: src0.xyz = input[0], src1.xyz = const[5]
MAD temp[0].xyz, src0.xxx, src1.Hyz, src0.000
1: src0.xyz = const[1], src1.xyz = const[6]
MAD temp[1].xyz, src0.xxx, src0.111, -src1.x1z
2: src0.xyz = temp[1]
CMP temp[1].xyz, src0.000, src0.111, src0.xyz
3: src0.xyz = temp[0], src1.xyz = input[0], src2.xyz = temp[1]
CMP temp[2].x, src0.x__, src1.x__, -src2.y__
4: src0.xyz = input[0], src1.xyz = temp[0], src2.xyz = temp[1]
CMP temp[3].x, src0.x__, src1.x__, -src2.z__
5: src0.xyz = temp[1]
MAX temp[4].x, src0.x__, src0.z__
6: src0.xyz = temp[0], src1.xyz = input[0], src2.xyz = temp[4]
CMP temp[4].x, src0.x__, src1.x__, -src2.x__
7: src0.xyz = temp[3], src1.xyz = input[0], src2.xyz = temp[1]
CMP temp[3].x, src0.x__, src1.x__, -src2.x__
8: src0.xyz = input[0], src1.xyz = temp[2], src2.xyz = temp[1]
CMP temp[2].x, src0.x__, src1.x__, -src2.x__
9: src0.xyz = temp[1]
MAD temp[1].x, src0.x__, src0.y__, src0.000
10: src0.xyz = input[0], src1.xyz = temp[0], src2.xyz = temp[1]
CMP temp[1].x, src0.x__, src1.x__, -src2.x__
11: src0.xyz = const[2], src1.xyz = const[6]
MAD temp[5].xyz, src0.xxx, src0.111, -src1.x1z
12: src0.xyz = temp[5]
CMP temp[5].xyz, src0.000, src0.111, src0.xyz
13: src0.xyz = temp[0], src1.xyz = temp[2], src2.xyz = temp[5]
CMP temp[6].x, src0.y__, src1.x__, -src2.y__
14: src0.xyz = temp[3], src1.xyz = temp[0], src2.xyz = temp[5]
CMP temp[7].x, src0.x__, src1.y__, -src2.z__
15: src0.xyz = temp[5]
MAX temp[8].x, src0.x__, src0.z__
16: src0.xyz = temp[0], src1.xyz = temp[4], src2.xyz = temp[8]
CMP temp[4].x, src0.y__, src1.x__, -src2.x__
17: src0.xyz = temp[7], src1.xyz = temp[3], src2.xyz = temp[5]
CMP temp[3].x, src0.x__, src1.x__, -src2.x__
18: src0.xyz = temp[2], src1.xyz = temp[6], src2.xyz = temp[5]
CMP temp[2].x, src0.x__, src1.x__, -src2.x__
....
This will be pair scheduled to:
Fragment Program: after 'pair scheduling'
0: src0.xyz = input[0], src1.xyz = const[5] // original inst 0
MAD temp[0].xyz, src0.xxx, src1.Hyz, src0.000
1: src0.xyz = const[1], src1.xyz = const[6] // original inst 1
MAD temp[1].xyz, src0.xxx, src0.111, -src1.x1z
2: src0.xyz = const[2], src1.xyz = const[6] // original inst 11
MAD temp[5].xyz, src0.xxx, src0.111, -src1.x1
3: src0.xyz = temp[1] // original inst 2
CMP temp[1].xyz, src0.000, src0.111, src0.xyz
4: src0.xyz = temp[1], src1.xyz = temp[0], src2.xyz = input[0]
MAX temp[4].x, src0.x__, src0.z__ // original inst 5
CMP temp[2].w, src1.x, src2.x, -src0.y // original inst 3
5: src0.xyz = input[0], src1.xyz = temp[0], src2.xyz = temp[1]
CMP temp[3].w, src0.x, src1.x, -src2.z // original inst 4
6: src0.xyz = temp[5], src0.w = temp[2], src1.xyz = input[0], src2.xyz = temp[1]
CMP temp[5].xyz, src0.000, src0.111, src0.xyz // original inst 12
CMP temp[5].w, src1.x, src0.w, -src2.x // original inst 8
7: src0.xyz = temp[0], src0.w = temp[5], src1.xyz = temp[2], src2.xyz = temp[5]
CMP temp[6].x, src0.y__, src0.w__, -src2.y__ // original inst 13
8: src0.xyz = temp[5], src0.w = temp[3], src1.xyz = input[0], src2.xyz = temp[1]
MAX temp[8].x, src0.x__, src0.z__ // original inst 15
CMP temp[5].w, src0.w, src1.x, -src2.x // original inst 7
9: src0.xyz = temp[3], src0.w = temp[5], src1.xyz = temp[0], src2.xyz = temp[5]
CMP temp[7].x, src0.w__, src1.y__, -src2.z__ // original inst 14
10: src0.xyz = temp[2], src0.w = temp[5], src1.xyz = temp[6], src2.xyz = temp[5]
CMP temp[2].x, src0.w__, src1.x__, -src2.x__ // original inst 18
11: src0.xyz = temp[7], src0.w = temp[5], src1.xyz = temp[3], src2.xyz = temp[5]
CMP temp[3].x, src0.x__, src0.w__, -src2.x__ // original inst 17
....
The problem is that instruction 11 (which was instruction 17 before the scheduling) now reads
a wrong source for src0. It initially used the result of instruction 8 (now scheduled as 6),
but now it reads from instruction 8 (corresponding to instruction 7 before the scheduling).
The bug is quite subtle and needs few conditions to reproduce:
- there is a loop, therefore we skip the the register rename
pass and hence don't have the ssa-like form,
- there are at least two rgb instructions writing the same register
and both are convertible to alpha instruction,
- there is excess of rgb instructions, so that the conversion actually
happens.
So what happens, while scheduling instructions, the scheduler will
recognize there are no alpha instruction to pair the rgb ones with
and convert some to alpha. It primarily tries to use the same register,
just reuse the alpha channel.
Why it happens? We are tracking the usage of registers in the block
being scheduled and when we rewrite something we move the users tracked
by the reg_value structures to the new register. The problem is that when
we do this, the current code expects that the code is in the ssa-like
form. Here it is not (because of the loop) and when we convert the
original instruction 2, we move the dependency information about the
temp[2].x to temp[2].w. When we later convert instruction 8, which also
writes temp[2].x, the original dependency info is gone, and when we copy
that to the new reg (temp[5].w), we just set it to NULL and it means we
don't mark it as used effectively, and later wrongly use it again when
we look for a next empty register.
Fix this by not deleting the original dependency info. We can't reuse the
reg now, but it doesn't matter, because the regalloc later can sort it out.
There are no changes in the shader-db.
Fixes: dEQP-GLES2.functional.shaders.indexing.tmp_array.float_dynamic_write_dynamic_loop_read_fragment
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6508
Reviewed-by: Filip Gawin <filip@gawin.net>
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18621>
Fix defects reported by Coverity Scan.
Resource leak (RESOURCE_LEAK)
leaked_storage: Variable used going out of scope leaks the storage it points to.
leaked_storage: Variable multiple_uses going out of scope leaks the storage it points to.
Fixes: 8fb415fee2 ("pan/bi: Reduce some moves when going out-of-SSA")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18653>
Break up the monolithic SET_SHADER_EXTENDED packet into the separate
underlying commands (some only 2-byte sized and aligned), and add a
builder for USC control streams like we did for PPP updates to make that
change manageable.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
For compute kernels, this encodes how much workgroup-local memory is
used ("shared memory" or "threadgroup memory" or "local memory"). This
memory is partitioned by the hardware.
For fragment shaders, this... encodes exactly the same thing. There is
no traditional tilebuffer in AGX, instead local memory is interpreted as
an imageblock, where each workgroup is a tile. This is a nifty design.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
Histogram of sizes of the spill buffer, with logarithmic bucket sizes
(relative to the amount spilled from the perspective of a single thread).
Pretty funny.
Also mark a few unknowns that are nonzero when spilling is used.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
Confusingly, after creation rsrc->base.format will contain the external
format due to u_transfer_helper quirks. For our internal use, we need to
look at the internal format, rsrc->layout.format. With the new layout
code, the rsrc->internal_format property is redundant, so we delete
that to reduce confusion.
Fixes dEQP-GLES3.functional.texture.format.sized.2d.depth32f_stencil8_*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
We need the header to be common between gfx and compute, but everything
else seems to be different. Shuffle so we can decode compute without any
terrible hacks.
I don't know the exact layout and don't care: the layout of the fields
here is all software defined in macOS, even though the *values* are
defined by hardware (or firmware in a few cases).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
Same enum as PowerVR CDM, annoyingly different from the VDM block types.
Split out the stream link / terminate structs (both observed with Metal
for copious amounts of compute), in preparation for decoding "properly".
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
This logic doesn't really do what it pretends to; we don't expose the
RGTC features unless we actually have LATC support. This is about to
change, but for that logic to work, we need to be able to tell if we're
using a fallback-format or not, and we can't do that unless we keep the
format as LATC.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18564>
RGTC and LATC unpacks in the same way, just to different formats. So
let's add support for unpacking that in this helper.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18564>
Define more HDMI PHY/PLL registers used on msm8x74/apq8084 platforms.
Register names are defined in clock-mdss-8974.c (msm-3.10).
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18629>
v2:
- added more requirements for LLVM (thanks Mike Lothian (@FireBurn)).
v3:
- note the optional cases for rustfmt (thanks @LingMan)
- remove the part about the SPIR-V target for LLVM (thanks Karol Herbst
(@karolherbst))
v4:
- added minimum version requirements (thanks Karol Herbst
(@karolherbst))
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18640>
This is now done for all drivers that supports half-float and sRGB
textures. Update features.txt to reflect this.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18574>
Get rid of any zero-sized entries so drivers never even have to think
about this case when using templates.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14780>