Otherwise the list 'next' changing will cause the assertion in
list_for_each_entry to be hit.
This was not hit before because list_assert is defined for debug
builds but not debugoptimized.
Fixes: 5067a26f44 ("pan/bi: Use flow control lowering on Valhall")
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17371>
I wrote fiction about dreaming that these were supported but after
waking finding that they were not. Somehow I came to consider that
fiction as fact and I never thought to test if they did work.
While reverse engineering the polygon list format, I found that these
were supported after all.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17371>
This maps to CL_DEVICE_MAX_COMPUTE_UNITS, which is defined as:
The number of parallel compute cores on the OpenCL device.
Since all supported Malis are unified shader cores, the number of parallel
compute cores is simply the number of shader cores.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17265>
Whereas the compiler needs to know the warp size for lowering divergent
indirects, the driver needs to know it to report the subgroup size. Move the
Bifrost-specific helper to common and add the trivial implementation for
Midgard.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17265>
To query the core count, the hardware has a SHADERS_PRESENT register containing
a mask of shader cores connected. The core count equals the number of 1-bits,
regardless of placement. This value is useful for public consumption (like
in clinfo).
However, internally we are interested in the range of core IDs.
We usually query core count to determine how many cores to allocate various
per-core buffers for (performance counters, occlusion queries, and the stack).
In each case, the hardware writes at the index of its core ID, so we have to
allocate enough for entire range of core IDs. If the core mask is
discontiguous, this necessarily overallocates.
Rename the existing core_count to core_id_range, better reflecting its
definition and purpose, and repurpose core_count for the actual core count.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17265>
Float conversions with explicit rounding modes are required for OpenCL,
as well as for Vulkan with the VK_KHR_16bit_storage extension (mandatory
in Vulkan 1.1). Since the hardware conversion instructions allow
configuring the round mode, this is easy to support :-)
Fixes test_half.vstore_half_rtz.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17262>
With stages dispatching with a mask, we can run into situations where
we don't have the global address in all lanes. The existing code
always assumed we had the addres in at least lane0.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: bb40e999d1 ("intel/nir: use a single intel intrinsic to deal with ray traversal")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17330>
This got messed up when scalarizing the IR. Fix the definition of the opcode to
return (instead of break, asserting out) and to respect the swizzle (instead of
failing validation). Noticed when bringing up OpenCL on Valhall.
Fixes: 5febeae58e ("pan/bi: Emit collect and split")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17222>
this is for drivers (like freedreno) which need the format in the sampler
state in order to accurately handle border colors
when set, drivers MAY receive a format in the sampler state if the frontend
supports it (e.g., nine does not), and the cso sampler cache will include
the format member of the struct
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17189>
EOT messages need to use g112-g127 for their sources. With the new
opt_split_sends pass, we may be constructing an EOT message from two
different registers, and be able to copy propagate the original values
into those SENDs.
This can cause problems if we copy propagate from a large register
(say an RGBA value which is 4 GRFs in SIMD8 or 8 GRFs in SIMD16), in a
situation where the SEND only read a subset of that (say the alpha value
out of an RGBA texturing result). g112-127 can only hold 16 registers
worth of data, and sometimes we can only use g112-126. So, we can't
propagate if the GRFs in question are larger than 15 GRFs.
Fixes a shader validation failure in Alan Wake. Thanks to Ian Romanick
for catching this!
shader-db on Icelake shows that only SIMD32 programs are affected, and
the results are pretty negligable:
total instructions in shared programs: 19615228 -> 19615269 (<.01%)
instructions in affected programs: 10702 -> 10743 (0.38%)
helped: 1 / HURT: 43 / largest change: +/- 2 instructions
total cycles in shared programs: 852001706 -> 852001566 (<.01%)
cycles in affected programs: 767098 -> 766958 (-0.02%)
helped: 68 / HURT: 64 / largest change: +/- 774 cycles
GAINED: 2 / LOST: 0
Fixes: 589b03d02f ("intel/fs: Opportunistically split SEND message payloads")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6803
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17390>
A general cleanop of the nir compiler options including separating
the handling for FS and all the other shaders.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17076>
Currently, the NIR code path doesn't use clause local registers,
but these seem to help a lot with some work loads.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17076>
This is a rewite of the NIR backend. it adds some optimization
and a scheduler.
v2: - replace some magic numbers by constants
- make sure constructor is always used with new
- use default initialization in more places
(changes suggested by Filip Gawin)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17076>
Not sure whether this is really needed. With the TGSI code path no
other bank swizzle is emitted for LDS ops, so make sure it's the same here.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17076>
The number of ALU groups is important for good sccheduling, so
let's add it to the stats.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17076>
When utrace/perfetto is active, we allocate/free utrace buffers at the
same rate as command buffers. It's useful to have a pool that avoids
GEM_CREATE/GEM_CLOSE ioctls.
v2: Use the pool more
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16613>
Dozen is currently a SW driver as far as WSI is concerned so it's going
to wait on a fence anyway. Also, I highly doubt it's actually hooked
these up properly. It's probably just a copy+paste from ANV.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17388>
D3D12 requires the offset to be 512B-aligned and row pitch to be
256B-aligned for copy commands. There will need to be a fallback
written eventually because Vulkan has no such requirements but these
will remain the optimal limits as they allow using the D3D12 copy
commands directly.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17388>
Some drivers such as lavapipe are 100% fine with using linear for WSI
images. Most HW drivers, however, would rather render tiled and eat a
blit.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17388>
This isn't a big deal for the current buffer paths because the required
alignment for PRIME is already higher than any driver advertises.
However, the SW path we're about to add won't have the PRIME requirement.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17388>
Instead of taking a single boolean for device-local, take a set of
required properties and denied properties. This will let us require
additional things like being CPU mappable in the future.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17388>