We support VK_KHR_dedicated_allocation so we must fill
VkMemoryDedicatedRequirements.
Vulkan spec states:
"[...] requiresDedicatedAllocation may be VK_TRUE under one of the
following conditions:
The pNext chain of VkImageCreateInfo for the call to vkCreateImage used
to create the image being queried included a VkExternalMemoryImageCreateInfo
structure, and any of the handle types specified in
VkExternalMemoryImageCreateInfo::handleTypes requires dedicated allocation,
as reported by vkGetPhysicalDeviceImageFormatProperties2 in
VkExternalImageFormatProperties::externalMemoryProperties.externalMemoryFeatures,
the requiresDedicatedAllocation field will be set to VK_TRUE."
All handle types require dedicated allocation at the moment.
Fixes:
dEQP-VK.api.external.memory.opaque_fd.dedicated.image.info
dEQP-VK.memory.requirements.dedicated_allocation.buffer.regular
dEQP-VK.memory.requirements.dedicated_allocation.image.transient_tiling_optimal
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9086>
This allows the caller to specify the view mask for this draw
in a multiview draw environment
This has been packed into the upper nibble and 2 bits of the
index size to retain the struct size as small as possible
for tc.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9399>
gl_MaxVaryingFloats was not removed from core until 4.20 and is still
available in compat shaders. Found while writing some new CTS to test
the correct declarations of this constant.
Fixes: 0ebf4257a385i ("glsl: define some GLES3 constants in GLSL 4.1")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9514>
Technically, this is only one field on IVB but it's two on BYT and so it
makes things easier if we split it for all Gen7.
While we're here, make some of the other fields in L3SQCREG1 Booleans.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9537>
This also trivially adds ARB_texture_filter_minmax, since the EXT
variant is a strict superset.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9487>
This is used to expose ARB/EXT_texture_filter_minmax. Note that only the
EXT_* enable is provided since the ARB one would require proper handling
of some formats not being supported. For now this is force-enabled for
everything.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9487>
This is to allow for
VK_EXT_sampler_filter_minmax
GL_EXT_texture_filter_minmax
support
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9487>
Calling code to pack/unpack with overlap would be already be undefined.
Cuts 50k of text on x86_64 release builds from the compiler having more
freedom in the src/dst loads knowing that they don't interfere with each
other.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9500>
This change is in line with naming convention used in isl.
We want to keep intel_ prefix reserved for common code.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9532>
There are only a couple patterns that use is_finite, so the changes
aren't huge. Mostly shaders from Batman Arkham City and a few shaders
from Shadow of the Tomb Raider were affected.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Tiger Lake
Instructions in all programs: 160902591 -> 160902489 (-0.0%)
SENDs in all programs: 6812270 -> 6812270 (+0.0%)
Loops in all programs: 38225 -> 38225 (+0.0%)
Cycles in all programs: 7429003266 -> 7428992369 (-0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304539 -> 304539 (+0.0%)
Ice Lake
Instructions in all programs: 145301634 -> 145301460 (-0.0%)
SENDs in all programs: 6863890 -> 6863890 (+0.0%)
Loops in all programs: 38219 -> 38219 (+0.0%)
Cycles in all programs: 8798589772 -> 8798575869 (-0.0%)
Spills in all programs: 216880 -> 216880 (+0.0%)
Fills in all programs: 334250 -> 334250 (+0.0%)
Skylake
Instructions in all programs: 135892010 -> 135891836 (-0.0%)
SENDs in all programs: 6802916 -> 6802916 (+0.0%)
Loops in all programs: 38216 -> 38216 (+0.0%)
Cycles in all programs: 8442597324 -> 8442583202 (-0.0%)
Spills in all programs: 194839 -> 194839 (+0.0%)
Fills in all programs: 301116 -> 301116 (+0.0%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9108>
Recall that when either value is NaN, fmax will pick the other value.
This means the result range of the fmax will either be the "ideal"
result range (calculated above) or the range of the non-NaN value.
Previously, something like fmax({gt_zero}, {lt_zero, is_a_number}) would
return a range of gt_zero. However, if the "gt_zero" parameter is NaN,
the actual result will be the "lt_zero" parameter.
This analysis depends on the is_a_number analysis also added in this MR.
Assuming this doesn't cause any unforeseen problems, I believe we should
wait a bit, then nominate a subset of the series for the stable
branches.
This fixes the piglit tests
tests/spec/glsl-1.30/execution/range_analysis_fmax_of_nan.shader_test
tests/spec/glsl-1.30/execution/range_analysis_fmin_of_nan.shader_test
from https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/463.
Even with the added fsat fixes, range_analysis_fsat_of_nan.shader_test
still fails. There are some other issues there that will be addressed
in later commits (in another MR).
v2: Add fsat fixes. Suggested by Rhys.
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Shader-db results:
All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21049290 -> 21049314 (<.01%)
instructions in affected programs: 3175 -> 3199 (0.76%)
helped: 0
HURT: 17
HURT stats (abs) min: 1 max: 3 x̄: 1.41 x̃: 1
HURT stats (rel) min: 0.20% max: 1.89% x̄: 0.97% x̃: 0.92%
95% mean confidence interval for instructions value: 1.09 1.73
95% mean confidence interval for instructions %-change: 0.75% 1.19%
Instructions are HURT.
total cycles in shared programs: 855136176 -> 855136406 (<.01%)
cycles in affected programs: 37579 -> 37809 (0.61%)
helped: 0
HURT: 17
HURT stats (abs) min: 12 max: 20 x̄: 13.53 x̃: 14
HURT stats (rel) min: 0.17% max: 1.13% x̄: 0.79% x̃: 0.91%
95% mean confidence interval for cycles value: 12.53 14.53
95% mean confidence interval for cycles %-change: 0.63% 0.94%
Cycles are HURT.
Fossil-db results:
Tiger Lake
Instructions in all programs: 160901033 -> 160902591 (+0.0%)
SENDs in all programs: 6812270 -> 6812270 (+0.0%)
Loops in all programs: 38225 -> 38225 (+0.0%)
Cycles in all programs: 7430016795 -> 7429003266 (-0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304539 -> 304539 (+0.0%)
Ice Lake
Instructions in all programs: 145299102 -> 145301634 (+0.0%)
SENDs in all programs: 6863890 -> 6863890 (+0.0%)
Loops in all programs: 38219 -> 38219 (+0.0%)
Cycles in all programs: 8798390846 -> 8798589772 (+0.0%)
Spills in all programs: 216880 -> 216880 (+0.0%)
Fills in all programs: 334250 -> 334250 (+0.0%)
Skylake
Instructions in all programs: 135889478 -> 135892010 (+0.0%)
SENDs in all programs: 6802916 -> 6802916 (+0.0%)
Loops in all programs: 38216 -> 38216 (+0.0%)
Cycles in all programs: 8442624166 -> 8442597324 (-0.0%)
Spills in all programs: 194839 -> 194839 (+0.0%)
Fills in all programs: 301116 -> 301116 (+0.0%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9108>
This commit is necessary to support "nir/range_analysis: Fix analysis of
fmin and fmax with NaN".
No shader-db or fossil-db changes on any Intel platform.
v2: Pack and unpack is_a_number.
v3: Don't set is_a_number of integer constants. The bit pattern might
be NaN.
v4: Update handling of b2i32. intBitsToFloat(int(true)) is
1.401298464324817e-45. Return a value consistent with that.
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9108>
The obvious changes to nir_search_helpers.h are in a separate commit to
limit the scope of this change. These additions are really only needed
to support the next commit "nir/range_analysis: Add "is a number" range
analysis tracking". This reduction in scope is intended to increase the
suitability for stable branches.
No shader-db or fossil-db changes on any Intel platform.
v2: Pack and unpack is_finite.
v3: Split nir_search_helpers.h changes into a separate commit.
v4: Remove assertion intended for the next commit. Update is_finite
comment for fsign. Both noticed by Rhys. Fix is_finite handling for
load_const vectors. If any element is not finite, set the flag to
false. This is the same way is_integral is already handled.
v5: Update handling of b2i32. intBitsToFloat(int(true)) is
1.401298464324817e-45. Return a value consistent with that.
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9108>
This will greatly simplify a later commit. The assert(r.is_integral) in
the eq_zero case is dropped because I don't think it's useful anymore.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9108>
Vulkan 1.1 has VK_KHR_device_group and VK_KHR_device_group_creation
promoted to core, thus we should handle DeviceIndex built-in.
While we are here, also add these extensions to the extensions list,
even though they are not doing anything useful.
Fixes test:
dEQP-VK.compute.device_group.device_index
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9516>
The registers were actually different per-stage even though we used the
same type, which resulted in a bunch of incorrectly programmed fields
and confusion. Move the stage-specific values to the registers
themselves, which makes things much less confusing and makes it possible
to set "mergedregs" correctly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9493>
When a bitset is "inline" it should act as-if the its fields were
inserted into the register itself. However when initializing the
register's bitfield we weren't doing a deep copy of the inline bitfield,
so if the register defined additional fields then they would get added
to the original inline bitfield and any further registers with the same
type would get them. Fix this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9493>
This prints the program with each instruction's contribution to it's
latency and various factors for the calculation of the Inverse Throughput
statistic.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8994>
Latency is estimanted duration of a single wave, ignoring others in the
CU. It is similar to the old cycles statistic except it it's more accurate
and considers memory operations.
The InvThroughput statistic is a combination of MaxWaves, Latency and the
portion of the wave's execution which does not use various resources.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8994>
This allows them to be scheduled properly and simplifies the assembler a
little.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8994>
Keep track of the current loop depth in Program and set the depth inside
Program::insert_block() instead of repeating it every time we insert one.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8994>
The texture and sampler descriptors are well separated in Vulkan,
let's add a new field to allow mixing sampler and texture descs.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9517>
On Bifrost the vertex/instance ID are preloaded in special registers,
no need to add special attribute entries.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9517>
There's a minus(1) modifier on the entries field. Take it into account.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9517>
UBO index assignment is a bit special in Vulkan, it's based on the
descriptor set layout, which doesn't know about shaders' internal UBOs
(our sysval UBOs). Extend the backend compilers so we can place sysval
UBOs where we want: after all explicit UBOs.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9517>
I'm just too lazy to implement the logic to prepare push constant
buffers in the Vulkan driver. Besides, Vulkan has explicit push
constants, which AFAIK is not handled in the compiler backends yet,
and that will probably conflict with the UBO -> push constant
promotion.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9517>
Currently the screen destruction closes the dup'ed fd, but not the
original renderonly gpu fd, which is kept around for the lifetime of
the renderonly.
Squashed revert of "vc4: Don't leak the GPU fd for renderonly usage."
(commit 99ef66c325) as requested by Eric.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6983>
The renderonly object is something the winsys creates, so the pipe
driver has no business in memcpying or freeing it. Move those bits
to the winsys.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6983>
Some opcodes come with both type and size variants. Right now, only the
size is taken into account. Extend the builder to provide wrappers that
take a nir_type in addition to the bitsize.
While at it, fix wrappers taking a compare operator to use the proper
.{i,s,u} variant based on the comparison (equal and non-equal should
use .i, other comparisons should use .{u,s}).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9520>
We will have 2 compute jobs per indexed indirect draw, one doing the
min-max index search and one patching the cmdstream. The second compute
job needs to depend on the first one, as well as the previous indirect
draw job to avoid corrupting the indirect draw context which is shared
at the batch level (global dependency).
Instead of handling that case in panfrost_add_job(), extend
panfrost_add_job() to accept an explicit global dependency.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9520>
This is needed for indirect draws so the compute job can patch the
vertex/tiler jobs which are following in the chain.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9520>
A block that has all its successors empty is not necessarily a leaf
block in the CFG, and removing the JUMP in that causes the shader
to continue executing code from another block instead of exiting.
This reverts commit a496b41d50.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9520>
The old values seemed to work fine, but the ISA docs recommend 0x0,0x3,0xc
and 0xf:
COMPR==1: export half-dword enable. Valid values are: 0x0,3,c,f
[0] enables VSRC0 : R,G from one VGPR (R in low bits, G high)
[2] enables VSRC1 : B,A from one VGPR (B in low bits, A high)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9459>
Obviously this didn't affect correctness. Not sure about performance.
It also changes enabled_channels to match radeonsi.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: f29c81f863 ("aco: use VOP2 for v_cvt_pkrtz_f16_f32 if possible")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9459>
This allows skipping mutex locking. Don't take the aux context into account.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9356>
Currently only initialized for a6xx, mostly because that is the easiest
setup for me to test and debug at the moment. But the couple a6xx changes
should not require counterparts in older gens.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
Previously we were expecting cb0 to be user_buffer. (We did in some
cases upload it to a gpu buffer, but this was an internally allocated
buffer and not something subject to rebind.) But with TC it becomes
a gpu buffer.
(Technically, with pctx->const_uploader, we shouldn't hit the rebind
path for cb0, but better to not try to be overly clever.. sooner or
later that would bite us.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
With threaded_context, CSO creation happens in the frontend thread,
which means it is no longer safe to do blits (if needed, for sampler
views with format that cannot be UBWC). So move this to the first
time that the sampler view is bound.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
With threaded_context, in the TC_TRANSFER_MAP_UNSYNC case, we are
getting called from the frontend thread, rather than driver thread.
So we need a different slab_child_pool for that.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
This will be used by threaded_context to avoid stalls in the
DISCARD_WHOLE_RESOURCE case (and DISCARD_RANGE cases that can
be promoted to DISCARD_WHOLE_RESOURCE).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
Split out the usage simplification from main part of transfer_map and
handle the threaded-context specific TC_TRANSFER_x flags.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
For threaded_context, to properly handle replace_buffer_storage, we'll
need to handle multiple "iterations" of a resource using the same
tracking in order to implement transfer_map() correctly.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
We didn't see this before without threaded_context because we (normally)
wouldn't upload cb0 (the slot u_blitter uses). But with cb0 getting
uploaded we could hit a leak due to constant state only being restored
in the fd_blitter_clear() path. Move cb0 save to the one path that uses
it.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
On android, this will show up in logcat, rather than being lost into the
ether.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
Useful for drivers to add some sanity checks to avoid/detect threading
issues caused by things that might be called (indirectly) from frontend
thread.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9323>
This reverts commit 6c8cc9be12.
A spec bug was resolved confirming the original behaviour. Also it
seems the game Foundation no longer depends on the incorrect
behaviour.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9486>
In a perf trace translate_quads_uint2uint_last2last_prdisable() was
showing up as a huge hot spot. Digging through the assembly on arm64
found that the compiler wasn't doing any read caching. Specifically,
the generated code looked roughly like this:
out[j+0] = in[i+0];
out[j+1] = in[i+1];
out[j+2] = in[i+3];
out[j+3] = in[i+1];
out[j+4] = in[i+2];
out[j+5] = in[i+3];
...and the compiler was loading "i+1" and "i+3" from memory twice for
no reason (instead of caching it).
If we sprinkle generous amounts of the `__restrict` keyword then the
compiler is able to be much smarter. Not only does it avoid
double-loading but it also generates better instructions. It uses two
LDRD instructions instead of 6 LDR instructions and uses some STRD
too.
In one example test this increased FPS from ~25.7 to ~34.5.
Change-Id: I88bf8bd9ac421fe48a7d6961e224425c3ae7beee
Reported-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9485>
We also switch from using __alignof__ to alignof() in util/macros.h
which works on MSVC with the one unfortunate downside of requiring an
actual type and not a value.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9511>
This way it properly compiles on Visual Studio.
Fixes: 145444d265 "anv: Move multialloc to common code"
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9506>
This patch renames all macros with "GEN_" prefix defined in
common code.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
This patch renames functions, structures, enums etc. with "gen_"
prefix defined in common code.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
Changes in this patch include:
- Rename all files in src/intel/common path
- Update the filenames used in source and build files
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
Starting with d0d039a4d3, we emit writes to the push constant chunk
of the payload to stomp out-of-bounds data to zero for Vulkan. Then, in
369eab9420, we started emitting shader preamble code for emulated
push constants on Gen12.5 parts. In either of these cases, we can run
into issues if we don't have a proper live range for some of the payload
registers where they get used for something and then smashed by our push
handling code. We've not seen many issues with this yet because it only
happens when you have dead push constants.
Fixes: d0d039a4d3 "anv: Emit pushed UBO bounds checking code..."
Fixes: 369eab9420 "intel/fs: Emit code for Gen12-HP indirect..."
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9501>
It's easier to compare with the HW docs than a pile of hex.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9501>
In cases where the alpha coverage is enabled but the color attachment is
either unused or absent there should be a dummy mrt to make the draw behave
correctly.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Yannik Marek <yannik@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8952>
If a VkRenderPassInputAttachmentAspectCreateInfo is provided, we use the
aspects specified there. Otherwise, we default to every aspect in the
format. For attachments which are not input attachments, aspectMask is
left zero.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
The Android ones we put in anv_android.c. Maybe one day we'll want a
vk_android.h to put some common Android stuff but, for now, let's keep
it contained to ANV's android code.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
The variable-length stack allocations are causing issues with ubsan when
the array size is zero. Also, a heap allocation is probably safer.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
- no 3D and cube textures
- no mipmapping
- no border color
- image_sample is the only supported opcode with a sampler (behaves like _lz)
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9389>
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9389>
The maximum value which OPC_GETSIZE could return for one dimension
is 0x007ff0, however sampler buffer could be much bigger.
Blob uses OPC_GETBUF for them.
Fixes tests:
dEQP-VK.memory.pipeline_barrier.transfer_dst_uniform_texel_buffer.1048576
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9391>
Otherwise, we won't be able to use OPC_GETBUF to get their size.
After this change we also could get rid of the hack for OPC_GETSIZE
which scaled the size for texture buffers.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9391>
It appears that storage for varyings in a wave has an upper
limit of wavesize * max_a831 where max_a831 is 64.
Exceeding the limit seam to force gpu to reduce primitives
processed per wave, at least calculations make sense with
such interpretation.
With blob SP_HS_WAVE_INPUT_SIZE never exceeds 64 and setting
it to 65 in freedreno leads to a hang.
Copied from the commit to freedreno e5499ca2
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8187>
This fixes the assembly for many scenarios where you want to use shader
replacement.
Note: unfortunately this leaks the identifier string created while
lexing, but I couldn't find a way to avoid leaking it except for
bringing in ralloc or something (which would be way more complicated).
The only other place doing something similar in mesa is the glsl parser,
which is using ralloc (actually a linear context).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9463>
Trying to specify a floating-point value in a @const line would result
in it getting interpreted as a FLUT value and failing parsing. Fix this
by making the various FLUT tokens include the surrounding parentheses.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9463>
Mesa fixed pipeline texture loading on programmable pipeline hardware emits
a generic fragment shader program which contains gl_TexCoord.xyzw as a vec4
and then expects to configure the varying assignments to the shader in the
pipeline command stream, to select what is wired to the XYZW fragment shader
inputs.
This gl_TexCoord.xyzw is turned into texture load with projection (TGSI TXP
opcode, similar for NIR). Texture load with projection does not exist in the
Vivante GPU as a dedicated opcode and is emulated. The shader program first
divides texture coordinates XYZ by projector W and then applies regular TEX
opcode to load the texture (i.e. TEX(gl_TexCoord.xyzw/gl_TexCoord.wwww)).
For point sprites, XY are the point coordinates from VS, Z=0 and W=1, always.
The Vivante GPU can only configure varying to be either of -- point coord X,
point coord Y, used, unused -- which covers XYZ, but not W. Z is fine because
unused means 0.
W used to be 0 too before this patch and that led to division by 0 in shader.
The only known way to solve this is to set Z=0, W=1 in the shader program
itself if the point sprites are enabled. This means we have to generate a
special shader variant which does extra SET to set the W=1 in case the point
sprites are enabled.
In case of TGSI, emitting the SET.TRUE opcode permits setting W=1 without
allocating additional constants. With NIR, use nir_lower_texcoord_replace()
to lower TEXn to PNTC, which sets Z=0, W=1, and let NIR optimize the shader.
Note that nir_lower_texcoord_replace() must be called before input linking
is set up, as it might add new FS input.
Also note that it should be possible to simply drop PIPE_CAP_POINT_SPRITE
in the long run, ST would then apply the same optimization pass, but that
option is so far misbehaving. And for etnaviv TGSI this is not applicable
yet.
This fixes neverball point sprites (exit cylinder stars) and eglretrace of
gl4es pointsprite test:
https://github.com/ptitSeb/gl4es/blob/master/traces/pointsprite.tgz
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8618>
Fixes this assert in debug builds:
in __GI___assert_fail (assertion=0x7ffff731f66b "util_cpu_caps.nr_cpus >= 1", file=0x7ffff731f650 "../src/util/u_cpu_detect.h", line=116,
function=0x7ffff7323280 <__PRETTY_FUNCTION__.11654> "util_get_cpu_caps") at assert.c:101
in util_get_cpu_caps () at ../src/util/u_cpu_detect.h:116
in _mesa_float_to_float16_rtz (val=0) at ../src/util/half_float.h:93
in util_format_r16g16b16a16_float_pack_rgba_float (dst_row=0x7fffffffbdc0 "", dst_stride=0, src_row=0x7fffffffbf90, src_stride=0, width=1, height=1)
at src/util/format/u_format_table.c:13459
in util_format_pack_rgba (format=PIPE_FORMAT_R16G16B16A16_FLOAT, dst=0x7fffffffbdc0, src=0x7fffffffbf90, w=1) at ../src/util/format/u_format.h:1525
in util_pack_color (rgba=0x7fffffffbf90, format=PIPE_FORMAT_R16G16B16A16_FLOAT, uc=0x7fffffffbdc0) at ../src/gallium/auxiliary/util/u_pack_color.h:432
in v3dv_get_hw_clear_color (color=0x7fffffffbf90, internal_type=6, internal_size=8, hw_color=0x7fffffffbf10) at ../src/broadcom/vulkan/v3dv_cmd_buffer.c:1241
v2: move call from physical device to instance init.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9408>
This restores many of the hurt shaders from the previous patch at the
expense of re-adding ldvary tracking in the scheduler.
total instructions in shared programs: 13760415 -> 13755738 (-0.03%)
instructions in affected programs: 1207560 -> 1202883 (-0.39%)
helped: 5080
HURT: 1731
Instructions are helped.
total max-temps in shared programs: 2322991 -> 2322828 (<.01%)
max-temps in affected programs: 5063 -> 4900 (-3.22%)
helped: 229
HURT: 108
Max-temps are helped.
total sfu-stalls in shared programs: 31827 -> 31545 (-0.89%)
sfu-stalls in affected programs: 478 -> 196 (-59.00%)
helped: 304
HURT: 21
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 13792242 -> 13787283 (-0.04%)
inst-and-stalls in affected programs: 1220856 -> 1215897 (-0.41%)
helped: 5162
HURT: 1697
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>
We get optimal ldvary pipelining by doing the following:
1) Carefully merge a paired ldvary into the previous instruction when
possible.
2) When the above succeeds, flag the ldvary as scheduled immediately so
we can merge one of its children into the current instruction.
3) When scheduling ldvary sequences, only pick up instructions that are
part of the sequence to avoid picking up something that prevents
successful pipelining.
This patch skips 3) assuming some hurt shaders in exchange for better
scheduling flexibility during ldvary sequences. Besides eliminating most
of the code dedicated to special handling ldvary sequences, this also
usually allows us to produce better code by merging instructions that are
unrelated to ldvary sequences into the ldvary sequences, which is
particularly effective to fill up the gaps produced when scheduling the
first and last ldvary sequences as well as the gaps produced by flat
and noperspective varyings sequences that don't have both mul and add
instructions.
Notice that there are some hurt shaders, because some times the extra
scheduler flexibility can lead to picking up instructions that will
break a sequence without compensating for that, typically an ldunif
that prevents us from doing the fixup for a follow-up ldvary. We will
try to correct some of these cases with the next patch.
total instructions in shared programs: 13786037 -> 13760415 (-0.19%)
instructions in affected programs: 3201387 -> 3175765 (-0.80%)
helped: 16155
HURT: 4146
Instructions are helped.
total max-temps in shared programs: 2324834 -> 2322991 (-0.08%)
max-temps in affected programs: 22160 -> 20317 (-8.32%)
helped: 1340
HURT: 103
Max-temps are helped.
total sfu-stalls in shared programs: 30685 -> 31827 (3.72%)
sfu-stalls in affected programs: 782 -> 1924 (146.04%)
helped: 253
HURT: 1416
Inconclusive result.
total inst-and-stalls in shared programs: 13816722 -> 13792242 (-0.18%)
inst-and-stalls in affected programs: 3171642 -> 3147162 (-0.77%)
helped: 15331
HURT: 4179
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>
These checks depend on prev_inst being set, so move them down below
with all the other checks with the same requirement.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>
In case a new gl_PointCoord shader input is created, increment shader
input count and set valid driver_location to the new input variable,
otherwise the input gets aliased to input 0 and shows up in NIR_PRINT
output as whatever shader input 0 is instead of gl_PointCoord. Also
set the input as used, otherwise it might get removed.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9214>
Future patches for VK_EXT_image_drm_format_modifier will, in some cases,
place the aux surface and fast clear state into a driver-private bo.
This increases the complexity of image memory layout to such a degree
that, to maintain sanity, we must improve how we track the layout.
Define new types:
- anv_image_memory_range
- anv_image_memory_binding
- anv_image_binding
Delete many fields in anv_image (and its children), and replace them
with the new types.
This patch does not change how anv_image tracks (or, rather, does not
track) the memory of gen12 implicit ccs. We should probably do that, but
that's left as a future exercise.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8097>
It calculates the address to a surface or to metadata in the image.
Refactor only. No intended change in behavior.
This patch prepares for, and reduces much noise in, the upcoming patch
that rewrites image memory tracking.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8097>
If the image is disjoint, there is no reason to calculate image-global
memory requirements. Instead, only per-plane memory requirements are
needed.
Also, delete a large duplicate comment.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8097>
Current code checks for surface validity with `surface.isl.size_B > 0`.
Replace the checks with anv_surface_is_valid().
This prepares for adding new members to anv_surface that may
be accidentally used as a validity-indicator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8097>
1. Don't compare bo->size to image->size. An upcoming patch replaces
anv_image::size with complicated stuff. Instead, properly query the
required size with anv_GetImageMemoryRequirements.
2. Require the bo to fit the *aligned* image size.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8097>
The calculation of the subsurfaces' memory requirements assumed that the
image was disjoint if the image was created with
VK_IMAGE_CREATE_DISJOINT_BIT. But the Vulkan spec also requires that the
VkFormat be multi-planar.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8097>
The name anv_image_plane::bo_is_owned will be made ambiguous by the
implementation of VK_EXT_image_drm_format_modifier, which may bind the
plane to multiple bo's.
Also, bo_is_owned was set if and only if the image was imported from
gralloc, and it was set only on the first plane. Therefore, let's rename
the field to from_gralloc, and move it to the toplevel of anv_image.
v2: Fix build in anv_android.c.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8097>
the maximum allowable runtime version of vk can be computed by MIN(instance_version, device_version)
despite this, instances and devices can be created using the maximum version available
for each respective type. the restriction is applied only at the point of
enabling/applying features and extensions, meaning that to correctly handle this,
zink must:
1. create an instance using the maximum allowable version
2. select a physical device using the instance
3. compute MIN(instance_version, device_version)
4. only now begin to enable/use features requiring vk 1.1+
ref #4392
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Hoe Hao Cheng <haochengho12907@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9479>
vkGetPhysicalDeviceProperties2 is not allowed to be used with a 1.0 device
because it's a vulkan 1.1 function.
Closes: #4396
Fixes: 38ce8d4d ("vulkan/device_select: Stop using device properties 2.")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9462>
With this command implemented messages emitted by
applications via glDebugMessageInsert will be forwarded
to the host.
v2: - remove check for feature in encode function, this
is covered in the state tracker (Rohan)
- reorder parameters in the encode function to the
order of the emit callback
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9433>
nir_lower_idiv(..) creates during its lowering isub instructions.
Move nir_lower_idiv(..) before the opt loop to have a chance to
optimize/lower isub away. Also drop the drop the halti dependency to
make it easier to follow.
This fixes the following assert on GC3000:
Unhandled ALU op: isub
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9447>
the full-variable outputs can be skipped, leaving only the varyings which
actually need explicit emission due to packed layouts or whatever
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9271>
if the number of explicit xfb outputs or new varyings added to the existing size
of the slot map would cause an overflow, we have to force a new slot map to
ensure that everything fits
this means iterating all the stages which can produce new varyings and calculating
all the slots required in order to compare against the max size available
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9271>
if an entire variable is being dumped into an xfb buffer, there's no need
to create an explicit xfb variable to copy the value into, and instead
the xfb attributes can just be set normally on the variable
this doesn't work for geometry shaders because outputs are per-vertex
fixes all KHR-GL46.enhanced_layouts xfb tests
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9271>
running nir_lower_io_arrays_to_elements_no_indirects for only some stages
breaks location-setting for the stages which don't run it when
e.g., dmat2x3 variables are sometimes split across locations and
sometimes jammed into a single location (TCS I'm looking at you)
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9271>
if we get some crazy matrix types in here then we need to ensure that
we accurately unwrap them and copy the components
fixes KHR-GL46.enhanced_layouts.xfb_stride
Fixes: 1b130c42b8 ("zink: implement streamout and xfb handling in ntv")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9271>
this was added during review, but it was never correct and just crashes
valid cases like streamout from a mat3x4 type
Fixes: b6f8f3a3ba ("zink: fix streamout for clipdistance")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9271>
This will take effect in future patches when we are able to query the
kernel to set device->vram.size to a non-zero size.
Builds on Sagar's ("anv: Query memory region info") patch, and
re-organizes things as recommended by Lionel (and Jason).
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9324>
Just treat the llc and non-llc paths as separate cases. This will also
help when adding the local memory setup.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9324>
When tyding up this section in 1e5b09f42f ("spirv: Tidy some repeated
if checks by using a switch statement.") the break got lost. It is
not a real problem because the next case just break, but better to
have it explicitly here instead of a FALLTHROUGH.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9440>
In nouveau's PBO path with GS support and no VS layer export, we got:
intrinsic store_output (ssa_1, ssa_0) (0, 15, 0, 160, 128) /* base=0 */ /* wrmask=xyzw */ /* component=0 */ /* src_type=float32 */ /* location=0 slots=1 */ /* out_pos */
[...]
vec3 32 ssa_4 = mov ssa_3.xxx
intrinsic store_output (ssa_4, ssa_0) (0, 4, 0, 160, 128) /* base=0 */ /* wrmask=z */ /* component=0 */ /* src_type=float32 */ /* location=0 slots=1 *//* out_pos */
The mov's SSA value we would decide we could store directly to the output,
since nothing else used it. However, the store has a writemask, and the
ALU op was stomping over it instead of ANDing with the output decl's
existing writemask.
Fixes: f79f382c81 ("nir_to_tgsi: Store directly to TGSI outputs when possible.")
Closes: #4380
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9376>
This replaces the new_src parameter of nir_ssa_def_rewrite_uses_after()
with an SSA def, and rewrites all the users as needed.
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
This commit replaces the new_src parameter of nir_ssa_def_rewrite_uses()
with an SSA def, removes nir_ssa_def_rewrite_uses_ssa(), and rewrites
all the users as needed.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
This is currently an alias for nir_ssa_def_rewrite_uses but we move all
the instances which used it to write a non-SSA source to the newly named
helper.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
While we're here, add __gen_get_batch_address declarations to more files
because we're about to start requiring it on all GFX 12.5+.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9445>
Add logical shift left and right operations support to mi_builder.
v1:
- Add GEN_GEN > 12 check (Jordan Justen)
- Add gen_mi_has_shift function (Jordan Justen)
- Fix commit title (Jordan Justen)
v2 (Jason Ekstrand):
- Add _imm versions of all of them
- Better handle corner-cases in _imm helpers
- Handle the power-of-two limitation for _imm versions
- Add tests
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9445>
The container is moved from before and hence returns size 0. To get the
correct value, the new instruction container must be used instead.
This was flagged by clang-tidy. The fixed call still triggers the
corresponding diagnostic, hence this change silences it by adding a
redundant clear() after move.
Fixes: 7f1b537304 ("aco: add new NOP insertion pass for GFX6-9")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9432>
This pass needs to run on the last shader in a pipeline writing
gl_Position. In GLES2, that's always the vertex shader, but in ES3.2, it
can be a geometry or tessellation shader. The shared code works the same
in this case, just make the assert more generous.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9444>
Now the texture virtual memory usage is less of a problem,
we can use this workaround permanently.
In the spirit of the API it's certainly not the proper way
of implementing DYNAMIC textures (it seems they are ok
to have hidden copies in driver managed memory, but not have
virtual addressing space reduced), but it makes sense for us,
both performance wise, and to avoid bugs.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9377>
On 32 bits, virtual memory is sometimes too short for apps.
Textures can hold virtual memory 3 ways:
1) MANAGED textures have a RAM copy of any texture
2) SYSTEMMEM is used to have RAM copy of DEFAULT textures
(to upload them for example)
3) Textures being mapped.
Nine cannot do much for 3). It's up to driver to really unmap textures
when possible on 32 bits to reduce virtual memory usage.
It's not clear whether on Windows anything special is done for
1) and 2). However there is clear indication some efforts have
been done on 3) to really unmap when it makes sense.
My understanding is that other implementations reduce the usage
of 1) by deleting the RAM copy once the texture is uploaded
(Dxvk's behaviour is controlled by evictManagedOnUnlock).
The obvious issue with that approach is whether the texture is
read by the application after some time. In that case,
we have to recreate the RAM backing from the GPU buffer.
And apps DO that. Indeed I found that for example Mass Effect 2
with High Texture mods (one of the crash case fixed by this patch serie),
When the character gets close to an object, a high res texture and replaces
the low res one. The high res one simply has more levels, and the game seems
to optimize reading the high res texture by retrieving the small-resolution
levels from the original low res texture.
In other words during gameplay, the game will randomly read MANAGED textures.
This is expected to be fast as the data is supposed to be in RAM...
Instead of taking that RAM copy eviction approach, this patchset
proposes a different approach: storing in memfd and release the
virtual memory until needed.
Basically instead of using malloc(), we create a memfd file
and map it. When the data doesn't seem to be accessed anymore,
we can unmap the memfd file.
If the data is needed, the memfd file is mapped again.
This trick enables to allocate more than 4GB on 32 bits apps.
The advantage of this approach over the RAM eviction one,
is that the load is much faster and doesn't block the GPU.
Of course we have problems if there's not enough memory to map the
memfd file. But the problem is the same for the RAM eviction approach.
Naturally on 64 bits, we do not use memfd.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9377>
We should only pass in a new indirect_info object if we actually set valid
values in it.
Fixes: abe8ef862f "gallium: make pipe_draw_indirect_info * a draw_vbo parameter"
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9425>
In order for shader viewport index to be calculated correctly,
the cliptest code needs proper primitive lengths to work out
the provoking vertex. I half fixed this before for GL4 but looks
like I didn't make it all the way.
This fixes:
dEQP-VK.draw.shader_viewport*
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9401>
Commit e99e7aa4 began passing start > 0 to indexed draw calls rather
than keeping start at 0 and manually advancing ib->ptr. This should
work fine, however, there have been instances of software fallbacks
not handling things right.
vbo_sw_primitive_restart had a bug where it was ignoring "start" and
always calling find_sub_primitives with start = 0 and end = ib->count.
This meant that when start > 0, it was analyzing the wrong part of the
index buffer when finding subprimitives.
In theory, each _mesa_prim can have a different "start" value. But
the code only calls find_sub_primitives once, because it wants to
map, analyze, and unmap the index buffer before calling ctx->Draw,
as some drivers don't support drawing with the index buffer mapped.
To handle this, we break vbo_sw_primitive_restart calls into sections
where "start" matches across all the primitives, similar to how I
handled the issue in tnl in commit bd6120f562.
In the common case, start matches and we handle it in one pass anyway.
Fixes Piglit's primitive-restart VBO_COMBINED_VERTEX_AND_INDEX test
and KHR-GL33.pipeline_statistics_query_tests_ARB.functional_primitives_vertices_submitted_and_clipping_input_output_primitives
on Intel Ivybridge and older (which don't do arbitrary cut indices).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4052
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9417>
Use debug_printf more consistently, normalize formatting a bit, and
trace a few more places you're likely to care about.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9436>
Applications rarely require them, but this improves fossil-db replay time.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9411>
This register is only 32bits.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1952fd8d2c ("anv: Implement VK_EXT_conditional_rendering for gen 7.5+")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9428>
Instead of lowering it in NIR, use the lookup tables as inputs to a
second-order Taylor expansion. shader-db results aren't amazing but keep
in mind this is without backend CSE yet.
total instructions in shared programs: 115913 -> 115707 (-0.18%)
instructions in affected programs: 3151 -> 2945 (-6.54%)
helped: 12
HURT: 0
Instructions are helped.
total nops in shared programs: 84045 -> 84041 (<.01%)
nops in affected programs: 1571 -> 1567 (-0.25%)
helped: 1
HURT: 7
Inconclusive result (value mean confidence interval includes 0).
total clauses in shared programs: 20498 -> 20489 (-0.04%)
clauses in affected programs: 188 -> 179 (-4.79%)
helped: 6
HURT: 0
Clauses are helped.
total quadwords in shared programs: 90395 -> 90291 (-0.12%)
quadwords in affected programs: 2287 -> 2183 (-4.55%)
helped: 12
HURT: 0
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9420>
Useful for representing -0 in transcendental sequences matching the
blob.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9420>
With these changes the shader code is visible in RGP.
Vk pipeline feature is emulated using si_update_shaders: when shaders are
updated we compute a sha1 of their code and use it as a pipeline hash.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
For radeonsi the shaders don't live in the same BOs, so they're
unlikely to be less that 0x1000 bytes apart.
So this commit bumps the threshold to 0x10000 and warns once
when hitting it.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
We were checking that the previous instruction doesn't write flags,
but we also need to check it doesn't read them.
Fixes: 1784dd22a3 ('broadcom/compiler: pipeline smooth ldvary sequences')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9431>
When we were only able to pipeline smooth varyings, if we had to disable
ldvary pipelining in the middle of a sequence it would stay disabled for
the rest of the program, to prevent us from prioritizing scheduling of
ldvary instructions that we would not be able to pipeline effectively.
Now that we can pipeline all ldvary sequences we can change this.
This change re-enables ldvary pipelining upon finding the next
ldvary in the program in the hopes that we can continue pipelining
succesfully. To do this, we track the number of ldvary instructions we
emitted so far and compare that to the number of inputs in the fragment
shader we are scheduling. This also allows us to simplify our ldvary
tracking at nir to vir time, since that is all now handled in the QPU
scheduler.
total instructions in shared programs: 13817048 -> 13810783 (-0.05%)
instructions in affected programs: 810114 -> 803849 (-0.77%)
helped: 4843
HURT: 591
Instructions are helped.
total max-temps in shared programs: 2326612 -> 2326300 (-0.01%)
max-temps in affected programs: 4689 -> 4377 (-6.65%)
helped: 285
HURT: 7
Max-temps are helped.
total sfu-stalls in shared programs: 30942 -> 30865 (-0.25%)
sfu-stalls in affected programs: 207 -> 130 (-37.20%)
helped: 120
HURT: 42
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 13847990 -> 13841648 (-0.05%)
inst-and-stalls in affected programs: 825378 -> 819036 (-0.77%)
helped: 4899
HURT: 590
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9404>
Use atomic operations to avoid competition. In addition,
since sub_ctx_id 0 has been used by default, sub_ctx_id
should start from 1.
Signed-off-by: Xin He <hexin.op@bytedance.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9406>
The clear code is 0xCC which means CMASK isn't fast-cleared.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9392>
It makes more sense to try to enable TC-compat if the image has HTILE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9405>
Not copying all the scissors caused
dEQP-VK.pipeline.extended_dynamic_state.two_draws_dynamic.2_viewports
to fail but thah test pointlessly relies on KHR_multiview (cts issue
filed).
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9422>
Prior to SKL, the mipmaps for 3D surfaces are laid out in a way
that make it impossible to represent in the way that
VkSubresourceLayout expects. Since we can't tell users how to make
sense of them, don't report them as available.
"Fixes" dEQP-VK.image.subresource_layout.3d.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9419>
This implements most of the remaining u_threaded_context support. Most
of the heavy lifting was done in the previous patches which fixed things
up for the new thread safety requirements. Only a few things remain.
u_threaded_context support can be disabled via an environment variable:
GALLIUM_THREAD=0
On Felix's Tigerlake with the GPU at fixed frequency, enabling
u_threaded_context improves performance of several games:
- Civilization VI: +17%
- Shadow of Mordor: +6%
- Bioshock Infinite +6%
- Xonotic: +6%
Various microbenchmarks improve substantially as well:
- GfxBench5 gl_driver2: +58%
- SynMark2 OglBatch6: +54%
- Piglit drawoverhead: +25%
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
pipe->transfer_map can be called from u_threaded_context's thread
rather than the driver thread. We need to use two different slab
allocators, one for each thread. transfer_unmap, on the other hand,
is only ever called from the driver thread.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
u_threaded_context requires various objects to inherit from a new
threaded_foo base class rather than directly from pipe_foo. This
patch does most of the mechanical changes required for that.
It also initializes the new threaded_resource fields.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
When we enable u_threaded_context, the pipe->create_*_state hooks
(precompile variants) are going to be called from one thread, while
iris_update_compiled_shaders (on-the-fly variants) are going to be
called from a driver thread. BLORP shaders also happen from
clear, blit, and so on in the driver thread.
u_upload_mgr isn't thread-safe, so use an uploader for each purpose.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
This enables us to replace the backing storage of resources that have
been used as stream output targets, in case we're invalidating their
entire contents. This can avoid stalls. We simply hadn't supported it
because it was going to be tricky to re-emit 3DSTATE_SO_BUFFER without
screwing up "reset offset to zero" vs. "keep appending". But that
should be working fine with the previous patch's refactor.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
The previous mechanism was a bit fragile. We stored the zero offset
in the pre-baked packet, and used an flag to override 0xFFFFFFFF
(append) offsets until our first emit - then prohibited anyone from
trying to re-emit the packet by flagging IRIS_DIRTY_SO_BUFFERS,
because that would re-emit the version with the zeroing of the offset.
Now, we always store 0xFFFFFFFF in the pre-baked packet, and use a
flag to override it to zero on the first emit. That way, we can
re-emit that packet at any time, and it'll just keep appending.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
In the future, Marek is planning to make u_threaded_context call
create_stream_output_target() from a different thread than the main
driver thread, which means that we can't safely use uploaders there.
To prepare for this eventual future, just defer the allocation of
the offset BO 'til later. It's a very small amount of overhead.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
With u_threaded_context, create_surface and create_sampler_view will
be called from a different thread than the driver thread. They aren't
allowed to access the context, which means that they can't use the
uploaders there to upload our SURFACE_STATE entries.
Thanks to backing-storage replacement and iris_rebind_buffer, we already
reworked things to maintain CPU-side copies of the SURFACE_STATE entries
and added the ability to upload or re-upload them later. So we can skip
the upload at object creation time, and add a simple resource-is-NULL
check at binding table upload time to ensure that they get uploaded by
the time we need them. (They might get uploaded earlier due to rebinds
or clear color updates, but this is the last moment to do so.)
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
It shouldn't affect bound program state, and the current context state
shouldn't be relevant for shader creation precompiles anyway (level load
isn't going to have the eventual set of sampler views bound when you go to
draw with that shader).
Closes: #4306
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9089>
You always have to populate the key with the right texture swizzles, even
if textures haven't changed since binding a new shader.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9089>
We can compose the swizzles at sampler view creation time, saving
recompiles on texture format changes.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9089>
This allows for virgl guests to expose GL_NVX_gpu_memory_info and
GL_ATI_meminfo when the extensions are supported on the host.
Signed-off-by: Rohan Garg <rohan.garg@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9337>
mi_ is already a unique prefix in Mesa so the gen_ isn't really gaining
us anything except extra characters. It's possible that MI_ may
conflict a tiny bit with GenXML but it doesn't seem to be a problem
today and we can deal with that in the future if it's ever an issue.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9393>
Writes to global memory should not be moved over discard,
otherwise we could have unintended side-effects or lack of
side-effects where they should be observed.
Fixes tests:
dEQP-VK.rasterization.frag_side_effects.color_at_beginning.kill
dEQP-VK.rasterization.frag_side_effects.color_at_end.kill
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9365>
Disabling VM_ALWAYS_VALID actually hurts more than it helps
after doing more testing. Managing the global BO list in userspace
is really costly and make a bunch of games CPU bound.
I think re-enabling VM_ALWAYS_VALID is a step in the right direction.
This reverts commit 6ac6e2fbfb.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9341>
We had an optimization in place to skip a unifa write if the address
happens to be right after the last ldunifa read address, but we can
take this further and update the unifa address by emitting ldunifa
instructions if needed to skip a unifa write that is close enough.
This is because a unifa write involves 4 cycles: 1 for the write
and 3 delay slots before we can emit the first ldunifa.
So if we have code like this:
unifa addr + 0
ldunifa.r0
unifa addr + 12
ldunifa.r1
In practice we end up with QPU like this:
unifa addr + 0
nop
nop
nop
ldunifa.r0
unifa addr + 12
nop
nop
nop
ldunifa.r1
And with this patch we get:
unifa addr + 0
nop
nop
nop
ldunifa.r0 <--- reads offset 0
ldunifa.- <--- reads offset 4
ldunifa.- <--- reads offset 8
ldunifa.r1 <--- reads offset 12
Of course, QPU scheduling might find ways to fill the NOPs to some
extent and remove some of the gains, but generally speaking, this is
still usually a win.
Going by shader-db results, allowing the next unifa address to be up
to 12 bytes after the address resulting from the last ldunifa read
shows the best results:
total instructions in shared programs: 13817048 -> 13812202 (-0.04%)
instructions in affected programs: 602701 -> 597855 (-0.80%)
helped: 1750
HURT: 760
Instructions are helped.
total uniforms in shared programs: 3795485 -> 3793200 (-0.06%)
uniforms in affected programs: 43930 -> 41645 (-5.20%)
helped: 898
HURT: 0
Uniforms are helped.
total max-temps in shared programs: 2326612 -> 2326621 (<.01%)
max-temps in affected programs: 651 -> 660 (1.38%)
helped: 10
HURT: 21
Inconclusive result (value mean confidence interval includes 0).
total sfu-stalls in shared programs: 30942 -> 30906 (-0.12%)
sfu-stalls in affected programs: 627 -> 591 (-5.74%)
helped: 186
HURT: 158
Inconclusive result (value mean confidence interval includes 0).
total inst-and-stalls in shared programs: 13847990 -> 13843108 (-0.04%)
inst-and-stalls in affected programs: 601404 -> 596522 (-0.81%)
helped: 1747
HURT: 757
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9384>
We can't remove unused ldunifa that are not the first or last
in a sequence, but we can still ignore their destination
to reduce register pressure.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9384>
With threaded-context we won't have a chance to apply the workaround in
the backend driver. But the previous commit moves it to a driconf
configured workaround in mesa/st, so we can drop this now.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9316>
All the ir3 gens had the same thing, time to move it out into a shared
helper.
The keeping the storage in fdN_context is to avoid namespace clashes
between ir3 and ir2.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9394>
I had forgotton on which gens these where used on (which is important if
you need to know which shader stages use these).. expand the comments a
bit.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9394>
Found this on top of Karol's patches but it seems like it can just be
applied to master.
Helps with some cases of
kernel_image_methods/test_kernel_image_methods 2Darray
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9381>
TGSI has these nice labels for us for where to jump in this case, let's
use them. Improves piglit arb_shader_image_load_store-shader-mem-barrier
runtime massively, though not enough to make the test really reasonable to
run.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9347>
FP16 rendering is supported on all gen4 hardware.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9379>
This has 3 advantages:
- It's SMEM.
- Multiple single component loads are merged into 1 multi-dword load by LLVM.
- The result is always packed for packed instructions.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9395>
The search helps must *never* modify the instruction passed in, so let
the compiler enforce this.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9378>
Available since Vulkan 1.0, and in fact already wired up, just not
advertised. It looks like we could make this dynamic state but this
works for now.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9371>
Some games declare the wrong format, so we might want to disable this
optimization in that case.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: e4d75c22 ("nir/opt_shrink_vectors: shrink image stores using the format")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9229>
tu_shader should be freed after pipeline is successfully created.
Fixes tests:
dEQP-VK.api.object_management.alloc_callback_fail.compute_pipeline
dEQP-VK.api.object_management.alloc_callback_fail_multiple.compute_pipeline
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9364>
Computing the expected buffer size isn't reliable on GFX10+ because
DROPPED_CNTR returns weird results.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9367>
DROPPED_CNTR isn't reliable and might still report non-zero if the
SQTT buffer isn't full. Checking if the number of written bytes by
the hw is equal to the SQTT buffer size seems reliable.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9367>
This fixes a GPU hang on my Sienna because the number of SE is
less than the maximum, and SE #1 is disabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9370>
This appeared in softpipe's image operations, since NIR always uses
4-component values for the coords, while the GLSL IR only has 2 components
for a 2D image (for example).
arb_shader_image_load_store-shader-mem-barrier (which times out in CI and
spends its time inside of tgsi_exec) was spending 4/51 of its instructions
on moving these undefs around.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9345>
I was suspicious that some entries in dri2_format_table (in
dri_helpers.c) had this field set incorrectly. It seemed like
DRM_FORMAT_ABGR16161616F and DRM_FORMAT_XBGR16161616F should have been 8
instead of 4. Upon digging I found that nothing uses the field. Fix
code by removing it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9354>
both bits will have been flagged at this point in order to indicate
that the aspects will be cleared "at some point" during the loop, but
when actually iterating through the pending clears, only the bits set
in the clear call should be applied
Fixes: 5c629e9ff2 ("zink: defer pipe_context::clear calls when not currently in a renderpass")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9366>
Remove the useless driCheckOption calls. They always
succeed.
As a result the intended behaviour for thread_submit
was not working (different default depending on the gpu
used). Add a comment to fix that in the future.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>