Each shader stage has its own "early preamble" flag.
Early preamble is likely an optimization to hide some of latency
when loading UBOs into consts in the preamble.
Early preamble has the following limitations:
- Only shared, a1, and consts regs could be used
(accessing other regs would result in GPU fault);
- No cat5/cat6, only stc/ldc variants are working;
- Values writen to shared regs are not accessible by the rest
of the shader;
- Instructions before shps are also considered to be a part of
early preamble.
Note, for all shaders from d3d11 games blob produced preambles
compatible with early preamble mode.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15901>
This reverts commit 68ef895674.
When trying out transcode_astc=true with BPTC on Asphalt 9, we observed
very poor image quality - to the point that basic UI icons were blocky,
and buttons with a black border had smeared pixels on the edges. Using
DXT5 had no such issues.
I originally suspected there was a bug in the BPTC encoder, but I now
believe the issue is deeper than that. The commit that introduced the
encoder, 17cde55c53, says:
"The compressor is written from scratch and takes a very simple
approach. It always uses a single mode of the BPTC format (4 for
unorm and 3 for half-floats) and picks the two endpoints by dividing
the texels into those which have more or less than the average
luminance of the block and then calculating an average color of the
texels within each division.
It's probably not really sensible to try to use BPTC compression at
runtime because for example with the Nvidia offline compression tool
it can take in the order of an hour to compress a full-screen image.
With that in mind I don't think it's worth having a proper compressor
in Mesa and this approach gives reasonable results for a usage that
is basically a corner case."
In other words, the reason our BPTC compressor was so fast is that it
only implements one of the modes and does a low quality approximation.
This honestly should probably be improved somewhat, but the original
use case was for online-compression, the uncommon but mandatory OpenGL
feature where you can supply uncompressed data and trust the driver to
compress it for you (at unknown and uncontrolled quality and speed).
Unfortunately, the compressor as it stands is simply not usable for
transcoding ASTC data where we want to preserve the underlying image
quality as much as possible.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16566>
The spec says the number of texels must be clamped to the value of
GL_MAX_TEXTURE_BUFFER_SIZE.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Mihai Preda <mhpreda@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16480>
The experimental support has been merged for months already, but has
been left unexercised in CI even though dEQP has a mesh_shader.nv
category.
This commit enables the experimental support, which should bring
additional testing \o/.
Suggested-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16550>
When caching NIR, it's cached as a raw object because we cache the
serialized NIR. When it's then loaded from the disk cache later, we
fail to deserialize it because raw objects are a special case. There
are two callers of vk_pipeline_cache_object_deserialize(), one of which
has a special case for raw objects and the other is called only when
we've checked that it isn't a raw object. The special cases are
pointless; raw objects should deserialize themselves.
Fixes: 591da98779 ("vulkan: Add a common VkPipelineCache implementation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16281>
without dynamic vertex input, pipeline vertex state must be recalculated
if buffer strides change or the enabled buffer mask changes in order
to accurately handle dynamic state stride VUs
cc: mesa-stable
fixes:
spec@!opengl 1.1@array-stride
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16563>
these might hit u_blitter later, which will require rt usage
to handle texture views, so add rt usage now to avoid failing later
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16563>
these caches ignore pNext from the create info since the pNext cannot
affect the eventual object that is created, so comparing it will break
the hash table
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16563>
this is one of those cases where some bizarro format is being created
for sampling only, but gallium blasts out all the bind flags at once
trust that we're not going to do anything too crazy and let surface
usage pruning handle the rest
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16563>
the idea here is that if a resource is intended to be used solely as a rendertarget
and can't be blitted to or drawn to, then resource creation should fail
but if the resource might be e.g., a texture, then it can probably hit the subdata
path and be fine
Fixes: 37ac8647fc ("zink: reject resource creation if format features don't match attachment")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16563>
this otherwise makes it impossible to accurately do hardware driver
fallback, as zink will create a sw context instead of failing, superceding
the actual sw driver priorities
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16572>
gallium can't directly support vertex attribute instance rate zero, since
the instance rate is also used to determine if the data is per-vertex or
per-instance in the first place (hence divisor zero meaning the data is
per vertex).
While it's an optional feature for VK_EXT_vertex_attribute_divisor, some
apps require it to work (it's a standard d3d10 feature and widely
supported), hence translate it away as MAX_UINT32 divisor instead (which
at this point probably makes more sense than to change the gallium
interface), which should work all the same.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16526>
If this == NULL, we'll get a crash which is pretty much the same thing
when it comes to ease of debugging. This fixes a giant pile of warnings
with GCC 12 of the form:
../src/compiler/glsl/ir.h: In member function ‘ir_dereference* ir_instruction::as_dereference()’:
../src/util/macros.h:149:30: warning: ‘nonnull’ argument ‘this’ compared to NULL [-Wnonnull-compare]
149 | #define assume(expr) ((expr) ? ((void) 0) \
| ~~~~~~~~^~~~~~~~~~~~~~
150 | : (assert(!"assumption failed"), \
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
151 | __builtin_unreachable()))
| ~~~~~~~~~~~~~~~~~~~~~~~~~
../src/compiler/glsl/ir.h:150:7: note: in expansion of macro ‘assume’
150 | assume(this != NULL); \
| ^~~~~~
../src/compiler/glsl/ir.h:160:4: note: in expansion of macro ‘AS_BASE’
160 | AS_BASE(dereference)
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16558>