This way it properly compiles on Visual Studio.
Fixes: 145444d265 "anv: Move multialloc to common code"
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9506>
This patch renames all macros with "GEN_" prefix defined in
common code.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
This patch renames functions, structures, enums etc. with "gen_"
prefix defined in common code.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
Changes in this patch include:
- Rename all files in src/intel/common path
- Update the filenames used in source and build files
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
Starting with d0d039a4d3, we emit writes to the push constant chunk
of the payload to stomp out-of-bounds data to zero for Vulkan. Then, in
369eab9420, we started emitting shader preamble code for emulated
push constants on Gen12.5 parts. In either of these cases, we can run
into issues if we don't have a proper live range for some of the payload
registers where they get used for something and then smashed by our push
handling code. We've not seen many issues with this yet because it only
happens when you have dead push constants.
Fixes: d0d039a4d3 "anv: Emit pushed UBO bounds checking code..."
Fixes: 369eab9420 "intel/fs: Emit code for Gen12-HP indirect..."
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9501>
It's easier to compare with the HW docs than a pile of hex.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9501>
In cases where the alpha coverage is enabled but the color attachment is
either unused or absent there should be a dummy mrt to make the draw behave
correctly.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Yannik Marek <yannik@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8952>
If a VkRenderPassInputAttachmentAspectCreateInfo is provided, we use the
aspects specified there. Otherwise, we default to every aspect in the
format. For attachments which are not input attachments, aspectMask is
left zero.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
The Android ones we put in anv_android.c. Maybe one day we'll want a
vk_android.h to put some common Android stuff but, for now, let's keep
it contained to ANV's android code.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
The variable-length stack allocations are causing issues with ubsan when
the array size is zero. Also, a heap allocation is probably safer.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8857>
- no 3D and cube textures
- no mipmapping
- no border color
- image_sample is the only supported opcode with a sampler (behaves like _lz)
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9389>
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9389>
The maximum value which OPC_GETSIZE could return for one dimension
is 0x007ff0, however sampler buffer could be much bigger.
Blob uses OPC_GETBUF for them.
Fixes tests:
dEQP-VK.memory.pipeline_barrier.transfer_dst_uniform_texel_buffer.1048576
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9391>
Otherwise, we won't be able to use OPC_GETBUF to get their size.
After this change we also could get rid of the hack for OPC_GETSIZE
which scaled the size for texture buffers.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9391>
It appears that storage for varyings in a wave has an upper
limit of wavesize * max_a831 where max_a831 is 64.
Exceeding the limit seam to force gpu to reduce primitives
processed per wave, at least calculations make sense with
such interpretation.
With blob SP_HS_WAVE_INPUT_SIZE never exceeds 64 and setting
it to 65 in freedreno leads to a hang.
Copied from the commit to freedreno e5499ca2
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8187>
This fixes the assembly for many scenarios where you want to use shader
replacement.
Note: unfortunately this leaks the identifier string created while
lexing, but I couldn't find a way to avoid leaking it except for
bringing in ralloc or something (which would be way more complicated).
The only other place doing something similar in mesa is the glsl parser,
which is using ralloc (actually a linear context).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9463>
Trying to specify a floating-point value in a @const line would result
in it getting interpreted as a FLUT value and failing parsing. Fix this
by making the various FLUT tokens include the surrounding parentheses.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9463>
Mesa fixed pipeline texture loading on programmable pipeline hardware emits
a generic fragment shader program which contains gl_TexCoord.xyzw as a vec4
and then expects to configure the varying assignments to the shader in the
pipeline command stream, to select what is wired to the XYZW fragment shader
inputs.
This gl_TexCoord.xyzw is turned into texture load with projection (TGSI TXP
opcode, similar for NIR). Texture load with projection does not exist in the
Vivante GPU as a dedicated opcode and is emulated. The shader program first
divides texture coordinates XYZ by projector W and then applies regular TEX
opcode to load the texture (i.e. TEX(gl_TexCoord.xyzw/gl_TexCoord.wwww)).
For point sprites, XY are the point coordinates from VS, Z=0 and W=1, always.
The Vivante GPU can only configure varying to be either of -- point coord X,
point coord Y, used, unused -- which covers XYZ, but not W. Z is fine because
unused means 0.
W used to be 0 too before this patch and that led to division by 0 in shader.
The only known way to solve this is to set Z=0, W=1 in the shader program
itself if the point sprites are enabled. This means we have to generate a
special shader variant which does extra SET to set the W=1 in case the point
sprites are enabled.
In case of TGSI, emitting the SET.TRUE opcode permits setting W=1 without
allocating additional constants. With NIR, use nir_lower_texcoord_replace()
to lower TEXn to PNTC, which sets Z=0, W=1, and let NIR optimize the shader.
Note that nir_lower_texcoord_replace() must be called before input linking
is set up, as it might add new FS input.
Also note that it should be possible to simply drop PIPE_CAP_POINT_SPRITE
in the long run, ST would then apply the same optimization pass, but that
option is so far misbehaving. And for etnaviv TGSI this is not applicable
yet.
This fixes neverball point sprites (exit cylinder stars) and eglretrace of
gl4es pointsprite test:
https://github.com/ptitSeb/gl4es/blob/master/traces/pointsprite.tgz
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8618>
Fixes this assert in debug builds:
in __GI___assert_fail (assertion=0x7ffff731f66b "util_cpu_caps.nr_cpus >= 1", file=0x7ffff731f650 "../src/util/u_cpu_detect.h", line=116,
function=0x7ffff7323280 <__PRETTY_FUNCTION__.11654> "util_get_cpu_caps") at assert.c:101
in util_get_cpu_caps () at ../src/util/u_cpu_detect.h:116
in _mesa_float_to_float16_rtz (val=0) at ../src/util/half_float.h:93
in util_format_r16g16b16a16_float_pack_rgba_float (dst_row=0x7fffffffbdc0 "", dst_stride=0, src_row=0x7fffffffbf90, src_stride=0, width=1, height=1)
at src/util/format/u_format_table.c:13459
in util_format_pack_rgba (format=PIPE_FORMAT_R16G16B16A16_FLOAT, dst=0x7fffffffbdc0, src=0x7fffffffbf90, w=1) at ../src/util/format/u_format.h:1525
in util_pack_color (rgba=0x7fffffffbf90, format=PIPE_FORMAT_R16G16B16A16_FLOAT, uc=0x7fffffffbdc0) at ../src/gallium/auxiliary/util/u_pack_color.h:432
in v3dv_get_hw_clear_color (color=0x7fffffffbf90, internal_type=6, internal_size=8, hw_color=0x7fffffffbf10) at ../src/broadcom/vulkan/v3dv_cmd_buffer.c:1241
v2: move call from physical device to instance init.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9408>
This restores many of the hurt shaders from the previous patch at the
expense of re-adding ldvary tracking in the scheduler.
total instructions in shared programs: 13760415 -> 13755738 (-0.03%)
instructions in affected programs: 1207560 -> 1202883 (-0.39%)
helped: 5080
HURT: 1731
Instructions are helped.
total max-temps in shared programs: 2322991 -> 2322828 (<.01%)
max-temps in affected programs: 5063 -> 4900 (-3.22%)
helped: 229
HURT: 108
Max-temps are helped.
total sfu-stalls in shared programs: 31827 -> 31545 (-0.89%)
sfu-stalls in affected programs: 478 -> 196 (-59.00%)
helped: 304
HURT: 21
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 13792242 -> 13787283 (-0.04%)
inst-and-stalls in affected programs: 1220856 -> 1215897 (-0.41%)
helped: 5162
HURT: 1697
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>
We get optimal ldvary pipelining by doing the following:
1) Carefully merge a paired ldvary into the previous instruction when
possible.
2) When the above succeeds, flag the ldvary as scheduled immediately so
we can merge one of its children into the current instruction.
3) When scheduling ldvary sequences, only pick up instructions that are
part of the sequence to avoid picking up something that prevents
successful pipelining.
This patch skips 3) assuming some hurt shaders in exchange for better
scheduling flexibility during ldvary sequences. Besides eliminating most
of the code dedicated to special handling ldvary sequences, this also
usually allows us to produce better code by merging instructions that are
unrelated to ldvary sequences into the ldvary sequences, which is
particularly effective to fill up the gaps produced when scheduling the
first and last ldvary sequences as well as the gaps produced by flat
and noperspective varyings sequences that don't have both mul and add
instructions.
Notice that there are some hurt shaders, because some times the extra
scheduler flexibility can lead to picking up instructions that will
break a sequence without compensating for that, typically an ldunif
that prevents us from doing the fixup for a follow-up ldvary. We will
try to correct some of these cases with the next patch.
total instructions in shared programs: 13786037 -> 13760415 (-0.19%)
instructions in affected programs: 3201387 -> 3175765 (-0.80%)
helped: 16155
HURT: 4146
Instructions are helped.
total max-temps in shared programs: 2324834 -> 2322991 (-0.08%)
max-temps in affected programs: 22160 -> 20317 (-8.32%)
helped: 1340
HURT: 103
Max-temps are helped.
total sfu-stalls in shared programs: 30685 -> 31827 (3.72%)
sfu-stalls in affected programs: 782 -> 1924 (146.04%)
helped: 253
HURT: 1416
Inconclusive result.
total inst-and-stalls in shared programs: 13816722 -> 13792242 (-0.18%)
inst-and-stalls in affected programs: 3171642 -> 3147162 (-0.77%)
helped: 15331
HURT: 4179
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>
These checks depend on prev_inst being set, so move them down below
with all the other checks with the same requirement.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9471>
In case a new gl_PointCoord shader input is created, increment shader
input count and set valid driver_location to the new input variable,
otherwise the input gets aliased to input 0 and shows up in NIR_PRINT
output as whatever shader input 0 is instead of gl_PointCoord. Also
set the input as used, otherwise it might get removed.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9214>
Future patches for VK_EXT_image_drm_format_modifier will, in some cases,
place the aux surface and fast clear state into a driver-private bo.
This increases the complexity of image memory layout to such a degree
that, to maintain sanity, we must improve how we track the layout.
Define new types:
- anv_image_memory_range
- anv_image_memory_binding
- anv_image_binding
Delete many fields in anv_image (and its children), and replace them
with the new types.
This patch does not change how anv_image tracks (or, rather, does not
track) the memory of gen12 implicit ccs. We should probably do that, but
that's left as a future exercise.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8097>
It calculates the address to a surface or to metadata in the image.
Refactor only. No intended change in behavior.
This patch prepares for, and reduces much noise in, the upcoming patch
that rewrites image memory tracking.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8097>
If the image is disjoint, there is no reason to calculate image-global
memory requirements. Instead, only per-plane memory requirements are
needed.
Also, delete a large duplicate comment.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8097>
Current code checks for surface validity with `surface.isl.size_B > 0`.
Replace the checks with anv_surface_is_valid().
This prepares for adding new members to anv_surface that may
be accidentally used as a validity-indicator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8097>
1. Don't compare bo->size to image->size. An upcoming patch replaces
anv_image::size with complicated stuff. Instead, properly query the
required size with anv_GetImageMemoryRequirements.
2. Require the bo to fit the *aligned* image size.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8097>
The calculation of the subsurfaces' memory requirements assumed that the
image was disjoint if the image was created with
VK_IMAGE_CREATE_DISJOINT_BIT. But the Vulkan spec also requires that the
VkFormat be multi-planar.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8097>
The name anv_image_plane::bo_is_owned will be made ambiguous by the
implementation of VK_EXT_image_drm_format_modifier, which may bind the
plane to multiple bo's.
Also, bo_is_owned was set if and only if the image was imported from
gralloc, and it was set only on the first plane. Therefore, let's rename
the field to from_gralloc, and move it to the toplevel of anv_image.
v2: Fix build in anv_android.c.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8097>
the maximum allowable runtime version of vk can be computed by MIN(instance_version, device_version)
despite this, instances and devices can be created using the maximum version available
for each respective type. the restriction is applied only at the point of
enabling/applying features and extensions, meaning that to correctly handle this,
zink must:
1. create an instance using the maximum allowable version
2. select a physical device using the instance
3. compute MIN(instance_version, device_version)
4. only now begin to enable/use features requiring vk 1.1+
ref #4392
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Hoe Hao Cheng <haochengho12907@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9479>
vkGetPhysicalDeviceProperties2 is not allowed to be used with a 1.0 device
because it's a vulkan 1.1 function.
Closes: #4396
Fixes: 38ce8d4d ("vulkan/device_select: Stop using device properties 2.")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9462>
With this command implemented messages emitted by
applications via glDebugMessageInsert will be forwarded
to the host.
v2: - remove check for feature in encode function, this
is covered in the state tracker (Rohan)
- reorder parameters in the encode function to the
order of the emit callback
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9433>
nir_lower_idiv(..) creates during its lowering isub instructions.
Move nir_lower_idiv(..) before the opt loop to have a chance to
optimize/lower isub away. Also drop the drop the halti dependency to
make it easier to follow.
This fixes the following assert on GC3000:
Unhandled ALU op: isub
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9447>
the full-variable outputs can be skipped, leaving only the varyings which
actually need explicit emission due to packed layouts or whatever
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9271>
if the number of explicit xfb outputs or new varyings added to the existing size
of the slot map would cause an overflow, we have to force a new slot map to
ensure that everything fits
this means iterating all the stages which can produce new varyings and calculating
all the slots required in order to compare against the max size available
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9271>
if an entire variable is being dumped into an xfb buffer, there's no need
to create an explicit xfb variable to copy the value into, and instead
the xfb attributes can just be set normally on the variable
this doesn't work for geometry shaders because outputs are per-vertex
fixes all KHR-GL46.enhanced_layouts xfb tests
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9271>
running nir_lower_io_arrays_to_elements_no_indirects for only some stages
breaks location-setting for the stages which don't run it when
e.g., dmat2x3 variables are sometimes split across locations and
sometimes jammed into a single location (TCS I'm looking at you)
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9271>
if we get some crazy matrix types in here then we need to ensure that
we accurately unwrap them and copy the components
fixes KHR-GL46.enhanced_layouts.xfb_stride
Fixes: 1b130c42b8 ("zink: implement streamout and xfb handling in ntv")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9271>
this was added during review, but it was never correct and just crashes
valid cases like streamout from a mat3x4 type
Fixes: b6f8f3a3ba ("zink: fix streamout for clipdistance")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9271>
This will take effect in future patches when we are able to query the
kernel to set device->vram.size to a non-zero size.
Builds on Sagar's ("anv: Query memory region info") patch, and
re-organizes things as recommended by Lionel (and Jason).
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9324>
Just treat the llc and non-llc paths as separate cases. This will also
help when adding the local memory setup.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9324>
When tyding up this section in 1e5b09f42f ("spirv: Tidy some repeated
if checks by using a switch statement.") the break got lost. It is
not a real problem because the next case just break, but better to
have it explicitly here instead of a FALLTHROUGH.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9440>
In nouveau's PBO path with GS support and no VS layer export, we got:
intrinsic store_output (ssa_1, ssa_0) (0, 15, 0, 160, 128) /* base=0 */ /* wrmask=xyzw */ /* component=0 */ /* src_type=float32 */ /* location=0 slots=1 */ /* out_pos */
[...]
vec3 32 ssa_4 = mov ssa_3.xxx
intrinsic store_output (ssa_4, ssa_0) (0, 4, 0, 160, 128) /* base=0 */ /* wrmask=z */ /* component=0 */ /* src_type=float32 */ /* location=0 slots=1 *//* out_pos */
The mov's SSA value we would decide we could store directly to the output,
since nothing else used it. However, the store has a writemask, and the
ALU op was stomping over it instead of ANDing with the output decl's
existing writemask.
Fixes: f79f382c81 ("nir_to_tgsi: Store directly to TGSI outputs when possible.")
Closes: #4380
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9376>
This replaces the new_src parameter of nir_ssa_def_rewrite_uses_after()
with an SSA def, and rewrites all the users as needed.
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
This commit replaces the new_src parameter of nir_ssa_def_rewrite_uses()
with an SSA def, removes nir_ssa_def_rewrite_uses_ssa(), and rewrites
all the users as needed.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
This is currently an alias for nir_ssa_def_rewrite_uses but we move all
the instances which used it to write a non-SSA source to the newly named
helper.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9383>
While we're here, add __gen_get_batch_address declarations to more files
because we're about to start requiring it on all GFX 12.5+.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9445>
Add logical shift left and right operations support to mi_builder.
v1:
- Add GEN_GEN > 12 check (Jordan Justen)
- Add gen_mi_has_shift function (Jordan Justen)
- Fix commit title (Jordan Justen)
v2 (Jason Ekstrand):
- Add _imm versions of all of them
- Better handle corner-cases in _imm helpers
- Handle the power-of-two limitation for _imm versions
- Add tests
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9445>
The container is moved from before and hence returns size 0. To get the
correct value, the new instruction container must be used instead.
This was flagged by clang-tidy. The fixed call still triggers the
corresponding diagnostic, hence this change silences it by adding a
redundant clear() after move.
Fixes: 7f1b537304 ("aco: add new NOP insertion pass for GFX6-9")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9432>
This pass needs to run on the last shader in a pipeline writing
gl_Position. In GLES2, that's always the vertex shader, but in ES3.2, it
can be a geometry or tessellation shader. The shared code works the same
in this case, just make the assert more generous.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9444>
Now the texture virtual memory usage is less of a problem,
we can use this workaround permanently.
In the spirit of the API it's certainly not the proper way
of implementing DYNAMIC textures (it seems they are ok
to have hidden copies in driver managed memory, but not have
virtual addressing space reduced), but it makes sense for us,
both performance wise, and to avoid bugs.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9377>
On 32 bits, virtual memory is sometimes too short for apps.
Textures can hold virtual memory 3 ways:
1) MANAGED textures have a RAM copy of any texture
2) SYSTEMMEM is used to have RAM copy of DEFAULT textures
(to upload them for example)
3) Textures being mapped.
Nine cannot do much for 3). It's up to driver to really unmap textures
when possible on 32 bits to reduce virtual memory usage.
It's not clear whether on Windows anything special is done for
1) and 2). However there is clear indication some efforts have
been done on 3) to really unmap when it makes sense.
My understanding is that other implementations reduce the usage
of 1) by deleting the RAM copy once the texture is uploaded
(Dxvk's behaviour is controlled by evictManagedOnUnlock).
The obvious issue with that approach is whether the texture is
read by the application after some time. In that case,
we have to recreate the RAM backing from the GPU buffer.
And apps DO that. Indeed I found that for example Mass Effect 2
with High Texture mods (one of the crash case fixed by this patch serie),
When the character gets close to an object, a high res texture and replaces
the low res one. The high res one simply has more levels, and the game seems
to optimize reading the high res texture by retrieving the small-resolution
levels from the original low res texture.
In other words during gameplay, the game will randomly read MANAGED textures.
This is expected to be fast as the data is supposed to be in RAM...
Instead of taking that RAM copy eviction approach, this patchset
proposes a different approach: storing in memfd and release the
virtual memory until needed.
Basically instead of using malloc(), we create a memfd file
and map it. When the data doesn't seem to be accessed anymore,
we can unmap the memfd file.
If the data is needed, the memfd file is mapped again.
This trick enables to allocate more than 4GB on 32 bits apps.
The advantage of this approach over the RAM eviction one,
is that the load is much faster and doesn't block the GPU.
Of course we have problems if there's not enough memory to map the
memfd file. But the problem is the same for the RAM eviction approach.
Naturally on 64 bits, we do not use memfd.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9377>
We should only pass in a new indirect_info object if we actually set valid
values in it.
Fixes: abe8ef862f "gallium: make pipe_draw_indirect_info * a draw_vbo parameter"
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9425>
In order for shader viewport index to be calculated correctly,
the cliptest code needs proper primitive lengths to work out
the provoking vertex. I half fixed this before for GL4 but looks
like I didn't make it all the way.
This fixes:
dEQP-VK.draw.shader_viewport*
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9401>
Commit e99e7aa4 began passing start > 0 to indexed draw calls rather
than keeping start at 0 and manually advancing ib->ptr. This should
work fine, however, there have been instances of software fallbacks
not handling things right.
vbo_sw_primitive_restart had a bug where it was ignoring "start" and
always calling find_sub_primitives with start = 0 and end = ib->count.
This meant that when start > 0, it was analyzing the wrong part of the
index buffer when finding subprimitives.
In theory, each _mesa_prim can have a different "start" value. But
the code only calls find_sub_primitives once, because it wants to
map, analyze, and unmap the index buffer before calling ctx->Draw,
as some drivers don't support drawing with the index buffer mapped.
To handle this, we break vbo_sw_primitive_restart calls into sections
where "start" matches across all the primitives, similar to how I
handled the issue in tnl in commit bd6120f562.
In the common case, start matches and we handle it in one pass anyway.
Fixes Piglit's primitive-restart VBO_COMBINED_VERTEX_AND_INDEX test
and KHR-GL33.pipeline_statistics_query_tests_ARB.functional_primitives_vertices_submitted_and_clipping_input_output_primitives
on Intel Ivybridge and older (which don't do arbitrary cut indices).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4052
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9417>
Use debug_printf more consistently, normalize formatting a bit, and
trace a few more places you're likely to care about.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9436>
Applications rarely require them, but this improves fossil-db replay time.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9411>
This register is only 32bits.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1952fd8d2c ("anv: Implement VK_EXT_conditional_rendering for gen 7.5+")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9428>
Instead of lowering it in NIR, use the lookup tables as inputs to a
second-order Taylor expansion. shader-db results aren't amazing but keep
in mind this is without backend CSE yet.
total instructions in shared programs: 115913 -> 115707 (-0.18%)
instructions in affected programs: 3151 -> 2945 (-6.54%)
helped: 12
HURT: 0
Instructions are helped.
total nops in shared programs: 84045 -> 84041 (<.01%)
nops in affected programs: 1571 -> 1567 (-0.25%)
helped: 1
HURT: 7
Inconclusive result (value mean confidence interval includes 0).
total clauses in shared programs: 20498 -> 20489 (-0.04%)
clauses in affected programs: 188 -> 179 (-4.79%)
helped: 6
HURT: 0
Clauses are helped.
total quadwords in shared programs: 90395 -> 90291 (-0.12%)
quadwords in affected programs: 2287 -> 2183 (-4.55%)
helped: 12
HURT: 0
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9420>
Useful for representing -0 in transcendental sequences matching the
blob.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9420>
With these changes the shader code is visible in RGP.
Vk pipeline feature is emulated using si_update_shaders: when shaders are
updated we compute a sha1 of their code and use it as a pipeline hash.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
For radeonsi the shaders don't live in the same BOs, so they're
unlikely to be less that 0x1000 bytes apart.
So this commit bumps the threshold to 0x10000 and warns once
when hitting it.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
We were checking that the previous instruction doesn't write flags,
but we also need to check it doesn't read them.
Fixes: 1784dd22a3 ('broadcom/compiler: pipeline smooth ldvary sequences')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9431>
When we were only able to pipeline smooth varyings, if we had to disable
ldvary pipelining in the middle of a sequence it would stay disabled for
the rest of the program, to prevent us from prioritizing scheduling of
ldvary instructions that we would not be able to pipeline effectively.
Now that we can pipeline all ldvary sequences we can change this.
This change re-enables ldvary pipelining upon finding the next
ldvary in the program in the hopes that we can continue pipelining
succesfully. To do this, we track the number of ldvary instructions we
emitted so far and compare that to the number of inputs in the fragment
shader we are scheduling. This also allows us to simplify our ldvary
tracking at nir to vir time, since that is all now handled in the QPU
scheduler.
total instructions in shared programs: 13817048 -> 13810783 (-0.05%)
instructions in affected programs: 810114 -> 803849 (-0.77%)
helped: 4843
HURT: 591
Instructions are helped.
total max-temps in shared programs: 2326612 -> 2326300 (-0.01%)
max-temps in affected programs: 4689 -> 4377 (-6.65%)
helped: 285
HURT: 7
Max-temps are helped.
total sfu-stalls in shared programs: 30942 -> 30865 (-0.25%)
sfu-stalls in affected programs: 207 -> 130 (-37.20%)
helped: 120
HURT: 42
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 13847990 -> 13841648 (-0.05%)
inst-and-stalls in affected programs: 825378 -> 819036 (-0.77%)
helped: 4899
HURT: 590
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9404>
Use atomic operations to avoid competition. In addition,
since sub_ctx_id 0 has been used by default, sub_ctx_id
should start from 1.
Signed-off-by: Xin He <hexin.op@bytedance.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9406>
The clear code is 0xCC which means CMASK isn't fast-cleared.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9392>
It makes more sense to try to enable TC-compat if the image has HTILE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9405>
Not copying all the scissors caused
dEQP-VK.pipeline.extended_dynamic_state.two_draws_dynamic.2_viewports
to fail but thah test pointlessly relies on KHR_multiview (cts issue
filed).
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9422>
Prior to SKL, the mipmaps for 3D surfaces are laid out in a way
that make it impossible to represent in the way that
VkSubresourceLayout expects. Since we can't tell users how to make
sense of them, don't report them as available.
"Fixes" dEQP-VK.image.subresource_layout.3d.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9419>
This implements most of the remaining u_threaded_context support. Most
of the heavy lifting was done in the previous patches which fixed things
up for the new thread safety requirements. Only a few things remain.
u_threaded_context support can be disabled via an environment variable:
GALLIUM_THREAD=0
On Felix's Tigerlake with the GPU at fixed frequency, enabling
u_threaded_context improves performance of several games:
- Civilization VI: +17%
- Shadow of Mordor: +6%
- Bioshock Infinite +6%
- Xonotic: +6%
Various microbenchmarks improve substantially as well:
- GfxBench5 gl_driver2: +58%
- SynMark2 OglBatch6: +54%
- Piglit drawoverhead: +25%
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
pipe->transfer_map can be called from u_threaded_context's thread
rather than the driver thread. We need to use two different slab
allocators, one for each thread. transfer_unmap, on the other hand,
is only ever called from the driver thread.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
u_threaded_context requires various objects to inherit from a new
threaded_foo base class rather than directly from pipe_foo. This
patch does most of the mechanical changes required for that.
It also initializes the new threaded_resource fields.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
When we enable u_threaded_context, the pipe->create_*_state hooks
(precompile variants) are going to be called from one thread, while
iris_update_compiled_shaders (on-the-fly variants) are going to be
called from a driver thread. BLORP shaders also happen from
clear, blit, and so on in the driver thread.
u_upload_mgr isn't thread-safe, so use an uploader for each purpose.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
This enables us to replace the backing storage of resources that have
been used as stream output targets, in case we're invalidating their
entire contents. This can avoid stalls. We simply hadn't supported it
because it was going to be tricky to re-emit 3DSTATE_SO_BUFFER without
screwing up "reset offset to zero" vs. "keep appending". But that
should be working fine with the previous patch's refactor.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
The previous mechanism was a bit fragile. We stored the zero offset
in the pre-baked packet, and used an flag to override 0xFFFFFFFF
(append) offsets until our first emit - then prohibited anyone from
trying to re-emit the packet by flagging IRIS_DIRTY_SO_BUFFERS,
because that would re-emit the version with the zeroing of the offset.
Now, we always store 0xFFFFFFFF in the pre-baked packet, and use a
flag to override it to zero on the first emit. That way, we can
re-emit that packet at any time, and it'll just keep appending.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
In the future, Marek is planning to make u_threaded_context call
create_stream_output_target() from a different thread than the main
driver thread, which means that we can't safely use uploaders there.
To prepare for this eventual future, just defer the allocation of
the offset BO 'til later. It's a very small amount of overhead.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
With u_threaded_context, create_surface and create_sampler_view will
be called from a different thread than the driver thread. They aren't
allowed to access the context, which means that they can't use the
uploaders there to upload our SURFACE_STATE entries.
Thanks to backing-storage replacement and iris_rebind_buffer, we already
reworked things to maintain CPU-side copies of the SURFACE_STATE entries
and added the ability to upload or re-upload them later. So we can skip
the upload at object creation time, and add a simple resource-is-NULL
check at binding table upload time to ensure that they get uploaded by
the time we need them. (They might get uploaded earlier due to rebinds
or clear color updates, but this is the last moment to do so.)
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
It shouldn't affect bound program state, and the current context state
shouldn't be relevant for shader creation precompiles anyway (level load
isn't going to have the eventual set of sampler views bound when you go to
draw with that shader).
Closes: #4306
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9089>
You always have to populate the key with the right texture swizzles, even
if textures haven't changed since binding a new shader.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9089>
We can compose the swizzles at sampler view creation time, saving
recompiles on texture format changes.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9089>
This allows for virgl guests to expose GL_NVX_gpu_memory_info and
GL_ATI_meminfo when the extensions are supported on the host.
Signed-off-by: Rohan Garg <rohan.garg@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9337>
mi_ is already a unique prefix in Mesa so the gen_ isn't really gaining
us anything except extra characters. It's possible that MI_ may
conflict a tiny bit with GenXML but it doesn't seem to be a problem
today and we can deal with that in the future if it's ever an issue.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9393>
Writes to global memory should not be moved over discard,
otherwise we could have unintended side-effects or lack of
side-effects where they should be observed.
Fixes tests:
dEQP-VK.rasterization.frag_side_effects.color_at_beginning.kill
dEQP-VK.rasterization.frag_side_effects.color_at_end.kill
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9365>
Disabling VM_ALWAYS_VALID actually hurts more than it helps
after doing more testing. Managing the global BO list in userspace
is really costly and make a bunch of games CPU bound.
I think re-enabling VM_ALWAYS_VALID is a step in the right direction.
This reverts commit 6ac6e2fbfb.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9341>
We had an optimization in place to skip a unifa write if the address
happens to be right after the last ldunifa read address, but we can
take this further and update the unifa address by emitting ldunifa
instructions if needed to skip a unifa write that is close enough.
This is because a unifa write involves 4 cycles: 1 for the write
and 3 delay slots before we can emit the first ldunifa.
So if we have code like this:
unifa addr + 0
ldunifa.r0
unifa addr + 12
ldunifa.r1
In practice we end up with QPU like this:
unifa addr + 0
nop
nop
nop
ldunifa.r0
unifa addr + 12
nop
nop
nop
ldunifa.r1
And with this patch we get:
unifa addr + 0
nop
nop
nop
ldunifa.r0 <--- reads offset 0
ldunifa.- <--- reads offset 4
ldunifa.- <--- reads offset 8
ldunifa.r1 <--- reads offset 12
Of course, QPU scheduling might find ways to fill the NOPs to some
extent and remove some of the gains, but generally speaking, this is
still usually a win.
Going by shader-db results, allowing the next unifa address to be up
to 12 bytes after the address resulting from the last ldunifa read
shows the best results:
total instructions in shared programs: 13817048 -> 13812202 (-0.04%)
instructions in affected programs: 602701 -> 597855 (-0.80%)
helped: 1750
HURT: 760
Instructions are helped.
total uniforms in shared programs: 3795485 -> 3793200 (-0.06%)
uniforms in affected programs: 43930 -> 41645 (-5.20%)
helped: 898
HURT: 0
Uniforms are helped.
total max-temps in shared programs: 2326612 -> 2326621 (<.01%)
max-temps in affected programs: 651 -> 660 (1.38%)
helped: 10
HURT: 21
Inconclusive result (value mean confidence interval includes 0).
total sfu-stalls in shared programs: 30942 -> 30906 (-0.12%)
sfu-stalls in affected programs: 627 -> 591 (-5.74%)
helped: 186
HURT: 158
Inconclusive result (value mean confidence interval includes 0).
total inst-and-stalls in shared programs: 13847990 -> 13843108 (-0.04%)
inst-and-stalls in affected programs: 601404 -> 596522 (-0.81%)
helped: 1747
HURT: 757
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9384>
We can't remove unused ldunifa that are not the first or last
in a sequence, but we can still ignore their destination
to reduce register pressure.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9384>
With threaded-context we won't have a chance to apply the workaround in
the backend driver. But the previous commit moves it to a driconf
configured workaround in mesa/st, so we can drop this now.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9316>
All the ir3 gens had the same thing, time to move it out into a shared
helper.
The keeping the storage in fdN_context is to avoid namespace clashes
between ir3 and ir2.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9394>
I had forgotton on which gens these where used on (which is important if
you need to know which shader stages use these).. expand the comments a
bit.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9394>
Found this on top of Karol's patches but it seems like it can just be
applied to master.
Helps with some cases of
kernel_image_methods/test_kernel_image_methods 2Darray
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9381>
TGSI has these nice labels for us for where to jump in this case, let's
use them. Improves piglit arb_shader_image_load_store-shader-mem-barrier
runtime massively, though not enough to make the test really reasonable to
run.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9347>
FP16 rendering is supported on all gen4 hardware.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9379>
This has 3 advantages:
- It's SMEM.
- Multiple single component loads are merged into 1 multi-dword load by LLVM.
- The result is always packed for packed instructions.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9395>
The search helps must *never* modify the instruction passed in, so let
the compiler enforce this.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9378>
Available since Vulkan 1.0, and in fact already wired up, just not
advertised. It looks like we could make this dynamic state but this
works for now.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9371>
Some games declare the wrong format, so we might want to disable this
optimization in that case.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: e4d75c22 ("nir/opt_shrink_vectors: shrink image stores using the format")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9229>
tu_shader should be freed after pipeline is successfully created.
Fixes tests:
dEQP-VK.api.object_management.alloc_callback_fail.compute_pipeline
dEQP-VK.api.object_management.alloc_callback_fail_multiple.compute_pipeline
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9364>
Computing the expected buffer size isn't reliable on GFX10+ because
DROPPED_CNTR returns weird results.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9367>
DROPPED_CNTR isn't reliable and might still report non-zero if the
SQTT buffer isn't full. Checking if the number of written bytes by
the hw is equal to the SQTT buffer size seems reliable.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9367>
This fixes a GPU hang on my Sienna because the number of SE is
less than the maximum, and SE #1 is disabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9370>
This appeared in softpipe's image operations, since NIR always uses
4-component values for the coords, while the GLSL IR only has 2 components
for a 2D image (for example).
arb_shader_image_load_store-shader-mem-barrier (which times out in CI and
spends its time inside of tgsi_exec) was spending 4/51 of its instructions
on moving these undefs around.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9345>
I was suspicious that some entries in dri2_format_table (in
dri_helpers.c) had this field set incorrectly. It seemed like
DRM_FORMAT_ABGR16161616F and DRM_FORMAT_XBGR16161616F should have been 8
instead of 4. Upon digging I found that nothing uses the field. Fix
code by removing it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9354>
both bits will have been flagged at this point in order to indicate
that the aspects will be cleared "at some point" during the loop, but
when actually iterating through the pending clears, only the bits set
in the clear call should be applied
Fixes: 5c629e9ff2 ("zink: defer pipe_context::clear calls when not currently in a renderpass")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9366>
Remove the useless driCheckOption calls. They always
succeed.
As a result the intended behaviour for thread_submit
was not working (different default depending on the gpu
used). Add a comment to fix that in the future.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
At the release of the last object holding a reference
on the device, the device dtor was executed and
the objector dtor was ignored.
The proper way is to execute the object dtor, then
the device dtor.
The previous code was likely for a workaround against
something that was fixed since.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
*PrivateData functions were not protected by
a mutex for Volumes whereas they definitely
should.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
FETCH4 is a feature that needs to be implemented
to advertise D3DFMT_DF24.
It's basically a variant of Gather4.
This first implementation will need to be completed
to implement the feature fully, but the feature
doesn't seem to be much used (other equivalent
features are preferred by games).
Note until DF24 is advertised, apps are not supposed to use
FETCH4.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
FETCH4 is a d3d9 extension not much used, as newer
ones were prefered. However it's support is required
to advertise the DF24 format.
Prepares support by tracking compatible formats.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
Do not unmap anything until all buffer unlocks
were received.
A buffer can be filled in several threads, and thus
in the case of double locks, it's not possible to know
which unlock is received first.
Thus only unmap the buffers when the last unlock is
received.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
Previously we used to clamp "available_texture_limit",
which was incorrect. "available_texture_mem" should
have been clamped instead.
The resulting code was noop.
The idea behind that code was that 32 bits executable
would see maximum 4GB video memory.
However it seems according to users that 32 bits apps
should be able to allocate more than 4GB, thus the
clamping is inappropriate.
Instead clamp the return of GetAvailableTextureMem, to
correctly report a high value when there is more than
4GB available.
I do not know what should exactly be the clamp value,
for now have a 64MB margin below UINT_MAX.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
For D3DUSAGE_AUTOGENMIPMAP basically, everything behaves
for the application as if the texture had one level.
However the pipe_resource has more levels, and those
get generated automatically.
Previously we did allocate all the Surfaces as if
the texture had all the levels, except of just one.
The app could still just access the first level.
This patch completly removes the useless unaccessible
Surfaces.
In addition removes redundant handling of D3DUSAGE_AUTOGENMIPMAP.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9177>
The code improvement is limited and it interferes with using literals
directly in LDS index ops, since here source modifiers are not
supported, but the current assembler code might inject the modifiers.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
The original shared load op can't be reordered, so it might be better to
also not allow this for the lowered variant.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
The backend code was actually assuming this, but the lowering still set
the components and write masks like it would be honoured.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9330>
A Hitman 2 shader does: read64(local_invocation_index() * 4 - 4). This was
likely emitting a ds_read2_b32 on GFX6. For local_invocation_index()=0,
because the first dword was out-of-bounds, the second was likely also
considered out-of-bounds (even though it's not, at offset 0).
Likely fixes https://gitlab.freedesktop.org/mesa/mesa/-/issues/3882
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 57e6886f98 ("aco: refactor load_lds to use new helpers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9332>
This reverts commit 1a0b0e8460.
The bounds checking behaviour of ds_read_b64, ds_read_b96 and ds_read_b128
make this feature very difficult to use safely.
This fixes a blocking artifact in Hitman 2. Previously, it contained:
ds_read_b64(local_invocation_index() * 4 - 4)
For local_invocation_index()=0, the second dword would be considered
out-of-bounds, even though it's at offset 0.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9332>
Now that we can pipeline all varyings we should not be referring
specifically to smooth varyings anywhere.
Also, rename the instruction field 'ldvary_pipelining' to
'is_ldvary_sequence', which is more appropriate, since we always
set this for any instruction involved with varying setups,
independently of whether they end up being pipelined or not.
This also does some other minor edits which intend to slightly
simplify the code and make it a bit more compact.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9363>
An earlier commit tried to make this shader compatible with GLSL 3.30,
but it requires, GL_ARB_gpu_shader_int64, which requires GLSL 4.00 and
GL 4.0 according to the extension spec. So we were failing to enable
the required extension, breaking compilation of this shader.
The original intention of that patch was to get this working on zink,
which at the time only supported GL 3.3. But now it supports later
OpenGL versions, so we don't need to do this any longer. Rather than
revert the patch and raise the version all the way back to 430, just
bump it to the require 400 at Ian Romanick's suggestion.
Fixes: 4d47b22bf0 ("glsl/float64: make this compatible with glsl 330")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3991
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9351>
These end up having a NOP between the ldvary and the next instruction
in the sequence (a MOV for flat and an FADD for noperspetive):
nop ; nop ; ldvary.r0
nop ; nop
fadd rf6, r0, r5 ; nop ; ldvary.r1
nop ; nop
fadd rf5, r1, r5 ; nop ; ldvary.r2
nop ; nop
fadd rf4, r2, r5 ; nop ; ldvary.r3
To pipeline these, we can reuse the same infrastructure we have in
place for smooth varyings but we need to avoid breaking the sequence
due to the NOP instruction. We do that by testing if dropping the
sequence when we failed to pick up the next instruction also fails
to choose an instruction.
This is not perfect, because we may be able to choose an instruction
outside the sequence such as an ldunif, and use that to break a
sequence that we could otherwise continue after scheduling the NOP
instruction, but it is still better than nothing.
total instructions in shared programs: 13820690 -> 13819774 (<.01%)
instructions in affected programs: 64026 -> 63110 (-1.43%)
helped: 479
HURT: 62
Instructions are helped.
total max-temps in shared programs: 2326435 -> 2326423 (<.01%)
max-temps in affected programs: 102 -> 90 (-11.76%)
helped: 7
HURT: 0
Max-temps are helped.
total sfu-stalls in shared programs: 30683 -> 30710 (0.09%)
sfu-stalls in affected programs: 13 -> 40 (207.69%)
helped: 2
HURT: 24
Sfu-stalls are HURT.
total inst-and-stalls in shared programs: 13851373 -> 13850484 (<.01%)
inst-and-stalls in affected programs: 62818 -> 61929 (-1.42%)
helped: 466
HURT: 65
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
Typically, we would schedule smooth varyings like this:
nop ; nop ; ldvary.r4
nop ; fmul r0, r4, rf0
fadd rf13, r0, r5 ; nop ; ldvary.r1
nop ; fmul r2, r1, rf0
fadd rf12, r2, r5 ; nop ; ldvary.r3
nop ; fmul r4, r3, rf0
fadd rf11, r4, r5 ; nop ; ldvary.r0
where we pair up an ldvary with the fadd of the previous sequence
instead of the previous fmul. This is because ldvary has an implicit
write to r5 which is read by the fadd of the previous sequence, so
our dependency tracking doesn't allow us to move the ldvary before the
fadd, however, the r5 write of the ldvary instruction happens in the
instruction after it is emitted so we can actually move it to the fmul
and the r5 write would still happen in the same instruction as the fadd,
which is fine.
This patch allows us to pipeline these sequences optimally. For that,
after merging an ldvary into a previous instruction in the middle of
a pipelineable ldvary sequence, we check if we can manually move it
to the last scheduled instruction instead (the one before the
instruction we are currently scheduling).
If we are successful at moving the ldvary to the previous instruction,
then we flag the ldvary as scheduled immediately, which may promote
its children (the follow-up fmul instruction for that ldvary) to DAG
heads and continue the merge loop so that fmul can be picked and
merged into the final fadd of the previous sequence (where we had
originally merged the ldvary). This leads to a result that looks like
this:
nop ; nop ; ldvary.r4
nop ; fmul r0, r4, rf0 ; ldvary.r1
fadd rf13, r0, r5 ; fmul r2, r1, rf0 ; ldvary.r3
fadd rf12, r2, r5 ; fmul r4, r3, rf0 ; ldvary.r0
Shader-db results:
total instructions in shared programs: 14071591 -> 13820690 (-1.78%)
instructions in affected programs: 7809692 -> 7558791 (-3.21%)
helped: 41209
HURT: 4528
Instructions are helped.
total max-temps in shared programs: 2335784 -> 2326435 (-0.40%)
max-temps in affected programs: 84302 -> 74953 (-11.09%)
helped: 4561
HURT: 293
Max-temps are helped.
total sfu-stalls in shared programs: 31537 -> 30683 (-2.71%)
sfu-stalls in affected programs: 3551 -> 2697 (-24.05%)
helped: 1713
HURT: 750
Sfu-stalls are helped.
total inst-and-stalls in shared programs: 14103128 -> 13851373 (-1.79%)
inst-and-stalls in affected programs: 7820726 -> 7568971 (-3.22%)
helped: 41411
HURT: 4535
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
If we have two (or more) smooth varyings like this:
nop t3; ldvary.rf0
fmul t5, t3, t0
fadd t6, t5, r5
nop t7; ldvary.rf0
fmul t9, t7, t0
fadd t10, t9, r5
nop t11; ldvary.rf0
fmul t13, t11, t0
fadd t14, t13, r5
We may be able to pipeline them like this:
nop ; nop ; ldvary.r4
nop ; fmul r0, r4, rf0 ; ldvary.r1
fadd rf13, r0, r5 ; fmul r2, r1, rf0 ; ldvary.r3
fadd rf12, r2, r5 ; fmul r4, r3, rf0 ; ldvary.r0
But in order to do this, we will need to manually tweak the
QPU scheduling.
This patch tracks information about ldvary sequences that are
good candidates for pipelining, and a follow-up patch will
use this information to pipeline them when we emit the QPU
code.
v2 (apinheiro):
- Rename the v3d_compile fields to avoid confusion with the qinst fields.
- Assert that a sequence's start instruction is not the same as the end.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
When selecting an instruction to merge, we want to pre-remove that
instruction from the DAG, not the one we are merging it in, which
we had already pre-removed right before.
The reason this was not causing problems before is that the
consequence of this bug is we will choose the same instruction
again in the merge loop and trying to merge that instruction twice
will fail and we would break out of the merge loop and move on.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>
EXT_external_objects require we call glSignalSemaphoreEXT followed by a
glFlush. If the rendering workload is small when Signal and Flush
take place the relevant batch buffers with the actual rendering might
have been submitted already. In that case the following condition is met:
(iris_batch_bytes_used(batch) == 0). This causes:
glFlush() --> iris_fence_flush() -> iris_batch_flush() ->
_iris_batch_flush() to no-op, and so the fence doesn't get submitted to the
kernel. Then when anv tries to submit an execuf2 that must wait on the
shared VkSempahore / drm_syncobj fence, there isn't one and the kernel
rejects the batchbuffer causing an -EINVAL return of the execbuf2 ioctl
and a VK_DEVICE_LOST error. Empty batch buffers do have typically one
fence attached, but the ones carrying the extra fence from a
glSignalSempahore() call do have at least 2.
See also: the discussion in MR!4337.
v2: Changed the batch struct to have a contains_fence_signal variable
that is set to true when i915_EXEC_FENCE_SIGNAL fence is added to the
batch and off when batch is reset (Tapani Pälli)
Authored-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reported-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Tested-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Eleni Maria Stea <elene.mst@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8861>
The cache has been detangled from glsl and used outside it (with Vulkan drivers)
for years now.
This also cleans up the dependancies in the build file. The test doesn't
depend on the glsl lib but rather the util lib.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9327>
If zink is meant to work against Vulkan 1.0 API then it
should expose the 1.0 API as create time as well as always
ask for all the vulkan 1.0 extensions.
Reviewed-by: Hoe Hao Cheng <haochengho12907@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9075>
A recent commit stopped updating the inverse MVP matrix, because usually
only GLSL built-ins need it. However, TNL also needs it. So make sure
it's correct when needed.
Fixes: 10371c520c ("mesa: don't compute the ModelView * Projection matrix if not used")
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9346>
Fixes the following building error:
FAILED: out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_util_intermediates/xmlconfig.o
...
external/mesa/src/util/xmlconfig.c:1030:12: fatal error: 'driconf_static.h' file not found
^~~~~~~~~~~~~~~~~~
1 error generated.
Fixes: a6b0ceb ("driconf: Generate a static table when no xmlconfig")
Acked-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Marijn Suijten <marijn.suijten@somainline.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9294>
MR to move to python3 in Android build gen rules is still pending
The change is to avoid following building error:
FAILED: out/target/product/x86_64/gen/STATIC_LIBRARIES/libmesa_util_intermediates/driconf_static.h
/bin/bash -c "/usr/bin/python external/mesa/src/util/driconf_static.py external/mesa/src/util/00-mesa-defaults.conf > out/target/product/x86_64/gen/STATIC_LIBRARIES/libmesa_util_intermediates/driconf_static.h"
File "external/mesa/src/util/driconf_static.py", line 2
SyntaxError: Non-ASCII character '\xc2' in file external/mesa/src/util/driconf_static.py on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Fixes: a6b0ceb ("driconf: Generate a static table when no xmlconfig")
Acked-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Marijn Suijten <marijn.suijten@somainline.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9294>
RuneScape ends up hitting this path, and it's easy enough to get
some well-defined behavior instead of a crash.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-By: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9342>
Wx0 and 0xH should result in no-ops in the driver, so they can just
become no-ops before they reach the driver to save some validation later
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9344>
these should never be larger than the fb and drivers shouldn't have to
care about it
Fixes: 1c8bcad81a ("gallium: add pipe cap for scissored clears and pass scissor state to clear() hook")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9344>
A sequence like:
1) create sampler view CSO with UBWC resource
2) later create another sampler view or image view with the same
resource, but a format that triggers demoting the resource to
uncompressed
3) bind CSO created in step #1
would not work correctly, because the CSO created in step #1 is still
setup for UBWC, despite the fact that the resource had been demoted to
uncompressed.
Fortunately this is easy enough to detect, as the resource's seqno would
have been updated.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9321>
this is only tracking memory used by resources referenced in the batch, but it
can be adjusted a bit if we see that we're flushing too often
fixes spec@!opengl 1.1@streaming-texture-leak hogging all system memory and ooming
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9274>
we want to be able to track this so we can check whether a given batch is
going wild with memory usage for resources that might be pending free once
the batch finishes
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9274>
The DS resolve attachment is the destination attachment, it doesn't
need to be decompressed before resolving the depth/stencil attachment.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9256>
There are less copy instructions than sources, so instead of visiting each
source and rewriting it if it's uses a copy instruction, visit each copy
instruction and rewrite it's users.
Besides improving compile time, this also has a side effect of fixing a
rare situation where copy-propagation does not happen:
loop {
a = phi ..., b
c = vec ...
b = mov c.y
}
It might have been the case that a phi source could not be rewritten until
the copy was visited later.
Compile-time (nir_copy_prop):
Difference at 95.0% confidence
-2613.13 +/- 15.2094
-27.4333% +/- 0.150247%
(Student's t, pooled s = 17.963)
Comple-time (overall):
Difference at 95.0% confidence
-2627.89 +/- 201.557
-2.05404% +/- 0.156221%
(Student's t, pooled s = 238.048)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8784>
These were hurting performance of other passes.
Compile-time (overall):
Difference at 95.0% confidence
-5496.3 +/- 219.752
-4.11912% +/- 0.160285%
(Student's t, pooled s = 259.538)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8784>
Midgard writeout arguments need to be written to in the same bundle the
writeout happens. Both csel, csel_v and their float variants also
require their conditional to be performed on the same bundle.
This patch prevents scheduling csel the same bundle as a writeout,
fixing the scheduling issue.
But... there's still room for optimizations since in some cases it might
be possible to fit all these instructions in the same bundle.
No shader-db changes.
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9340>
We need to iterate the whole row, we can't be clever and only look at
one side, the symmetry doesn't work like that. See the original paper.
total instructions in shared programs: 69392 -> 69322 (-0.10%)
instructions in affected programs: 9002 -> 8932 (-0.78%)
helped: 82
HURT: 28
Instructions are helped.
total bundles in shared programs: 32225 -> 32155 (-0.22%)
bundles in affected programs: 4286 -> 4216 (-1.63%)
helped: 82
HURT: 28
Bundles are helped.
total quadwords in shared programs: 56102 -> 56102 (0.00%)
quadwords in affected programs: 0 -> 0
helped: 0
HURT: 0
total registers in shared programs: 4552 -> 4572 (0.44%)
registers in affected programs: 298 -> 318 (6.71%)
helped: 18
HURT: 38
Registers are HURT.
total threads in shared programs: 3772 -> 3775 (0.08%)
threads in affected programs: 84 -> 87 (3.57%)
helped: 15
HURT: 14
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 0 -> 0
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 0 -> 0
fills in affected programs: 0 -> 0
helped: 0
HURT: 0
Fixes: 66ad64d73d ("pan/midgard: Implement linearly-constrained register allocation")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9338>
The number of shader engines isn't always 4.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9307>
In the caller, this error simply gets mapped to VK_ERROR_INIT[...].
Especially for users it is very valuable to know what the driver
tried and what kind of failure occured. Thus just straight out log
to stderr.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9317>
Patch adds a check for mipmap completeness of framebuffer object texture
attachments. Since a glTexImage call might have updated miplevels
meanwhile, we test the completeness before setting framebuffer object
incomplete.
Fixes some upcoming framebuffer completeness CTS tests that explicitly
test case where we have mipmap incomplete non base level texture which
should make also framebuffer object incomplete. After update to the
texture it should make framebuffer object complete again.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8520>
When format modifiers are supported, use
resource_create_with_modifiers instead of resource_create. This
allows radeonsi to set the modifier field, and allows VA-API
clients to have a proper modifier instead of
DRM_FORMAT_MOD_INVALID.
Signed-off-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9308>
We have to reserve at lease 16 program parameters in storage to
avoid its reallocation.
v2: move allocation to `st_deserialise_ir_program` and add helper for that
( Eric Anholt <eric@anholt.net> )
v3 amend comments a bit
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4352
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9282>
When semaphores are not involved, try to batch things up as much as
possible across VkSubmitInfo and also batch command buffers within a
VkSubmitInfo.
v2: Reuse anv_cmd_buffer_is_chainable()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2371>
v2: Fixup crash spotted by Mark about missing alloc vfuncs
v3: Fixup double iteration over device->memory_objects (that ought to
be expensive...) (Ken)
v4: Add more asserts for non-softpin cases (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2371>
We would like to chain multiple primary command buffer to be submitted
together to i915. For prepare this, add end the command buffers with a
MI_BATCH_BUFFER_START and at submit time, replace it with
MI_BATHC_BUFFER_END if needed.
v2: Don't even consider non softpin platforms
v3: Fix inverted condition
v4: Limit is_chainable() to checking device->use_softpin (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2371>
The fixed-func vertex program uses it too, which was ignored. This commit
fixes it.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8850>
This should reduce fixed-func program key recomputations.
I also update the fixed-func fragment program, which was incorrectly
ignored because it's clearly used there.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8850>
This only has to be called in a few places and not in normal draw calls.
egl_image_target_texture doesn't call _mesa_update_pixel because it only
assigns an EGL image to a texture object.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8850>
This commit fixes _mesa_update_color_material, which allows cleaning up
the unnecessary state flagging.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8850>
- execute vbo_set_vertex_format in a separate skipable conditional block
- replace dmul with dmul_shift
- don't check <= VBO_ATTRIB_MAT_BACK_INDEXES because there is no attrib
above that
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8850>
This decreases the CPU time spent in fetch_state.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8850>
This decreases the CPU time spent in fetch_state.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8850>
Just flag _NEW_FF_VERT_PROGRAM where needed. There are only a few places
that must do it.
Also do the same with _NEW_FF_FRAG_PROGRAM, but this is not sufficient
for the ff frag prog to ignore texture states.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8850>
They are always recomputed by _mesa_update_state, which will need the old
values, so that it can update other dependent states if needed.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8850>
This adds _NEW_FF_FRAG_PROGRAM.
_mesa_set_varying_vp_inputs flags both fixed-func programs because both use
the state.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8850>
To my great surprise, many drivers don't use these values at all.
Move the update to the places where they are used.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8850>
Only clip planes and GLSL built-in uniforms use it.
update_projection (called by _mesa_update_state) removes
the _math_matrix_analyse call, reducing the time spent
in _mesa_update_state.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8850>
This eliminates a lot of the remaining no-op fixed-func program key
recomputations in _mesa_update_state.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8850>
_NEW_LIGHT_CONSTANTS: state parameters
_NEW_LIGHT_FF_PROGRAM: keys for fixed-func programs
_NEW_LIGHT_STATE: gallium rasterizer state
This reduces:
- the number of no-op fixed-func program key recomputations
in _mesa_update_state
- the number of no-op rasterizer state updates in st/mesa
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8850>
pop_enable_group calls _mesa_set_enable for every state it changes,
so we don't need do anything else.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8850>
At -O1 with GCC 10.2.1, _nir_visit_dest_indirect (declared ALWAYS_INLINE)
will fail to inline if it's caller (nir_foreach_dest) is not inlined,
because _nir_visit_dest_indirect is passed as a function pointer. This
results in a compilation error.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Witold Baryluk <witold.baryluk@gmail.com>
Fixes: 336bcbacd0 ("nir: inline nir_foreach_{src,dest}")
Tested-by: Witold Baryluk <witold.baryluk@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4353
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9301>
Test a full transformation path (load_uniform -> load_ubo -> load_uniform)
and validate the load_uniform offset.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9305>
The restoring of the acutal uniform offset was wrong.
Fixes: 1837135f7c ("etnaviv: nir: add ubo lowering pass")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9305>
This is a little bit contorted because the Z storage for the tile is
either float or int depending on the Z format, so we have to be careful
about types when comparing.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9287>
... instead of truncating to GLfloat. This seems somewhat silly since
the "clamp" part means only values [0.0, 1.0] are defined, but if the
depth buffer is Z32_UNORM then storing as GLfloat means you lose 8 bits
of depth bounds precision. This happens not to matter, yet, since swrast
classic doesn't support Z32_UNORM for depth, and the software gallium
drivers don't support EXT_depth_bounds_test. But the latter part is
about to change.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9287>
In release builds, there should be no change, but in debug builds the
assert will help us catch undefined behavior resulting from using
util_cpu_caps before it is initialized.
With fix for u_half_test for MSVC from Jesse Natalie squashed in.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9266>
I noticed that we were hitting this before st_create_context() called
util_cpu_detect() and so num_cpu_mask_bits was zero. But there is no
harm in calling util_cpu_detect(), so lets just call it here to be safe.
Fixes: d877451b48 ("util/u_queue: add UTIL_QUEUE_INIT_SET_FULL_THREAD_AFFINITY")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9266>
So that RGP reports the memory type and the memory throughput.
Based on AMDVLK.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9303>
Since value-numbering no longer works across loops, we no longer need to
use v_readfirstlane_b32.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9288>
face
The opcode evaluates tha unnormalized coordinates, the length of the
major axis, and the cube face.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9200>
E.g. r600 a cube texture lookup uses a specific cube instruction
to evaluate the sample coordinates and the face ID, so that the cube
texture lookup can be lowered to a array texture lookup, thereby sharing
the code with the 2D array texture lopkup.
However, for TXD the given gradients still need to be three-component
vectors, so add a flag that the NIR validation knows that we deal with
cube texture that was lowered to an array and can validate accordingly.
v2: Handle new flag in serialization (Marek)
v3: Rebase so that the change does not require the patch to deduct the
number of offset and grad components from sampler type
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9200>
this ends up being a tradeoff where we waste a little startup time and
an extra ~4k memory for the overall screen object in exchange for never having
to fetch format properties again, which is a surprisingly expensive call
to be making as much as we have to make it
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9293>
For a while we were doing 3-space indent with 8-space tabs, largely
due to the emacs settings of a couple of contributors. We stopped
using tabs a long time ago, and they're just a nuisance at this point.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9207>
These had the function name baked into the perf_debug message, which
after a bunch of refactoring, was out of sync with the actual code.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9207>
Allow ctx to be NULL in perf_debug_ctx() and make perf_debug() a
shortcut for perf_debug_ctx(NULL, ...) to simplify things slightly
in the next patch.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9264>
This was meant to be an || rather than &&, although it didn't matter for
shaderdb because both conditions would be true. But it did matter if
you were trying to force synchronous compile to avoid having nir/ir3
prints interleaved from multiple threads.
While at it, add a more specific debug flag to force initial variant
compile to be synchronous, because at some point the 'shaderdb' flag
itself will not force this.
Fixes: 75b0c4b5e1 ("freedreno/ir3: Async shader compile")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9264>
libpanfrost_lib depends on libpanfrost_bifrost for 'bifrost_compile_shader_nir' symbol
libpanfrost_lib depends on libpanfrost_bifrost_disasm for 'disassemble_bifrost' symbol
LOCAL_STATIC_LIBRARIES requires proper ordering to make the symbols available
Fixes the following building error happening with Android P:
FAILED: out/target/product/x86_64/obj/SHARED_LIBRARIES/gallium_dri_intermediates/LINKED/gallium_dri.so
external/mesa/src/panfrost/lib/decode.c:534: error: undefined reference to 'disassemble_bifrost'
external/mesa/src/panfrost/lib/pan_shader.c:145: error: undefined reference to 'bifrost_compile_shader_nir'
Cc: 20.3 21.0 <mesa-stable@lists.freedesktop.org>
Fixes: 166630f ("android: pan/bi: Separate disasm/compiler targets")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9265>
I don't think link_whole works right for VS project generation, but MSVC
doesn't support GCC weak functions anyway, so work around it.
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9121>
We'd like to use one Mesa build environment which builds our CL compiler
stack (which needs Clang/LLVM) and which builds our GL driver. The GL
driver doesn't really need LLVM support, and since we're statically
linking LLVM, removing it from the driver drastically reduces our DLL
size on disk.
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9259>
Need to import client buffer to display drm device, otherwise
get following xserver error log:
[ 190.982] (WW) modeset(0): Page flip failed: No such file or directory
[ 190.982] (EE) modeset(0): present flip failed
With this fix, full screen x11 client can display its window
buffer directly without a copy. Tested on Allwinner H3, 1080p
full screen glxgears go from 163FPS to 173FPS.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9038>
This becomes possible as long as we do
val = s_and_b32/64 exec, val
before any subgroup operations.
This precautional instruction can be removed by the
optimizer if 'val' was computed by a VOPC instruction
using the same exec mask.
Totals from 59 (0.04% of 146267) affected shaders (Navi10):
VGPRs: 2808 -> 2816 (+0.28%)
CodeSize: 340888 -> 340852 (-0.01%); split: -0.20%, +0.19%
Instrs: 61733 -> 61625 (-0.17%); split: -0.18%, +0.01%
Cycles: 470636 -> 469112 (-0.32%); split: -0.33%, +0.01%
VMEM: 8091 -> 7993 (-1.21%)
SMEM: 2736 -> 2719 (-0.62%); split: +0.29%, -0.91%
VClause: 1745 -> 1741 (-0.23%)
SClause: 2394 -> 2392 (-0.08%); split: -0.25%, +0.17%
Copies: 3249 -> 3253 (+0.12%); split: -0.62%, +0.74%
Branches: 1210 -> 1206 (-0.33%)
PreSGPRs: 3126 -> 3176 (+1.60%); split: -0.16%, +1.76%
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9195>
v2: Consider surface height as valid when unused by using 1.
Fixup width boundary checking.
v3 (Karol): Pull in changes from later commits
Fix validation
Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9212>
The 3D one was always wrongly used, also the consumers always wanted the
size, not the levels. This should make it easier to use the interface and
also prevent future bugs like the 3D one.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9212>
v2 (Karol Herbst): extracted from other commit
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Serge Martin <edb@sigluy.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9212>
We don't care how many dimensions the image arg has, so drop it otherwise
we would have to add a lot of variants for arrays, msaa and depth
combinations. Yes, image2d_array_msaa_depth_t is a thing.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Serge Martin <edb@sigluy.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9212>
Scaling the depth bias doesn't seem correct with Vulkan. This is
probably the root cause of the shadow artifacts differences between
RADV and AMDVLK/AMDGPU-PRO.
Fix dEQP-VK.rasterization.depth_bias.d16_unorm.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2217
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9249>
we're going to want to make sure all other resources have been handled
at this point so that we can make some better decisions in this block
based on descriptor usage
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9273>