Fixes memory leak with dependencies array:
==5224== 104 (96 direct, 8 indirect) bytes in 3 blocks are definitely lost in loss record 1,954 of 2,035
==5224== at 0x484178A: malloc (vg_replace_malloc.c:380)
==5224== by 0x484670B: realloc (vg_replace_malloc.c:1437)
==5224== by 0x14DBAB9B: update_bo_syncobjs (iris_batch.c:819)
==5224== by 0x14DBADB8: update_batch_syncobjs (iris_batch.c:898)
==5224== by 0x14DBB3D5: _iris_batch_flush (iris_batch.c:1031)
==5224== by 0x14DB77D0: iris_transfer_map (iris_resource.c:2348)
==5224== by 0x157786FD: u_transfer_helper_transfer_map (u_transfer_helper.c:243)
==5224== by 0x14C479E7: tc_buffer_map (u_threaded_context.c:2252)
==5224== by 0x1434F3F8: pipe_buffer_map_range (u_inlines.h:393)
==5224== by 0x1435094A: _mesa_bufferobj_map_range (bufferobj.c:491)
==5224== by 0x143586D9: map_buffer_range (bufferobj.c:3737)
==5224== by 0x14358DA3: _mesa_MapBuffer (bufferobj.c:3947)
==5224== 240 (192 direct, 48 indirect) bytes in 6 blocks are definitely lost in loss record 1,984 of 2,035
==5224== at 0x484178A: malloc (vg_replace_malloc.c:380)
==5224== by 0x484670B: realloc (vg_replace_malloc.c:1437)
==5224== by 0x14DBAB9B: update_bo_syncobjs (iris_batch.c:819)
==5224== by 0x14DBADB8: update_batch_syncobjs (iris_batch.c:898)
==5224== by 0x14DBB3D5: _iris_batch_flush (iris_batch.c:1031)
==5224== by 0x14FF72CC: iris_get_query_result (iris_query.c:631)
==5224== by 0x14C4396A: tc_get_query_result (u_threaded_context.c:880)
==5224== by 0x1458F4F7: get_query_result (st_cb_queryobj.c:273)
==5224== by 0x1458F7EB: st_WaitQuery (st_cb_queryobj.c:352)
==5224== by 0x144EFF66: get_query_object (queryobj.c:742)
==5224== by 0x144F01AE: _mesa_GetQueryObjectuiv (queryobj.c:811)
And leak with syncobjs:
==13644== 8 bytes in 1 blocks are definitely lost in loss record 1 of 1,846
==13644== at 0x484186F: malloc (vg_replace_malloc.c:381)
==13644== by 0x639789B: iris_create_syncobj (iris_fence.c:69)
==13644== by 0x63B213A: iris_batch_reset (iris_batch.c:512)
==13644== by 0x63B3637: _iris_batch_flush (iris_batch.c:1056)
==13644== by 0x65EF2BC: iris_get_query_result (iris_query.c:631)
==13644== by 0x623B970: tc_get_query_result (u_threaded_context.c:880)
==13644== by 0x5B874F7: get_query_result (st_cb_queryobj.c:273)
==13644== by 0x5B877EB: st_WaitQuery (st_cb_queryobj.c:352)
==13644== by 0x5AE7F66: get_query_object (queryobj.c:742)
==13644== by 0x5AE8150: _mesa_GetQueryObjectiv (queryobj.c:801)
Fixes: ce2e2296ab ("iris: Suballocate BO using the Gallium pb_slab mechanism")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14387>
Cutting the extra VK mustpass files is 315MB out of 1.5GB of the amd64
rootfs. pip was 10MB. The rustup toolchains were massive (over a GB
IIRC) on the x86 container images.
Hopefully helps with #5837
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14460>
Use header->header_size to offset cache data as well in case the header
struct extends on a newer driver but the cache data was appended with
an old header.
Fixes: 723f0bf74a ("venus: initial support for module and pipelines")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14463>
If we have a compute shader that has a big workgroup, a barrier, and
a branchstack which limits max_waves - this may result in a situation
when we cannot run concurrently all waves of the workgroup, which
would lead to a hang.
Blob just explodes in such case.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14110>
If barriers are used, it must be possible for all waves in the workgroup
to execute concurrently. Thus we may have to reduce the registers limit.
Fixes a hang in "Digital Combat Simulator".
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14110>
Because the extend_cb vfunc is not initialized, there is a risk that
the emission code calls into a random pointer.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14418>
This is used by virglrenderer to create the correct shaders on the
host. Fixes:
dEQP-GLES31.functional.primitive_bounding_box.triangles.tessellation_set_per_primitive.vertex_tessellation_fragment.fbo
when using ntt with virgl.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14423>
Ultimately this is consumed by nir-to-tgsi and needed by virglrenderer
to correctly declare output variables.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14423>
By putting vertex store and indices all in one buffer the larger part
of the shared buffer might actually only be vertex data we are not
interested in. Hence only map the part of the buffer that contains the
index data for the currently active draw command.
This helps drivers where a mapping operation is expensive, like e.g. virgl.
v2: - add comment about ranged buffer mapping (Pierre-Eric)
- keep passing direct_draws[i].start to direct_draw_func, it looks
like the "start" parameter is properly set in
util_prim_restart_convert_to_direct
v3: Fix ws error (Mike)
Related: #5825
Fixes: f9d12bf50e
vbo/dlist: use a single buffer object
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14423>
This is the minimum required by the spec.
Fixes dEQP-VK.api.info.vulkan1p2_limits_validation.nv_mesh_shader
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14446>
The blob uses *both* nops and (ss). It turns out that in some rare cases
the hardware does take more than 6 cycles, at least for movmsk, but
adding nops is unnecessary. I believe the extra nops are only there due
to the immaturity of the blob's implementation of subgroup ops, so we
don't have to copy them - just handle shared reg producers the same as
SFU instructions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>
This now covers e.g. cat6 instructions as well, and ss will cover
instructions writing shared regs as well. This is split out from the
previous change to avoid too much churn and shouldn't cause any
functional changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>
Add new centralized functions which will replace the various places we
hardcode 10 for the number of (ss) nops, add numbers for soft (sy) nops
based on similar computerator experiments with ldc, sam, and ldib (the
most common (sy) producers), and add a "systall" metric which is
analogous to sstall. This also fixes some cases where we'd erroniously
count ldl* as (sy) producers instead of (ss) producers when calculating
sstall.
This only switches over the metric reporting to the new functions, so
there is no behavior change. The following commit will switch over
the rest of the compiler.
While we're at it, remove max_sun as it's never set.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>
After some experimentation with computerator, it seems on a618 that
writing a full register and then reading half of it as a half register
requires a delay of 6, the same as the delay for cat5/cat6 sources. The
other direction only has a delay of 5, but just bump it unconditionally
out of an abundance of caution.
Fixes: 890de1a436 ("ir3/delay: Fix full->half and half->full delay")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>
If we're allocating a source then we force is_killed to false, not to
true. Fixes a regression in
dEQP-GLES31.functional.synchronization.in_invocation.image_atomic_write_read
later.
Fixes: 0ffcb19b9d ("ir3: Rewrite register allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14246>
Was getting ASAN errors in CI when trying to add ANV to the
debian-testing job:
==10993==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 4194304 byte(s) in 64 object(s) allocated from:
#0 0x7f763c1bda3c in __interceptor_posix_memalign ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:226
#1 0x55f43d28627f in os_malloc_aligned ../src/util/os_memory_aligned.h:58
#2 0x55f43d28627f in _util_sparse_array_node_alloc ../src/util/sparse_array.c:107
#3 0x55f43d28627f in util_sparse_array_get ../src/util/sparse_array.c:143
#4 0x55f43d1fdaba in anv_device_lookup_bo ../src/intel/vulkan/anv_private.h:1335
#5 0x55f43d1fdaba in anv_device_import_bo_from_host_ptr ../src/intel/vulkan/anv_allocator.c:1843
#6 0x55f43d1ff571 in anv_block_pool_expand_range ../src/intel/vulkan/anv_allocator.c:534
#7 0x55f43d1ffcb5 in anv_block_pool_init ../src/intel/vulkan/anv_allocator.c:417
#8 0x55f43d18f082 in run_test ../src/intel/vulkan/tests/block_pool_no_free.c:123
#9 0x55f43d1862b6 in main ../src/intel/vulkan/tests/block_pool_no_free.c:152
#10 0x7f763b942d09 in __libc_start_main ../csu/libc-start.c:308
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14121>
Fixes piglit test shaders@ssa@fs-if-def-else-break.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12892>
When we lower SPIR-V to NIR for textures in vtn_handle_texture, we only
bump the number of coordinate components when the op is not a lod query.
Update the assert to take this into account.
This fixes:
- dEQP-VK.robustness.robustness2.bind.template.r32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.r32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.r32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.r32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.r32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.r32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rg32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.bind.template.rgba32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.r32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rg32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32f.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32f.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32i.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32i.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32ui.dontunroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
- dEQP-VK.robustness.robustness2.push.notemplate.rgba32ui.unroll.nonvolatile.sampled_image.no_fmt_qual.null_descriptor.samples_1.cube_array.frag
Fixes: 231337a1 ("intel/fs/xehp: Assert that the compiler is sending all 3 coords for cubemaps.")
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13925>
That's how the TGSI math opcodes work.
This lets lower_vec_to_regs coalesce the DP output into the .yzw channels,
giving an impressive shader-db win on softpipe:
total instructions in shared programs: 2929840 -> 2794036 (-4.64%)
instructions in affected programs: 1651438 -> 1515634 (-8.22%)
total temps in shared programs: 372730 -> 332744 (-10.73%)
temps in affected programs: 118151 -> 78165 (-33.84%)
and a minor one on r300:
total instructions in shared programs: 51238 -> 51149 (-0.17%)
instructions in affected programs: 2621 -> 2532 (-3.40%)
total vinst in shared programs: 15655 -> 15618 (-0.24%)
vinst in affected programs: 468 -> 431 (-7.91%)
total temps in shared programs: 9838 -> 9828 (-0.10%)
temps in affected programs: 59 -> 49 (-16.95%)
and a bigger one on i915g:
total instructions in shared programs: 398064 -> 395901 (-0.54%)
instructions in affected programs: 29271 -> 27108 (-7.39%)
total tex_indirect in shared programs: 12261 -> 12233 (-0.23%)
tex_indirect in affected programs: 98 -> 70 (-28.57%)
LOST: 0
GAINED: 5
The r300 change is less impressive because it does some backend copy-prop,
but also because intermediate storage of DPs now takes a vec4 instead of a
scalar.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14200>
Note that this causes a geometry slice to be disabled if any DSS is
fused off within that slice, which may seem stricter than the BSpec
quotation implies, but testing shows that pixel pipes with any faulted
DSS don't work at all, and that using a slice with any faulted pixel
pipe leads to serious graphics corruption.
It would be better to query this geometry topology information from
the hardware instead of trying to reconstruct it here, but the kernel
interface for that is not available yet.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14436>
An if that looks like:
if (x) { } else { }
That has no phis following it is dead. Currently these are only
removed by peephole select, but that means that 'x' is considered
used until that pass is run, which can make it difficult to apply
sane lowering in the case where loading 'x' requires complex or
expensive transformations, but 'x' is *really* unused.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14400>
Note that currently there's no emulation if the D3D12 driver doesn't
support the "UAV typed load" feature for all of the GL required formats.
This is not a required D3D12 feature, so this support won't light up
on all D3D12 hardware.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14342>