I noticed it was using getenv directly when I tried to use 'setprop
mesa.tu.debug ..' on android. Use os_get_option() instead so we get
sysprop fallback on android.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15289>
Enables rasterization order attachment access for all pipelines,
see VK_ARM_rasterization_order_attachment_access for details.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15262>
Trivially implemented by using A6XX_GRAS_SC_CNTL_SINGLE_PRIM_MODE.
This extension is useful for emulators e.g. AetherSX2 PS2 emulator and
could drastically improve performance when blending is emulated.
Relevant tests:
dEQP-VK.rasterization.rasterization_order_attachment_access.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15106>
I chose to implement this as a global flag in the device, because
otherwise we would end up with extra draw overhead trying to avoid it in
the implicit-sync WSI case, and you're probably going to end up needing
implicit sync anyway because you used one of the BOs in any of the
submitted cmdbufs. To do better than this, we would probably want a
skip-implicit-sync flag on the BOs in the BO list, rather than global on
the submit.
Reports about venus on turnip say that this flag reduces worst-case
QueueSubmit time in a game workload from ~10ms to ~4ms.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14838>
Until now we have lived without a refcount mechanism in the driver
because in Vulkan the user is responsible for handling the life
span of memory allocations for all Vulkan objects, however,
imported BOs are tricky because the kernel doesn't refcount
so user-space needs to make sure that:
1. When importing a BO into the same device used to create it
(self-importing) it does not double free the same BO.
2. Frees imported BOs that were not allocated through the same
device.
Our initial implementation always freed BOs when requested,
so we handled 2) correctly but not 1) on drm and we would
double-free self-imported BOs because kernel doesn't return
a unique gem_handle on each import.
Beside this the submit ioctl checks for duplicates in the
BO list and returns an error if there is one.
This fixes the problem for good by adding refcounts to BOs
so that self-imported BOs have a refcnt > 1 and are only freed
when all references are freed.
KGSL on the other hand does not have the same problems,
at least not with ION buffers which are used for exportable
BOs on pre 5.10 android kernels.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5936
Fixes CTS tests: dEQP-VK.drm_format_modifiers.export_import.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15031>
Both Venus and Android AHB requires this extension.
Turnip ignores VK_SHARING_MODE_EXCLUSIVE so this is a no-op.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Acked-by: Rob Clark <robdclark@chromium.org>
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14836>
Graphics Flight Recorder is:
"The Graphics Flight Recorder (GFR) is a Vulkan layer to help
trackdown and identify the cause of GPU hangs and crashes.
It works by instrumenting command buffers with completion tags."
This is a nice little tool which could help quickly identify the call
which hanged. Or if command buffer is executed for too long.
The tiling nature of our GPU shouldn't be a big issue aside from
lower performance.
For non-segfault case, if:
- Hang happens at the same place in cmdbuf and draw/dispatch is not
finished at that point - it is likely that there is an infinite
loop in some of the shaders in this draw.
- Hang happens always in different place - likely there is nothing
wrong and command buffer just takes too long to execute and you
should try increasing hangcheck_period_ms. If it doesn't help
it is likely a synchronization issue.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13553>
This is entirely implemented in the SPIR-V frontend.
Relevant CTS tests:
dEQP-VK.spirv_assembly.instruction.compute.non_semantic_info.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14829>
Moved nir_lower_compute_system_values to lower
load_local_invocation_index which could be emitted by
nir_zero_initialize_shared_memory.
Relevant CTS tests:
dEQP-VK.compute.zero_initialize_workgroup_memory.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14829>
VK_EXT_image_robustness is a strict subset of VK_EXT_robustness2
so we could just expose it.
Relevant CTS tests: dEQP-VK.robustness.image_robustness.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14829>
We recently had a bug of forgeting to add the buf->bo_offset. Just make
the easiest field to get be the bo->iova + buf->bo_offset already. Plus,
a little less work at emit time.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14816>
The implementation is separate from Freedreno due to multithreading
support.
In Vulkan application may fill command buffer from many threads
and expect no locking to occur. We do introduce the possibility of
locking on renderpass end, however assuming that application
doesn't have a huge amount of slightly different renderpasses,
there would be minimal to none contention.
Other assumptions are:
- Application does submit command buffers soon after their creation.
Breaking the above may lead to some decrease in performance or
autotuner turning itself off.
The heuristic is too simplistic at the moment, to find a proper
one - we should run a bunch of traces with sysmem and gmem, and
build better heuristic from gathered data.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12128>
Fixes the case when last cmd buffer in submission doesn't have
tracepoints leading to flush data not being freed.
Added a few comments, renamed things, refactored allocations - now
the data flow should be a bit more clean.
Extracted submission data creation into tu_u_trace_submission_data_create
which would be later used in in tu_kgsl.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14391>
With the proper version checking in the common vulkan instance code
(commit 88b9b68) it is now possible to bring the reported interface
version up to v5.
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14563>
Define struct tu_timeline_sync for emulated timeline support in common
implementation that is on top of drm syncobj as a binary sync.
Also implement init/finish/reset/wait_many methods for the struct.
v1. Does not set MSM_SUBMIT_SYNCOBJ_RESET for waiting syncobjs since
it's being managed in the common implementation already.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14105>
This patch ports to common code for VkSemaphore, VkFence and relevant
APIs like vkCreate(Destroy)Semaphore/Fence, vkGetSemaphoreFdKHR, etc.
Accordingly, starts using common vkQueueSubmit with implementing
driver-specific hook.
Also remove all timeline semaphore codes so that we could use common
code in the following patches. This way we could easily see what's
modified in the following patch.
Note that kgsl is not ported in this patch.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14105>
- gen4 - has dp4acc and dp2acc, dp4acc is used to implement
4x8 dot product.
- gen3 - has dp2acc, in OpenCL blob uses dp2acc for dot product
on both get3 and gen4.
- gen2 - unknown, lower everything.
- gen1 - no dp2acc, lower everything. OpenCL blob doesn't advertise
cl_qcom_dot_product8 but still generates code for it.
The assembly is more verbose and uses yet to be documented
mad32.u16 instruction.
Passes:
dEQP-VK.spirv_assembly.instruction.compute.opsdotkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opudotkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opsudotkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opsdotaccsatkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opudotaccsatkhr.*
dEQP-VK.spirv_assembly.instruction.compute.opsudotaccsatkhr.*
Only packed 4x8 unsigned and mixed versions are accelerated.
However in theory we should be able to do better for signed version
than current NIR lowering.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>
In order to capture the timestamp when things actually end on Intel
GPU HW, we need to know whether the timestamp should be capture at the
top or end of pipeline.
v2: use one line python if/else (Danylo)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13911>
We don't advertise bufferDeviceAddressCaptureReplay capability and
neither does blob, because at the moment there is no way to allocate
bo with predefined iova.
There is no support of any arithmetic with addresses since shaderInt64
is not enabled. However, we could enable int64 support whenever we want.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8717>
By default line mode is VK_LINE_RASTERIZATION_MODE_RECTANGULAR_EXT,
when lineRasterizationMode is VK_LINE_RASTERIZATION_MODE_BRESENHAM_EXT
and primtype is line - we enable bresenham line mode.
We have to disable MSAA when bresenham lines are used, this is
a hardware limitation and spec allows it:
"When Bresenham lines are being rasterized, sample locations may
all be treated as being at the pixel center (this may affect
attribute and depth interpolation)."
This forces us to re-emit msaa state when line mode is changed.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6020>
This is mostly a matter of auditing uses of
cmd->state.framebuffer and replacing every use of fb->attachments with
cmd->state.attachments. We already weren't using the attachments
anywhere outside of the render pass, so this is pretty straightforward.
We also don't have any use for anything in
VkFramebufferAttachmentImageInfo so we can just ignore it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13228>
Tiles on the edge may cross maximum viewport size, we have to clamp
per-tile scissor to maximum allowed viewport dimensions.
Add MAX_VIEWPORT_SIZE constant along the way.
Fixes vkd3d test "test_draw_uav_only"
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13197>
We add "Turnip" so that users (and vulkan.gpuinfo.org) can distinguish us
without requiring VK_KHR_driver_properties. This will be a lot more
user-friendly than "FD618", though.
I made some little vk_asprintf helpers, because I figure other drivers
setting up deviceName's will want them too.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13041>
This is already handled by vk_device_init(); drivers no longer
need to do it themselves.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12867>
The common code fails dEQP-VK.wsi.display_control.register_device_event
due to having a stub NOT_IMPLEMENTED return, and thus fails the CTS. This
is one of our last failures, so disable the extension until it can get
finished off, so we can unblock passing the CTS.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13010>
Switch to using common structure.
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13000>
The limitation is that the reusable command buffer should be created
when perfetto is already connected in order to write timestamps.
Otherwise such cmd buffer won't be traces.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10969>
This extension relaxes the alignment requirements to allow the GL std430
layout to be used. freedreno/ir3 already supports this (via
PIPE_CAP_LOAD_CONSTBUF).
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12495>
Two related changes:
- in tu_device.c:tu_CreateDevice we need to free both pointers in the
teardown path after tu_bo_finish(global_bo), which uses the pointers.
They are allocated in the first call to tu_bo_init(), which happens
when global_bo is allocated.
- in tu_drm.c:tu_bo_init we need to free bo_list if the bo_idx
allocation fails. Convert to the goto teardown pattern as well.
Fixes the following dEQP-VK tests:
dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail
dEQP-VK.api.object_management.alloc_callback_fail.device
dEQP-VK.api.object_management.alloc_callback_fail.device_group
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12481>
Move away from using gpu_id as the primary means to identify which
adreno we are running on, as future GPUs (starting with 7c3) stop
providing a gpu_id as a new naming scheme is introduced.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12159>
... and reduce maxDescriptorSetUpdateAfterBindStorageBuffersDynamic from 12 to
8.
MAX_DYNAMIC_BUFFERS is MAX_DYNAMIC_UNIFORM_BUFFERS +
MAX_DYNAMIC_STORAGE_BUFFERS. We set
maxDescriptorSetUniformBuffersDynamic = MAX_DYNAMIC_UNIFORM_BUFFERS
maxDescriptorSetStorageBuffersDynamic = MAX_DYNAMIC_STORAGE_BUFFERS
maxDescriptorSetUpdateAfterBindUniformBuffersDynamic = MAX_DYNAMIC_BUFFERS / 2
maxDescriptorSetUpdateAfterBindStorageBuffersDynamic = MAX_DYNAMIC_BUFFERS / 2
The CTS test checks that
maxDescriptorSetUpdateAfterBindUniformBuffersDynamic
- is at least 8; and
- is at least maxDescriptorSetUniformBuffersDynamic
maxDescriptorSetUpdateAfterBindStorageBuffersDynamic
- is at least 4; and
- and is at least maxDescriptorSetStorageBuffersDynamic
Prior to this patch maxDescriptorSetUpdateAfterBindUniformBuffersDynamic was 12
but maxDescriptorSetUniformBuffersDynamic was 16, thus causing the CTS failure
in
dEQP-VK.api.info.vulkan1p2_limits_validation.ext_descriptor_indexing
By raising maxDescriptorSetUpdateAfterBindUniformBuffersDynamic to the same
value as maxDescriptorSetUniformBuffersDynamic, we bring the limits into the
appropriate ranges. We do the same thing for
maxDescriptorSetUpdateAfterBindStorageBuffersDynamic by assigning it the same
value as maxDescriptorSetStorageBuffersDynamic.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12193>
Blob advertises { 1024, 1024, 64 }, but from tests they all
could be 1024.
Fixes tests:
dEQP-VK.compute.basic.max_local_size_x
dEQP-VK.compute.basic.max_local_size_y
dEQP-VK.compute.basic.max_local_size_z
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9409>
This is much more maintainable, extensible, and easy to read than
hand-rolled structs approximating assembly. This also removes the last
use of the old hand-written packing structs. There are a few minor
differences:
- The shaders are larger because ir3 currently doesn't support (rpt),
which means that some shaders are larger than one instrlen and the
current logic has to be extended to allow for that. This seems a small
price to pay, ir3 will gain support for (rpt) eventually, and we
shouldn't have limitations like this baked in anyway. For example some
GL blob r8g8 <-> r16 copy shaders are apparently quite large.
- Due to the inability to switch inputs/outputs on the fly, we need to
split the VS into two variants. I made the layer-writing variant also
used for other clears, because the old method of overloading c0.z/c1.z
to mean both "src x coordinate" and "z clear value" in the same shader
seemed too clever and I didn't want to add yet another variant. This
means that non-layered clears will also write the layer (to 0), but
that shouldn't be a big deal performance-wise.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12079>
v1. Hyunjun Ko <zzoon@igalia.com>
- Add to hanlde VK_DESCRIPTOR_POOL_CREATE_HOST_ONLY_BIT_VALVE
- Don't support VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT and
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER
v2. Hyunjun Ko <zzoon@igalia.com>
- Fix some indentations and nitpicks.
- Add the extension to features.txt
v3. Hyunjun Ko <zzoon@igalia.com>
- Remove unnecessary asserts.
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9322>
This way we can make the tables const. At the same time, for a6xx, this
introduces a "sub-generation template" to reduce the copy/paste for
parameters which are keyed to the sub-generation. It also explicitly
lists every supported GPU, to get rid of duplicate lists of supported
gpus between the device-info and drivers.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
Replace it with a calculation which works for all current GPUs.
Duplicated the calculation in both drivers because freedreno_dev_info isn't
meant for derived parameters (and drivers might want to just calculate on
the fly instead).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11790>
There are some cases using kgsl backend on linux that is still not usual
setup though, we need to consider too.
Regarding the timeline semaphore feature, we could implement it for
the kgsl backend in the future, and probalby it should be using the
existing code in tu_drm.
See #4738, #4907
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11488>
Move and rename warn_non_conformant_implementation() to common location
of src/vulkan/util/vk_util.c as vk_warn_non_conformant_implementation().
In freedreno/ci, move MESA_VK_IGNORE_CONFORMANCE_WARNING to common
location of .baremetal-deqp-test-freedreno-vk.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11563>
tu_kgsl.c: tu_enumerate_devices closed fd previously closed by
tu_physical_device_init function.
Move out the fd closing from tu_physical_device_init function because
they do not belong to it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11561>
GPU won't be able to write to such BOs, which would to useful for
cmdstream BOs.
Move "bool dump" to the new flags along the way.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10807>
The spec requires at least 2, but says "No specific guarantees are made
about higher priority queues receiving more processing time or better
quality of service than lower priority queues." So, we can just leave the
priorities as a stub.
Fixes dEQP-VK.info.device_properties
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10470>