My editor likes to enforce pep8, here's some low hanging fruit so I don't
have to do too much add -p.
Acked-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10505>
nir_assign_io_var_locations() does not use outputs_written when
assigning driver locations. Use driver_location to avoid incorrectly
guessing what locations it assigned.
Copied from lavapipe 8731a1beb7
Will fix provoking vertex tf tests when VK_EXT_provoking_vertex
would be enabled:
dEQP-VK.rasterization.provoking_vertex.transform_feedback.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11111>
Previously, iris_bufmgr had the ability to maintain multiple
simultaneous memory mappings for a BO, one in WB mode (with CPU caches),
and another in WC (streaming) mode. Depending on the flags passed to
iris_bo_map(), we would select one mode or the other.
The rules for deciding which to use were:
- Systems with LLC always use WB mode because it's basically free
- Non-LLC systems used...
- WB maps for all BOs where snooping is enabled (which translates to
when BO_ALLOC_COHERENT is set at allocation time)
- WB maps for reads unless persistent, coherent, async, or raw.
- WC maps for everything else.
This patch simplifies the system by selecting a single mmap mode at
BO allocation time, and always using that. Each BO now has at most one
map at a time, rather than up to two (or three before we deleted GTT
map support in recent patches).
In practical terms, this eliminates the capability to use WB maps for
reads of non-snooped BOs on non-LLC systems. Such reads would now be
slow, uncached reads. However, iris_transfer_map recently began using
staging blits for such reads - so the GPU copies the data to a snooped
buffer which will be mapped WB. So, rather than incurring slow UC
reads, we really just take the hit of a blit, and retain fast reads.
The rest of the rules remain the same.
There are a few reasons for this:
1. TTM doesn't support mapping an object as both WB and WC. The
cacheability is treated as a property of the object, not the map.
The kernel is moving to use TTM as part of adding discrete local
memory support. So it makes sense to centralize on that model.
2. Mapping the same BO as both WB and WC is impossible to support on
some CPUs. It works on x86/x86_64, which was fine for integrated
GPUs, but it may become an issue for discrete graphics paired with
other CPUs (should that ever be a thing we need to support).
3. It's overall much simpler. We only have one bo->map field, and
manage to drop a significant amount of boilerplate.
One issue that arises is the interaction with the BO cache: BOs with
WB maps and WC maps will be lumped together into the same cache. This
means that a cached BO may have the wrong mmap mode. We check that,
and if it doesn't match, we unmap it, waiting until iris_bo_map is
called to restore one with the desired mode. This may underutilize
cache mappings slightly on non-LLC systems, but I don't expect it to
have a large impact.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4747
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10941>
In the bad old days, i965 used GTT mapping for detiling maps. iris
never has, however. The only reason it used GTT maps was in weird
fallback cases for dealing with BO imports from foreign memory. We
now do staging blits for those, and never mmap them.
There are no more users of GTT mapping, so we can delete it.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10941>
XXX: This is actually wrong. The dmabuf imported case can be mapped via
GEM_MMAP_GTT if the iommu is working, according to Joonas, but GEM_MMAP
would fall over and fail. So we would need this fallback.
ALTERNATIVELY...we would need to flag such imported dmabufs as
unmappable, and then make iris_transfer_map/unmap always do blits
instead of direct mappings. That seems like the saner approach
We never want to use GEM_MMAP_GTT, as it does detiling maps, and iris
always wants direct maps. There were originally two cases that this
fallback path was attempting to handle:
1. The BO was allocated from stolen memory that we can't GEM_MMAP.
At one point, kernel patches were being proposed to use stolen
memory for userspace buffers, but these never landed. The kernel
has never given us stolen memory, so we cannot hit this case.
2. Imported objects may be from memory we can't GEM_MMAP.
For example, a DMABUF from a discrete AMD/NVIDIA GPU in a PRIME
setup would be backed by memory that we can't GEM_MMAP. We could
try and mmap these directly with GEM_MMAP_GTT, but that relies on
the IOMMU working. We could mmap the DMABUF fd directly (but have
never tried to do so), but there are complex rules there. Instead,
we now flag those imports, however, and rely on the iris_transfer_map
code to perform staging blits on the GPU, so we never even try to
map them directly. So this case won't reach us here any longer.
With both of those out of the way, there is no need for a fallback.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10941>
iris has never relied on detiled maps using hardware fences.
This code is a remnant of i965, where that was actually used.
We can just assert that callers don't do such a thing.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10941>
Direct mappings of imported DMABUFs can be tricky. If they're allocated
from our own device, then we can probably mmap them and it'd be fine.
But they may come from a different device (such as a discrete GPU), in
which case I915_GEM_MMAP wouldn't work, I915_GEM_MMAP_GTT would require
a working IOMMU, and directly mmap'ing the DMABUF fd would come with a
bunch of rules and restrictions which are hard to get right.
CPU mapping an imported DMABUF image for writes seems very uncommon,
solidly in the "what are you even doing?" realm. Mapping an imported
DMABUF for reading might be a thing, in case someone wanted to do
glReadPixels on it. But in that case, the cost of doing a staging
blit is probably acceptable.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10941>
If we're doing CPU reads of a resource that doesn't have CPU caches
enabled for the mapping (say, in device local memory, or WC mapped),
then blit it to a temporary that does have those caches enabled.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10941>
Not all external objects are the same. Imported buffers may be from
other devices (say a dmabuf from an AMD or NVIDIA discrete card) which
are backed by memory that we can't use with I915_GEM_MMAP. However,
exported buffers are ones that we know we allocated ourselves from our
own device. We may not know what other clients are doing with them,
but we can assume a bit more about where they came from.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10941>
I'd like to start tracking "imported" vs. "exported" for objects,
rather than a blanket "external" flag. Instead of directly checking
bo->external, use a new helper that will eventually be "imported or
exported".
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10941>
We basically tried this, and it performed worse, so delete the
suggestion in the comments that we may want to do it someday.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10941>
In some cases, we have to map directly (e.g. coherent/persistent maps).
In other cases (e.g. tiled), we /cannot/ map directly. We should put
the code which adds the PIPE_MAP_DIRECTLY flag in mandatory cases before
the "bail and return NULL" check for cases where we can't do that.
We leave the "we would prefer to direct map this" cases after the error
check, since we -can- use blits for those, we'd just rather not. ASTC
also stays because even though it's tiled, our tiled memcpy paths work.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10941>
Here, we're deciding when to map the buffer directly, rather than using
the GPU to blit to/from a temporary. There is already a flag for that,
PIPE_MAP_DIRECTLY, which has the added benefit of not being a negative
(such as "no_gpu").
Currently, we intend to map directly if:
1. Direct mappings were requested explicitly
2. Persistent or coherent mappings were requested (and so we must)
3. ASTC textures (we currently can't blit those correctly)
4. There is no need for a temporary (there's no image compression that
the CPU wouldn't understand, and we don't need to avoid stalls due
to the buffer being busy on the GPU)
Expressing "please memory map this directly" is easier to follow than
"please don't use the GPU as part of mapping this".
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10941>
hang-detection is a vulkan-based lightweight wrapper from
parallel-deqp-runner that periodically submits empty command buffers
and waits for their completions. If the completion never happens, the
GPU is considered hung, the wrapped script is killed, and the job
should get aborted.
This should have no negative impact on the runtime of dEQP/traces/...,
but will allow saving time when the GPU gets hung as we can abort the
job immediately rather than waiting for the timeout.
In the case of B2C, we are using this tool's error message as a way to
trigger the reboot of the test machine and start again.
v2:
- Use hang-detection already with some jobs (Martin).
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Martin Peres <martin.peres@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11087>
Otherwise it will store a pointer to already unmapped memory which
could lead to a crash in tu_CmdPushDescriptorSetWithTemplateKHR since
it tries to copy data from the old memory.
Fixes a crash with Zink's new lazy descriptor manager instroduced
in bfdd1d8d
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11137>
accessing this variable repeatedly like this is a contended hotpath somehow,
so instead just create a const for reference
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11124>
this is an extreme hotpath, so having a single calculation in a const
variable is slightly better for compiler microoptimizing
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11124>
It seems something went wrong during conversion of this article (or
maybe even before in the HTML version), where every header after the
"Gallium environment variables" header was nested below it.
That's clearly not what's meant here, so let's fix that.
This makes the toctree make a bit more sense for this article.
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11106>
all color outputs need to be rewritten with coverage, not just FragColor
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11065>
This provides most of the implementation, but there are some
things we cannot enable until we improve of kernel submit
interface, namely:
We don't expose capacity to export SYNC_FD, although we do
have the implementation in place. This requires that we
improve our kernel interface and event wait implementation
first so we can cover the corner case where the application
submits a command buffer that includes a VkCmdWaitForEvents
and tries to export a SYNC_FD from its signal semaphores or
fence before it the event is signaled and the command buffer
is sent to the kernel for execution in full.
Likewise, we can't currently import semaphores. This is because
our current kernel submit interface can only take one syncobj.
We have been working around this so far by waiting on the last
syncobj produced from the device whenever we had to wait on any
semaphores (which is obviously suboptimal already), but this
won't work as soon as we allow importing external semaphores,
as those could (and would typically) be produced from a
different device.
Once we address the kernel bits, we should come back and enable
SYNC_FD exports as well as semaphore imports.
Relevant CTS tests:
dEQP-VK.api.external.fence.*
dEQP-VK.api.external.semaphore.*
dEQP-VK.synchronization.cross_instance.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11105>
We can (and should) close the descriptor immediately after the import.
Gets the following CTS test to pass without requiring to increase limits
for open file descriptors:
dEQP-VK.synchronization.basic.binary_semaphore.chain
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11105>
Fixes the following building error:
clang: error: no such file or directory: 'external/mesa/src/mesa/drivers/dri/i965/brw_ff_gs_emit.c'
clang: error: no input files
Fixes: 897bcc1e6b ("i965: drop old brw ff gs code.")
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10718>
Fixes the following building error:
FAILED: out/target/product/x86_64/obj_x86/SHARED_LIBRARIES/i965_dri_intermediates/LINKED/i965_dri.so
...
ld.lld: error: undefined symbol: brw_compile_ff_gs_prog
>>> referenced by brw_ff_gs.c:56 (external/mesa/src/mesa/drivers/dri/i965/brw_ff_gs.c:56)
Fixes: 52e426fd8b ("intel/compiler: add support for compiling fixed function gs")
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10718>
When we remove the contents of the results directory, we `cd` into it.
The script expects that $PWD is /piglit, and $OLDPWD is the Mesa build
directory, however the cd into the results directory will make $OLDPWD
be $BUILDDIR/results.
This means that Piglit emits into results/results/ which looks weird,
but more importantly also fails OpenCL Piglit execution, because we
can't find our baseline result expectations.
Fix it by using an explicit variable rather than relying on history.
Fixes: 683ddf19dc ("ci: remove results directory content only with piglit runners")
Ref: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10856
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Martin Peres <martin.peres@mupuf.org>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11126>
So we don't need to provision aarch64 servers, which are these days
rarer than x8_64.
In the switch to the new runner tags, switch to one which contains the
device type, so we can dimension the runner jobs taking into account the
number of DUTs available.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11108>
This will get us build coverage of a bunch of Vulkan features, plus the
ELF TLS support.
Reviewed-by: Roman Stratiienko <r.stratiienko@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10389>
Android 29 introduced general-dynamic TLS variable support ("quick
function call to look up the location of the dynamically allocated
storage"), while Mesa on normal Linux has used initial-exec ("use some of
the startup-time fixed slots, hope for the best!"). Both would be better
options than falling all the way back to pthread_getspecific(), which is
the alternative we have pre-29.
Reviewed-by: Roman Stratiienko <r.stratiienko@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10389>
I'm going to add another case for Android shortly, and then we can keep
the logic all in one spot.
Reviewed-by: Roman Stratiienko <r.stratiienko@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10389>