Currently this just has wait, but in order to get the right answer
for vulkan partial, lavapipe/llvmpipe need to pass a partial flag
through here in the future.
This just changes the API so that's possible.
v2: use an enum (zmike)
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15009>
The next patches will justify the new ownership. We want the BOs to
have references on the batches' syncobjs so we can implement implicit
tracking. In other words: BOs will be able to wait on syncobjs owned
by different screens. Since our syncobjs are actually just a Kernel
handle with a refcount, they can be used globally and it makes more
sense to map them to the bufmgr, just like the BOs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12363>
This commit does several things:
* Unify code common to several drivers by evaluating INTEL_NO_HW within
intel_get_device_info_from_fd (suggested by Jordan).
* For drivers that keep a copy of the intel_device_info struct, a
separate copy of the no_hw field is now unnecessary. Remove them.
* Minimize kernel queries when INTEL_NO_HW is true. This is done for
code simplification, but we may find reason to undo this later on.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12007>
u_threaded_context requires various objects to inherit from a new
threaded_foo base class rather than directly from pipe_foo. This
patch does most of the mechanical changes required for that.
It also initializes the new threaded_resource fields.
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
mi_ is already a unique prefix in Mesa so the gen_ isn't really gaining
us anything except extra characters. It's possible that MI_ may
conflict a tiny bit with GenXML but it doesn't seem to be a problem
today and we can deal with that in the future if it's ever an issue.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9393>
In the Chrome WebGL Aquarium stress test, 20 instances of Chrome will run
Aquarium simultaneously over 20+ hours. That causes Chrome crash.
During the stress, glBeginQueryIndexed is called frequently.
1.Each query will only use 32 bytes from query_buffer_uploader. After the offset
exceed 4096, it will alloc new buffer for query_buffer_uploader->buffer
and release the old buffer.
2.But iris_begin_query will call u_upload_alloc when the offset changed, and it
will increase the query_buffer_uploader->buffer->reference.count every time
when it called u_upload_alloc.
3.So when u_upload_release_buffer try to release the resource of
query_buffer_uploader->buffer, its reference.count is
already equal to 129. pipe_reference_described will only decrease its reference
count to 128.So it never called old_dst->screen->resource_destroy.
4.The old resouce bo will never be freeed. And chrome will called mmap every time
when it alloc new resource bo.
5. Chrome process map too many vmas in its process. Its map count exceed the
sysctl_max_map_count which is 65530 defined in kernel.
6. When iris_begin_query want to alloc new resource bo, it will meet NULL pointer
because mmap return failed. Finally chrome crashed when it access this NULL resource
bo.
The fix is decrease the reference count in iris_destroy_query.
Patch is verified by chrome webgl Aquarium test case for more than 72 hours.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Yang Shi <yang.a.shi@intel.com>
Reviewed-by: Alex Zuo <alex.zuo@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7890>
The write flag is redundant since it can be inferred easily from the
iris_address::access domain. This allows the iris_address struct to
be laid out more efficiently in memory, leading to a measurable
improvement in several Piglit Drawoverhead test-cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
Probably the most annoying patch to review from the whole series --
Mark every buffer object use as accessed through some caching domain
with the sequence number of the current synchronization section of the
batch. The additional argument of iris_use_pinned_bo() makes sure I'd
have gotten a compile error if I had missed any buffer added to the
batch validation list.
There are only a few exceptions where a buffer is left untracked while
adding it to the validation list, justified below:
- Batch buffers: These are strictly read-only for the moment.
- BLORP buffer objects: Their seqnos are bumped manually at the end
of iris_blorp_exec() instead, in order to avoid plumbing domain
information through BLORP address combining.
- Scratch buffers: The contents of these are strictly thread-local.
- Shader images and SSBOs: Accesses of these buffers are explicitly
synchronized at the API level.
v2: Opt out of tracking more aggressively (Ken): In addition to the
above, surface states, binding tables, instructions and most
dynamic states are now left untracked, which means a *lot* more BO
uses marked IRIS_DOMAIN_NONE which need to be reviewed extremely
carefully, since the cache tracker won't be able to provide any
coherency guarantees for them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This delimits all batch operations which access memory between
iris_batch_sync_region_start() and iris_batch_sync_region_end() calls.
This makes sure that any buffer objects accessed within the region are
considered in use through the same caching domain until the end of the
region.
Adding any buffer to the batch validation list outside of a sync
region will lead to an assertion failure in a future commit, unless
the caller explicitly opted out of the cache tracking mechanism.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
We're nearly out of dirty bits, and some patches pending review on
GitLab no longer apply due to that. Make room for them by splitting
off shader stage-specific bits into a separate stage_dirty mask.
An alternative would be to split compute-related bits into a separate
mask, but that would prevent the '<< stage' indexing done in various
parts of the driver from working.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5279>
This is just a refcounted wrapper around a drm_syncobj. There is
enough terminology going on in the area of synchronization (sync
objects, sync files, ...) that I'd rather not invent our own.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>
This stuff is somewhat specific to the GL extension & drivers. On
Vulkan we won't use this, it also made a rather large file.
v2: Fix Android build (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4344>
Implementation is similar to radeonsi in 5f1cef76
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is the first call that provides the iris context to the monitor
implementation. On the first call, use the iris context to initialize
the monitor context.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The perf subsystem needs several macro definitions that were
duplicated in Iris and i965 headers. Place these macros within perf,
if the perf implementation contains the only references to the values.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In a few cases, we switch to MI_MATH instead of MI_PREDICATE,
just because we were already doing math and it's easier to chain
together.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This is a relatively minimal change to adjust all the gallium interfaces
to use bool instead of boolean. I tried to avoid making unrelated
changes inside of drivers to flip boolean -> bool to reduce the risk of
regressions (the compiler will much more easily allow "dirty" values
inside a char-based boolean than a C99 _Bool).
This has been build-tested on amd64 with:
Gallium drivers: nouveau r300 r600 radeonsi freedreno swrast etnaviv v3d
vc4 i915 svga virgl swr panfrost iris lima kmsro
Gallium st: mesa xa xvmc xvmc vdpau va
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
In the past, each query object had their own BO. Checking if the batch
referenced that BO was an easy way to check if commands were still
queued to compute the query value. If so, we needed to flush.
More recently (c24a574e6c), we started using an u_upload_mgr for query
objects, placing multiple queries in the same BO. One side-effect is
that iris_batch_references is a no longer a reasonable way to check if
commands are still queued for our query. Ours might be done, but a
later query that happens to be in the same BO might be queued. We don't
want to flush in that case.
Instead, check if the current batch's signalling syncpt is the one we
referenced when ending the query. We know the syncpt can't have been
reused because our query is holding a reference, so a simple pointer
comparison should suffice.
Removes all batch flushing caused by query objects in Shadow of Mordor.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
This prints a log of every PIPE_CONTROL flush we emit, noting which bits
were set, and also the reason for the flush. That way we can see which
are caused by hardware workarounds, render-to-texture, buffer updates,
and so on. It should make it easier to determine whether we're doing
too many flushes and why.
We don't execute any of the commands to record snapshots, so we can't
actually produce a real result. We do however need to avoid waiting
on a syncpt which will never be signalled. So, just return 0.
iris_draw_vbo is divided into two functions to remove unnecessary
operations from the loop. This implementation of ARB_indirect_parameters
takes into account NV_conditional_render by saving MI_PREDICATE_RESULT
at the start of a draw call and restoring it at the end also the result
of NV_conditional_render is taken into account when computing predicates
that limit draw calls for ARB_indirect_parameters in a similar way
to 1952fd8d in ANV.
v2: Optimize indirect draws (suggested by Kenneth Graunke)
v3: (by Kenneth Graunke)
- Fix an issue where indirect draws wouldn't set patch information
before updating the compiled TCS.
- Move some code back to iris_draw_vbo to avoid duplicating it.
- Fix minor indentation issues.
Signed-off-by: Illia Iorin <illia.iorin@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to subtract the starting offset from the final offset before
dividing by the stride. See src/intel/vulkan/genX_cmd_buffer.c:3142.
Not known to fix anything.
MI_PREDICATE_DATA is an intermediate storage for the MI_PREDICATE
command's calculations - it holds the result of the subtraction when
the compare operation is SRCS_EQUAL or DELTAS_EQUAL. But the actual
result of the predication is MI_PREDICATE_RESULT, which is what we
want to copy from the render context to the compute context.
This function can be used to stall on the CPU and resolve the predicate
for the conditional render. It will convert ice->state.predicate from
IRIS_PREDICATE_STATE_USE_BIT to either IRIS_PREDICATE_STATE_RENDER or
IRIS_PREDICATE_STATE_DONT_RENDER, depending on the result of the query.
v2:
- return void (Ken)
- update the stored condition (Ken)
- simplify the code leading to resolve the predicate (Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Suggested by Chris Wilson, if only to make it obvious to the human
readers that these are volatile reads. It may also be necessary for
the compiler in a few cases.
When switching from bo_wait to sync-points, I missed that we turned an
if (not landed) bo_wait into a while (not landed) check_syncpt(), which
has a timeout of 0. This meant, rather than sleeping until the batch
is complete, we'd busy-loop, continually asking the kernel "is the batch
done yet???". This is not what we want at all - if we wanted a busy
loop, we'd just loop on !snapshots_landed. We want to sleep.
Add an effectively infinite timeout so that we sleep.
Instead of allocating 4K BO per query object, we can create a large blob
of memory and split it into pieces as required.
Having one BO for multiple query objects, we don't want to wait on all
of them, instead when we write last snapshot, we create a sync point, and
check syncpoints while waiting on particular object.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
1. Set a render condition. We emit it immediately on the render
engine, and stash q->bo as ice->state.compute_predicate in case
the compute engine needs it.
2. Clear the render condition. We were incorrectly leaving a stale
compute_predicate kicking around...
3. Dispatch compute. We would then read the stale compute predicate,
and try to load it into MI_PREDICATE_DATA. But q->bo may have been
freed altogether, causing us to try and use garbage memory as a BO,
adding it to the validation list, failing asserts, and tripping
EINVALs in execbuf.
Huge thanks to Mark Janes for narrowing this sporadic GL CTS failure
down to a list of 48 tests I could easily run to reproduce it. Huge
thanks to the Valgrind authors for the memcheck tool that immediately
pinpointed the problem.
I had a hack in place earlier to pass the query type as q->index
for the regular statistics query, but we ended up adjusting the
interface and adding a new query type. Use that instead, fixing
pipeline statistics queries since the rebase.
We were dividing by 4 in calculate_result_on_gpu(), and also in
iris_get_query_result(). We should stop doing the latter, and instead
divide by 4 in calculate_result_on_cpu() as well.
Otherwise, if snapshots were available, and you hit the
calculate_result_on_cpu() path, but requested it be written to a QBO,
you'd fail to get a divide.