Commit Graph

60 Commits

Author SHA1 Message Date
Samuel Pitoiset 5d6a560a29 radv: do not use the availability bit for timestamp queries
It's unnecessary because we can just check if the timestamp
is to different to the default value when a pool is created
or resetted. Instead of waiting for the availability bit to
be 1, we have to emit a not equal WAIT_REG_MEM for checking
if the timestamp is ready.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-09-28 09:08:03 +02:00
Bas Nieuwenhuizen fbcd167314 radv: Add on-demand compilation of built-in shaders.
In environments where we cannot cache, e.g. Android (no homedir),
ChromeOS (readonly rootfs) or sandboxes (cannot open cache), the
startup cost of creating a device in radv is rather high, due
to compiling all possible built-in pipelines up front. This meant
depending on the CPU a 1-4 sec cost of creating a Device.

For CTS this cost is unacceptable, and likely for starting random
apps too.

So if there is no cache, with this patch radv will compile shaders
on demand. Once there is a cache from the first run, even if
incomplete, the driver knows that it can likely write the cache
and precompiles everything.

Note that I did not switch the buffer and itob/btoi compute pipelines
to on-demand, since you cannot really do anything in Vulkan without
them and there are only a few.

This reduces the CTS runtime for the no caches scenario on my
threadripper from 32 minutes to 8 minutes.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-08-14 10:26:24 +02:00
Karol Herbst cb65246ed2 nir: cleanup oversized arrays in nir_swizzle calls
There are no fixed sized array arguments in C, those are simply pointers
to unsized arrays and as the size is passed in anyway, just rely on that.

where possible calls are replaced by nir_channel and nir_channels.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
2018-07-13 15:46:57 +02:00
Samuel Pitoiset 6248fbe5e4 radv: get rid of buffer object priorities
We mostly use the same priority for all buffer objects, so
I don't think that matter much. This should reduce CPU
overhead a little bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-12 11:08:40 +02:00
Samuel Pitoiset 1f616a840e radv: emit a dummy ZPASS_DONE to prevent GPU hangs on GFX9
A ZPASS_DONE or PIXEL_STAT_DUMP_EVENT (of the DB occlusion
counters) must immediately precede every timestamp event to
prevent a GPU hang on GFX9.

Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-07-12 10:22:36 +02:00
Samuel Pitoiset 9c09e7d66e radv: remove unused 'predicated' parameter from some functions
It's always false.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-06-27 09:48:15 +02:00
Samuel Pitoiset fa42fa1a60 radv: emit PIPELINESTAT_{START,STOP} events for pipeline stats queries
Ported from RadeonSI.
This appears to fix some random fails with:
dEQP-VK.query_pool.statistics_query.*

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-26 18:23:16 +02:00
Samuel Pitoiset 41f6096c26 radv: use EOP_DATA_SEL_* instead of magic numbers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-21 10:31:02 +02:00
Marek Olšák 6703fec58c amd,radeonsi: rename radeon_winsys_cs -> radeon_cmdbuf
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-19 13:08:50 -04:00
Bas Nieuwenhuizen 38933c1151 radv: Add option to print errors even in optimized builds.
Errors are not that common of a case so we can eat a slight perf
hit in having to call a function and do a runtime check.

In turn this makes debugging random errors happening for end users
easier, because they don't have to have a debug build on hand.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-31 11:51:23 +02:00
Bas Nieuwenhuizen 62f50df7b7 radv: Fix multiview queries.
This moves the extra queries to after the main query ended, instead
of doing it after the begin and hence doing nesting.

We also emit only (view count - 1) extra queries, as the main query
is already there for the first view.

This fixes the CTS occasionally getting stuck in
dEQP-VK.multiview.queries* waiting on results.

Fixes: 32b4f3c38d "radv/query: handle multiview queries properly. (v3)"
CC: 18.1 <mesa-stable@lists.freedesktop.org>

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-05-14 16:49:06 +02:00
Samuel Pitoiset 1d766b0196 radv: only disable out-of-order rast for perfect occlusion queries
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-05-02 10:33:22 +02:00
Samuel Pitoiset 7fe586f6fb radv: only enable PERFECT_ZPASS_COUNTS for precision occlusion queries
This unnecessary when the precision bit flag is not set, and this
might hurt performance. The Vulkan explains that not setting
VK_QUERY_CONTROL_PRECISE_BIT might be more efficient on some
implementations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-04-06 09:07:34 +02:00
Dave Airlie 032014ac01 radv/query: handle multiview timestamp queries.
For each view bit we need to emit a timestamp query.

Fixes: dEQP-VK.multiview.queries*

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-03-19 19:29:14 +00:00
Dave Airlie 32b4f3c38d radv/query: handle multiview queries properly. (v3)
For multiview we need to emit a number of sequential queries
depending on the view mask.

This avoids dEQP-VK.multiview.queries.15 waiting forever
on the CPU for query results that are never coming.

We only really want to emit one query,
and the rest should be blank (amdvlk does the same),
so we emit begin/end pairs for all the others except
the first query.

v2: fix tests
v3: split out patch.

Fixes: dEQP-VK.multiview.queries*
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-03-19 19:29:09 +00:00
Dave Airlie 4034dc5c72 radv/query: split out begin/end query emission
This just splits out the begin/end query hw emissions,
it makes it easier to add multiview support for queries.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-03-19 19:29:05 +00:00
Samuel Pitoiset c27f5419f6 radv: only emit cache flushes when the pool size is large enough
This is an optimization which reduces the number of flushes for
small pool buffers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-01 09:53:40 +01:00
Samuel Pitoiset 2fe07933bd radv: keep track of the query pool size
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-01 09:53:39 +01:00
Samuel Pitoiset c956d0f406 radv: make sure to emit cache flushes before starting a query
If the query pool has been previously resetted using the compute
shader path.

Fixes: a41e2e9cf5 ("radv: allow to use a compute shader for resetting the query pool")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105292
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-01 09:14:49 +01:00
Dave Airlie ec1edd0fd2 radv: fix pipeline statistics end query on compute queue
It's legal to a pipeline stat query on a compute queue,
but we'd emit the wrong packet here. This should fix it to emit
the correct packet.

Noticed while inspecting the mpv hang.

Fixes: ad61eac250 (radv: factor out eop event writing code. (v2))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-12-28 19:31:01 +10:00
Samuel Pitoiset bc92ed04ac radv: do not add the query pool BO to the list in vkCmdEndQuery()
As per the spec, the query identified by queryPool and query
must currently be active. Applications have to call vkCmdBeginQuery()
before, and thus the query pool BO will already be in the list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-11-20 11:18:20 +01:00
Samuel Pitoiset cd64a4f705 radv: use vk_error() everywhere an error is returned
For consistency and it might help for debugging purposes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-11-13 11:05:26 +01:00
Dave Airlie 25660499b6 radv: wrap cs_add_buffer in an inline. (v2)
The next patch will try and avoid calling the indirect function.

v2: add a missing conversion.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-06 21:45:59 +00:00
Samuel Pitoiset a41e2e9cf5 radv: allow to use a compute shader for resetting the query pool
Serious Sam Fusion 2017 uses a huge number of occlusion queries,
and the allocated query pool buffer is greater than 4096 bytes.

This slightly improves performance (tested in Ultra) from
117.2 FPS to 119.7 FPS (~+2%) on my RX480.

This also improves Talos, from 69 FPS to 72/73 FPS (~+5%).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-27 13:47:06 +02:00
Dave Airlie a639d40f13 radv: add support for local bos. (v3)
This uses the new kernel interfaces for reduced cs overhead,
We only set the local flag for memory allocations that don't have
 a dedicated allocation and ones that aren't imports.

v2: add to all the internal buffer creation paths.
v3: missed some command submission paths, handle 0/empty bo lists.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-10-26 23:59:28 +01:00
Marek Olšák 7b697c8b78 amd: move r600d_common.h into r600g
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-09 16:27:06 +02:00
Samuel Pitoiset c8ea55ddda radv: convert all COMPUTE operations to the RADV_META_SAVE_XXX flags
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-06 09:49:06 +02:00
Samuel Pitoiset 457306fa4c radv: do not need to double zero-init the meta state structures
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-02 11:56:20 +02:00
Samuel Pitoiset 8860b39d94 radv: store the amount of saved constants in the compute state
It's safer and more elegant.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-09-27 09:26:44 +02:00
Samuel Pitoiset 4b407a62c7 radv: fix saved compute state when doing statistics/occlusion queries
We are pushing 16-bytes of constants, so we have to save/restore
the same amount of data to avoid data corruption.

Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-09-26 23:14:48 +02:00
Bas Nieuwenhuizen d235ff6e8f radv: Don't use a virtual function for getting the buffer virtual address.
We are really not going to use a winsys which does not need to store
the va, so might as well store it in a standard field.

Not sure this helps perf much though, as most of the cost is in the
cache miss accessing the bo anyway, which we stil need to do.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-09-20 22:04:25 +02:00
Dave Airlie a6c2001ace radv: add support for cmd predication.
This doesn't get used yet, it just adds support to various PKT3
emissions to enable it later.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-07-06 02:06:49 +01:00
Bas Nieuwenhuizen 59c2e2a061 radv: Remove SI num RB override for occlusion queries.
radeonsi doesn't have it anymore either.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-06 23:23:43 +02:00
Nicolai Hähnle 388d36dfd1 radv: fewer than 8 RBs are possible
This fixes the subsequent assertion on Bonaire.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-06-05 10:43:47 +10:00
Dave Airlie ad61eac250 radv: factor out eop event writing code. (v2)
In prep for GFX9 refactor some of the eop event writing code
out.

This changes behaviour, but aligns with what radeonsi does,
it does double emits on CIK/VI, whereas previously it only
did this on CIK.

v2: bump the size checks.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-02 12:48:56 +10:00
Dave Airlie 7205431e73 radv: factor out si_emit_wait_fence code.
This code was in a few places, consolidate into one.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-06-02 12:48:20 +10:00
Grazvydas Ignotas 45ccb661d8 radv: always free nir shaders from modules on stack
valgrind reports them as leaked, and I could not find anything making a
copy of the nir pointer. Also, radv_device_init_meta_blit_color() is
already freeing them unconditionally like this.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-05-10 01:13:44 +03:00
Jason Ekstrand b86dba8a0e nir: Embed the shader_info in the nir_shader again
Commit e1af20f18a changed the shader_info
from being embedded into being just a pointer.  The idea was that
sharing the shader_info between NIR and GLSL would be easier if it were
a pointer pointing to the same shader_info struct.  This, however, has
caused a few problems:

 1) There are many things which generate NIR without GLSL.  This means
    we have to support both NIR shaders which come from GLSL and ones
    that don't and need to have an info elsewhere.

 2) The solution to (1) raises all sorts of ownership issues which have
    to be resolved with ralloc_parent checks.

 3) Ever since 00620782c9, we've been
    using nir_gather_info to fill out the final shader_info.  Thanks to
    cloning and the above ownership issues, the nir_shader::info may not
    point back to the gl_shader anymore and so we have to do a copy of
    the shader_info from NIR back to GLSL anyway.

All of these issues go away if we just embed the shader_info in the
nir_shader.  There's a little downside of having to copy it back after
calling nir_gather_info but, as explained above, we have to do that
anyway.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Dave Airlie eb2a833679 radv: set base/ranges for push constant loads.
This isn't necessary yet but I'd like to use the range in
some future patches.

[airlied: add new resolve pass]
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-05-08 08:56:36 +10:00
Bas Nieuwenhuizen 6681ab1f97 radv: Use correct stage for ready bit.
Set the bit in the same stage as the timestamp, instead always at top of pipe.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
2017-05-02 00:54:44 +02:00
Bas Nieuwenhuizen 568aec29d9 radv: Add top of pipe timestamp queries.
Does not fix brokenness with the ready bit.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-05-02 00:54:18 +02:00
Fredrik Höglund 5ab5d1bee4 radv: use push descriptors in meta
Use push descriptors instead of temp descriptor sets.

Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-04-14 23:21:24 +02:00
Bas Nieuwenhuizen b35b5951fc radv: Stop shadowing the result in radv_GetQueryPoolResults.
The outer result was referred to, which meant bugs.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-12 07:38:58 +02:00
Bas Nieuwenhuizen 0763453291 radv: Return VK_NOT_READY if the query results are not available.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: 8475a14302 ("radv: Implement pipeline statistics queries.")
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
2017-04-12 07:38:58 +02:00
Bas Nieuwenhuizen 2dacb727c2 radv: Set query availability bit even if we don't wait.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: 8475a14302 ("radv: Implement pipeline statistics queries.")
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
2017-04-12 07:38:58 +02:00
Bas Nieuwenhuizen 8475a14302 radv: Implement pipeline statistics queries.
The devil is in the shader again, otherwise this is
fairly straightforward.

The CTS contains no pipeline statistics copy to buffer
testcases, so I did a basic smoketest.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-11 09:33:17 +02:00
Bas Nieuwenhuizen d2906bc72d radv: Let count be dynamic in radv_break_on_count.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-11 09:33:17 +02:00
Bas Nieuwenhuizen 8473193760 radv: Rename query pipeline/set layout.
For using them with both occlusion and pipeline statistics queries.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-11 09:33:17 +02:00
Bas Nieuwenhuizen 95743d5b88 radv: Use VK_WHOLE_SIZE for the query buffer bindings.
The buffer sizes are specified just a few lines earlier, so don't
repeat ourselves.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-11 09:33:17 +02:00
Bas Nieuwenhuizen 8911dd6d12 radv: Use a shader for occlusion CmdCopyQueryPoolResults.
Use the new occlusion query copy shader.

We don't use the shader for the waiting as a polling loop ineracts badly
with having caching enabled. I noticed on my GPU (Tonga) that the values
are written out in order, so I just use a WAIT_REG_MEM on the last value.

If it turns out other chips don't do that we may need to look a bit more
into this. Having 8 WAIT_REG_MEM packets per query doesn't sound ideal.

This also restricts the availability word in the pool to timestamp queries
only, as occlusion queries don't use it, and pipeline statistic queries
likely won't either.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-04-11 09:33:17 +02:00