Commit Graph

32726 Commits

Author SHA1 Message Date
Nicolai Hähnle b921da3b74 radeonsi: use a threaded context even for debug contexts
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:01:04 +01:00
Nicolai Hähnle 1a6d9e087a radeonsi: record and dump time of flush
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:01:04 +01:00
Nicolai Hähnle b07569ad8b ddebug: optionally handle transfer commands like draws
Transfer commands can have associated GPU operations.

Enabled by passing GALLIUM_DDEBUG=transfers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:01:03 +01:00
Nicolai Hähnle 18fd2a859d ddebug: dump context and before/after times of draws
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:01:03 +01:00
Nicolai Hähnle ba2f2b6f2a ddebug: generalize print_named_xxx via a PRINT_NAMED macro
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:01:03 +01:00
Nicolai Hähnle c9fefa062b ddebug: rewrite to always use a threaded approach
This patch has multiple goals:

1. Off-load the writing of records in 'always' mode to another thread
   for performance.

2. Allow using ddebug with threaded contexts. This really forces us to
   move some of the "after_draw" handling into another thread.

3. Simplify the different modes of ddebug, both in the code and in
   the user interface, i.e. GALLIUM_DDEBUG. In particular, there's
   no 'pipelined' anymore, since we're always pipelined; and 'noflush'
   is replaced by 'flush', since we no longer flush by default.

4. Fix the fences in pipelining mode. They previously relied on writes
   via pipe_context::clear_buffer. However, on radeonsi, those could
   (quite reasonably) end up in the SDMA buffer. So we use the newly
   added PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE fences instead.

5. Improve pipelined mode overall, using the finer grained information
   provided by the new fences.

Overall, the result is that pipelined mode should be more useful, and
using ddebug in default mode is much less invasive, in the sense that
it changes the overall driver behavior less (which is kind of crucial
for a driver debugging tool).

An example of the new hang debug output:

  Gallium debugger active.
  Hang detection timeout is 1000ms.
  GPU hang detected, collecting information...

  Draw #   driver  prev BOP  TOP  BOP  dump file
  -------------------------------------------------------------
  2          YES      YES    YES  NO   /home/nha/ddebug_dumps/shader_runner_19919_00000000
  3          YES      NO     YES  NO   /home/nha/ddebug_dumps/shader_runner_19919_00000001
  4          YES      NO     YES  NO   /home/nha/ddebug_dumps/shader_runner_19919_00000002
  5          YES      NO     YES  NO   /home/nha/ddebug_dumps/shader_runner_19919_00000003

  Done.

We can see that there were almost certainly 4 draws in flight when
the hang happened: the top-of-pipe fence was signaled for all 4 draws,
the bottom-of-pipe fence for none of them. In virtually all cases,
we'd expect the first draw in the list to be at fault, but due to the
GPU parallelism, it's possible (though highly unlikely) that one of
the later draws causes a component to get stuck in a way that prevents
the earlier draws from making progress as well.

(In the above example, there were actually only 3 draws truly in flight:
the last draw is a blit that waits for the earlier draws; however, its
top-of-pipe fence is emitted before the cache flush and wait, and so
the fact that the draw hasn't truly started yet can only be seen from a
closer inspection of GPU state.)

Acked-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:01:03 +01:00
Nicolai Hähnle e8bb8758dd ddebug: use an atomic increment when numbering files
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:01:03 +01:00
Nicolai Hähnle d6710fe874 dd/util: extract dd_get_debug_filename_and_mkdir
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:01:03 +01:00
Nicolai Hähnle 8491fcafab gallium/u_dump: add and use util_dump_transfer_usage
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:01:02 +01:00
Nicolai Hähnle 9b8033a4a7 gallium/u_dump: add util_dump_ns
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:01:02 +01:00
Nicolai Hähnle 6f4a03b08a gallium/u_dump: export util_dump_ptr
Change format to %p while we're at it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:01:02 +01:00
Nicolai Hähnle 125a915052 radeonsi: implement PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE
v2: use uncached system memory for the fence, and use the CPU to
    clear it so we never read garbage when checking the fence

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:00:55 +01:00
Nicolai Hähnle e4627ac8fb radeonsi: document some subtle details of fence_finish & fence_server_sync
v2: remove the change to si_fence_server_sync, we'll handle that more
    robustly

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:00:50 +01:00
Nicolai Hähnle 14b9fa75e4 gallium: add pipe_context::callback
For running post-draw operations inside the driver thread. ddebug will
use it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:00:50 +01:00
Nicolai Hähnle 2bdfbb0e53 gallium/u_threaded: implement pipe_context::set_log_context
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:00:49 +01:00
Nicolai Hähnle 244536d3d6 gallium/u_threaded: avoid syncs for get_query_result
Queries should still get marked as flushed when flushes are executed
asynchronously in the driver thread.

To this end, the management of the unflushed_queries list is moved into
the driver thread.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:00:49 +01:00
Nicolai Hähnle 609a230375 gallium/u_threaded: implement asynchronous flushes
This requires out-of-band creation of fences, and will be signaled to
the pipe_context::flush implementation by a special TC_FLUSH_ASYNC flag.

v2:
- remove an incorrect assertion
- handle fence_server_sync for unsubmitted fences by
  relying on the improved cs_add_fence_dependency
- only implement asynchronous flushes on amdgpu

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:00:42 +01:00
Nicolai Hähnle 11b380ed0c gallium/u_threaded: mark queries flushed only for non-deferred flushes
The driver uses (and must use) the flushed flag of queries as a hint that
it does not have to check for synchronization with currently queued up
commands. Deferred flushes do not actually flush queued up commands, so
we must not set the flushed flag for them.

Found by inspection.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:00:42 +01:00
Nicolai Hähnle 78a4750d91 radeonsi: move fence functions to si_fence.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:00:42 +01:00
Nicolai Hähnle e6dbc804a8 winsys/amdgpu: handle cs_add_fence_dependency for deferred/unsubmitted fences
The idea is to fix the following interleaving of operations
that can arise from deferred fences:

 Thread 1 / Context 1          Thread 2 / Context 2
 --------------------          --------------------
 f = deferred flush
 <------- application-side synchronization ------->
                               fence_server_sync(f)
                               ...
                               flush()
 flush()

We will now stall in fence_server_sync until the flush of context 1
has completed.

This scenario was unlikely to occur previously, because applications
seem to be doing

 Thread 1 / Context 1          Thread 2 / Context 2
 --------------------          --------------------
 f = glFenceSync()
 glFlush()
 <------- application-side synchronization ------->
                               glWaitSync(f)

... and indeed they probably *have* to use this ordering to avoid
deadlocks in the GLX model, where all GL operations conceptually
go through a single connection to the X server. However, it's less
clear whether applications have to do this with other WSI (i.e. EGL).
Besides, even this sequence of GL commands can be translated into
the Gallium-level sequence outlined above when Gallium threading
and asynchronous flushes are used. So it makes sense to be more
robust.

As a side effect, we no longer busy-wait on submission_in_progress.

We won't enable asynchronous flushes on radeon, but add a
cs_add_fence_dependency stub anyway to document the potential
issue.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 14:00:22 +01:00
Nicolai Hähnle 1e5c9cf590 gallium: add PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE bits
These bits are intended to be used by the ddebug hang detection and are
named in analogy to the Vulkan stage bits (and the corresponding Radeon
pipeline event).

Hang detection needs fences on the granularity of individual commands,
which nothing else really covers. The closest alternative would have
been PIPE_QUERY_GPU_FINISHED, but (a) queries are a per-context object
and we really want a per-screen object, (b) queries don't offer a
wait with timeout, and (c) in any case, PIPE_QUERY_GPU_FINISHED is
meant to imply that GPU caches are flushed, which the new bits
explicitly aren't.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 13:58:16 +01:00
Nicolai Hähnle ea6df1ce37 gallium: add PIPE_FLUSH_ASYNC and PIPE_FLUSH_HINT_FINISH
Also document some subtleties of pipe_context::flush.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 13:58:16 +01:00
Nicolai Hähnle c50743f61c gallium: remove unused and deprecated u_time.h
Cc: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:57:22 +01:00
Nicolai Hähnle 222a2fb998 util: move os_time.[ch] to src/util
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:57:21 +01:00
Nicolai Hähnle f76a6cb337 radeonsi: always use async compiles when creating shader/compute states
With Gallium threaded contexts, creating shader/compute states is
effectively a screen operation, so we should not use context state.

In particular, this allows us to avoid using the context's LLVM
TargetMachine.

This isn't an issue yet because u_threaded_context filters out non-async
debug callbacks, and we disable threaded contexts for debug contexts.
However, we may want to change that in the future.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:53:20 +01:00
Nicolai Hähnle b650fc09c3 radeonsi: fix potential use-after-free of debug callbacks
Found by inspection.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:53:20 +01:00
Nicolai Hähnle dd7c273e87 radeonsi: move pipe debug callback to si_context
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:53:19 +01:00
Nicolai Hähnle f0d3a4de75 util: move pipe_barrier into src/util and rename to util_barrier
The #if guard is probably not 100% equivalent to the previous PIPE_OS
check, but if anything it should be an over-approximation (are there
pthread implementations without barriers?), so people will get either
a good implementation or compile errors that are easy to fix.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:53:19 +01:00
Nicolai Hähnle 28c95cdb29 gallium: add async debug message forwarding helper
v2: use util_vasprintf for Windows portability

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
2017-11-09 11:53:19 +01:00
Nicolai Hähnle 0dcf30e550 gallium: clarify the constraints on sampler_view_destroy
r600 expects the context that created the sampler view to still be alive
(there is a per-context list of sampler views).

svga currently bails when the context of destruction is not the same as
creation.

The GL state tracker, which is the only one that runs into the
multi-context subtleties (due to share groups), already guarantees that
sampler views are destroyed before their context of creation is destroyed.

Most drivers are context-agnostic, so the warning message in
pipe_sampler_view_release doesn't really make sense.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:50:54 +01:00
Nicolai Hähnle 0f54ee6072 radeonsi: reduce the scope of sel->mutex in si_shader_select_with_key
We only need the lock to guard changes in the variant linked list. The
actual compilation can happen outside the lock, since we use the ready
fence as a guard.

v2: fix double-unlock

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:37:51 +01:00
Nicolai Hähnle 4f493c79ee radeonsi: use ready fences on all shaders, not just optimized ones
There's a race condition between si_shader_select_with_key and
si_bind_XX_shader:

  Thread 1                         Thread 2
  --------                         --------
  si_shader_select_with_key
    begin compiling the first
    variant
    (guarded by sel->mutex)
                                   si_bind_XX_shader
                                     select first_variant by default
                                     as state->current
                                   si_shader_select_with_key
                                     match state->current and early-out

Since thread 2 never takes sel->mutex, it may go on rendering without a
PM4 for that shader, for example.

The solution taken by this patch is to broaden the scope of
shader->optimized_ready to a fence shader->ready that applies to
all shaders. This does not hurt the fast path (if anything it makes
it faster, because we don't explicitly check is_optimized).

It will also allow reducing the scope of sel->mutex locks, but this is
deferred to a later commit for better bisectability.

Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple.buffers.bufferdata_render

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:37:51 +01:00
Ilia Mirkin f317f72f73 r600g: use SIMPLE_FLOAT for blending to enable some optimizations
Radeonsi also sets this flag. Seems to avoid pulling up the desintation
RT value when the dst blend factor is zero if it's not otherwise being
loaded. Among other things, it allows blending to overwrite infinity/NaN
values in the destination RT.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-08 22:35:27 -05:00
Ilia Mirkin 35433494f3 nv50: make blending work so that zero wins in a multiplication
This matches nvc0 behavior, tested with the fbo-float-nan piglit.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann<tobias.johannes.klausmann@mni.thm.de>
2017-11-08 22:32:43 -05:00
Timothy Arceri 87f02ddfd1 amdgpu: use simple mtx
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-09 12:07:48 +11:00
Timothy Arceri f0857fe87b mesa: use simple mtx in core mesa
Results from x11perf -copywinwin10 on Eric's SKL:
   4.33338% ± 0.905054% (n=40)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Yogesh Marathe <yogesh.marathe@intel.com>
2017-11-09 12:07:48 +11:00
Andreas Boll 6e4d65f674 broadcom/vc5: Add vc5_drm.h to the release tarball
Fixes: 45bb8f2957 ("broadcom: Add V3D 3.3 gallium driver called "vc5",
       for BCM7268.")

Cc: 17.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-11-08 18:30:45 +00:00
Gert Wollny 6905d005ef clover: use the unified check for c++11 instead of the gcc version number
So far clover based its test for compiler support on the version of gcc,
while in reality support for c++11 is required. This patch replaces the
version check by the check unified for all modules that require c++11.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-11-08 16:03:38 +00:00
Gert Wollny 8f18528cea swr: Replace the check for c++11 by the unified version
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-11-08 16:03:38 +00:00
Emil Velikov 0cd0958544 targets/opencl: don't hardcode the icd file install to /etc/...
Use $(sysconfdir) instead of hardcoding /etc.

While the OpenCL spec expects the file in /etc, people building their
stack can override that, esp. !Linux users.

Furthermore this removes a fundamental violation, which results in the
system file being overwritten even as one explicitly sets --prefix
and/or DESTDIR.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-By: Aaron Watry <awatry@gmail.com>
2017-11-08 14:10:07 +00:00
Tobias Droste 5d61fa4e68 gallivm: Use new LLVM fast-math-flags API
LLVM 6 changed the API on the fast-math-flags:
https://reviews.llvm.org/rL317488

NOTE: This also enables the new flag 'ApproxFunc' to allow for
approximations for library functions (sin, cos, ...). I'm not completly
convinced, that this is something mesa should do.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2017-11-08 10:44:19 +01:00
Marek Olšák 7f33e94e43 amd/addrlib: update to latest version
This uses C++11 initializer lists.

I just overwrote all Mesa files with internal addrlib and discarded
hunks that we should probably keep, but I might have missed something.

The code depending on ADDR_AM_BUILD is removed. We can add it back next
time if needed.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-08 00:55:13 +01:00
Eric Anholt 3bfcd31e98 braodcom/vc5: Flush the job when it grows over 1GB.
Fixes GL_OUT_OF_MEMORY from streaming-texture-leak (and will hopefully
keep piglit from ooming on my no-swap platform, as well).
2017-11-07 12:58:03 -08:00
Eric Anholt 80da60947b broadcom/vc5: Fix pausing of transform feedback.
Gallium disables it by removing the streamout buffers, not by binding a
program that doesn't have TF outputs.  Fixes piglit
"ext_transform_feedback2/counting with pause"
2017-11-07 12:58:00 -08:00
Eric Anholt 25d199f67d broadcom/vc5: Add support for GL_RASTERIZER_DISCARD
Fixes piglit discard-drawarrays.
2017-11-07 12:57:49 -08:00
Eric Anholt 9ccb6621be broadcom/vc5: Add partial transform feedback query support.
We have to compute the queries in software, so we're counting the
primitives by hand.  We still need to make sure to not increment the
PRIMITIVES_EMITTED if we overflowed, but leave that for later.
2017-11-07 12:57:43 -08:00
Eric Anholt 4f33344e7a broadcom/vc5: Add occlusion query support.
Fixes all of piglit's OQ tests.
2017-11-07 12:56:40 -08:00
Eric Anholt bd24f4890f broadcom/vc5: Skip emitting textures that aren't used.
Fixes crashes when ARB_fp uses texture[1] but not 0, as in piglit's
fp-fragment-position.
2017-11-07 09:40:25 -08:00
Eric Anholt 3d5e62dcfa broadcom/vc5: Add missing SRGBA8 ETC2 support.
Fixes piglit oes_compressed_etc2_texture-miptree srgb8-alpha8.
2017-11-07 09:40:25 -08:00
Eric Anholt 6079f7c3c3 broadcom/vc5: Disable early Z test when the FS writes Z.
Fixes piglit early-z.
2017-11-07 09:40:25 -08:00