Commit Graph

110721 Commits

Author SHA1 Message Date
Marek Olšák b3a26d4628 glsl: fix and clean up NV_compute_shader_derivatives support
- make sure compute shader derivatives are exposed for all extensions
- unify duplicated code

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-02 16:09:24 -04:00
Marek Olšák 20909284f2 st/dri: decrease input lag by syncing sooner in SwapBuffers
It's done by:
- decrease the number of frames in flight by 1
- flush before throttling in SwapBuffers
  (instead of wait-then-flush, do flush-then-wait)

The improvement is apparent with Unigine Heaven.

Previously:
    draw frame 2
    wait frame 0
    flush frame 2
    present frame 2

    The input lag is 2 frames.

Now:
    draw frame 2
    flush frame 2
    wait frame 1
    present frame 2

    The input lag is 1 frame. Flushing is done before waiting, because
    otherwise the device would be idle after waiting.

Nine is affected because it also uses the pipe cap.
2019-05-02 16:09:24 -04:00
Erik Faye-Lund d30ce03bc0 meson: add build-summary
This roughly mirrors what we get from autotools. There's a few
differences, though:

1. The "exec_prefix" output has been dropped. Meson doesn't support
   this, so it makes no sense here.
2. The "llvm-config" output has been dropped. Meson abstracts dependency
   discovery a bit more than our autotools build-system does, so it's
   not easy to get this information as-is.
3. HUD extra stats, SWR archs, Shared/Static libs and CFLAGS / CXXFLAGS /
   LDFLAGS has been dropped. These can be inspected by "meson configure".
4. How we set defines works quite differently in our Meson build-system,
   and the result isn't quite the same. In particular, the DEFINES output
   has been dropped, to avoid having to refactor the code too much.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109326
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2019-05-02 18:30:29 +00:00
Erik Faye-Lund 2127403439 meson: give dri- and gallium-drivers separate vars
Variables are cheap, and there's little reason for the dri and gallium
drivers to work on the same variable for the driver list. So let's split
these in two separate lists instead.

This makes it easier to inspect these after-the fact, for instance
for generating a summary of build-settings.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2019-05-02 18:30:29 +00:00
Erik Faye-Lund 28f18915b8 meson: lift driver-collection out into parent build-file
This way we can mark the dri_drivers and dri_link arrays as temporary,
as all knowledge about them are contained in a single build-file with
clearly visible limited life-span.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
2019-05-02 18:30:29 +00:00
Rob Clark c14b13d0ff docs: mark KHR_blend_equation_advanced done on a6xx
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-02 11:19:22 -07:00
Rob Clark 8c77e669a8 freedreno/a6xx: smaller hammer for fb barrier
We just need to do a sequence of commands to flush the cache.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Rob Clark 6fa8a6d60f freedreno/a6xx: KHR_blend_equation_advanced support
Wire up support to sample from the fb (and force GMEM rendering when we
have fb reads).  The existing GLSL IR lowering for blend_equation_advanced
does the rest.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Rob Clark 650246523b freedreno/ir3: fb read support
Lower load_output to txf_ms_fb and add support for the new texture fetch
instruction.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Rob Clark 0704ddb2e5 freedreno/drm: expose GMEM_BASE address
Needed for sampling from tile buffer (GMEM).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Rob Clark a99c360a46 nir: add pass to lower fb reads
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Rob Clark a2c89a85f4 nir: fix lower_wpos_ytransform in load_frag_coord case
Apparently we never hit this path.  Or at least haven't for a rather
long time.  But in either case (load_deref or load_frag_coord), we can
just directly use the intrinsic's ssa dest.  So stop passing the
nir_variable (which would be NULL in the load_frag_coord case) around
and instead just use &intr->dest.ssa.

(This ofc means we need to setup the cursor to insert *after* the
instruction, which seems to be another bug of the original
implementation.)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Rob Clark 691d5a825a nir: rework tex instruction printing
The extra comma at the end was annoying me.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-05-02 11:19:22 -07:00
Rob Clark ca3eb5db66 freedreno/ir3: add some ubo range related asserts
And a comment..  since we are mixing units of bytes/dwords/vec4,
hopefully this will avoid some unit confusion.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-02 11:19:22 -07:00
Rob Clark e941faf3e8 freedreno/ir3: add IR3_SHADER_DEBUG flag to disable ubo lowering
It isn't quite as simple as not running the pass, since with packed
varyings we get load_ubo for block==0 (ie. the "real" uniforms).  So
instead run the pass normally but decline to lower anything in
block > 0

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-02 11:19:22 -07:00
Rob Clark f697f61590 freedreno/ir3: fix lowered ubo region alignment
Since we emit UBO regions INDIRECTly (ie. not copied into cmdstream but
emit by EXT_SRC_ADDR) we need to keep them 4*vec4 aligned.  Which the
code already mostly did, except for aligning the first UBO region itself
(ie. the one after block==0 which is the "real" uniforms).

Fixes: 893425a607 freedreno/ir3: Push UBOs to constant file
Fixes: 3c8779af32 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-02 11:19:22 -07:00
Rob Clark 32925f4072 freedreno/ir3: fix shader variants vs UBO analysis
Otherwise we zero out the state again, but all the UBO loads that we
could lower are already lowered.  End result is that we didn't emit the
uniforms for lowered UBO access in any case where multiple shader
variants are used.

Fixes: 893425a607 freedreno/ir3: Push UBOs to constant file
Fixes: 3c8779af32 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-05-02 11:19:22 -07:00
Lionel Landwerlin ff4168c418 vulkan/overlay: add TODO list
Keen on having other people contribute.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-02 17:02:57 +01:00
Lionel Landwerlin 99cb2d325f vulkan/overlay: make overriden functions static
And fix the unused CmdDrawIndirect.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-02 17:02:57 +01:00
Lionel Landwerlin f2afd6bd76 vulkan/overlay: make overlay size configurable
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-02 17:02:55 +01:00
Lionel Landwerlin 7d908038ad vulkan/overlay: add a frame counter option
This is useful to normalize the numbers written into the output file
as those number are accumulated over a period of time and number of
frames.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-02 17:02:35 +01:00
Lionel Landwerlin 81fd6ba7cc vulkan/overlay: record all select metrics into output file
The output looks something like this (csv style) :

fps, frame, frame_timing(us), submit, draw_indexed, pipeline_graphics, acquire_timing(us), vert_invocations, frag_invocations, gpu_timing(ns)
480.55, 242, 501512, 247, 1444, 1204, 714, 5827272, 113043296, 121424174
467.80, 234, 500214, 234, 1412, 1176, 648, 5635680, 109436188, 117743760
424.37, 213, 501923, 213, 2130, 1704, 623, 5132448, 99657292, 105474683
472.15, 237, 501962, 237, 2370, 1896, 667, 5710752, 110924644, 122226004
411.32, 206, 500826, 206, 2060, 1648, 709, 4963776, 96491764, 95333273
458.87, 230, 501228, 230, 2300, 1840, 634, 5542080, 107758204, 123112090
475.01, 238, 501044, 238, 2380, 1904, 631, 5734848, 111477480, 122087426
471.08, 236, 500972, 236, 2360, 1888, 655, 5686656, 110498496, 114816162

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-02 17:02:34 +01:00
Lionel Landwerlin 74a9fdd8a2 vulkan/overlay: add a margin to the size of the window
Looks a bit better.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-02 17:02:07 +01:00
Lionel Landwerlin 7ba50d8040 vulkan/overlay: add no display option
In case you're just interested in data being record to the output
file.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-02 17:02:07 +01:00
Lionel Landwerlin ea7a6fa980 vulkan/overlay: add pipeline statistic & timestamps support
v2: switch to VkBase{In,Out}Structure

v3: Add timestamps at begin/end of primary command buffers to estimate
    gpu time spent per submission (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v2)
2019-05-02 17:02:06 +01:00
Lionel Landwerlin 4438188f49 vulkan/overlay: record stats in command buffers and accumulate on exec/submit
This significantly reworks how numbers displayed are computed. We
accumulate operations written into command buffers and add those to
the device when submitted to a queue. These collected values are then
used to compute per frame overlay data.

We also accumulate the data over the sampling fps period to produce
numbers for that period of time.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-05-02 17:02:06 +01:00
Lionel Landwerlin 9eddceef44 vulkan/overlay: update help printout
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-02 17:02:06 +01:00
Lionel Landwerlin a1e6b5e9be vulkan/util: generate a helper function to return pNext struct sizes
This will be used to copy chains of structures so that we can alterate
some of them.

v2: Drop vk_util.h include (Eric)
    Use VkBaseInStructure directly (Eric)

v3: Drop --platforms= param to generator script, instead produce a
    file with #ifdef based what platforms are compiled.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-02 17:02:02 +01:00
Tomeu Vizoso ad7c9ba0ec panfrost/midgard: Skip liveness analysis for instructions without dest
[Alyssa: Add comment explanation]

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-02 15:29:48 +00:00
Tomeu Vizoso a5dddc2d42 panfrost/midgard: Skip register allocation if there's no work to do
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-05-02 15:29:41 +00:00
Eric Engestrom 7c15a87aea gitlab-ci: add scons windows build using mingw
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-02 15:10:59 +00:00
Eric Engestrom a34ee4dec7 egl: hard-code destroy function instead of passing it around as a pointer
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-05-02 14:44:16 +00:00
Connor Abbott 6ec4ed48fc nir/search: Add debugging code to dump the pattern matched
This was useful while debugging the previous commit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-02 16:14:06 +02:00
Connor Abbott 7ce86e6938 nir/search: Add automaton-based pre-searching
nir_opt_algebraic is currently one of the most expensive NIR passes,
because of the many different patterns we've added over the years. Even
though patterns are already sorted by opcode, there are still way too
many patterns for common opcodes like bcsel and fadd, which means that
many patterns are tried but only a few actually match. One way to fix
this is to add a pre-pass over the code that scans it using an automaton
constructed beforehand, similar to the automatons produced by lex and
yacc for parsing source code. This automaton has to walk the SSA graph
and recognize possible pattern matches.

It turns out that the theory to do this is quite mature already, having
been developed for instruction selection as well as other non-compiler
things. I followed the presentation in the dissertation cited in the
code, "Tree algorithms: Two Taxonomies and a Toolkit," trying to keep
the naming similar. To create the automaton, we have to perform
something like the classical NFA to DFA subset construction used by lex,
but it turns out that actually computing the transition table for all
possible states would be way too expensive, with the dissertation
reporting times of almost half an hour for an example of size similar to
nir_opt_algebraic. Instead, we adopt one of the "filter" approaches
explained in the dissertation, which trade much faster table generation
and table size for a few more table lookups per instruction at runtime.
I chose the filter which resulted the fastest table generation time,
with medium table size. Right now, the table generation takes around .5
seconds, despite being implemented in pure Python, which I think is good
enough. Based on the numbers in the dissertation, the other choice might
make table compilation time 25x slower to get 4x smaller table size, but
I don't think that's worth it. As of now, we get the following binary
size before and after this patch:

    text   data	    bss	     dec	   hex	filename
11979455 464720	 730864	13175039	c908ff	before i965_dri.so
   text	   data	    bss	    dec	           hex	filename
12037835 616244	 791792	13445871	cd2aef	after i965_dri.so

There are a number of places where I've simplified the automaton by
getting rid of details in the LHS patterns rather than complicate things
to deal with them. For example, right now the automaton doesn't
distinguish between constants with different values. This means that it
isn't as precise as it could be, but the decrease in compile time is
still worth it -- these are the compilation time numbers for a shader-db
run with my (admittedly old) database on Intel skylake:

Difference at 95.0% confidence
	-42.3485 +/- 1.375
	-7.20383% +/- 0.229926%
	(Student's t, pooled s = 1.69843)

We can always experiment with making it more precise later.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-02 16:14:06 +02:00
Samuel Pitoiset 08be23bfde radv: set WD_SWITCH_ON_EOP=1 when drawing primitives from a stream output buffer
According to RadeonSI, this seems to be required by the hardware
to avoid GPU hangs. I think I just forgot to set that bit when I
implemented VK_EXT_transform_feedback.

This fixes a GPU hang with Space Engineers and DXVK.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110291
Fixes: b4eb029062 ("radv: implement VK_EXT_transform_feedback")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-02 15:55:46 +02:00
Brian Paul 48107b5a2b glsl: fix typo in #warning message
Trivial.  Spotted by Eric Engestrom.
2019-05-02 06:32:57 -06:00
Brian Paul f0f7c3b03a svga: add SVGA_NO_LOGGING env var (v2)
valgrind crashes when we try to initialize host logging.  This
env var can be used to disable logging.

v2: rebase onto "svga: move host logging to winsys".

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2019-05-02 06:09:35 -06:00
Charmaine Lee 9c5f407b0b svga: move host logging to winsys
This patch adds a host_log interface to svga_winsys and
moves the host logging code to the winsys layer.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
2019-05-02 06:09:35 -06:00
Eric Engestrom da8d9e2d88 wsi/wayland: document lack of vkAcquireNextImageKHR timeout support
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-02 11:51:03 +00:00
Daniel Stone 9826e04eca vulkan/wsi/wayland: Respect non-blocking AcquireNextImage
If the client has requested that AcquireNextImage not block at all, with
a timeout of 0, then don't make any non-blocking calls.

This will still potentially block infinitely given a non-infinte
timeout, but the fix for that is much more involved.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
Cc: Chad Versace <chadversary@chromium.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108540
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-02 11:51:03 +00:00
Erik Faye-Lund 8a67e4d30a docs: reorder heading and notice
All other pages has the heading as ghe first thing in the article. Let's
clean this up for consistency.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-02 11:09:16 +00:00
Erik Faye-Lund 561c2b9bfa docs: drop centered heading for faq
The FAQ is the only article we have that uses a centered heading, which
makes it look odd compared to the other articles. Let's drop the
centering for consistency.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-02 11:09:16 +00:00
Erik Faye-Lund da4994f252 docs: turn faq-index into an ordered list
HTML already have a way of doing automatically ordered lists, so let's
use that instead of open-coding one.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-02 11:09:16 +00:00
Erik Faye-Lund afda72dc10 docs: replace empty list with a none-paragraph
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-02 11:09:16 +00:00
Erik Faye-Lund a4ee15d5fe docs: fix closing of list-items
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-02 11:09:16 +00:00
Erik Faye-Lund b9eaeffaba docs: fixup list-item tags
The list items needs to contain everything part of the item, not just
the first paragraph.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-02 11:09:16 +00:00
Erik Faye-Lund 830821aaa4 docs: fix closing of paragraphs
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-02 11:09:16 +00:00
Erik Faye-Lund 02a5698017 docs: add missing lists
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-02 11:09:16 +00:00
Erik Faye-Lund 767c517816 docs: fixup bad paragraphing
This markup seems to assume paragraphs survive across block-elements,
which isn't the case. Let's rectify that.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-02 11:09:16 +00:00
Erik Faye-Lund b877722d75 docs: remove stray list-start
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-05-02 11:09:16 +00:00