Commit Graph

100646 Commits

Author SHA1 Message Date
Timothy Arceri 68ce0ec222 nir: calculate trip count for more loops
This adds support to loop analysis for loops where the induction
variable is compared to the result of min(variable, constant).

For example:

   for (int i = 0; i < imin(x, 4); i++)
      ...

We add a new bool to the loop terminator struct in order to
differentiate terminators with this exit condition.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri e8a8937a04 nir: add partial loop unrolling support
This adds partial loop unrolling support and makes use of a
guessed trip count based on array access.

The code is written so that we could use partial unrolling
more generally, but for now it's only use when we have guessed
the trip count.

We use partial unrolling for this guessed trip count because its
possible any out of bounds array access doesn't otherwise affect
the shader e.g the stores/loads to/from the array are unused. So
we insert a copy of the loop in the innermost continue branch of
the unrolled loop. Later on its possible for nir_opt_dead_cf()
to then remove the loop in some cases.

A Renderdoc capture from the Rise of the Tomb Raider benchmark,
reports the following change in an affected compute shader:

GPU duration: 350 -> 325 microseconds

shader-db results radeonsi VEGA (NIR backend):

SGPRS: 1008 -> 816 (-19.05 %)
VGPRS: 684 -> 432 (-36.84 %)
Spilled SGPRs: 539 -> 0 (-100.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 39708 -> 45812 (15.37 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 105 -> 144 (37.14 %)
Wait states: 0 -> 0 (0.00 %)

shader-db results i965 SKL:

total instructions in shared programs: 13098265 -> 13103359 (0.04%)
instructions in affected programs: 5126 -> 10220 (99.38%)
helped: 0
HURT: 21

total cycles in shared programs: 332039949 -> 331985622 (-0.02%)
cycles in affected programs: 289252 -> 234925 (-18.78%)
helped: 12
HURT: 9

vkpipeline-db results VEGA:

Totals from affected shaders:
SGPRS: 184 -> 184 (0.00 %)
VGPRS: 448 -> 448 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 26076 -> 24428 (-6.32 %) bytes
LDS: 6 -> 6 (0.00 %) blocks
Max Waves: 5 -> 5 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri fba5d275db nir: add new partially_unrolled bool to nir_loop
In order to stop continuously partially unrolling the same loop
we add the bool partially_unrolled to nir_loop, we add it here
rather than in nir_loop_info because nir_loop_info is only set
via loop analysis and is intended to be cleared before each
analysis. Also nir_loop_info is never cloned.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Timothy Arceri 03a452b7d0 nir: add guess trip count support to loop analysis
This detects an induction variable used as an array index to guess
the trip count of the loop. This enables us to do a partial
unroll of the loop, which can eventually result in the loop being
eliminated.

v2: check if the induction var is used to index more than a single
    array and if so get the size of the smallest array.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-12 00:52:30 +00:00
Tomeu Vizoso 97f2d04d5e panfrost: Add support for PAN_MESA_DEBUG
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-03-12 00:30:27 +00:00
Tomeu Vizoso f0b1bbebdd panfrost/midgard: Add support for MIDGARD_MESA_DEBUG
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-03-12 00:30:27 +00:00
Xavier Bouchoux c5236fc6e2 nir/spirv: Fix assert when unsampled OpTypeImage has unknown 'Depth'
'dxc' hlsl-to-spirv compiler appears to emit 2 (Unknown) in the depth field,
when the image is not sampled and the value is not needed.

Previously, shaders failed with:

SPIR-V parsing FAILED:
    In file ../src/compiler/spirv/spirv_to_nir.c:1412
    !is_shadow
    632 bytes into the SPIR-V binary

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-11 23:28:39 +01:00
Kenneth Graunke d75f84cb65 iris: Fix write enable in pinning of depth/stencil resources
We may bind new Z/S buffers (which come via the framebuffer CSO,
triggering IRIS_DIRTY_DEPTH_BUFFER), but with writes disabled.

The next draw may enable Z or S writes (which come via the ZSA CSO,
triggering IRIS_DIRTY_WM_DEPTH_STENCIL), which requires us to update
our pin to have the write flag.

So, update pinning if either dirty flag changes.  To clarify, pass
cso_zsa to the pinning function rather than pulling the random values
out of ice->state, which unfortunately have to exist for the resolve
code since iris_depth_stencil_alpha_state only exists in iris_state.c.
2019-03-11 15:04:08 -07:00
Kenneth Graunke 863e810a19 iris: Refactor depth/stencil buffer pinning into a helper.
This avoids the code duplication that caused me to put things in the
wrong place in the previous commit.  One used to have extra flushes,
but we moved those out so now these are identical and can be easily
shared.
2019-03-11 15:04:08 -07:00
Kenneth Graunke 9302414f8b iris: Move depth/stencil flushes so they actually do something
Commit d6dd57d43c (iris: Add missing depth cache flushes) added the
depth/stencil flushes to the wrong place.  I meant to add them to the
iris_upload_dirty_render_state code that emits the packets, but I
accidentally added them to the nearly identical looking code in
iris_restore_render_saved_bos.  This meant we missed the actual flushing
at draw time, but instead did pointless flushing on the first draw in a
batch where things are already flushed anyway.

This commit moves them to iris_resolve.c, next to the depth prepares,
similar to what we do for color buffers.  i965 does them elsewhere, but
I'm not sure why - this seems like the most consistent place.
2019-03-11 15:04:08 -07:00
Christian Gmeiner 076a7095bb st/dri: allow direct UYVY import
Push this format to the pipe driver unchanged.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-11 22:19:11 +01:00
Kenneth Graunke 04ff2e3fbb iris: Fix TES gl_PatchVerticesIn handling.
1. If we switch the TCS for one with a different number of output
   vertices, then the TES's gl_PatchVerticesIn value will change.
   We need to re-upload in this case.  For now, re-emit constants
   whenever the TCS/TES are swapped out.

2. If there is no TCS, then we can't grab gl_PatchVerticesIn from
   the TCS info.  Since it's a passthrough, we can just use the
   primitive's patch count (like the TCS gl_PatchVerticesIn does).

Fixes KHR-GL45.tessellation_shader.single.max_patch_vertices and
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-11 14:07:16 -07:00
Kenneth Graunke 2f51cb5e67 iris: Rework default tessellation level uploads
Now that we've added a system value uploading mechanism, we may as well
reuse the same system for default tessellation levels.  This simplifies
the state upload code a bit.

Also fixes:
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_tessLevel

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-11 14:07:12 -07:00
Timur Kristóf fd5075e059 iris: Face should be a system value.
This patch adds PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL which
despite its name is not a TGSI-specific capability, just lets
the state tracker know that it should generate a system value
for FACE.

This is needed if we want to run tgsi_to_nir on iris.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-11 14:02:40 -07:00
Eric Anholt 3a9e2d6085 vc4: Switch the post-RA scheduler over to the DAG datastructure.
Just a small code reduction from shared infrastructure.
2019-03-11 13:14:37 -07:00
Eric Anholt 33886474d6 v3d: Use the DAG datastructure for QPU instruction scheduling.
Just a small code reduction from shared infrastructure.
2019-03-11 13:14:32 -07:00
Eric Anholt d6d83b34ee vc4: Reuse list_for_each_entry_rev(). 2019-03-11 13:14:32 -07:00
Eric Anholt 7c01ddbf7f v3d: Reuse list_for_each_entry_rev(). 2019-03-11 13:14:32 -07:00
Eric Anholt 7a727c1a12 vc4: Switch over to using the DAG datastructure for QIR scheduling.
Just a small code reduction from shared infrastructure.
2019-03-11 13:14:18 -07:00
Eric Anholt 0533d2d95c util: Add a DAG datastructure.
I keep writing this for various schedulers.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-03-11 13:13:52 -07:00
Kristian H. Kristensen 5f0a922c27 freedreno/a6xx: Remove extra parens
There's a warning about this now.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-03-11 11:37:53 -07:00
Kristian H. Kristensen 08c452bef7 freedreno: Use c_vis_args and no_override_init_args
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-03-11 11:37:53 -07:00
Chia-I Wu 24af64baa5 turnip: preliminary support for Wayland WSI 2019-03-11 10:02:13 -07:00
Chia-I Wu ae82b5df88 turnip: preliminary support for tu_GetImageSubresourceLayout 2019-03-11 10:02:13 -07:00
Chad Versace 6cb5fd0d71 turnip: Use Vulkan 1.1 names instead of KHR
That is, drop KHR from all tokens that were promoted to Vulkan 1.1.
The consistency makes ctags more useful (it now jumps directly to the
real definitions in vulkan_core.h instead of the typedefs); and it makes
the code slightly less verbose.
2019-03-11 10:02:13 -07:00
Chia-I Wu 949ce2745d turnip: preliminary support for tu_CmdDraw 2019-03-11 10:02:13 -07:00
Chia-I Wu f9b34622cd turnip: preliminary support for draw state binding
This adds support for tu_CmdBindPipeline, tu_CmdBindVertexBuffers,
etc.
2019-03-11 10:02:13 -07:00
Chia-I Wu 54b7a57c22 turnip: add draw_cs to tu_cmd_buffer
It will hold draw commands.
2019-03-11 10:02:13 -07:00
Chia-I Wu 1cdbab016e turnip: parse VkPipelineVertexInputStateCreateInfo 2019-03-11 10:02:13 -07:00
Chia-I Wu d17096b9b1 turnip: parse VkPipelineShaderStageCreateInfo 2019-03-11 10:02:13 -07:00
Chia-I Wu a7d842c97c turnip: compile VkPipelineShaderStageCreateInfo
Compile all shaders and upload the binaries to a BO.
2019-03-11 10:02:13 -07:00
Chia-I Wu 970a8fec96 turnip: preliminary support for shader modules
Save SPIR-V in tu_shader_module.  Tranlation to NIR happens in
tu_shader_create, and compilation to binary code happens in
tu_shader_compile.  Both will be called during pipeline creation.
2019-03-11 10:02:13 -07:00
Chia-I Wu 9e0d878787 turnip: parse VkPipeline{Multisample,ColorBlend}StateCreateInfo 2019-03-11 10:02:13 -07:00
Chia-I Wu bec0abf294 turnip: parse VkPipelineDepthStencilStateCreateInfo 2019-03-11 10:02:13 -07:00
Chia-I Wu 9496b377ff turnip: parse VkPipelineRasterizationStateCreateInfo 2019-03-11 10:02:13 -07:00
Chia-I Wu b4884761e8 turnip: parse VkPipelineViewportStateCreateInfo 2019-03-11 10:02:13 -07:00
Chia-I Wu 1bea6a91cb turnip: parse VkPipelineInputAssemblyStateCreateInfo 2019-03-11 10:02:13 -07:00
Chia-I Wu c584c2e86c turnip: parse VkPipelineDynamicStateCreateInfo 2019-03-11 10:02:13 -07:00
Chia-I Wu df48cb7b3e turnip: create a less dummy pipeline
Still dummy, but at least it is created from tu_pipeline_builder.
2019-03-11 10:02:13 -07:00
Chia-I Wu 57327626dc turnip: simplify tu_cs sub-streams usage
Let tu_cs_begin_sub_stream imply tu_cs_reserve_space, and
tu_cs_end_sub_stream imply tu_cs_sanity_check.  Callers are no
longer required to call them (but can still do if they choose to).
2019-03-11 10:02:13 -07:00
Chia-I Wu 59419bb691 turnip: fix tu_cs sub-streams
Update cs->start in tu_cs_end_sub_stream.  Otherwise, the entry
would include commands from all prior sub-streams.
2019-03-11 10:02:13 -07:00
Chia-I Wu c0567e84db turnip: tu_cs_emit_array
Array version of tu_cs_emit.  Useful for updating multiple
consecutive array-like registers, or loading a shader binary with
SS6_DIRECT.
2019-03-11 10:02:13 -07:00
Chia-I Wu fffaa9b4b3 turnip: add tu_cs_discard_entries
We will start a draw IB at the beginning of a subpass and consume it
at the end of the subpass.  With tu_cs_discard_entries, we can reuse
the same tu_cs for all subpasses.
2019-03-11 10:02:13 -07:00
Chia-I Wu 10c5013442 turnip: more/better asserts for tu_cs
Asserting (cur < end) in tu_cs_emit catches much less programming
errors comparing to asserting (cur < reserved_end).  We should never
write more commands than what we have reserved.

Assert IB is non-empty and sane in tu_cs_emit_ib.
2019-03-11 10:02:13 -07:00
Chia-I Wu aa7dd6cb7f turnip: use 32-bit offset in tu_cs_entry
We don't support nor expect BOs to be that big in tu_cs.
2019-03-11 10:02:13 -07:00
Chia-I Wu b8a5e10d0d turnip: mark IBs for dumping
Includes IBs in kernel cmdbuf dumps.
2019-03-11 10:02:13 -07:00
Eric Engestrom 4a48dd9fb8 turnip: use the platform defines in vk.xml instead of hard-coding them
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-03-11 10:02:13 -07:00
Bas Nieuwenhuizen 0d12bcbfa7 turnip: Add todo for copies. 2019-03-11 10:02:13 -07:00
Bas Nieuwenhuizen 51115e7201 turnip: Add buffer->image DMA copies.
Passes dEQP-VK.api.copy_and_blit.core.buffer_to_image.*
2019-03-11 10:02:13 -07:00
Bas Nieuwenhuizen 6616563472 turnip: Add image->buffer DMA copies.
Passes dEQP-VK.api.copy_and_blit.core.image_to_buffer.*
2019-03-11 10:02:13 -07:00