KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Jason Ekstrand	bd17bdc56b	glsl/lower_vector_derefs: Don't use a temporary for TCS outputs Tessellation control shader outputs act as if they have memory backing them and you can have multiple writes to different components of the same vector in-flight at the same time. When this happens, the load vec store pattern that gets used by ir_triop_vector_insert doesn't yield the correct results. Instead, just emit a sequence of conditional assignments. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Cc: mesa-stable@lists.freedesktop.org	2019-03-13 02:10:31 +00:00
Jason Ekstrand	20c4578c55	glsl/list: Add a list variant of insert_after Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-13 02:10:31 +00:00
Jason Ekstrand	83fdefc062	nir/loop_unroll: Fix out-of-bounds access handling The previous code was completely broken when it came to constructing the undef values. I'm not sure how it ever worked. For the case of a copy that reads an undefined value, we can just delete the copy because the destination is a valid undefined value. This saves us the effort of trying to construct a value for an arbitrary copy_deref intrinsic. Fixes: `e8a8937a04` "nir: add partial loop unrolling support" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-12 21:06:39 -05:00
Jason Ekstrand	c056609c43	anv: Ignore VkRenderPassInputAttachementAspectCreateInfo We don't care about the information but there's no sense in throwing a debug warning about it. It's harmless but annoying to users. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109984 Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-03-12 21:06:39 -05:00
Eric Anholt	486b181fd7	v3d: Fix leak of the renderonly struct on screen destruction. This makes v3d match vc4's destroy path. Fixes: `e113b21cb7` ("v3d: Add renderonly support.")	2019-03-12 16:15:40 -07:00
Eric Anholt	0c874c18cd	v3d: Fix leak of the mem_ctx after the DAG refactor. Noticed while trying to get a CTS run again. Fixes: `33886474d6` ("v3d: Use the DAG datastructure for QPU instruction scheduling.")	2019-03-12 16:15:40 -07:00
Grigori Goronzy	acfd88204e	glx: add support for GLX_ARB_create_context_no_error (v3) v2: Only reject no-error contexts for too-old GL if we're actually trying to create a no-error context (Adam Jackson) v3: Fix share contexts (Adam Jackson) Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-12 19:12:21 -04:00
Samuel Pitoiset	ae77f12368	radv: set the maximum number of IBs per submit to 192 This fixes random SteamVR corruption, see https://github.com/ValveSoftware/SteamVR-for-Linux/issues/181 Fixes: `4d30f2c6f4` ("radv/winsys: remove the max IBs per submit limit for the fallback path") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-12 22:15:45 +01:00
Danylo Piliaiev	9c80be956f	anv: Fix destroying descriptor sets when pool gets reset pool->next and pool->free_list were reset before their usage in anv_descriptor_pool_free_set Fixes: `775aabdd` "anv: destroy descriptor sets when pool gets reset" Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-12 17:09:37 +00:00
Eric Anholt	ccce940947	v3d: Disable PIPE_CAP_BLIT_BASED_TEXTURE_TRANSFER. This reduces the runtime of dEQP-GLES3.functional.shaders.precision.* from 11.5s to 3.3s. This brings CTS runs down to 4 hours on one of my target devices.	2019-03-12 09:04:25 -07:00
Jason Ekstrand	6d5d89d25a	intel/nir: Vectorize all IO The IO scalarization pass that we run to help with linking end up turning some shader I/O such as that for tessellation and geometry shaders into many scalar URB operations rather than one vector one. To alleviate this, we now vectorize the I/O once again. This fixes a 10% performance regression in the GfxBench tessellation test that was caused by scalarizing. Shader-db results on Kaby Lake: total instructions in shared programs: 15224023 -> 15220871 (-0.02%) instructions in affected programs: 342009 -> 338857 (-0.92%) helped: 1236 HURT: 443 total spills in shared programs: 23471 -> 23465 (-0.03%) spills in affected programs: 6 -> 0 helped: 1 HURT: 0 total fills in shared programs: 31770 -> 31766 (-0.01%) fills in affected programs: 4 -> 0 helped: 1 HURT: 0 Cycles was just a lot of churn do to moves being different places. Most of the pure churn in instructions was +/- one or two instructions in fragment shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510 Fixes: `4434591bf5` "intel/nir: Call nir_lower_io_to_scalar_early" Fixes: `8d8222461f` "intel/nir: Enable nir_opt_find_array_copies" Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-03-12 15:34:06 +00:00
Jason Ekstrand	5ef2b8f1f2	nir: Add a pass for lowering IO back to vector when possible This pass tries to turn scalar and array-of-scalar IO variables into vector IO variables whenever possible. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Cc: "19.0" <mesa-stable@lists.freedesktop.org>	2019-03-12 15:34:06 +00:00
Rhys Perry	0f025bbccc	ac/nir: fix 16-bit ssbo stores Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-03-12 15:51:52 +01:00
pal1000	7f89fd17ed	scons: Compatibility with Scons development version string This ensures Mesa3D build doesn't fail in this case as encountered when bisecting Scons source code while regression testing https://bugs.freedesktop.org/show_bug.cgi?id=109443 and when testing 3.0.5.a.2 Technical details: Scons version string has consistently been in this format: MajorVersion.MinorVersion.Patch[.alpha/beta.yyyymmdd] so these formulas should strip alpha/beta flags and return Scons version: - as string - `'.'.join(SCons.__version__.split('.')[:3])` - as tuple of integers - `tuple(map(int, SCons.__version__.split('.')[:3]))` - v2: Fixed Scons version retrieval formulas as string and tuple of integers. - v3: Fixed Scons version string format description. Cc: "19.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2019-03-12 14:22:34 +00:00
Tapani Pälli	bef354321b	anv: revert "anv: release memory allocated by glsl types during spirv_to_nir" This reverts commit `47fc359822`. Reason is that patch did not take in to account situation where we might have both OpenGL and Vulkan using glsl_types at the same time. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-12 14:12:36 +02:00
Connor Abbott	1bbe58c214	radeonsi/nir: Use nir stripping pass This reduces compilation time for my shader-db collection from around 40 seconds to 30, vs. 19 seconds for TGSI. There are still some shaders that TGSI caches but NIR doesn't, partly because of more aggressive cross-stage optimizations with NIR. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-12 10:49:48 +01:00
Connor Abbott	5b2ec9c81e	nir: Add a stripping pass for improved cacheability Oftentimes various nir shaders after lowering will be the same, or almost the same. For example, this can happen when the same shader is linked with different shaders to form different pipelines and cross-stage optimizations don't kick in to change it. We want to avoid running the backend twice on these shaders. We were already doing this with radeonsi, but we were storing a few extra pieces of information that made this much less effective compared to TGSI. The worse offender by far was the program name, which caused most of the cache misses. This pass strips out these pieces of information, controlled by the NIR_STRIP debug env variable. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-12 10:49:48 +01:00
Samuel Pitoiset	6403171843	radv: fix pointSizeRange limits The values should match the ones that are emitted. This fixes new CTS dEQP-VK.rasterization.primitive_size.points.*. Fixes: `f4e499ec79` ("radv: add initial non-conformant radv vulkan driver") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-12 09:00:32 +01:00
Sagar Ghuge	bbef6c2d5f	iris: Flag fewer dirty bits in BLORP v2: 1) Skip flagging IRIS_DIRTY_DEPTH_BUFFER if BLORP_BATCH_NO_EMIT_DEPTH_STENCIL is set (Kenneth Graunke) 2) Add missing flags (Kenneth Graunke) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-11 22:46:39 -07:00
Timothy Arceri	cb2898f478	st/glsl_to_nir: fix incorrect arrary access This fixes a segfault when we try to access the array using a -1 when the array wasn't allocated in the first place. Before `7536af670b` we would just access a pre-allocated array that was also load/stored to/from the shader cache. But now the cache will no longer allocate these arrays if they are empty. The change resulted in tests such as the following segfaulting when run with a warm shader cache. tests/spec/arb_arrays_of_arrays/execution/sampler/fs-struct-const-index.shader_test	2019-03-12 14:47:21 +11:00
Brian Paul	02c2863df5	nir: silence a couple new compiler warnings [33/630] Compiling C object 'src/compiler/nir/nir@sta/nir_loop_analyze.c.o'. ../src/compiler/nir/nir_loop_analyze.c: In function ‘try_find_trip_count_vars_in_iand’: ../src/compiler/nir/nir_loop_analyze.c:846:29: warning: suggest parentheses around ‘&&’ within ‘\|\|’ [-Wparentheses] if (ind == NULL \|\| ind && (ind)->type != basic_induction \|\| ^ [85/630] Compiling C object 'src/compiler/nir/nir@sta/nir_opt_loop_unroll.c.o'. ../src/compiler/nir/nir_opt_loop_unroll.c: In function ‘complex_unroll_single_terminator’: ../src/compiler/nir/nir_opt_loop_unroll.c:494:17: warning: unused variable ‘unroll_loc’ [-Wunused-variable] nir_cf_node unroll_loc = ^ Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-12 14:34:51 +11:00
Alyssa Rosenzweig	587ad37e72	panfrost: Identify fragment_extra flags The fragment_extra structure contains additional fields extending the MRT framebuffer descriptor, snuck in between the main framebuffer descriptor and the render targets. Its fields include those related to transaction elimination and depth/stencil buffers. This patch identifies the flags field (previously just "unk" with some magic values) as well as identifying some (but not all) flags set by the driver. The process of identifying flags brought a bug to light where transaction elimination (checksumming) could not be enabled unless AFBC was in-use. This issue is now resolved. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-03-12 02:37:42 +00:00
Alyssa Rosenzweig	e57ea53acf	panfrost: Document "depth-buffer writeback" bit This bit, if set, causes the depth buffer to be copied from GPU tile memory to the provided depth buffer in main memory. If not set, the GPU will not access the main memory (saving considerable memory bandwidth if depth results are not actually used). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-12 02:37:42 +00:00
Alyssa Rosenzweig	2df4537f91	panfrost: Support linear depth textures This combination has not yet been seen "in the wild" in traces, but to support linear depth FBOs, ~bruteforce reveals this bit pattern is necessary. It's not yet clear why the meanings of 0x1 and 0x2 are essentially flipped (tiled vs linear for colour, linear vs some sort of tiled for depth). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-03-12 02:37:41 +00:00
Alyssa Rosenzweig	9f25a4e65c	panfrost: Allocate dedicated slab for linear BOs Previously, linear BOs shared memory with each other to minimize kernel round-trips / latency, as well as to work around a bug in the free_slab function. These concerns are invalid now, but continuing to use the slab allocator for BOs resulted in memory allocation errors. This issue was aggravated, though not introduced (so not a real regression) in the previous commit. v2 (unreviewed): Fix bug in v1 preventing munmaps from working Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-03-12 02:37:41 +00:00
Alyssa Rosenzweig	f9dc1ebc0d	panfrost: Determine framebuffer format bits late Again, these formats are only properly known at the time of fragment job emit. Rather than hardcoding the format, at least for MFBD we begin to construct the format bits on-demand. This cleans up the code, futureproofs for ES3 framebuffer formats, and should fix bugs regarding FBO colour swizzles. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>	2019-03-12 02:37:41 +00:00
Alyssa Rosenzweig	7ba18cdfa9	panfrost: Delay color buffer setup In an effort to cleanup framebuffer management code, we delay colour buffer setup until the FRAGMENT job is actually emitted, allowing the AFBC and linear codepaths to be unified. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>	2019-03-12 02:37:41 +00:00
Alyssa Rosenzweig	536bcaa68f	panfrost: Combine has_afbc/tiled in layout enum AFBC, tiled, and linear BO layouts are mutually exclusive; they should be coupled via a single enum rather than ad hoc checks of booleans. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>	2019-03-12 02:37:41 +00:00
Alyssa Rosenzweig	d93c5c3148	panfrost: Cleanup needless if in create_bo I'm not sure why we were checking for these additional criteria (likely inherited from some other driver); remove the needless checks to cleanup the code and perhaps fix some bugs down the line. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>	2019-03-12 02:37:41 +00:00
Kenneth Graunke	1467deb543	i965: Reimplement all the PIPE_CONTROL rules. This implements virtually all documented PIPE_CONTROL restrictions in a centralized helper. You now simply ask for the operations you want, and the pipe control "brain" will figure out exactly what pipe controls to emit to make that happen without tanking your system. The hope is that this will fix some intermittent flushing issues as well as GPU hangs. However, it also has a high risk of causing GPU hangs and other regressions, as this is a particularly sensitive area and poking the bear isn't always advisable. Mark Janes noted that this patch helps with some GPU hangs on Icelake. This does re-enable the VF Invalidate => Write Immediate workaround on Gen8, which had been disabled (bug 103787) due to GPU hangs. The old code did this workaround after another which would have added CS stall bits, so it missed a workaround. The new code orders them properly and appears to work. v4: Don't pass "bo, offset, imm" to a recursive CS stall (caught by Topi Pohjolainen), drop Gen10 workarounds that are unnecessary for production hardware. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-03-11 19:32:40 -07:00
Kenneth Graunke	c6af96d1bc	i965: Use genxml for emitting PIPE_CONTROL. While this does add a bunch of boilerplate, it also protects us against the hardware moving bits, or changing their meaning. For something as finnicky as PIPE_CONTROL, the extra safety seems worth it. We turn PIPE_CONTROL_* into an bitfield of arbitrary flags, and then pack them appropriately. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-03-11 19:32:40 -07:00
Kenneth Graunke	2c6f712408	i965: Rename ISP_DIS to INDIRECT_STATE_POINTERS_DISABLE. Clearer name. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-03-11 19:32:40 -07:00
Kenneth Graunke	aa139f0980	i965: Move some genX infrastructure to genX_boilerplate.h. This will let us make multiple genX_*.c files, without copy and pasting all this boilerplate. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-03-11 19:32:40 -07:00
Brian Paul	ecb708fada	gallium/winsys/kms: fix incomplete type compilation failure Fixes: ../src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c: In function ‘kms_sw_displaytarget_from_handle’: ../src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c:402:60: error: dereferencing pointer to incomplete type ‘const struct pipe_resource’ templ->format, ^ Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-11 20:08:16 -06:00
Brian Paul	04544d852c	drisw: fix incomplete type compilation failure Fixes: ../src/gallium/winsys/sw/dri/dri_sw_winsys.c: In function ‘dri_sw_displaytarget_display’: ../src/gallium/winsys/sw/dri/dri_sw_winsys.c:255:39: error: dereferencing pointer to incomplete type ‘struct pipe_box’ offset = dri_sw_dt->stride * box->y; ^ Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-11 20:08:16 -06:00
Brian Paul	45c6da5a48	docs: try to improve the Meson documentation (v2) Add new Introduction and Advanced Usage sections. Spell out a few more details, like "ninja install". Improve the layout around example commands. Fix grammatical errors and tighten up the text. Explain the --prefix option. v2: Remove language about 'ninja clean' and move link to Meson information about separate build directories earlier in the page. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-11 20:08:16 -06:00
Brian Paul	187a527ed7	st/mesa: minor refactoring of texture/sampler delete code Rename st_texture_free_sampler_views() to st_delete_texture_sampler_views() to align with st_DeleteTextureObject(), its only caller. Move the call to st_texture_release_all_sampler_views() from st_DeleteTextureObject() to st_delete_texture_sampler_views() so all the sampler view clean-up code is in one place. Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-03-11 20:08:16 -06:00
Brian Paul	70a2ede112	st/mesa: rename st_texture_release_sampler_view() To st_texture_release_context_sampler_view() to be more clear that it's context-specific. Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-03-11 20:08:16 -06:00
Brian Paul	41adb3d6df	st/mesa: add/improve sampler view comments Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-03-11 20:08:16 -06:00
Brian Paul	c7d2504625	st/mesa: move around some code in st_context.c st_init_driver_functions() is only called in st_context.c so there's no need for the prototype in st_context.h To avoid a forward declaration of st_init_driver_functions() in st_context.c, we need to move around several other functions. No functional change. Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-03-11 20:08:16 -06:00
Brian Paul	b29d827f09	st/mesa: move utility functions, macros into new st_util.h file To de-clutter st_context.h. Clean up remaining function prototypes in st_context.h. The st_vp_uses_current_values() helper is only used in st_context.c so move it there. The st_get_active_states() function is only used in st_context.c so remove its prototype in st_context.h Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-03-11 20:08:16 -06:00
Juan A. Suarez Romero	775aabdd01	anv: destroy descriptor sets when pool gets reset As stated in Vulkan spec: "Resetting a descriptor pool recycles all of the resources from all of the descriptor sets allocated from the descriptor pool back to the descriptor pool, and the descriptor sets are implicitly freed." This fixes dEQP-VK.api.descriptor_pool.* Fixes: `14f6275c92` "anv/descriptor_set: add reference counting for..." Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Clayton Craft <clayton.a.craft@intel.com>	2019-03-11 20:40:31 -05:00
Timothy Arceri	3235a942c1	nir: find induction/limit vars in iand instructions This will be used to help find the trip count of loops that look like the following: while (a < x && i < 8) { ... i++; } Where the NIR will end up looking something like this: vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */) loop { ... vec1 1 ssa_12 = ilt ssa_225, ssa_11 vec1 1 ssa_17 = ilt ssa_226, ssa_1 vec1 1 ssa_18 = iand ssa_12, ssa_17 vec1 1 ssa_19 = inot ssa_18 if ssa_19 { ... break } else { ... } } On RADV this unrolls a bunch of loops in F1-2017 shaders. Totals from affected shaders: SGPRS: 4112 -> 4136 (0.58 %) VGPRS: 4132 -> 4052 (-1.94 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 515444 -> 587720 (14.02 %) bytes LDS: 2 -> 2 (0.00 %) blocks Max Waves: 194 -> 196 (1.03 %) Wait states: 0 -> 0 (0.00 %) It also unrolls a couple of loops in shader-db on radeonsi. Totals from affected shaders: SGPRS: 128 -> 128 (0.00 %) VGPRS: 64 -> 64 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 6880 -> 9504 (38.14 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 16 -> 16 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	67c3478482	nir: pass nir_op to calculate_iterations() Rather than getting this from the alu instruction this allows us some flexibility. In the following pass we instead pass the inverse op. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	11e8f8a166	nir: add get_induction_and_limit_vars() helper to loop analysis This helps make find_trip_count() a little easier to follow but will also be used by a following patch. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	f219f6114d	nir: add helper to return inversion op of a comparison This will be used to help find the trip count of loops that look like the following: while (a < x && i < 8) { ... i++; } Where the NIR will end up looking something like this: vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */) loop { ... vec1 1 ssa_12 = ilt ssa_225, ssa_11 vec1 1 ssa_17 = ilt ssa_226, ssa_1 vec1 1 ssa_18 = iand ssa_12, ssa_17 vec1 1 ssa_19 = inot ssa_18 if ssa_19 { ... break } else { ... } } So in order to find the trip count we need to find the inverse of ilt. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	090feaacdc	nir: simplify the loop analysis trip count code a little Here we create a helper is_supported_terminator_condition() and use that rather than embedding all the trip count code inside a switch. The new helper will also be used in a following patch. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	7571de8eaa	nir: unroll some loops with a variable limit For some loops can have a single terminator but the exact trip count is still unknown. For example: for (int i = 0; i < imin(x, 4); i++) ... Shader-db results radeonsi (all affected are from Tropico 5): Totals from affected shaders: SGPRS: 144 -> 152 (5.56 %) VGPRS: 124 -> 108 (-12.90 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 5180 -> 6640 (28.19 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 17 -> 21 (23.53 %) Wait states: 0 -> 0 (0.00 %) Shader-db results i965 (SKL): total loops in shared programs: 3808 -> 3802 (-0.16%) loops in affected programs: 6 -> 0 helped: 6 HURT: 0 vkpipeline-db results RADV (Unrolls some Skyrim VR shaders): Totals from affected shaders: SGPRS: 304 -> 304 (0.00 %) VGPRS: 296 -> 292 (-1.35 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 15756 -> 25884 (64.28 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 29 -> 29 (0.00 %) Wait states: 0 -> 0 (0.00 %) v2: fix bug where last iteration would get optimised away by mistake. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	68ce0ec222	nir: calculate trip count for more loops This adds support to loop analysis for loops where the induction variable is compared to the result of min(variable, constant). For example: for (int i = 0; i < imin(x, 4); i++) ... We add a new bool to the loop terminator struct in order to differentiate terminators with this exit condition. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	e8a8937a04	nir: add partial loop unrolling support This adds partial loop unrolling support and makes use of a guessed trip count based on array access. The code is written so that we could use partial unrolling more generally, but for now it's only use when we have guessed the trip count. We use partial unrolling for this guessed trip count because its possible any out of bounds array access doesn't otherwise affect the shader e.g the stores/loads to/from the array are unused. So we insert a copy of the loop in the innermost continue branch of the unrolled loop. Later on its possible for nir_opt_dead_cf() to then remove the loop in some cases. A Renderdoc capture from the Rise of the Tomb Raider benchmark, reports the following change in an affected compute shader: GPU duration: 350 -> 325 microseconds shader-db results radeonsi VEGA (NIR backend): SGPRS: 1008 -> 816 (-19.05 %) VGPRS: 684 -> 432 (-36.84 %) Spilled SGPRs: 539 -> 0 (-100.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 39708 -> 45812 (15.37 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 105 -> 144 (37.14 %) Wait states: 0 -> 0 (0.00 %) shader-db results i965 SKL: total instructions in shared programs: 13098265 -> 13103359 (0.04%) instructions in affected programs: 5126 -> 10220 (99.38%) helped: 0 HURT: 21 total cycles in shared programs: 332039949 -> 331985622 (-0.02%) cycles in affected programs: 289252 -> 234925 (-18.78%) helped: 12 HURT: 9 vkpipeline-db results VEGA: Totals from affected shaders: SGPRS: 184 -> 184 (0.00 %) VGPRS: 448 -> 448 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 26076 -> 24428 (-6.32 %) bytes LDS: 6 -> 6 (0.00 %) blocks Max Waves: 5 -> 5 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00

... 2 3 4 5 6 ...

109320 Commits All Branches Search

109320 Commits

All Branches