KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Erico Nunes	4577eb7b7c	nir/algebraic: add lowering for fsign The mali utgard pp doesn't support a sign instruction. In the ARM offline shader compiler, the sign function is implemented using sub(gt(0.0, a), lt(0.0, a)). This is a generic optimization, so implement it in the nir level when lower_fsign is set, alongside the lowering for isign. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-19 15:42:23 +00:00
Jason Ekstrand	c6463f8ac2	nir: Add a nir_src_as_intrinsic() helper Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-18 17:12:44 +00:00
Jason Ekstrand	85c35885b3	nir: Rework nir_src_as_alu_instr to not take a pointer Other nir_src_as_* functions just take a nir_src. It's not that much more memory copying and the constness preserving really isn't worth the cognitive dissonance. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-18 17:12:44 +00:00
Jason Ekstrand	eee994e769	nir: Drop "struct" from some nir_* declarations Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-18 17:12:44 +00:00
Marek Olšák	d3ce8a7f6b	nir: optimize gl_SampleMaskIn to gl_HelperInvocation for radeonsi when possible Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-16 10:24:19 -04:00
Karol Herbst	14531d676b	nir: make nir_const_value scalar v2: remove & operator in a couple of memsets add some memsets v3: fixup lima Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2019-04-14 22:25:56 +02:00
Alyssa Rosenzweig	2ce4adefa5	nir: Add nir_lower_viewport_transform On Mali hardware (supported by Panfrost and Lima), the fixed-function transformation from world-space to screen-space coordinates is done in the vertex shader prior to writing out the gl_Position varying, rather than in dedicated hardware. This commit adds a shared NIR pass for implementing coordinate transformation and lowering gl_Position writes into screen-space gl_Position writes. v2: Run directly on derefs before io/vars are lowered to cleanup the code substantially. Thank you to Qiang for this suggestion! v3: Bikeshed continues. v4: Add to Makefile.sources (per Jason's comment). Bikeshed comment. Ian and Qiang's reviews are from v3, but no real functional changes from v4. Rob's review is from v4. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Suggested-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-04-14 19:15:13 +00:00
Christian Gmeiner	b6bed115a5	nir: add lower_ftrunc Port TGSI TRUNC lowering to nir Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-13 17:54:48 +00:00
Jason Ekstrand	18ed82b084	nir: Add a pass for selectively lowering variables to scratch space This commit adds new nir_load/store_scratch opcodes which read and write a virtual scratch space. It's up to the back-end to figure out what to do with it and where to put the actual scratch data. v2: Drop const_index comments (by anholt) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-12 15:59:31 -07:00
Karol Herbst	4a3c04a11f	glsl/nir: add support for lowering bindless images_derefs v2: handle atomics as well make use of nir_rewrite_image_intrinsic v3: remove call to nir_remove_dead_derefs v4: (Timothy Arceri) dont actually call lowering yet Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Timothy Arceri	035759b61b	nir/i965/freedreno/vc4: add a bindless bool to type size functions This required to calculate sizes correctly when we have bindless samplers/images. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Karol Herbst	3b2a9ffd60	nir: move brw_nir_rewrite_image_intrinsic into common code Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Jason Ekstrand	6279074de1	nir: Get rid of global registers We have a pass to lower global registers to locals and many drivers dutifully call it. However, no one ever creates a global register ever so it's all dead code. It's time we bury it. Acked-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-09 00:29:36 -05:00
Jason Ekstrand	b28bad89b9	nir: Get rid of nir_register::is_packed All we ever do is initialize it to zero, clone it, print it, and validate it. No one ever sets or uses it. Acked-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-09 00:29:36 -05:00
Timothy Arceri	e30804c602	nir/radv: remove restrictions on opt_if_loop_last_continue() When I implemented opt_if_loop_last_continue() I had restricted this pass from moving other if-statements inside the branch opposite the continue. At the time it was causing a bunch of spilling in shader-db for i965. However Samuel Pitoiset noticed that making this pass more aggressive significantly improved the performance of Doom on RADV. Below are the statistics he gathered. 28717 shaders in 14931 tests Totals: SGPRS: 1267317 -> 1267549 (0.02 %) VGPRS: 896876 -> 895920 (-0.11 %) Spilled SGPRs: 24701 -> 26367 (6.74 %) Code Size: 48379452 -> 48507880 (0.27 %) bytes Max Waves: 241159 -> 241190 (0.01 %) Totals from affected shaders: SGPRS: 23584 -> 23816 (0.98 %) VGPRS: 25908 -> 24952 (-3.69 %) Spilled SGPRs: 503 -> 2169 (331.21 %) Code Size: 2471392 -> 2599820 (5.20 %) bytes Max Waves: 586 -> 617 (5.29 %) The codesize increases is related to Wolfenstein II it seems largely due to an increase in phis rather than the existing jumps. This gives +10% FPS with Doom on my Vega56. Rhys Perry also benchmarked Doom on his VEGA64: Before: 72.53 FPS After: 80.77 FPS v2: disable pass on non-AMD drivers Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1) Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-09 11:29:41 +10:00
Rob Clark	1ae0c030cb	nir: add lower_all_io_to_elements I need this part of lower_all_io_to_temps but without the actual lowering to temps part. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-30 12:56:01 -04:00
Ian Romanick	2cf59861a8	nir: Add partial redundancy elimination for compares This pass attempts to dectect code sequences like if (x < y) { z = y - x; ... } and replace them with sequences like t = x - y; if (t < 0) { z = -t; ... } On architectures where the subtract can generate the flags used by the if-statement, this saves an instruction. It's also possible that moving an instruction out of the if-statement will allow nir_opt_peephole_select to convert the whole thing to a bcsel. Currently only floating point compares and adds are supported. Adding support for integer will be a challenge due to integer overflow. There are a couple possible solutions, but they may not apply to all architectures. v2: Fix a typo in the commit message and a couple typos in comments. Fix possible NULL pointer deref from result of push_block(). Add missing (-A + B) case. Suggested by Caio. v3: Fix is_not_const_zero to work correctly with types other than nir_type_float32. Suggested by Ken. v4: Add some comments explaining how this works. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-28 15:35:53 -07:00
Ian Romanick	c6ee46a753	nir: Add nir_alu_srcs_negative_equal v2: Move bug fix in get_neg_instr from the next patch to this patch (where it was intended to be in the first place). Noticed by Caio. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-28 15:35:52 -07:00
Ian Romanick	be1cc3552b	nir: Add nir_const_value_negative_equal v2: Rebase on 1-bit Boolean changes. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> [v1] Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-28 15:35:52 -07:00
Timothy Arceri	e76ae39ae2	nir: add support for user defined select control This will allow us to make use of the selection control support in spirv and the GL support provided by EXT_control_flow_attributes. Note this only supports if-statements as we dont support switches in NIR. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841	2019-03-27 02:39:12 +00:00
Timothy Arceri	b56451f82c	nir: add support for user defined loop control This will allow us to make use of the loop control support in spirv and the GL support provided by EXT_control_flow_attributes. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841	2019-03-27 02:39:12 +00:00
Jason Ekstrand	40074ebf74	nir: Add texture sources and intrinsics for bindless On Intel, we have both bindless and bindful and we'd like to use them at the same time if we can so we need to be able to distinguish at the NIR level between the two. This also fixes nir_lower_tex to properly handle bindless in its tex_texture_size and get_texture_lod helpers. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-25 16:12:09 -05:00
Jason Ekstrand	3bd5457641	nir: Add a lowering pass for non-uniform resource access Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-25 15:00:36 -05:00
Jason Ekstrand	39da1deb49	nir/lower_io: Add a bounds-checked 64-bit global address format Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-25 14:40:54 -05:00
Iago Toral Quiroga	3766334923	compiler/nir: add lowering for 16-bit flrp And enable it on Intel. v2: - Squash the change to enable it on Intel (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-25 16:08:25 +01:00
Iago Toral Quiroga	ca31df6f1f	compiler/nir: add lowering option for 16-bit fmod And enable it on Intel. v2: - Squash the change to enable this lowering on Intel (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-25 16:08:25 +01:00
Samuel Pitoiset	23d30f4099	spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass This lowering isn't needed for RADV because AMDGCN has two instructions. It will be disabled for RADV in an upcoming series. While we are at it, factorize a little bit. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-22 19:41:46 +01:00
Karol Herbst	d8a0658d8b	nir/lower_tex: Add support for tg4 offsets lowering Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-21 02:58:41 +00:00
Karol Herbst	71c66c254b	nir: add support for gather offsets Values inside the offsets parameter of textureGatherOffsets are required to be constants in the range of [GL_MIN_PROGRAM_TEXTURE_GATHER_OFFSET, GL_MAX_PROGRAM_TEXTURE_GATHER_OFFSET]. As this range is never outside [-32, 31] for all existing drivers inside mesa, we can simply store the offsets as a int8_t[4][2] array inside nir_tex_instr. Right now only Nvidia hardware supports this in hardware, so we can turn this on inside Nouveau for the NIR path as it is already enabled with the TGSI one. v2: use memcpy instead of for loops add missing bits to nir_instr_set don't show offsets if they are all 0 v3: default offsets aren't all 0 v4: rename offsets -> tg4_offsets rename nir_tex_instr_has_explicit_offsets -> nir_tex_instr_has_explicit_tg4_offsets Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-21 02:58:41 +00:00
Jason Ekstrand	0b7e5bdbd4	nir: Constant values are per-column not per-component Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-03-20 09:26:56 -05:00
Jason Ekstrand	cbfe31ccbe	Revert "nir: const `nir_call_instr::callee`" This reverts commit `db57db5317`. When building IR, nothing is really immutable and, since C has no concept of constness propagating beyond the first pointer, we have to be vary careful with how we use it. To just throw const into a function like this is a lie. Instead, we should just drop the unneeded const in spirv_to_nir which this commit does along with the revert.	2019-03-19 10:19:42 -05:00
Eric Engestrom	db57db5317	nir: const `nir_call_instr::callee` Fixes: `c95afe56a8` "nir/spirv: handle kernel function parameters" Cc: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Tapani Pälli <tapani.palli@intel.com> Acked-by: Karol Herbst <kherbst@redhat.com>	2019-03-19 12:51:53 +00:00
Jason Ekstrand	35b8f6f40b	nir: Add a new pass to lower array dereferences on vectors This pass was originally written for lowering TCS output reads and writes but it is also applicable just about anything including UBOs, SSBOs, and shared variables. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 23:10:27 -05:00
Jason Ekstrand	c8d42c8cf6	nir: Rename nir_address_format_vk_index_offset to not be vk It's just a 32-bit index and offset. We're going to want to use it in GL as well so stop talking about Vulkan. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Caio Marcelo de Oliveira Filho	822a8865e4	nir: Add a pass to combine store_derefs to same vector v2: (all from Jason) Reuse existing function for the end of the block combinations. Check the SSA values are coming from the right place in tests. Document the case when the store to array_deref is reused. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-13 08:39:16 -07:00
Jason Ekstrand	5ef2b8f1f2	nir: Add a pass for lowering IO back to vector when possible This pass tries to turn scalar and array-of-scalar IO variables into vector IO variables whenever possible. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Cc: "19.0" <mesa-stable@lists.freedesktop.org>	2019-03-12 15:34:06 +00:00
Connor Abbott	5b2ec9c81e	nir: Add a stripping pass for improved cacheability Oftentimes various nir shaders after lowering will be the same, or almost the same. For example, this can happen when the same shader is linked with different shaders to form different pipelines and cross-stage optimizations don't kick in to change it. We want to avoid running the backend twice on these shaders. We were already doing this with radeonsi, but we were storing a few extra pieces of information that made this much less effective compared to TGSI. The worse offender by far was the program name, which caused most of the cache misses. This pass strips out these pieces of information, controlled by the NIR_STRIP debug env variable. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-12 10:49:48 +01:00
Timothy Arceri	68ce0ec222	nir: calculate trip count for more loops This adds support to loop analysis for loops where the induction variable is compared to the result of min(variable, constant). For example: for (int i = 0; i < imin(x, 4); i++) ... We add a new bool to the loop terminator struct in order to differentiate terminators with this exit condition. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	fba5d275db	nir: add new partially_unrolled bool to nir_loop In order to stop continuously partially unrolling the same loop we add the bool partially_unrolled to nir_loop, we add it here rather than in nir_loop_info because nir_loop_info is only set via loop analysis and is intended to be cleared before each analysis. Also nir_loop_info is never cloned. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	03a452b7d0	nir: add guess trip count support to loop analysis This detects an induction variable used as an array index to guess the trip count of the loop. This enables us to do a partial unroll of the loop, which can eventually result in the loop being eliminated. v2: check if the induction var is used to index more than a single array and if so get the size of the smallest array. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Jason Ekstrand	e02959f442	nir/lower_doubles: Inline functions directly in lower_doubles Instead of trusting the caller to already have created a softfp64 function shader and added all its functions to our shader, we simply take the softfp64 shader as an argument and do the function inlining ouselves. This means that there's no more nasty functions lying around that the caller needs to worry about cleaning up. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	f25ca337b4	nir/deref: Expose nir_opt_deref_impl Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	de8d80f9cc	nir/inline_functions: Break inlining into a builder helper This pulls the guts of function inlining into a builder helper so that it can be used elsewhere. The rest of the infrastructure is still needed for most inlining cases to ensure that everything gets inlined and only ever once. However, there are use-cases where you just want to inline one little thing. This new helper also has a neat trick where it can seamlessly inline a function from one nir_shader into another. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	9314084237	nir: Teach loop unrolling about 64-bit instruction lowering The lowering we do for 64-bit instructions can cause a single NIR ALU instruction to blow up into hundreds or thousands of instructions potentially with control flow. If loop unrolling isn't aware of this, it can unroll a loop 20 times which contains a nir_op_fsqrt which we then lower to a full software implementation based on integer math. Those 20 invocations suddenly get a lot more expensive than NIR loop unrolling currently expects. By giving it an approximate estimate function, we can prevent loop unrolling from going to town when it shouldn't. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	ebb3695376	nir: Expose double and int64 op_to_options_mask helpers We already have one internally for int64 but we don't have a similar one for doubles so we'll have to make one. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Iago Toral Quiroga	ca2b5e9069	compiler/nir: add an is_conversion field to nir_op_info This is set to True only for numeric conversion opcodes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Karol Herbst	272e927d0e	nir/spirv: initial handling of OpenCL.std extension opcodes Not complete, mostly just adding things as I encounter them in CTS. But not getting far enough yet to hit most of the OpenCL.std instructions. Anyway, this is better than nothing and covers the most common builtins. v2: add hadd proof from Jason move some of the lowering into opt_algebraic and create new nir opcodes simplify nextafter lowering fix normalize lowering for inf rework upsample to use nir_pack_bits add missing files to build systems v3: split lines of iadd/sub_sat expressions Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 22:28:29 +01:00
Timur Kristóf	6684e039eb	nir: Add multiplier argument to nir_lower_uniforms_to_ubo. Note that locations can be set in different units, and the multiplier argument caters to supporting these different units. For example, st_glsl_to_nir uses dwords (4 bytes) so the multiplier should be 4, while tgsi_to_nir uses bytes, so the multiplier should be 16. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Timur Kristóf	909d1f50f3	nir: Move nir_lower_uniforms_to_ubo to compiler/nir. The nir_lower_uniforms_to_ubo function is useful outside of mesa/state_tracker, and in fact is needed to produce NIR for drivers that have the PIPE_CAP_PACKED_UNIFORMS capability. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Jason Ekstrand	5c96120b5c	intel,nir: Lower TXD with min_lod when the sampler index is not < 16 When we have a larger sampler index, we get into the "high sampler" scenario and need an instruction header. Even in SIMD8, this pushes the instruction over the sampler message size maximum of 11 registers. Instead, we have to lower TXD to TXL. Fixes: `cb98e0755f` "intel/fs: Support min_lod parameters on texture..." Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-04 23:56:39 +00:00

1 2 3 4 5 ...

409 Commits