KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Caio Marcelo de Oliveira Filho	9fdded0cc3	src/compiler: use new hash table and set creation helpers Replace calls to create hash tables and sets that use _mesa_hash_pointer/_mesa_key_pointer_equal with the helpers _mesa_pointer_hash_table_create() and _mesa_pointer_set_create(). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Eric Engestrom <eric@engestrom.ch>	2019-01-14 10:49:28 -08:00
Jason Ekstrand	821b6861ec	nir/gcm: Support deref instructions Even though no one's been brave enough to ever use this pass, I like to keep it functionally working. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-12 17:55:49 -06:00
Rhys Perry	0210243923	nir: fix copy-paste error in nir_lower_constant_initializers Fixes: `393b59e077` ('nir: Rework nir_lower_constant_initializers() to handle functions') Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-10 10:51:52 -06:00
Matt Turner	2623653126	nir: Unset metadata debug bit if no progress made NIR metadata validation verifies that the debug bit was unset (by a call to nir_metadata_preserve) if a NIR optimization pass made progress on the shader. With the expectation that the NIR shader consists of only a single main function, it has been safe to call nir_metadata_preserve() iff progress was made. However, most optimization passes calculate progress per-function and then return the union of those calculations. In the case that an optimization pass makes progress only on a subset of the functions in the shader metadata validation will detect the debug bit is still set on any unchanged functions resulting in a failed assertion. This patch offers a quick solution (short of a larger scale refactoring which I do not wish to undertake as part of this series) that simply unsets the debug bit on unchanged functions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	e633fae5cb	nir: Add lowering support for 64-bit operations to software Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	fe2cbcf3ee	nir: Create nir_builder in nir_lower_doubles_impl() We're going to use it more in a future patch, and this avoids a lot of gross code. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	ecb115eb3f	nir: Add and set info::uses_64bit Will be used to communicate that a shader uses 64-bit operations to the concerned lowering passes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	41f3e9e5f5	nir: Implement lowering of 64-bit shift operations Reviewed-by: Elie Tournier <tournier.elie@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	62d55f1281	nir: Wire up int64 lowering functions Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Jason Ekstrand	adab27e741	nir: Add some more int64 lowering helpers [mattst88]: Found in an old branch of Jason's. Jason implemented: inot, iand, ior, iadd, isub, ineg, iabs, compare, imin, imax, umin, umax Matt implemented: ixor, bcsel, b2i, i2b, i2i8, i2i16, i2i32, i2i64, u2u8, u2u16, u2u32, u2u64, and fixed ilt Reviewed-by: Elie Tournier <tournier.elie@gmail.com>	2019-01-09 16:42:40 -08:00
Matt Turner	dde73e646f	nir: Tag entrypoint for easy recognition by nir_shader_get_entrypoint() We're going to have multiple functions, so nir_shader_get_entrypoint() needs to do something a little smarter. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	393b59e077	nir: Rework nir_lower_constant_initializers() to handle functions Previously it assumed that only a single function (the entrypoint) existed and attempted to lower constant initializers of shader outputs for each function, for instance.	2019-01-09 16:42:40 -08:00
Eric Anholt	211b826790	nir: Make nir_deref_instr_build/get_const_offset actually use size_align. I think this was copy-and-paste mistake -- nir_opt_large_constants was passing in glsl_get_natural_size_align_bytes() given brw_nir.c's arguments to the opt pass. I wanted to reuse this function for handling constant offsets of arrays of images in V3D. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-01-08 15:40:53 -08:00
Eric Anholt	6051c11d17	nir: Add nir_lower_tex support for Broadcom's swizzled TG4 results. V3D returns the texels in a different order in the resulting vec4 from what GLSL wants, so we need to put in a swizzle. Fixes dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.base_level.level_1 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 13:03:41 -08:00
Caio Marcelo de Oliveira Filho	baabfb1959	nir: fix warning in nir_lower_io.c Initialize the variable with NULL. Fixes the following In file included from ../src/compiler/nir/nir_lower_io.c:34: ../src/compiler/nir/nir_lower_io.c: In function ‘nir_lower_explicit_io’: ../src/compiler/nir/nir.h:668:11: warning: ‘addr’ may be used uninitialized in this function [-Wmaybe-uninitialized] return src; ^~~ ../src/compiler/nir/nir_lower_io.c:735:17: note: ‘addr’ was declared here nir_ssa_def *addr; ^~~~ v2: Avoid using a 'default' case so we get help from the compiler when new deref types are added. (Lionel) Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 12:29:56 -08:00
Karol Herbst	d0c6ef2793	nir: rename global/local to private/function memory the naming is a bit confusing no matter how you look at it. Within SPIR-V "global" memory is memory accessible from all threads. glsl "global" memory normally refers to shader thread private memory declared at global scope. As we already use "shared" for memory shared across all thrads of a work group the solution where everybody could be happy with is to rename "global" to "private" and use "global" later for memory usually stored within system accessible memory (be it VRAM or system RAM if keeping SVM in mind). glsl "local" memory is memory only accessible within a function, while SPIR-V "local" memory is memory accessible within the same workgroup. v2: rename local to function as well v3: rename vtn_variable_mode_local as well Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 18:51:46 +01:00
Jason Ekstrand	63b9aa2e25	spirv: Add support for using derefs for UBO/SSBO access For now, it's hidden behind a cap. Hopefully, we can eventually drop that along with all the manual offset code in spirv_to_nir. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	e90b738f20	nir/vulkan: Add a descriptor type to vulkan resource intrinsics Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	f393b10b3f	nir/lower_io: Add "explicit" IO lowering This new pass is for lowering explicitly laid out memory coming in from SPIR-V or a similar source. It's quite a bit more complicated than the normal lower_io because we have to be able to handle matrices. The way the stride information is stored for matrices is awkward and dealing with row-major matrices is especially painful. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	52dd43c7ef	nir/validate: Allow array derefs on vectors in more modes Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	013ee5732b	nir/intrinsics: Add access flags to load/store_deref Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	7755171e4c	nir/intrinsics: Allow deref sources to consume anything This commit adds a new num_components value for intrinsic sources of -1 which means that it consumes everything and the number of components effectively isn't validated. This is useful for deref sources which just take the result of the deref and we leave it up to the driver to decide what that size should be. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	d0fe52a456	nir/validate: Allow derefs in phi nodes We added this assert when first moving derefs over to instructions to ensure that deref chains could go all the way back to the variables. Now that we're going to start using derefs for things that we can do variable pointers on such as UBOs and SSBOs, we need to be able to run derefs through phi nodes, selects, and basically anything else. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	7e85480a67	nir/remove_dead_variables: Properly handle deref casts We already detect any incomplete deref chains (where the deref is used for something other than another deref or a load/store) and flag the variable as used thanks to deref_used_for_not_store. All that's left to do is to properly skip casts when cleaning up. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	78d80f7db2	nir/deref: Skip over casts in fixup_deref_modes This pass is used when, for instance, we lazily change the mode of variables rather than replacing the variable with a new one. Since we only do this in cases where we know we have full deref chains, it's ok to just skip them in fixup_deref_modes. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	d8e3edb784	nir/deref: Support casts and ptr_as_array in comparisons The code which constructs deref paths already gives you the path starting at the nearest deref_cast or deref_var. All we need to do for casts is handle the case where the start of the path isn't a deref_var. For ptr_as_array derefs, we just bail if we have any after the divergence point between the two derefs. We may be able to do better in the future but this works for now. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	a1c688517d	nir/opt_deref: Properly optimize ptr_as_array derefs When handling casts, we can't blindly propagate the parent of a cast into a ptr_as_array deref because doing so might loose the stride information from the cast. Instead, before we can propagate into ptr_as_array derefs, we need to check that the cast is a cast of an array deref and that the stride matches. For other types of derefs, we can continue to propagate casts as normal because they don't need the stride. We also add an optimization which can combine a ptr_as_array deref with it parent if it is also an array deref of some form. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	427558a717	nir/validate: Don't allow derefs in if conditions Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	e94a027af8	nir: Add a ptr_as_array deref type These correspond directly to SPIR-V's OpPtrAccessChain. As such, they treat whatever their parent gives them as if it's the first element in some array and dereferences that array. If the parent is, itself, an array deref, then the two indices can just be added together to get the final array deref. However, it can also be used in cases where what you have is a dereference to some random vec2 value somewhere. In this case, we require a cast before the ptr_as_array and use the ptr_stride field in the cast to provide a stride for the ptr_as_array derefs. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	fc9c4f89b8	nir: Move propagation of cast derefs to a new nir_opt_deref pass We're going to want to do more deref optimizations going forward and this gives us a central place to do them. Also, cast propagation will get a bit more complicated with the addition of ptr_as_array derefs. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	6cebeb4f71	glsl_type: Add support for explicitly laid out matrices and arrays SPIR-V allows for matrix and array types to be decorated with explicit byte stride decorations and matrix types to be decorated row- or column-major. This commit adds support to glsl_type to encode this information. Because this doesn't work nicely with std430 and std140 alignments, we add asserts to ensure that we don't use any of the std430 or std140 layout functions with explicitly laid out types. In SPIR-V, the layout information for matrices is applied to the parent struct member instead of to the matrix type itself. However, this is gets rather clumsy when you're walking derefs trying to compute offsets because, the moment you hit a matrix, you have to crawl back the deref chain and find the struct. Instead, we take the same path here as we've taken in spirv_to_nir and put the decorations on the matrix type itself. This also subtly adds support for strided vector types. These don't come up in SPIR-V directly but you can get one as the result of taking a column from a row-major matrix or a row from a column-major matrix. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	d34f19feba	glsl_type: Drop the glsl_get_array_instance C helper It was added in `bce6f99875` even though it's completely redundant with glsl_array_type(). Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	a700a82bda	nir: Distinguish between normal uniforms and UBOs Previously, NIR had a single nir_var_uniform mode used for atomic counters, UBOs, samplers, images, and normal uniforms. This commit splits this into nir_var_uniform and nir_var_ubo where nir_var_uniform is still a bit of a catch-all but the nir_var_ubo is specific to UBOs. While we're at it, we also rename shader_storage to ssbo to follow the convention. We need this so that we can distinguish between normal uniforms and UBO access at the deref level without going all the way back variable and seeing if it has an interface type. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	c9a4135e14	nir: Allow storing to shader_storage I have no idea how shader_storage made it into the list of banned variable modes for stores but it clearly should be allowed. This only doesn't cause us a problem today because we never actually use derefs on shader_storage variables. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	cd93b0a670	nir/validate: Require array indices to match the deref bit size This doesn't currently change anything because array indices are required to be 32 bits and all derefs are also 32 bits. However, we will one day have 64-bit derefs for OpenCL. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	bfe31c5e46	nir/builder: Add nir_i2i and nir_u2u helpers which take a bit size Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com	2019-01-08 00:38:29 +00:00
Timothy Arceri	6dade5d534	nir: avoid uninitialized variable warning Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109231	2019-01-07 10:57:00 +11:00
Eric Anholt	f217a94542	nir: Add nir_lower_tex options to lower sampler return formats. I've been doing this in the nir-to-vir and nir-to-qir backends of v3d and vc4, but nir could potentially do some useful stuff for us (like avoiding unpack/repacks) if we give it the information. v2: Skip lowering for txs/query_levels v3: Fix a crash on old-style shadow v4: Rename to tex_packing, use nir_format_unpack_sint/uint helpers, pack the enum. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-04 15:59:57 -08:00
Eric Anholt	a74f2aeb4f	nir: Allow nir_format_unpack_int/sint to unpack larger values. For V3D, I want to unpack 4-16-bit packed integers for 8 and 16-bit integer samplers. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-04 15:59:30 -08:00
Caio Marcelo de Oliveira Filho	bbf9ee9b18	nir: remove dead code from copy_prop_vars When copy_prop_vars also took care of dead write handling, intrin was used as part of store_to_entry. Now it isn't, so this assignment isn't used really used. Add a comment clarifying what happens to intrin. Fixes: `4dfa7adc10` "nir: Remove handling of dead writes from copy_prop_vars" Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-04 15:18:41 -08:00
Timothy Arceri	4d3f6cb973	nir: merge some basic consecutive ifs After trying multiple times to merge if-statements with phis between them I've come to the conclusion that it cannot be done without regressions. The problem is for some shaders we end up with a whole bunch of phis for the merged ifs resulting in increased register pressure. So this patch just merges ifs that have no phis between them. This seems to be consistent with what LLVM does so for radeonsi we only see a change (although its a large change) in a single shader. Shader-db results i965 (SKL): total instructions in shared programs: 13098176 -> 13098152 (<.01%) instructions in affected programs: 1326 -> 1302 (-1.81%) helped: 4 HURT: 0 total cycles in shared programs: 332032989 -> 332037583 (<.01%) cycles in affected programs: 60665 -> 65259 (7.57%) helped: 0 HURT: 4 The cycles estimates reported by shader-db for i965 seem inaccurate as the only difference in the final code is the removal of the redundent condition evaluations and jumps. Also the biggest code reduction (~7%) for radeonsi was in a tomb raider tressfx shader but for some reason this does not get merged for i965. Shader-db results radeonsi (VEGA): Totals from affected shaders: SGPRS: 232 -> 232 (0.00 %) VGPRS: 164 -> 164 (0.00 %) Spilled SGPRs: 59 -> 59 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 14584 -> 13520 (-7.30 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 13 -> 13 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-01-03 15:17:16 +11:00
Timothy Arceri	19cafe8084	nir: add rewrite_phi_predecessor_blocks() helper This will also be used by the if merge pass in the following commit. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-01-03 15:17:16 +11:00
Timothy Arceri	5122fbc4ba	nir: simplify does_varying_match() Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-03 11:47:56 +11:00
Timothy Arceri	8d05ee2005	nir: make use of does_varying_match() helper Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-03 11:47:56 +11:00
Timothy Arceri	0016166d19	nir: make nir_opt_remove_phis_impl() static Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-03 11:47:56 +11:00
Caio Marcelo de Oliveira Filho	7d6babf995	nir: add a way to print the deref chain Makes debugging easier when we care about the deref chain and not the deref instruction itself. To make it take a const pointer, constify some of the static functions in nir_print.c. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 10:09:04 -08:00
Iago Toral Quiroga	95b7c29c2c	compiler/spirv: handle 16-bit float in radians() and degrees() v2: - use nir_imm_fmul helper (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:05 +01:00
Iago Toral Quiroga	aeee683780	compiler/nir: add nir_fadd_imm() and nir_fmul_imm() helpers Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:05 +01:00
Iago Toral Quiroga	5fc9ad1cb0	compiler/nir: add a nir_b2f() helper Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:05 +01:00
Timothy Arceri	70be9afccb	nir: link time opt duplicate varyings If we are outputting the same value to more than one output component rewrite the inputs to read from a single component. This will allow the duplicate varying components to be optimised away by the existing opts. shader-db results i965 (SKL): total instructions in shared programs: 12869230 -> 12860886 (-0.06%) instructions in affected programs: 322601 -> 314257 (-2.59%) helped: 3080 HURT: 8 total cycles in shared programs: 317792574 -> 317730593 (-0.02%) cycles in affected programs: 2584925 -> 2522944 (-2.40%) helped: 2975 HURT: 477 shader-db results radeonsi (VEGA): SGPRS: 31576 -> 31664 (0.28 %) VGPRS: 17484 -> 17064 (-2.40 %) Spilled SGPRs: 184 -> 167 (-9.24 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 583340 -> 569368 (-2.40 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 6162 -> 6270 (1.75 %) Wait states: 0 -> 0 (0.00 %) vkpipeline-db results RADV (VEGA): Totals from affected shaders: SGPRS: 14880 -> 15080 (1.34 %) VGPRS: 10872 -> 10888 (0.15 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 674016 -> 668396 (-0.83 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 2708 -> 2704 (-0.15 %) Wait states: 0 -> 0 (0.00 % V2: bunch of tidy ups suggested by Jason Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 12:19:17 +11:00
Timothy Arceri	d828694b80	nir: rework nir_link_opt_varyings() This just cleans things up a little and make things more safe for derefs. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 12:19:17 +11:00
Timothy Arceri	c0aba8b0dc	nir: add can_replace_varying() helper This will be reused by the following patch. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 12:19:17 +11:00
Timothy Arceri	50de3f80a8	nir: rename nir_link_constant_varyings() nir_link_opt_varyings() The following patches will add support for an additional optimisation so this function will no longer just optimise varying constants. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 12:19:17 +11:00
Iago Toral Quiroga	d6110d4d54	intel/compiler: move nir_lower_bool_to_int32 before nir_lower_locals_to_regs The former expects to see SSA-only things, but the latter injects registers. The assertions in the lowering where not seeing this because they asserted on the bit_size values only, not on the is_ssa field, so add that assertion too. Fixes: `11dc130779` "nir: Add a bool to int32 lowering pass" CC: mesa-stable@lists.freedesktop.org Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-20 08:02:44 +01:00
Caio Marcelo de Oliveira Filho	947f7b452a	nir: properly find the entry to keep in copy_prop_vars When copy propagation handles a store/copy, it iterates the current copy entries to remove aliases, but keeps the "equal" entry (if exists) to be updated. The removal step may swap the entries around (to ensure there are no holes), invalidating previous iteration pointers. The bug was saving such pointer to use later. Change the code to first perform the removals and then find the remaining right entry. This was causing updates to be lost since they were being made to an entry that was not part of the current copies. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108624 Fixes: `b3c6146925` "nir: Copy propagation between blocks" Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-19 09:33:36 -08:00
Caio Marcelo de Oliveira Filho	0ddc911f4d	nir: properly clear the entry sources in copy_prop_vars When updating a copy entry source value from a "non-SSA" (the data come from a copy instruction) to a "SSA" (the data or parts of it come from SSA values), it was possible to hold invalid data in ssa[0] depending on the writemask. Because the union, ssa[0] could contain a pointer to a nir_deref_instr left-over from previous non-SSA usage. Change code to clean up the array before use to avoid invalid data around. Fixes: `62332d139c` "nir: Add a local variable-based copy propagation pass" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-19 08:35:48 -08:00
Ian Romanick	96c4b135e3	nir/algebraic: Don't put quotes around floating point literals The quotation marks around 1.0 cause it to be treated as a string instead of a floating point value. The generator then treats it as an arbitrary variable replacement, so any iand involving a ('ineg', ('b2i', a)) matches. v2: Remove misleading comment about sized literals (suggested by Timothy). Add assertion that the name of a varible is entierly alphabetic (suggested by Jason). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Timothy Arceri <tarceri@itsqueeze.com> [v1] Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> [v1] Fixes: `6bcd2af086` ("nir/algebraic: Add some optimizations for D3D-style Booleans") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109075	2018-12-18 23:28:31 -08:00
Sagar Ghuge	933c44bcc4	nir: Add a new lowering option to lower 3D surfaces from txd to txl. Tested on gen9. v2: Rename lower_txd_3d_surafaces flag to lower_txd_3d (Jason Ekstrand) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-18 13:44:09 -08:00
Jason Ekstrand	5dad1abfdc	nir/dead_write_vars: Get modes directly from derefs Instead of going all the way back to the variable, just look at the deref. The modes are guaranteed to be the same by nir_validate whenever the variable can be found. This fixes clear_unused_for_modes for derefs that don't have an accessible variable. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	fa40a58fd9	nir/copy_prop_vars: Get modes directly from derefs Instead of going all the way back to the variable, just look at the deref. The modes are guaranteed to be the same by nir_validate whenever the variable can be found. This fixes apply_barrier_for_modes for derefs that don't have an accessible variable. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	cf7fb39805	nir/lower_wpos_center: Look at derefs for modes This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	867fe35a16	nir/lower_io_to_scalar: Look at derefs for modes This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	3fe0363dda	nir/lower_io_arrays_to_elements: Look at derefs for modes This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	8cc0f92492	nir/linking_helpers: Look at derefs for modes This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	8410cf66d7	nir/propagate_invariant: Skip unknown vars If we can't find the variable from the deref, just assume it isn't invariant and continue on. This can happen if, for instance, we're writing to a deref that points into an SSBO. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Ian Romanick	29e4b949b4	Revert "nir/lower_indirect: Bail early if modes == 0" "There's no point in walking the program if we're never going to actually lower anything." Except we might lower compacted local arrays. In that case, modes will be 0, but there is still lowering to be done. This reverts commit `7f75cf2a94`. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109081 Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Clayton Craft <clayton.a.craft@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org>	2018-12-18 10:47:54 -08:00
Ian Romanick	378f996771	nir/opt_peephole_select: Don't peephole_select expensive math instructions On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	09b7e1d8e4	nir/opt_peephole_select: Don't try to remove flow control around indirect loads That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Eric Anholt	708d8f4d0a	nir: Fix clamping of uints for image store lowering. I botched some copy-and-paste and clamped to signed int max instead of uint max. Fixes KHR-GL46.shader_image_load_store.multiple-uniforms on skl. Fixes: `d3e046e76c` ("nir: Pull some of intel's image load/store format conversion to nir_format.h") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-17 20:02:22 +00:00
Ian Romanick	9dc135efa1	nir: Release per-block metadata in nir_sweep nir_sweep already marks all metadata invalid, so it is safe to release the memory here too. mean soft fp64 using uint64: 1,342,759,331 => 1,010,670,475 gfxbench5 aztec ruins high 11: 63,555,571 => 61,889,811 deus ex mankind divided 148: 62,845,304 => 62,829,640 deus ex mankind divided 2890: 71,922,686 => 71,922,686 dirt showdown 676: 69,238,607 => 69,238,607 dolphin ubershaders 210: 77,822,072 => 77,822,072 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 14:39:56 -08:00
Ian Romanick	7adafd6e1c	nir: Fix holes in nir_instr Found using pahole. Changes in peak memory usage according to Valgrind massif: mean soft fp64 using uint64: 1,343,991,403 => 1,342,759,331 gfxbench5 aztec ruins high 11: 63,619,971 => 63,555,571 deus ex mankind divided 148: 62,887,728 => 62,845,304 deus ex mankind divided 2890: 72,399,750 => 71,922,686 dirt showdown 676: 69,464,023 => 69,238,607 dolphin ubershaders 210: 78,359,728 => 77,822,072 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 14:39:56 -08:00
Ian Romanick	8161a87b24	nir/phi_builder: Use per-value hash table to store [block] -> def mapping Replace the old array in each value with a hash table in each value. Changes in peak memory usage according to Valgrind massif: mean soft fp64 using uint64: 5,499,875,082 => 1,343,991,403 gfxbench5 aztec ruins high 11: 63,619,971 => 63,619,971 deus ex mankind divided 148: 62,887,728 => 62,887,728 deus ex mankind divided 2890: 72,402,222 => 72,399,750 dirt showdown 676: 74,466,431 => 69,464,023 dolphin ubershaders 210: 109,630,376 => 78,359,728 Run-time change for a full run on shader-db on my Haswell desktop (with -march=native) is 1.22245% +/- 0.463879% (n=11). This is about +2.9 seconds on a 237 second run. The first time I sent this version of this patch out, the run-time data was quite different. I had misconfigured the script that ran the test, and none of the tests from higher GLSL versions were run. These are generally more complex shaders, and they are more affected by this change. The previous version of this patch used a single hash table for the whole phi builder. The mapping was from [value, block] -> def, so a separate allocation was needed for each [value, block] tuple. There was quite a bit of per-allocation overhead (due to ralloc), so the patch was followed by a patch that added the use of the slab allocator. The results of those two patches was not quite as good: mean soft fp64 using uint64: 5,499,875,082 => 1,343,991,403 gfxbench5 aztec ruins high 11: 63,619,971 => 63,619,971 deus ex mankind divided 148: 62,887,728 => 62,887,728 deus ex mankind divided 2890: 72,402,222 => 72,402,222 * dirt showdown 676: 74,466,431 => 72,443,591 * dolphin ubershaders 210: 109,630,376 => 81,034,320 * The * denote tests that are better now. In the tests that are the same in both patches, the "after" peak memory usage was at a different location. I did not check the local peaks. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 14:39:56 -08:00
Jason Ekstrand	6bcd2af086	nir/algebraic: Add some optimizations for D3D-style Booleans D3D Booleans use a 32-bit 0/-1 representation. Because this previously matched NIR exactly, we didn't have to really optimize for it. Now that we have 1-bit Booleans, we need some specific optimizations to chew through the D3D12-style Booleans. Shader-db results on Kaby Lake: total instructions in shared programs: 15136811 -> 14967944 (-1.12%) instructions in affected programs: 2457021 -> 2288154 (-6.87%) helped: 8318 HURT: 10 total cycles in shared programs: 373544524 -> 359701825 (-3.71%) cycles in affected programs: 151029683 -> 137186984 (-9.17%) helped: 7749 HURT: 682 total loops in shared programs: 4431 -> 4399 (-0.72%) loops in affected programs: 32 -> 0 helped: 21 HURT: 0 total spills in shared programs: 10290 -> 10051 (-2.32%) spills in affected programs: 2532 -> 2293 (-9.44%) helped: 18 HURT: 18 total fills in shared programs: 22203 -> 21732 (-2.12%) fills in affected programs: 3319 -> 2848 (-14.19%) helped: 18 HURT: 18 Note that a large chunk of the improvement fixing regressions caused by switching to 1-bit Booleans. Previously, our ability to optimize D3D booleans was improved by using the D3D representation directly in NIR. Now that NIR does 1-bit bools, we need a few more optimizations. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	3b30814791	nir/algebraic: Optimize 1-bit Booleans Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	44227453ec	nir: Switch to using 1-bit Booleans for almost everything This is a squash of a few distinct changes: glsl,spirv: Generate 1-bit Booleans Revert "Use 32-bit opcodes in the NIR producers and optimizations" Revert "nir/builder: Generate 32-bit bool opcodes transparently" nir/builder: Generate 1-bit Booleans in nir_build_imm_bool Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	11dc130779	nir: Add a bool to int32 lowering pass We also enable it in all of the NIR drivers. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	191a1dce92	nir: Add 1-bit Boolean opcodes We also have to add support for 1-bit integers while we're here so we get 1-bit variants of iand, ior, and inot. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	615cc26b97	nir/algebraic: Generalize an optimization This just makes it nicely scale across bit sizes. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	487514ae61	nir/large_constants: Properly handle 1-bit bools Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	3191a82372	nir: Add support for 1-bit data types This commit adds support for 1-bit Booleans and integers. Booleans obviously take a value of true or false. Because we have to define the semantics of 1-bit signed and unsigned integers, we define uint1_t to take values of 0 and 1 and int1_t to take values of 0 and -1. 1-bit arithmetic is then well-defined in the usual way, just with fewer bits. The definition of int1_t and uint1_t doesn't usually matter but we do need something for purposes of constant folding. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	2fe8708ffd	nir/constant_expressions: Rework Boolean handling This commit contains three related changes. First, we define boolN_t for N = 8, 16, and 64 and move the definition of boolN_vec to the loop with the other vec definitions. Second, there's no reason why we need the != 0 on the source because that happens implicitly when it's converted to bool. Third, for destinations, we use a signed integer type and just do -(int)bool_val which will give us the 0/-1 behavior we want and neatly scales to all bit widths. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	80e8dfe9de	nir: Rename Boolean-related opcodes to include 32 in the name This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_$[fiu]lt$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]ge$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]ne$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]eq$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fi]$ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_$[fiu]lt$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]ge$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]ne$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]eq$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fi]$ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	b569093566	nir/algebraic: Make an optimization more specific Later in this series, bool is not going to imply 32-bit. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	517099809a	nir: Drop support for lower_b2f This was originally added for the out-of-tree Mali driver but I think we've all agreed it's easy enough for them to just do in their back-end. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	4bb1a34727	nir/algebraic: Optimize x2b(xneg(a)) -> a Shader-db results on Kaby Lake: total instructions in shared programs: 15072525 -> 15072525 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 This helps prevent regressions in later commits. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	3595a0abf4	nir/constant_folding: Fix source bit size logic Instead of looking at input_sizes[i] which contains the number of components for each source, we look at the bit size of input_types[i]. This fixes a regression in the 1-bit boolean series though I have no idea how we haven't seen it before now. Fixes: `35baee5dce` "nir/constant_folding: fix incorrect bit-size check" Fixes: `9076c4e289` "nir: update opcode definitions for different bit sizes" Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	e17426058c	nir/lower_idiv: Use ilt instead of bit twiddling The previous code was creating a boolean by doing an arithmetic right- shift by 31 which produces a boolean which is true if the argument is negative. This is the same as the expression r < 0 which is much simpler and doesn't depend on NIR's representation of booleans. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-12-16 21:03:02 +00:00
Rhys Perry	ed4020fabe	nir: fix constness in nir_intrinsic_align() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-16 14:56:10 +00:00
Ian Romanick	ba5402ec9a	nir/phi_builder: Internal users should use nir_phi_builder_value_set_block_def too Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-14 07:36:05 -08:00
Timothy Arceri	a2ec78883f	nir: fix opt_if_loop_last_continue() The pass did not correctly handle loops ending in: if ssa_7 { block block_8: /* preds: block_7 / continue / succs: block_1 / } else { block block_9: / preds: block_7 / break / succs: block_11 */ } The break will get eliminated by another opt but if this pass gets called first (as it does on RADV) we ended up inserting instructions after the break. Fixes: `5921a19d4b` ("nir: add if opt opt_if_loop_last_continue()") Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-12-14 17:21:35 +11:00
Eric Anholt	4407e688cd	nir: Move intel's half-float image store lowering to to nir_format.h. I needed the same function for v3d. This was originally in `d3e046e76c` ("nir: Pull some of intel's image load/store format conversion to nir_format.h") before we made am istake about simplifying the function. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-13 12:24:26 -08:00
Eric Anholt	c2c44dba7a	nir: Print the format of image variables. This helps a lot when debugging image load/store lowering on large testcases. Unfortunately the Mesa enum name stuff is under src/mesa and we can't get at it from the compiler. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-13 12:24:12 -08:00
Jason Ekstrand	74492ebad9	nir: Add a pass for lowering integer division by constants It's a reasonably well-known fact in the world of compilers that integer divisions by constants can be replaced by a multiply, an add, and some shifts. This commit adds such an optimization to NIR for easiest case of udiv. Other division operations will be added in following commits. In order to provide some additional driver control, the pass takes a minimum bit size to optimize. Reviewed-by: Ian Romanick ian.d.romanick@intel.com	2018-12-13 17:49:48 +00:00
Ian Romanick	090e282407	nir: Add a saturated unsigned integer add opcode Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-13 17:49:48 +00:00
Jason Ekstrand	39198a1238	nir/lower_int64: Add support for [iu]mul_high Reviewed-by: Ian Romanick ian.d.romanick@intel.com	2018-12-13 17:49:48 +00:00
Jason Ekstrand	9525971e2b	nir: Allow [iu]mul_high on non-32-bit types Reviewed-by: Ian Romanick ian.d.romanick@intel.com	2018-12-13 17:49:48 +00:00
Alejandro Piñeiro	c7bdcd67aa	nir: remove unused variable To avoid the following warning: ./src/compiler/nir/nir_loop_analyze.c:807:16: warning: unused variable ‘ns’ [-Wunused-variable] nir_shader *ns = impl->function->shader; Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-13 16:35:21 +01:00
Eric Anholt	d3e046e76c	nir: Pull some of intel's image load/store format conversion to nir_format.h I needed the same functions for v3d. Note that the color value in the Intel lowering has already been cut down to image.chans num_components. v2: Drop the half float one, since it was a 1-liner after cleanup. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-12 16:09:43 -08:00
Eric Anholt	19c7cba2ab	nir: Add some more consts to the nir_format_convert.h helpers. Most of the bits were constant, but a few were missed. Avoids warnings from v3d's upcoming static const bits declarations. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-12 16:09:37 -08:00
Timothy Arceri	9e6b39e1d5	nir: detect more induction variables This allows loop analysis to detect inductions variables that are incremented in both branches of an if rather than in a main loop block. For example: loop { block block_1: /* preds: block_0 block_7 / vec1 32 ssa_8 = phi block_0: ssa_4, block_7: ssa_20 vec1 32 ssa_9 = phi block_0: ssa_0, block_7: ssa_4 vec1 32 ssa_10 = phi block_0: ssa_1, block_7: ssa_4 vec1 32 ssa_11 = phi block_0: ssa_2, block_7: ssa_21 vec1 32 ssa_12 = phi block_0: ssa_3, block_7: ssa_22 vec4 32 ssa_13 = vec4 ssa_12, ssa_11, ssa_10, ssa_9 vec1 32 ssa_14 = ige ssa_8, ssa_5 / succs: block_2 block_3 / if ssa_14 { block block_2: / preds: block_1 / break / succs: block_8 / } else { block block_3: / preds: block_1 / / succs: block_4 / } block block_4: / preds: block_3 / vec1 32 ssa_15 = ilt ssa_6, ssa_8 / succs: block_5 block_6 / if ssa_15 { block block_5: / preds: block_4 / vec1 32 ssa_16 = iadd ssa_8, ssa_7 vec1 32 ssa_17 = load_const (0x3f800000 / 1.000000/) / succs: block_7 / } else { block block_6: / preds: block_4 / vec1 32 ssa_18 = iadd ssa_8, ssa_7 vec1 32 ssa_19 = load_const (0x3f800000 / 1.000000/) / succs: block_7 / } block block_7: / preds: block_5 block_6 / vec1 32 ssa_20 = phi block_5: ssa_16, block_6: ssa_18 vec1 32 ssa_21 = phi block_5: ssa_17, block_6: ssa_4 vec1 32 ssa_22 = phi block_5: ssa_4, block_6: ssa_19 / succs: block_1 */ } Unfortunatly GCM could move the addition out of the if for us (making this patch unrequired) but we still cannot enable the GCM pass without regressions. This unrolls a loop in Rise of The Tomb Raider. vkpipeline-db results (VEGA): Totals from affected shaders: SGPRS: 88 -> 96 (9.09 %) VGPRS: 56 -> 52 (-7.14 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 2168 -> 4560 (110.33 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 4 -> 4 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32211	2018-12-13 10:58:35 +11:00
Timothy Arceri	c03d6e61cc	nir: reword code comment Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-12-13 10:58:35 +11:00
Timothy Arceri	48b40380e3	nir: in loop analysis track actual control flow type This will allow us to improve analysis to find more induction variables. Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-12-13 10:58:35 +11:00
Danylo Piliaiev	5921a19d4b	nir: add if opt opt_if_loop_last_continue() Removing the last continue can allow more loops to unroll. Also inserting code into the if branch can allow the various if opts to progress further. The insertion of some loops into the if branch also reduces VGPR use in some shaders. vkpipeline-db results (VEGA): Totals from affected shaders: SGPRS: 6552 -> 6576 (0.37 %) VGPRS: 6544 -> 6532 (-0.18 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 481952 -> 478032 (-0.81 %) bytes LDS: 13 -> 13 (0.00 %) blocks Max Waves: 241 -> 242 (0.41 %) Wait states: 0 -> 0 (0.00 %) Shader-db results radeonsi (VEGA): Totals from affected shaders: SGPRS: 168 -> 168 (0.00 %) VGPRS: 144 -> 140 (-2.78 %) Spilled SGPRs: 157 -> 157 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 8524 -> 8488 (-0.42 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 7 -> 7 (0.00 %) Wait states: 0 -> 0 (0.00 %) v2: (Timothy Arceri): - allow for continues in either branch - move any trailing loops inside the if as well as blocks. - leave nir_opt_trivial_continues() to actually remove the continue. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32211	2018-12-13 10:58:35 +11:00
Timothy Arceri	721566bddb	nir: rework force_unroll_array_access() Here we rework force_unroll_array_access() so that we can reuse the induction variable detection in a following patch. Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-12-13 10:39:51 +11:00
Timothy Arceri	48135f175c	nir: factor out some of the complex loop unroll code to a helper Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-12-13 10:34:48 +11:00
Jason Ekstrand	ca98902d09	nir: Document the function inlining process This has thrown a few people off recently and it's good to have the process and all the rational for it documented somewhere. A comment at the top of nir_inline_functions seems as good a place as any. Acked-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-12-12 08:32:32 -06:00
Jason Ekstrand	4ef8f46fd1	nir/lower_tex: Add lowering for some min_lod cases Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-12-11 21:26:23 -06:00
Jason Ekstrand	4a691cfa7e	nir/lower_tex: Modify txd instructions instead of replacing them I don't know if one is better than the other or not but this approach has the advantage that we never forget to copy information over and we're not hard-coding quite as many assumptions. It's also a lot simpler and much less code. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-12-11 21:26:23 -06:00
Jason Ekstrand	5a968ae473	nir/lower_tex: Simplify lower_gradient logic Instead of having to call two different lower_gradient functions based on whether or not it's a cube, just make lower_gradient handle cubes. This significantly simplifies some of the logic. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-12-11 21:26:23 -06:00
Jason Ekstrand	caeffe7549	spirv: Add support for MinLod Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-12-11 21:26:23 -06:00
Rob Clark	9e3fc0c1e0	nir: fix spelling typo Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-11 15:51:34 -05:00
Dylan Baker	6d3cbbbe15	meson: Add nir_algebraic_parser_test to suites Just to make it easier to run a nir tests together. Fixes: `a0ae12ca91` ("nir/algebraic: Add unit tests for bitsize validation") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-12-10 09:14:44 -08:00
Timothy Arceri	032f247921	nir: make use of new nir_cf_list_clone_and_reinsert() helper Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-10 13:59:50 +11:00
Timothy Arceri	6b961eb534	nir: add a new nir_cf_list_clone_and_reinsert() helper Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-10 13:59:50 +11:00
Timothy Arceri	03d7c65ad8	nir: clarify some nit_loop_info member names Following commits will introduce additional fields such as guessed_trip_count. Renaming these will help avoid confusion as our unrolling feature set grows. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-10 13:59:50 +11:00
Timothy Arceri	de0aee7638	nir: small tidy ups for nir_loop_analyze() Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-10 13:59:50 +11:00
Connor Abbott	2845c49218	nir: Fixup algebraic test for variable-sized conversions b2i can now take any size boolean in preparation for 1-bit booleans, so the error message printed is slightly different. Fixes: `dca6cd9ce6` ("nir: Make boolean conversions sized just like the others") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108961 Cc: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-07 16:07:51 +01:00
Vinson Lee	b4fd59075b	nir/algebraic: Make algebraic_parser_test.sh executable. Fixes make check permission error. ../../bin/test-driver: line 107: ./nir/tests/algebraic_parser_test.sh: Permission denied FAIL nir/tests/algebraic_parser_test.sh (exit status: 126) Fixes: `a0ae12ca91` ("nir/algebraic: Add unit tests for bitsize validation") Signed-off-by: Vinson Lee <vlee@freedesktop.org>	2018-12-06 11:48:20 -08:00
Jason Ekstrand	dca6cd9ce6	nir: Make boolean conversions sized just like the others Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:03:07 -06:00
Jason Ekstrand	be98b1db38	nir/opt_algebraic: Add 32-bit specifiers to a bunch of booleans Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:03:03 -06:00
Jason Ekstrand	2715080d65	nir/opt_algebraic: Drop bit-size suffixes from conversions Suffixes are dropped from a bunch of conversion opcodes when it makes sense to do so. Others are kept if we really do want the bit-size restriction. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:03:01 -06:00
Jason Ekstrand	ff8e3d3b7b	nir/opt_algebraic: Simplify an optimization using the new search ops Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:02:58 -06:00
Jason Ekstrand	05af952a11	nir/algebraic: Add support for unsized conversion opcodes All conversion opcodes require a destination size but this makes constructing certain algebraic expressions rather cumbersome. This commit adds support to nir_search and nir_algebraic for writing conversion opcodes without a size. These meta-opcodes match any conversion of that type regardless of destination size and the size gets inferred from the sizes of the things being matched or from other opcodes in the expression. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:02:56 -06:00
Jason Ekstrand	4925290ab1	nir/algebraic: Refactor codegen a bit Instead of using an OrderedDict, just have a (necessarily sorted) array of transforms and a set of opcodes. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:02:54 -06:00
Jason Ekstrand	d6aac618fb	nir/algebraic: Clean up some __str__ cruft Both of these things are already handled in the Value base class so we don't need to handle them explicitly in Constant. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:02:52 -06:00
Jason Ekstrand	85f0ea9d8f	nir/opcodes: Rename tbool to tbool32 Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:02:49 -06:00
Jason Ekstrand	03571a7a6c	nir/opcodes: Pull in the type helpers from constant_expressions While we're at it, we rework them a bit to all use regular expressions and assert more. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:02:06 -06:00
Connor Abbott	a0ae12ca91	nir/algebraic: Add unit tests for bitsize validation The non-failure path can be tested by just compiling mesa and then testing it, but the failure paths won't be hit unless you make a mistake, so it's best to test them with some unit tests. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-05 17:57:40 +01:00
Connor Abbott	29a1450e28	nir/algebraic: Rewrite bit-size inference Before this commit, there were two copies of the algorithm: one in C, that we would use to figure out what bit-size to give the replacement expression, and one in Python, that emulated the C one and tried to prove that the C algorithm would never fail to correctly assign bit-sizes. That seemed pretty fragile, and likely to fall over if we make any changes. Furthermore, the C code was really just recomputing more-or-less the same thing as the Python code every time. Instead, we can just store the results of the Python algorithm in the C datastructure, and consult it to compute the bitsize of each value, moving the "brains" entirely into Python. Since the Python algorithm no longer has to match C, it's also a lot easier to change it to something more closely approximating an actual type-inference algorithm. The algorithm used is based on Hindley-Milner, although deliberately weakened a little. It's a few more lines than the old one, judging by the diffstat, but I think it's easier to verify that it's correct while being as general as possible. We could split this up into two changes, first making the C code use the results of the Python code and then rewriting the Python algorithm, but since the old algorithm never tracked which variable each equivalence class, it would mean we'd have to add some non-trivial code which would then get thrown away. I think it's better to see the final state all at once, although I could also try splitting it up. v2: - Replace instances of "== None" and "!= None" with "is None" and "is not None". - Rename first_src to first_unsized_src - Only merge the destination with the first unsized source, since the sources have already been merged. - Add a comment explaining what nir_search_value::bit_size now means. v3: - Fix one last instance to use "is not" instead of != - Don't try to be so clever when choosing which error message to print based on whether we're in the search or replace expression. - Fix trailing whitespace. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-05 17:57:40 +01:00
Matt Turner	017199d2d2	mesa: Revert INTEL_fragment_shader_ordering support This extension is not properly tested (testing for GL_ARB_fragment_shader_interlock is not sufficient), and since this was noted in review on August 28th no tests have been sent. Revert "i965: Add INTEL_fragment_shader_ordering support." Revert "mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering" This reverts commit `03ecec9ed2`. This reverts commit `119435c877`. Cc: mesa-stable@lists.freedesktop.org Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Eric Anholt <eric@anholt.net>	2018-12-03 15:37:37 -08:00
Józef Kucia	94bfb8bf38	nir: Fix assert in print_intrinsic_instr(). Signed-off-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-29 16:29:37 +00:00
Jason Ekstrand	199a0353d6	nir/derefs: Add a nir_derefs_do_not_alias enum value This makes some of the code more clear. Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-11-28 14:29:25 -06:00
Matt Turner	1a210268b8	nir: Call fflush() at the end of nir_print_shader() We normally call with stderr which is unbuffered, so this won't affect that, but it does let me call nir_print_shader(nir, fopen("log", "w+")) from gdb and actually get the whole shader in my file. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-27 22:29:53 -08:00
Jonathan Marek	3e7186d472	nir: add fceil lowering lowers ceil(x) as -floor(-x) Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Iago Toral Quiroga	8e73b57634	Revert "nir/builder: Assert that intN_t immediates fit" This reverts commit `1f29f4db1e`. For this to work the compiler must ensure that it never puts the values that arrive to this helper into unsigned variables at any point in its processing, since that would not apply sign extension to the value and it would break the expectations here. Unfortunately, we use uint64_t extensively to pass and copy things around, so some times we get to this helper with values that are not properly sign extended to 64-bit. Here is an example for an 8-bit value that comes from a switch case: (gdb) p /x x $1 = 0xffffffd6 The value seems to have been sign extended to 32-bit at some point getting proper sign extension, but then copied into a uint64_t which wont' apply sign extension, breaking the expectations of the assertion. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-21 08:12:50 +01:00
Iago Toral Quiroga	387888e3b7	nir/from_ssa: fix bit-size of temporary register Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-21 08:07:22 +01:00
Dylan Baker	a999798daa	meson: Add tests to suites Meson test has a concepts of suites, which allow tests to be grouped together. This allows for a subtest of tests to be run only (say only the tests for nir). A test can be added to more than one suite, but for the most part I've only added a test to a single suite, though I've added a compiler group that includes nir, glsl, and glcpp tests. To use this you'll need to invoke meson test directly, instead of ninja test (which always runs all targets). it can be invoked as: `meson test -C builddir --suite $suitename` (meson test has addition options that are pretty useful). Tested-By: Gert Wollny <gert.wollny@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-20 09:09:22 -08:00
Kenneth Graunke	5b682143da	nir: Make nir_lower_clip_vs optionally work with variables. The way nir_lower_clip_vs() works with store_output intrinsics makes a ton of assumptions about the driver_location field. In i965 and iris, I'd rather do this lowering early and work with variables. v3d may want to switch to that as well, and ir3 could too, but I'm not sure exactly what would need updating. For now, handle both methods. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-19 14:33:16 -08:00
Kenneth Graunke	d0f746b645	nir: Save nir_variable pointers in nir_lower_clip_vs rather than locs. I'll want the variables in the next patch. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-19 14:33:16 -08:00
Kenneth Graunke	63c8696874	nir: Inline lower_clip_vs() into nir_lower_clip_vs(). It's now called exactly once, and there's not really any distinction. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-19 14:33:14 -08:00
Kenneth Graunke	bfa789aceb	nir: Use nir_shader_get_entrypoint in nir_lower_clip_vs(). Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-19 14:31:20 -08:00
Dave Airlie	c8a35285f0	nir: handle shared pointers in lowering indirect derefs. Check if the base ends up with no variable, and continue if we see that case outside the loop. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-20 05:36:52 +10:00
Dave Airlie	760859cac2	nir: move getting deref from var after we check deref type. I posted a load of hacks before to do this, Jason suggested this, just check the deref mode, not the variable mode and delay getting the variable until we know the type. avoids crashes when derefing shared memory pointers. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-20 05:36:38 +10:00
Jason Ekstrand	060817b2fa	intel,nir: Move gl_LocalInvocationID lowering to nir_lower_system_values It's not at all intel-specific; the formula is dictated by OpenGL and Vulkan. The only intel-specific thing is that we need the lowering. As a nice side-effect, the new version is variable-group-size ready. Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>	2018-11-19 09:57:41 -06:00
Jason Ekstrand	d34fd81e76	nir: Add alignment parameters to SSBO, UBO, and shared access This also changes spirv_to_nir and glsl_to_nir to set them. The one place that doesn't set them is shared memory access lowering in nir_lower_io. That will have to be updated before any consumers of it can effectively use these new alignments. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Acked-by: Karol Herbst <kherbst@redhat.com>	2018-11-15 19:59:42 -06:00
Jason Ekstrand	fb127f7729	nir/lower_io: Add shared to get_io_offset_src Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-15 19:59:31 -06:00
Jason Ekstrand	f16bd8a9fe	nir/builder: Add a nir_pack/unpack/bitcast helpers The new helpers can generate any pack/unpack operation including those for which we do not have specific opcodes and they express a bitcast in terms of these pack/unpack operations. In particular, the new helpers properly handle 8-bit types. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-15 19:59:28 -06:00
Jason Ekstrand	b77d68b78e	nir/builder: Add iadd_imm and imul_imm helpers The pattern of adding or multiplying an integer by an immediate is fairly common especially in deref chain handling. This adds a helper for it and uses it a few places. The advantage to the helper is that it automatically handles bit sizes for you. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-11-15 19:59:27 -06:00
Jason Ekstrand	1f29f4db1e	nir/builder: Assert that intN_t immediates fit This assert won't catch all mistakes with this helper but it will at least ensure that the top bits are all zero or all one which should help catch bugs. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-15 19:59:26 -06:00
Jason Ekstrand	4266932c0b	nir/lower_alu_to_scalar: Don't try to lower unpack_32_2x16 It messes up when trying to lower. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-15 19:59:09 -06:00
Andrii Simiklit	7aca650122	compiler: avoid 'unused variable' warnings 1. nir/nir_lower_vars_to_ssa.c:691:21: warning: unused variable ‘var’ nir_variable *var = path->path[0]->var; v2: Changes for some part of 'may be used uninitialized' warnings were removed, seems like it is a compiler issue. ( Eric Engestrom <eric.engestrom@intel.com> ) Possible like this one: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46684 This issue is flagged as duplicate but an original one is not closed yet. Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-14 13:35:38 +00:00
Gert Wollny	4bba280937	nir: Allow to skip integer ops in nir_lower_to_source_mods Some hardware supports source mods only for float operations. Make it possible to skip lowering to source mods in these cases. v2: use option flags instead of a boolean (Jason Ekstrand) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-14 08:59:26 +01:00
Karol Herbst	099728b115	nir: replace nir_load_system_value calls with appropiate builder functions this helps reduce the overall code changes when a bit_size parameter is added to nir_load_system_value Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Karol Herbst <kherbst@redhat.com>	2018-11-14 02:09:11 +01:00
Karol Herbst	80db331c2d	nir: add const_index parameters to system value builder function this allows to replace some nir_load_system_value calls with the specific system value constructor Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Karol Herbst <kherbst@redhat.com>	2018-11-14 02:09:11 +01:00
Plamena Manolova	c5f3013cba	nir: Don't lower the local work group size if it's variable. If the local work group size is variable it won't be available at compile time so we can't lower it in nir_lower_system_values(). Signed-off-by: Plamena Manolova <plamena.n.manolova@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-11-13 10:57:04 +02:00
Timothy Arceri	34dffcf913	nir: add support for removing redundant stores to copy prop var For example the following type of thing is seen in TCS from a number of Vulkan and DXVK games: vec1 32 ssa_557 = deref_var &oPatch (shader_out float) vec1 32 ssa_558 = intrinsic load_deref (ssa_557) () vec1 32 ssa_559 = deref_var &oPatch@42 (shader_out float) vec1 32 ssa_560 = intrinsic load_deref (ssa_559) () vec1 32 ssa_561 = deref_var &oPatch@43 (shader_out float) vec1 32 ssa_562 = intrinsic load_deref (ssa_561) () intrinsic store_deref (ssa_557, ssa_558) (1) /* wrmask=x / intrinsic store_deref (ssa_559, ssa_560) (1) / wrmask=x / intrinsic store_deref (ssa_561, ssa_562) (1) / wrmask=x */ No shader-db changes on i965 (SKL). vkpipeline-db results RADV (VEGA): Totals from affected shaders: SGPRS: 7832 -> 7728 (-1.33 %) VGPRS: 6476 -> 6740 (4.08 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 469572 -> 456596 (-2.76 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 989 -> 960 (-2.93 %) Wait states: 0 -> 0 (0.00 %) The Max Waves and VGPRS changes here are misleading. What is happening is a bunch of TCS outputs are being optimised away as they are now recognised as unused. This results in more varyings being compacted via nir_compact_varyings() which can result in more register pressure when they are not packed in an optimal way. This is an existing problem independent of this patch. I've run some benchmarks and haven't noticed any performance regressions in affected games. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-13 15:19:36 +11:00
Christian Gmeiner	c6aaafa3a1	nir: add lowering for ffloor Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-12 21:57:25 +01:00
Lionel Landwerlin	8a15f06d19	nir/lower_tex: Add AYUV lowering support Byte ordering is : 0: V 1: U 2: Y 3: A v2: Split refactoring of alpha channel (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1) Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v2)	2018-11-12 13:22:54 +00:00
Lionel Landwerlin	0a30c33e83	nir/lower_tex: add alpha channel parameter for yuv lowering We're about to introduce AYUV support which provides its own alpha channel. So give alpha as a parameter and set it to 1 on exising formats. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-12 13:22:54 +00:00
Timothy Arceri	a068958692	nir: don't pack varyings ints with floats unless flat Fixes: `1c9c42d16b` ("nir: add varying component packing helpers") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-12 15:38:56 +11:00
Timothy Arceri	d40dd05553	nir: add new linking opt nir_link_constant_varyings() This pass moves constant outputs to the consuming shader stage where possible. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-10 11:41:00 +11:00
Iago Toral Quiroga	35baee5dce	nir/constant_folding: fix incorrect bit-size check nir_alu_type_get_type_size takes a type as parameter and we were passing a bit-size instead, which did what we wanted by accident, since a bit-size of zero matches nir_type_invalid, which has a size of 0 too. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-09 08:22:15 +01:00
Jason Ekstrand	61e15348c4	nir: Add a read_mask helper for ALU instructions Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:22 -06:00
Timothy Arceri	769ae9fb7f	nir: fix condition propagation when src has a swizzle We cannot use nir_build_alu() to create the new alu as it has no way to know how many components of the src we will use. This results in it guessing the max number of components from one of its inputs. Fixes the following CTS tests: dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_geom dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_tessc dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_vert Fixes: `2975422ceb` ("nir: propagates if condition evaluation down some alu chains") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-03 00:44:01 +11:00
Timothy Arceri	c7bdda8aa5	nir: allow propagation of if evaluation for bcsel Shader-db results Skylake: total instructions in shared programs: 13109035 -> 13109024 (<.01%) instructions in affected programs: 4777 -> 4766 (-0.23%) helped: 11 HURT: 0 total cycles in shared programs: 332090418 -> 332090443 (<.01%) cycles in affected programs: 19474 -> 19499 (0.13%) helped: 6 HURT: 4 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-02 15:56:34 +11:00
Timothy Arceri	5b757b4097	nir: fix if condition propagation for alu use We need to update the cursor before we check if the alu use is dominated by the if condition. Previously we were checking if the current location of the alu instruction was dominated by the if condition which would miss some optimisation opportunities. Fixes: `a3b4cb3458` ("nir/opt_if: Rework condition propagation") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-01 09:22:55 +11:00
Eric Anholt	8265dfaa87	nir: Allow using nir_lower_io_to_scalar_early on VS input vars. This will be used on V3D to cut down the size of the VS inputs in the VPM (memory area for sharing data between shader stages). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-30 10:46:52 -07:00
Brian Paul	9007c0ed26	nir: fix yet another MSVC build break Trivial.	2018-10-29 11:15:12 -06:00
Jason Ekstrand	19064b8c3a	nir: Add a pass for gathering transform feedback info This is different from the GL_ARB_spirv pass because it generates a much simpler data structure that isn't tied to OpenGL and mtypes.h. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-29 17:09:08 +01:00
Brian Paul	7e64e39f8b	nir: Fix array initializer Empty initializer is not standard C. This fixes MSVC build. Trivial.	2018-10-26 12:35:48 -06:00
Jason Ekstrand	5bfce5fcc2	nir/system_values: Use the bit size from the load_deref This isn't a great solution for bit-sizes but we don't have a particularly convenient way to get a bit size from the system value enum and this keeps the lowering pass from changing it. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	a3b4cb3458	nir/opt_if: Rework condition propagation Instead of doing our own constant folding, we just emit instructions and let constant folding happen. This is substantially simpler and lets us use the nir_imm_bool helper instead of dealing with the const_value's ourselves. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	4cd8a58595	nir/search: Use the nir_imm_* helpers from nir_builder This requires that we rework the interface a bit to use nir_builder but that's a nice little modernization anyway. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	6e32115bd6	nir/builder: Handle 16-bit floats in nir_imm_floatN_t Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	ff45649bc2	nir/builder: Add a nir_imm_true/false helpers Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	249e32ab17	nir/constant_folding: Use nir_src_as_bool for discard_if Missed one while converting to the nir_src_as_* helpers. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	6de1869e86	nir/constant_folding: Add an unreachable to a switch Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	28bb6abd1d	nir/validate: Print when the validation failed Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Eric Engestrom	e27902a261	util: use C99 declaration in the for-loop set_foreach() macro Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-25 12:43:18 +01:00
Eric Engestrom	bb84fa146f	util: use C99 declaration in the for-loop hash_table_foreach() macro Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-25 12:43:18 +01:00
Jose Fonseca	d9a04196d9	nir: Fix array initializer. Empty initializer is not standard C. This fixes MSVC build. Trivial.	2018-10-24 11:37:09 +01:00
Juan A. Suarez Romero	3112da346b	nir: fix nir_copy_propagation test Use nir_src_comp_as_uint() to read the proper second component, as nir_src_as_uint() returns the first one. v2: Use nir_src_comp_as_uint() [Jason] Fixes: `16870de8a0` ("nir: Use nir_src_is_const and nir_src_as_* in core code") Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108532 Tested-by: Michel Dänzer <michel.daenzer@amd.com> Tested-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-24 09:13:24 +02:00
Samuel Pitoiset	7c694cbfa4	nir: add linking helper nir_link_xfb_varyings() The linking opts shouldn't try removing or compacting XFB varyings in the consumer. To avoid this we copy the always_active_io flag from the producer. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-24 08:21:29 +11:00
Jason Ekstrand	ecb7775e1c	nir/algebraic: Fix a typo in the bit size validation code The conon_bit_class and canon_var_class variables got switched. Fixes: `932c650e0b` "nir/algebraic: Loosen a restriction on variables" Reported-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-23 12:22:29 -05:00
Jason Ekstrand	bf441d22a7	nir/algebraic: Provide descriptive asserts for bit size checks This will hopefully make debugging opt_algebraic bit-size compile failures easier. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	932c650e0b	nir/algebraic: Loosen a restriction on variables Previously, we would fail if a variable had an assigned but unknown bit size X and we tried to assign it an actual bit size. However, this is ok because, at the time we do the search, the variable does have an actual bit size and it will match X because of the NIR rules. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	ea9e651423	nir/algebraic: A bit of validation refactoring' We rename some local variables in validate() to be more readable and plumb the var through to get/set_var_bit_class instead of the var index. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	641f4be8e8	nir/algebraic: Make internal classes str-able Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	6068be543b	nir/algebraic: Generalize an optimization There's nothing boolean about (a \| ~a) ~> -1 Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	69618a8678	nir/algebraic: Use bool internally instead of bool32 Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	16870de8a0	nir: Use nir_src_is_const and nir_src_as_* in core code Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 14:24:15 -05:00
Jason Ekstrand	ce36f412c9	nir/search_helpers: Use nir_src_is_const and friends This not only makes them safe for more bit sizes but it also fixes a bug in is_zero_to_one where it would return true for constant NaN. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 14:24:15 -05:00
Jason Ekstrand	7bae7828aa	nir/search: Use nir_src_is_const and friends Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 14:24:15 -05:00
Jason Ekstrand	bca5c2c688	nir: Add some new helpers for working with const sources Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 14:24:15 -05:00
Dave Airlie	ff281e6204	nir: fix clip cull lowering to not assert if GLSL already lowered. If GLSL has already done the lowering, we'd rather not crash in this pass. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-10-15 18:53:48 -07:00
Caio Marcelo de Oliveira Filho	b3c6146925	nir: Copy propagation between blocks Extend the pass to propagate the copies information along the control flow graph. It performs two walks, first it collects the vars that were written inside each node. Then it walks applying the copy propagation using a list of copies previously available. At each node the list is invalidated according to results from the first walk. This approach is simpler than a full data-flow analysis, but covers various cases. If derefs are used for operating on more memory resources (e.g. SSBOs), the difference from a regular pass is expected to be more visible -- as the SSA copy propagation pass won't apply to those. A full data-flow analysis would handle more scenarios: conditional breaks in the control flow and merge equivalent effects from multiple branches (e.g. using a phi node to merge the source for writes to the same deref). However, as previous commentary in the code stated, its complexity 'rapidly get out of hand'. The current patch is a good intermediate step towards more complex analysis. The 'copies' linked list was modified to use util_dynarray to make it more convenient to clone it (to handle ifs/loops). Annotated shader-db results for Skylake: total instructions in shared programs: 15105796 -> 15105451 (<.01%) instructions in affected programs: 152293 -> 151948 (-0.23%) helped: 96 HURT: 17 All the HURTs and many HELPs are one instruction. Looking at pass by pass outputs, the copy prop kicks in removing a bunch of loads correctly, which ends up altering what other other optimizations kick. In those cases the copies would be propagated after lowering to SSA. In few HELPs we are actually helping doing more than was possible previously, e.g. consolidating load_uniforms from different blocks. Most of those are from shaders/dolphin/ubershaders/. total cycles in shared programs: 566048861 -> 565954876 (-0.02%) cycles in affected programs: 151461830 -> 151367845 (-0.06%) helped: 2933 HURT: 2950 A lot of noise on both sides. total loops in shared programs: 4603 -> 4603 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 11085 -> 11073 (-0.11%) spills in affected programs: 23 -> 11 (-52.17%) helped: 1 HURT: 0 The shaders/dolphin/ubershaders/12.shader_test was able to pull a couple of loads from inside if statements and reuse them. total fills in shared programs: 23143 -> 23089 (-0.23%) fills in affected programs: 2718 -> 2664 (-1.99%) helped: 27 HURT: 0 All from shaders/dolphin/ubershaders/. LOST: 0 GAINED: 0 The other generations follow the same overall shape. The spills and fills HURTs are all from the same game. shader-db results for Broadwell. total instructions in shared programs: 15402037 -> 15401841 (<.01%) instructions in affected programs: 144386 -> 144190 (-0.14%) helped: 86 HURT: 9 total cycles in shared programs: 600912755 -> 600902486 (<.01%) cycles in affected programs: 185662820 -> 185652551 (<.01%) helped: 2598 HURT: 3053 total loops in shared programs: 4579 -> 4579 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 80929 -> 80924 (<.01%) spills in affected programs: 720 -> 715 (-0.69%) helped: 1 HURT: 5 total fills in shared programs: 93057 -> 93013 (-0.05%) fills in affected programs: 3398 -> 3354 (-1.29%) helped: 27 HURT: 5 LOST: 0 GAINED: 2 shader-db results for Haswell: total instructions in shared programs: 9231975 -> 9230357 (-0.02%) instructions in affected programs: 44992 -> 43374 (-3.60%) helped: 27 HURT: 69 total cycles in shared programs: 87760587 -> 87727502 (-0.04%) cycles in affected programs: 7720673 -> 7687588 (-0.43%) helped: 1609 HURT: 1416 total loops in shared programs: 1830 -> 1830 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 1988 -> 1692 (-14.89%) spills in affected programs: 296 -> 0 helped: 1 HURT: 0 total fills in shared programs: 2103 -> 1668 (-20.68%) fills in affected programs: 438 -> 3 (-99.32%) helped: 4 HURT: 0 LOST: 0 GAINED: 1 v2: Remove the DISABLE prefix from tests we now pass. v3: Add comments about missing write_mask handling. (Caio) Add unreachable when switching on cf_node type. (Jason) Properly merge the component information in written map instead of replacing. (Jason) Explain how removal from written arrays works. (Jason) Use mode directly from deref instead of getting the var. (Jason) v4: Register the local written mode for calls. (Jason) Prefer cf_node instead of node. (Jason) Clarify that remove inside iteration only works in backward iterations. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	dc349f07b5	nir: Take call instruction into account in copy_prop_vars Calls are not used yet (functions are inlined), but since new code is already taking them into account, do it here too. The convention here and in other places is that no writable memory is assumed to remain unchanged, as well as global variables. Also, explicitly state the modes affected (instead of using the reverse logic) in one of the apply_for_barrier_modes calls. Suggested by Jason. v2: Consider local vars used by a call to be conservative, SPIR-V has such cases. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	797f01c220	nir: Add tests for copy propagation of derefs Also tests for removal of redundant loads, that we currently handle as part of the copy propagation. Note some tests involve multiple blocks and are currently DISABLED because they (expectedly) fail. v2: Add missing DISABLED prefix to "multi block" tests. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	4dfa7adc10	nir: Remove handling of dead writes from copy_prop_vars These are covered by another pass now. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	cb126cf67a	nir: Separate dead write removal into its own pass Instead of doing this as part of the existing copy_prop_vars pass. Separation makes easier to expand the scope of both passes to be more than per-block. For copy propagation, the information about valid copies comes from previous instructions; while the dead write removal depends on information from later instructions ("have any instruction used this deref before overwrite it?"). Also change the tests to use this pass (instead of copy prop vars). Note that the disabled tests continue to fail, since the standalone pass is still per-block. v2: Remove entries from dynarray instead of marking items as deleted. Use foreach_reverse. (Caio) (all from Jason) Do not cache nir_deref_path. Not worthy for this patch. Clear unused writes when hitting a call instruction. Clean up enumeration of modes for barriers. Move metadata calls to the inner function. v3: For copies, use the vector length to calculate the mask. (all from Jason) Use nir_component_mask_t when applicable. Rename functions for clarity. Consider local vars used by a call to be conservative (SPIR-V has such cases). Comment and assert the assumption that stores and copies are always to a deref that ends with a vector or scalar. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	a02fd7000d	nir: Add tests for dead write elimination Note at the moment the pass called is nir_opt_copy_prop_vars, because dead write elimination is implemented there. Also added tests that involve identifying dead writes in multiple blocks (e.g. the overwrite happens in another block). Those currently fail as expected, so are marked to be skipped. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	bbda2a17f7	nir: Add test file for vars related passes Add basic helpers for doing tests on the vars related optimization passes. The main goal is to lower the barrier to create tests during development and debugging of the passes. Full coverage is not a requirement. v2: Make find_next_intrinsic() skip blocks before 'after'. (Jason) Move nir_imm_ivec2() to nir_builder.h. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	c869646b7d	nir: Add nir_imm_ivec2 helper Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Eric Anholt	7d77fe1bcc	nir: Expose nir_remove_unused_io_vars(). For gallium drivers where you want to do some linking at variant compile time, you don't have the other producer/consumer shader on hand to modify. By exposing the inner function, the driver can have the used varyings in the compiled shader cache key and still do linking. This is also useful for V3D, where the binning shader wants to only output position and TF varyings. We've been removing those after nir_lower_io, but this will be less driver-specific code and let more of the shader get DCEed early in NIR. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-15 17:16:44 -07:00
Eric Anholt	b788ab6d5c	nir: Be sure to fix deref modes after demoting shader i/o vars to global. Fixes assertion failures when calling nir_remove_unused_varyings() or nir_remove_unused_io_vars(). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-15 17:16:44 -07:00
Kenneth Graunke	ed169c9ad2	nir: Create sampler2D variables in nir_lower_{bitmap,drawpixels}. This is needed for nir_gather_info to actually count the new textures, since it operates solely on variables. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-14 23:35:35 -07:00
Samuel Pitoiset	4b74f05f6b	spirv/nir: handle memory access qualifiers for SSBO loads/stores v2: - change how the access qualifiers are accumulated v3: - duplicate members in struct_member_decoration_cb() - handle access qualifiers on variables - remove access qualifiers handling in _vtn_variable_load_store() - fix setting access qualifiers on type->array_element Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net	2018-10-12 08:42:08 +02:00
Jason Ekstrand	d7e0d47b9d	nir: Add a bunch of b2[if] optimizations The b2f and b2i conversions always produce zero or one which are both representable in every type and size. Since b2i and b2f support all bit sizes, we can just get rid of the conversion opcode. total instructions in shared programs: 15089335 -> 15084368 (-0.03%) instructions in affected programs: 212564 -> 207597 (-2.34%) helped: 896 HURT: 0 total cycles in shared programs: 369831123 -> 369826267 (<.01%) cycles in affected programs: 2008647 -> 2003791 (-0.24%) helped: 693 HURT: 216 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-11 15:21:19 -05:00
Ian Romanick	a68dd47b91	nir/algebraic: Simplify fsat of fsign These allows us to not support fsign.sat in the Intel compiler backend, and that will simplify some later changes. No shader-db changes on any Intel platform. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-10-09 13:56:42 -07:00
Ian Romanick	1546204cdd	nir/algebraic: sign(x)xx is abs(x)*x shader-db results: All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15106023 -> 15105981 (<.01%) instructions in affected programs: 300 -> 258 (-14.00%) helped: 6 HURT: 0 helped stats (abs) min: 7 max: 7 x̄: 7.00 x̃: 7 helped stats (rel) min: 14.00% max: 14.00% x̄: 14.00% x̃: 14.00% 95% mean confidence interval for instructions value: -7.00 -7.00 95% mean confidence interval for instructions %-change: -14.00% -14.00% Instructions are helped. total cycles in shared programs: 566050327 -> 566050075 (<.01%) cycles in affected programs: 2826 -> 2574 (-8.92%) helped: 6 HURT: 0 helped stats (abs) min: 40 max: 44 x̄: 42.00 x̃: 42 helped stats (rel) min: 8.89% max: 8.94% x̄: 8.92% x̃: 8.92% 95% mean confidence interval for cycles value: -44.30 -39.70 95% mean confidence interval for cycles %-change: -8.95% -8.88% Cycles are helped. No changes on Gen6 or earlier. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-10-09 13:56:42 -07:00
Ian Romanick	10f4a8871e	nir: Add helper functions to get the instruction that generated a nir_src Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-10-09 13:56:42 -07:00
Jason Ekstrand	dd553bc67f	nir/alu_to_scalar: Use ssa_for_alu_src in hand-rolled expansions The ssa_for_alu_src helper will correctly handle swizzles and other source modifiers for you. The expansions for unpack_half_2x16, pack_uvec2_to_uint, and pack_uvec4_to_uint were all broken with regards to swizzles. The brokenness of unpack_half_2x16 was causing rendering errors in Rise of the Tomb Raider on Intel ever since `c11833ab24` which added an extra copy propagation to the optimization pipeline and caused us to start seeing swizzles where we hadn't seen any before. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926 Fixes: `9ce901058f` "nir: Add lowering of nir_op_unpack_half_2x16." Fixes: `9b8786eba9` "nir: Add lowering support for packing opcodes." Tested-by: Alex Smith <asmith@feralinteractive.com> Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-10-04 12:43:59 -05:00
Jason Ekstrand	00f385e6d4	nir/from_ssa: Don't rewrite derefs destinations to registers We already call nir_rematerialize_derefs_in_use_blocks_impl prior to calling nir_lower_ssa_defs_to_regs_block so the assertion that all deref uses in the block should hold. This fixes the following CTS test when SPIR-V optimization recipe 1: dEQP-VK.glsl.struct.local.loop_nested_struct_array_vertex Fixes: `606eb56ab9` "intel/nir: Only lower load/store derefs" Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-10-02 10:24:56 -05:00
Jason Ekstrand	bfc89c668e	nir/cf: Remove phi sources if needed in nir_handle_add_jump If the block in which the jump is inserted is the predecessor of a phi then we need to remove phi sources otherwise the phi may end up with things improperly connected. This fixes the following CTS test when dEQP is run with SPIR-V optimization recipe 1: dEQP-VK.glsl.functions.control_flow.return_in_nested_loop_vertex Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-10-02 10:24:56 -05:00
Juan A. Suarez Romero	0c82e3603e	nir: add initializer data to fix MSVC compile error CC: Jason Ekstrand <jason@jlekstrand.net> Fixes: 82799a5d1b8 ("nir: Add a small pass to rematerialize derefs per-block") Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-09-19 11:46:44 +02:00
Jason Ekstrand	976046a8d8	nir: Add some asserts that we don't put derefs in phis The lcssa and phis_to_regs passes are used by various NIR optimizations that modify the CFG. Putting a couple of asserts will help ensure that we don't accidentally put derefs in phis as part of an optimization pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-09-19 02:00:49 -05:00
Jason Ekstrand	864c780566	nir/opt_if: Re-materialize derefs in use blocks before peeling loops Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107879 Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-09-19 02:00:49 -05:00
Jason Ekstrand	0796c3934e	nir/loop_unroll: Re-materialize derefs in use blocks before unrolling When we're about to re-arrange a bunch of blocks, it's a good idea to make sure that we don't have deref uses crossing block boundaries. Otherwise we may end up with a deref going through a phi and that would be bad. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-09-19 01:59:40 -05:00
Jason Ekstrand	7d1d1208c2	nir: Add a small pass to rematerialize derefs per-block This pass re-materializes deref instructions on a per-block basis to ensure that every use of a deref occurs in the same block as the instruction which uses it. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-09-19 01:59:40 -05:00
Timothy Arceri	21e34bab09	nir: add loop unroll support for complex wrapper loops In GLSL IR we cheat with switch statements and simply convert them into loops with a single iteration. This allowed us to make use of the existing jump instruction handling provided by the loop handing code, it also allows dead code to be cleaned up once we have wrapped the code in a loop. However using loops in this way created previously unrollable loops which limits further optimisations. Here we provide a way to unroll loops that end in a break and have multiple other exits. All shader-db changes are from the dolphin uber shaders. There is a small amount of HURT shaders but in general the improvements far exceed the HURT. shader-db results IVB: total instructions in shared programs: 10018187 -> 10016468 (-0.02%) instructions in affected programs: 104080 -> 102361 (-1.65%) helped: 36 HURT: 15 total cycles in shared programs: 220065064 -> 154529655 (-29.78%) cycles in affected programs: 126063017 -> 60527608 (-51.99%) helped: 51 HURT: 0 total loops in shared programs: 2515 -> 2308 (-8.23%) loops in affected programs: 903 -> 696 (-22.92%) helped: 51 HURT: 0 total spills in shared programs: 4370 -> 4124 (-5.63%) spills in affected programs: 1397 -> 1151 (-17.61%) helped: 9 HURT: 12 total fills in shared programs: 4581 -> 4419 (-3.54%) fills in affected programs: 2201 -> 2039 (-7.36%) helped: 9 HURT: 15 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-14 16:07:36 +10:00
Timothy Arceri	2975422ceb	nir: propagates if condition evaluation down some alu chains v2: - only allow nir_op_inot or nir_op_b2i when alu input is 1. - use some helpers as suggested by Jason. v3: - evaluate alu op for single input alu ops - add helper function to decide if to propagate through alu - make use of nir_before_src in another spot shader-db IVB results: total instructions in shared programs: 9993483 -> 9993472 (-0.00%) instructions in affected programs: 1300 -> 1289 (-0.85%) helped: 11 HURT: 0 total cycles in shared programs: 219476091 -> 219476059 (-0.00%) cycles in affected programs: 7675 -> 7643 (-0.42%) helped: 10 HURT: 1 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-14 16:07:36 +10:00
Timothy Arceri	ef4ad7baf1	nir: evaluate if condition uses inside the if branches Since we know what side of the branch we ended up on we can just replace the use with a constant. All the spill changes in shader-db are from Dolphin uber shaders, despite some small regressions the change is clearly positive. V2: insert new constant after any phis in the use->parent_instr->type == nir_instr_type_phi path. v3: - use nir_after_block_before_jump() for inserting const - check dominance of phi uses correctly v4: - create some helpers as suggested by Jason. v5 (Jason Ekstrand): - Use LIST_ENTRY to get the phi src shader-db results IVB: total instructions in shared programs: 9999201 -> 9993483 (-0.06%) instructions in affected programs: 163235 -> 157517 (-3.50%) helped: 132 HURT: 2 total cycles in shared programs: 231670754 -> 219476091 (-5.26%) cycles in affected programs: 143424120 -> 131229457 (-8.50%) helped: 115 HURT: 24 total spills in shared programs: 4383 -> 4370 (-0.30%) spills in affected programs: 1656 -> 1643 (-0.79%) helped: 9 HURT: 18 total fills in shared programs: 4610 -> 4581 (-0.63%) fills in affected programs: 374 -> 345 (-7.75%) helped: 6 HURT: 0 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-14 16:07:36 +10:00
Dylan Baker	8396043f30	Replace uses of _mesa_bitcount with util_bitcount and _mesa_bitcount_64 with util_bitcount_64. This fixes a build problem in nir for platforms that don't have popcount or popcountll, such as 32bit msvc. v2: - Fix additional uses of _mesa_bitcount added after this was originally written Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1) Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-09-07 10:21:26 -07:00
Jason Ekstrand	44ec31cd75	nir: Drop the vs_inputs_dual_locations option It was very inconsistently handled; the only things that made use of it were glsl_to_nir, glspirv, and nir_gather_info. In particular, nir_lower_io completely ignored it so anyone using nir_lower_io on 64-bit vertex attributes was going to be in for a shock. Also, as of the previous commit, it's set by every driver that supports 64-bit vertex attributes. There's no longer any reason to have it be an option so let's just delete it. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-09-06 16:07:50 -05:00
Jason Ekstrand	0909a57b63	radeonsi/nir: Set vs_inputs_dual_locations and let NIR do the remap We were going out of our way to disable dual-location re-mapping in NIR only to then do the remapping in st_glsl_to_nir.cpp. Presumably, this was so that double_inputs would be correct for the core state tracker. However, now that we've it to gl_program::DualSlotInputs which is unaffected by NIR lowering, we can let NIR lower things for us. The one tricky bit here is that we have to remap the inputs_read bitfield back to the single-slot convention for the gallium state tracker to use. Since radeonsi is the only NIR-capable gallium driver that also supports GL_ARB_vertex_attrib_64bit, we only have to worry about radeonsi when making core gallium state tracker changes. Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-09-06 16:07:50 -05:00
Jason Ekstrand	25efd787cf	compiler: Move double_inputs to gl_program::DualSlotInputs Previously, we had two field in shader_info: double_inputs_read and double_inputs. Presumably, the one was for all double inputs that are read and the other is all that exist. However, because nir_gather_info regenerates these two values, there is a possibility, if a variable gets deleted, that the value of double_inputs could change over time. This is a problem because double_inputs is used to remap the input locations to a two-slot-per-dvec3/4 scheme for i965. If that mapping were to change between glsl_to_nir and back-end state setup, we would fall over when trying to map the NIR outputs back onto the GL location space. This commit changes the way slot re-mapping works. Instead of the double_inputs field in shader_info, it adds a DualSlotInputs bitfield to gl_program. By having it in gl_program, we more easily guarantee that NIR passes won't touch it after it's been set. It also makes more sense to put it in a GL data structure since it's really a mapping from GL slots to back-end and/or NIR slots and not really a NIR shader thing. Tested-by: Alejandro Piñeiro <apinheiro@igalia.com> (ARB_gl_spirv tests) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-09-06 16:07:50 -05:00
Jason Ekstrand	09f1de97a7	anv,i965: Lower away image derefs in the driver Previously, the back-end compiler turn image access into magic uniform reads and there was a complex contract between back-end compiler and driver about setting up and filling out those params. As of this commit, both drivers now lower image_deref_load_param_intel intrinsics to load_uniform intrinsics controlled by the driver and lower the other image_deref_* intrinsics to image_* intrinsics which take an actual binding table index. There are still "magic" uniforms but they are now added and controlled entirely by the driver and that contract no longer spans components. This also has the side-effect of making most image use compile-time binding table indices. Previously, all image access pulled the binding table index from a uniform. Part of the reason for this was that the magic uniforms made it difficult to decouple binding table indices from the uniforms and, since they are indexed completely differently (especially in Vulkan), it was hard to pull them apart. Now that the driver is handling both, it's trivial to decouple the two and provide actual binding table indices. Shader-db results on Kaby Lake: total instructions in shared programs: 15166872 -> 15164293 (-0.02%) instructions in affected programs: 115834 -> 113255 (-2.23%) helped: 191 HURT: 0 total cycles in shared programs: 571311495 -> 571196465 (-0.02%) cycles in affected programs: 4757115 -> 4642085 (-2.42%) helped: 73 HURT: 67 total spills in shared programs: 10951 -> 10926 (-0.23%) spills in affected programs: 742 -> 717 (-3.37%) helped: 7 HURT: 0 total fills in shared programs: 22226 -> 22201 (-0.11%) fills in affected programs: 1146 -> 1121 (-2.18%) helped: 7 HURT: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:03 -05:00
Jason Ekstrand	0de003be03	nir: Add handle/index-based image intrinsics Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	3942943819	nir: Use a bitfield for image access qualifiers This commit expands the current memory access enum to contain the extra two bits provided for images. We choose to follow the SPIR-V convention of NonReadable and NonWriteable because readonly implies that you can read so readonly + writeonly doesn't make as much sense as NonReadable + NonWriteable. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	37f7983bcc	intel/compiler: Do image load/store lowering to NIR This commit moves our storage image format conversion codegen into NIR instead of doing it in the back-end. This has the advantage of letting us run it through NIR's optimizer which is pretty effective at shrinking things down. In the common case of rgba8, the number of instructions emitted after NIR is done with it is half of what it was with the lowering happening in the back-end. On the downside, the back-end's lowering is able to directly use predicates and the NIR lowering has to use IFs. Shader-db results on Kaby Lake: total instructions in shared programs: 15166910 -> 15166872 (<.01%) instructions in affected programs: 5895 -> 5857 (-0.64%) helped: 15 HURT: 0 Clearly, we don't have that much image_load_store happening in the shaders in shader-db.... Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	15d39f474b	nir: Make image load/store intrinsics variable-width Instead of requiring 4 components, this allows them to potentially use fewer. Both the SPIR-V and GLSL paths still generate vec4 intrinsics so drivers which assume 4 components should be safe. However, we want to be able to shrink them for i965. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	7cdf8f9339	nir/format_convert: Fix a bitmask in unpack_11f11f10f Fixes: `4e337b42f9` "nir/format_convert: Add pack/unpack for R11F_G11F_B10F" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	1f7be4968f	nir/format_convert: Rename pack_r11g11b10f to pack_11f11f10f This matches the unpack function. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	7bd0363d6f	nir/format_convert: Add [us]norm conversion helpers Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	152fdeddbb	nir/format_convert: Rename nir_format_bitcast_uint_vec We have a name for that, it's called a uvec. This just makes the function name a bit shorter. While we're here, we also add an assert for one of the assumptions this function makes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	7c5df52bdc	nir/format_convert: Add vec mask and sign-extend helpers Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	ea4f200864	nir/format_convert: Add support for unpacking signed integers Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	80c424148b	nir/opcodes: Make unpack_half_2x16_split_* variable-width There is nothing inherent about these opcodes that requires them to only take scalars. It's very convenient if we let them take vectors as well. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	d448fa3ae3	nir/algebraic: Add some max/min optimizations Found by inspection. This doesn't help much now but we'll see this pattern with images if you load UNORM and then store UNORM. Shader-db results on Kaby Lake: total instructions in shared programs: 15166916 -> 15166910 (<.01%) instructions in affected programs: 761 -> 755 (-0.79%) helped: 6 HURT: 0 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	4dd5263663	nir/algebraic: Add more extract_[iu](8\|16) optimizations This adds the "(a << N) >> M" family of mask or sign-extensions. Not a huge win right now but this pattern will soon be generated by NIR format lowering code. Shader-db results on Kaby Lake: total instructions in shared programs: 15166918 -> 15166916 (<.01%) instructions in affected programs: 36 -> 34 (-5.56%) helped: 2 HURT: 0 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	116b47fe3c	nir/algebraic: Be more careful converting ushr to extract_u8/16 If it's not the right bit-size, it may not actually be the correct extraction. For now, we'll only worry about 32-bit versions. Fixes: `905ff86198` "nir: Recognize open-coded extract_u16" Fixes: `76289fbfa8` "nir: Recognize open-coded extract_u8" Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Timothy Arceri	5db981952a	nir: add loop unroll support for wrapper loops This adds support for unrolling the classic do { // ... } while (false) that is used to wrap multi-line macros. GLSL IR also wraps switch statements in a loop like this. shader-db results IVB: total loops in shared programs: 2515 -> 2512 (-0.12%) loops in affected programs: 33 -> 30 (-9.09%) helped: 3 HURT: 0 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-29 16:02:05 +10:00
Timothy Arceri	0f450b57a1	nir/opt_loop_unroll: Remove unneeded phis if we make progress Now that SSA values can be derefs and they have special rules, we have to be a bit more careful about our LCSSA phis. In particular, we need to clean up in case LCSSA ended up creating a phi node for a deref. This avoids validation issues with some CTS tests with the following patch, but its possible this we could also see the same problem with the existing unrolling passes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-29 16:02:05 +10:00
Timothy Arceri	5a6b04d94b	nir: add complex_loop bool to loop info In order to be sure loop_terminator_list is an accurate representation of all the jumps in the loop we need to be sure we didn't encounter any other complex behaviour such as continues, nested breaks, etc during analysis. This will be used in the following patch. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-29 16:02:05 +10:00
Timothy Arceri	fef6325e58	nir: always attempt to find loop terminators This will help later patches with unrolling loops that end with a break i.e. loops the always exit on their first interation. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-29 16:02:05 +10:00
Caio Marcelo de Oliveira Filho	f172a77dd8	nir: Remove outdated comment Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-28 08:11:03 -07:00
Kevin Rogovin	119435c877	mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering This extension provides new GLSL built-in function beginFragmentShaderOrderingIntel() that guarantees (taking wording of GL_INTEL_fragment_shader_ordering extension) that any memory transactions issued by shader invocations from previous primitives mapped to same xy window coordinates (and same sample when per-sample shading is active), complete and are visible to the shader invocation that called beginFragmentShaderOrderingINTEL(). One advantage of INTEL_fragment_shader_ordering over ARB_fragment_shader_interlock is that it provides a function that operates as a memory barrie (instead of a defining a critcial section) that can be called under arbitary control flow from any function (in contrast the begin/end of ARB_fragment_shader_interlock may only be called once, from main(), under no control flow. Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>	2018-08-28 17:15:10 +03:00
Jason Ekstrand	07a227f543	nir: Pull block_ends_in_jump into nir.h We had two different implementations in different files. May as well have one and put it in nir.h. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-27 02:15:38 -05:00
Jason Ekstrand	53072582dc	nir: Add an array copy optimization This peephole optimization looks for a series of load/store_deref or copy_deref instructions that copy an array from one variable to another and turns it into a copy_deref that copies the entire array. The pattern it looks for is extremely specific but it's good enough to pick up on the input array copies in DXVK and should also be able to pick up the sequence generated by spirv_to_nir for a OpLoad of a large composite followed by OpStore. It can always be improved later if needed. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-23 21:47:47 -05:00
Jason Ekstrand	be8d009908	nir: Add a array-of-vector variable shrinking pass This pass looks for variables with vector or array-of-vector types and narrows the type to only the components used. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-23 21:46:56 -05:00
Jason Ekstrand	fa6417495c	nir: Add an array splitting pass This pass looks for array variables where at least one level of the array is never indirected and splits it into multiple smaller variables. This pass doesn't really do much now because nir_lower_vars_to_ssa can already see through arrays of arrays and can detect indirects on just one level or even see that arr[i][0][5] does not alias arr[i][1][j]. This pass exists to help other passes more easily see through arrays of arrays. If a back-end does implement arrays using scratch or indirects on registers, having more smaller arrays is likely to have better memory efficiency. v2 (Jason Ekstrand): - Better comments and naming (some from Caio) - Rework to use one hash map instead of two v2.1 (Jason Ekstrand): - Fix a couple of bugs that were added in the rework including one which basically prevented it from running Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-23 21:44:14 -05:00
Jason Ekstrand	26eb077ec4	nir: Add a structure splitting pass This pass doesn't really do much now because nir_lower_vars_to_ssa can already see through structures and considers them to be "split". This pass exists to help other passes more easily see through structure variables. If a back-end does implement arrays using scratch or indirects on registers, having more smaller arrays is likely to have better memory efficiency. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-23 21:44:14 -05:00
Ian Romanick	0842655ac6	nir: Add floating point atomic min, max, and compare-swap instrinsics Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	69ce7baa9e	nir: Add floating point atomic add instrinsics Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Caio Marcelo de Oliveira Filho	410de0e3f1	nir: Give end_block its own index Since there's no particular reason for the index to be 0, choose an index that is not used by other block. This is convenient when we store "per-block" data in an array AND look for the successors data (e.g. any kind of backwards data-flow analysis). v2: Add a note about end_block's index. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-22 14:41:26 -07:00
Caio Marcelo de Oliveira Filho	8364ec3fce	nir: Skip common instructions when comparing deref paths Deref paths may share the same deref instructions in their chains, e.g. ssa_100 = deref_var A ssa_101 = deref_struct "array_field" of ssa_100 ssa_102 = deref_array "[1]" of ssa_101 ssa_103 = deref_struct "field_a" of ssa_102 ssa_104 = deref_struct "field_a" of ssa_103 when comparing the two last deref instructions, their paths will share a common sequence ssa_100, ssa_101, ssa_102. This patch skips to next iteration if the deref instructions are the same. Path[0] (the var) is still handled specially, so in the case above, only ssa_101 and ssa_102 will be skipped. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-22 14:41:26 -07:00
Caio Marcelo de Oliveira Filho	5196041e93	nir: Export deref comparison functions Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-22 14:41:26 -07:00
Jason Ekstrand	68ae66542a	nir/vars_to_ssa: Don't build deref nodes for non-local variables Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 14:17:38 -05:00
Kai Wasserbäch	6f0647c0b2	nir: mark prev_block as MAYBE_UNUSED in opt_peel_loop_initial_if Only used, when asserts are enabled. Fixes an unused-variable warning with gcc-8: ../../../src/compiler/nir/nir_opt_if.c: In function 'opt_peel_loop_initial_if': ../../../src/compiler/nir/nir_opt_if.c:109:15: warning: unused variable 'prev_block' [-Wunused-variable] nir_block prev_block = ^~~~~~~~~~ Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-18 10:34:15 +10:00
Timothy Arceri	d0803dea11	nir: allow more nested loops to be unrolled The innermost check was added to stop us from unrolling multiple loops in a single pass, and to stop outer loops from unrolling. When we successfully unroll a loop we need to run the analysis pass again before deciding if we want to go ahead an unroll a second loop. However the logic was flawed because it never tried to unroll any nested loops other than the first innermost loop it found. If this innermost loop is not unrolled we end up skipping all other nested loops. This unrolls a loop in a Deus Ex: MD shader on ultra settings and also unrolls a loop in a shader from the game Prey when running on DXVK. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-18 09:03:13 +10:00
Alejandro Piñeiro	d6c8066663	nir/glsl: make nir_remap_attributes public As we plan to reuse it for ARB_gl_spirv implementation. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-13 16:28:27 +02:00
Alejandro Piñeiro	5332d7582d	nir: add how_declared to nir_variable.data Equivalent to the already existing how_declared at GLSL IR. The only difference is that we are not adding all the declaration_type available on GLSL, only the one that we will use on the short term. We would add more mode if needed on the future. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-13 16:28:26 +02:00
Mathieu Bridon	2ee1c86d71	meson: Build with Python 3 Now that all the build scripts are compatible with both Python 2 and 3, we can flip the switch and tell Meson to use the latter. Since Meson already depends on Python 3 anyway, this means we don't need two different Python stacks to build Mesa. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-10 15:15:09 -07:00
Mathieu Bridon	1e668ca111	python: Better check for integer types Python 3 lost the long type: now everything is an int, with the right size. This commit makes the script compatible with Python 2 (where we check for both int and long) and Python 3 (where we only check for int). Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-09 16:49:19 -07:00
Mathieu Bridon	14f1ab998f	python: Do not mix bytes and unicode strings Mixing the two is a long-standing recipe for errors in Python 2, so much so that Python 3 now completely separates them. This commit stops treating both as if they were the same, and in the process makes the script compatible with both Python 2 and 3. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-09 16:49:19 -07:00
Mathieu Bridon	d9ca4a172e	python: Use the right function for the job The code was just reimplementing itertools.combinations_with_replacement in a less efficient way. This does change the order of the results slightly, but it should be ok. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-09 16:49:18 -07:00
Mathieu Bridon	ba1ebf2ee1	python: Specify the template output encoding We're trying to write a unicode string (i.e decoded) to a file opened in binary (i.e encoded) mode. In Python 2 this works, because of the automatic conversion between byte and unicode strings. In Python 3 this fails though, as no automatic conversion is attempted. This change makes the scripts compatible with both versions of Python. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-07 13:28:35 -07:00
Ian Romanick	3b07d28f81	nir: Transform expressions of b2f(a) and b2f(b) to a == b All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14276886 -> 14276838 (<.01%) instructions in affected programs: 312 -> 264 (-15.38%) helped: 2 HURT: 0 total cycles in shared programs: 532578395 -> 532570985 (<.01%) cycles in affected programs: 682562 -> 675152 (-1.09%) helped: 374 HURT: 4 helped stats (abs) min: 2 max: 200 x̄: 20.39 x̃: 18 helped stats (rel) min: 0.07% max: 11.64% x̄: 1.25% x̃: 1.28% HURT stats (abs) min: 2 max: 114 x̄: 53.50 x̃: 49 HURT stats (rel) min: 0.06% max: 11.70% x̄: 5.02% x̃: 4.15% 95% mean confidence interval for cycles value: -21.30 -17.91 95% mean confidence interval for cycles %-change: -1.30% -1.06% Cycles are helped. Sandy Bridge total instructions in shared programs: 10488123 -> 10488075 (<.01%) instructions in affected programs: 336 -> 288 (-14.29%) helped: 2 HURT: 0 total cycles in shared programs: 150260379 -> 150260439 (<.01%) cycles in affected programs: 4726 -> 4786 (1.27%) helped: 0 HURT: 2 No changes on Iron Lake or GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	c658b6c4c8	nir: Transform expressions of b2f(a) and b2f(b) to a ^^ b All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14276892 -> 14276886 (<.01%) instructions in affected programs: 484 -> 478 (-1.24%) helped: 2 HURT: 0 total cycles in shared programs: 532578397 -> 532578395 (<.01%) cycles in affected programs: 3522 -> 3520 (-0.06%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	3aca80aabc	nir: Transform expressions of b2f(a) and b2f(b) to !(a && b) All Gen platforms had pretty similar results. (Skylake shown) total cycles in shared programs: 532578400 -> 532578397 (<.01%) cycles in affected programs: 2784 -> 2781 (-0.11%) helped: 1 HURT: 1 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.26% max: 0.26% x̄: 0.26% x̃: 0.26% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08% v2: s/fmax/fmin/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	1713c97181	nir: Transform expressions of b2f(a) and b2f(b) to a && b No changes on any Gen platform. v2: s/fmax/fmin/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	4425f4786a	nir: Transform expressions of b2f(a) and b2f(b) to !(a \|\| b) All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14276961 -> 14276892 (<.01%) instructions in affected programs: 3215 -> 3146 (-2.15%) helped: 28 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 2.46 x̃: 2 helped stats (rel) min: 0.47% max: 9.52% x̄: 4.34% x̃: 1.92% 95% mean confidence interval for instructions value: -2.87 -2.06 95% mean confidence interval for instructions %-change: -5.73% -2.95% Instructions are helped. total cycles in shared programs: 532577068 -> 532578400 (<.01%) cycles in affected programs: 121864 -> 123196 (1.09%) helped: 35 HURT: 30 helped stats (abs) min: 2 max: 268 x̄: 42.34 x̃: 22 helped stats (rel) min: 0.12% max: 12.14% x̄: 3.22% x̃: 1.86% HURT stats (abs) min: 2 max: 246 x̄: 93.80 x̃: 36 HURT stats (rel) min: 0.09% max: 13.63% x̄: 4.47% x̃: 2.58% 95% mean confidence interval for cycles value: -5.02 46.01 95% mean confidence interval for cycles %-change: -0.99% 1.65% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 7781299 -> 7781342 (<.01%) instructions in affected programs: 22300 -> 22343 (0.19%) helped: 13 HURT: 40 helped stats (abs) min: 2 max: 3 x̄: 2.85 x̃: 3 helped stats (rel) min: 1.15% max: 7.69% x̄: 3.72% x̃: 3.33% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.26% max: 1.30% x̄: 0.47% x̃: 0.43% 95% mean confidence interval for instructions value: 0.23 1.39 95% mean confidence interval for instructions %-change: -1.18% 0.07% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 177878928 -> 177879332 (<.01%) cycles in affected programs: 383298 -> 383702 (0.11%) helped: 7 HURT: 43 helped stats (abs) min: 2 max: 18 x̄: 10.00 x̃: 10 helped stats (rel) min: 0.17% max: 4.81% x̄: 2.62% x̃: 3.40% HURT stats (abs) min: 2 max: 38 x̄: 11.02 x̃: 12 HURT stats (rel) min: 0.08% max: 1.54% x̄: 0.25% x̃: 0.09% 95% mean confidence interval for cycles value: 5.21 10.95 95% mean confidence interval for cycles %-change: -0.51% 0.21% Inconclusive result (%-change mean confidence interval includes 0). v2: s/fmin/fmax/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	6b3670ae80	nir: Transform -fabs(a) >= 0 to a == 0 All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14276964 -> 14276961 (<.01%) instructions in affected programs: 411 -> 408 (-0.73%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.47% max: 1.96% x̄: 1.04% x̃: 0.68% total cycles in shared programs: 532577062 -> 532577068 (<.01%) cycles in affected programs: 1093 -> 1099 (0.55%) helped: 1 HURT: 1 helped stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16 helped stats (rel) min: 7.77% max: 7.77% x̄: 7.77% x̃: 7.77% HURT stats (abs) min: 22 max: 22 x̄: 22.00 x̃: 22 HURT stats (rel) min: 2.48% max: 2.48% x̄: 2.48% x̃: 2.48% Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	46e7c340d4	nir: Transform expressions of b2f(a) and b2f(b) to a \|\| b All Gen6+ platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277184 -> 14276964 (<.01%) instructions in affected programs: 10082 -> 9862 (-2.18%) helped: 37 HURT: 1 helped stats (abs) min: 1 max: 30 x̄: 5.97 x̃: 4 helped stats (rel) min: 0.14% max: 16.00% x̄: 5.23% x̃: 2.04% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70% 95% mean confidence interval for instructions value: -7.87 -3.71 95% mean confidence interval for instructions %-change: -6.98% -3.16% Instructions are helped. total cycles in shared programs: 532577990 -> 532577062 (<.01%) cycles in affected programs: 170959 -> 170031 (-0.54%) helped: 33 HURT: 9 helped stats (abs) min: 2 max: 120 x̄: 30.91 x̃: 30 helped stats (rel) min: 0.02% max: 7.65% x̄: 2.66% x̃: 1.13% HURT stats (abs) min: 2 max: 24 x̄: 10.22 x̃: 8 HURT stats (rel) min: 0.09% max: 1.79% x̄: 0.61% x̃: 0.22% 95% mean confidence interval for cycles value: -31.23 -12.96 95% mean confidence interval for cycles %-change: -2.90% -1.02% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 7781539 -> 7781301 (<.01%) instructions in affected programs: 10169 -> 9931 (-2.34%) helped: 32 HURT: 0 helped stats (abs) min: 2 max: 20 x̄: 7.44 x̃: 6 helped stats (rel) min: 0.47% max: 17.02% x̄: 4.03% x̃: 1.88% 95% mean confidence interval for instructions value: -9.53 -5.34 95% mean confidence interval for instructions %-change: -5.94% -2.12% Instructions are helped. total cycles in shared programs: 177878590 -> 177878932 (<.01%) cycles in affected programs: 78706 -> 79048 (0.43%) helped: 7 HURT: 21 helped stats (abs) min: 6 max: 34 x̄: 24.57 x̃: 28 helped stats (rel) min: 0.15% max: 8.33% x̄: 4.66% x̃: 6.37% HURT stats (abs) min: 2 max: 86 x̄: 24.48 x̃: 22 HURT stats (rel) min: 0.01% max: 4.28% x̄: 1.21% x̃: 0.70% 95% mean confidence interval for cycles value: 0.30 24.13 95% mean confidence interval for cycles %-change: -1.52% 1.01% Inconclusive result (%-change mean confidence interval includes 0). v2: s/fmin/fmax/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	be7d3ba34a	nir: Transform -fabs(a) < 0 to a != 0 Unlike the much older -abs(a) >= 0.0 transformation, this is not precise. The behavior changes if a is NaN. All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277216 -> 14277184 (<.01%) instructions in affected programs: 2300 -> 2268 (-1.39%) helped: 8 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 4.00 x̃: 3 helped stats (rel) min: 0.48% max: 15.15% x̄: 4.41% x̃: 1.01% 95% mean confidence interval for instructions value: -6.45 -1.55 95% mean confidence interval for instructions %-change: -9.96% 1.13% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 532577848 -> 532577990 (<.01%) cycles in affected programs: 17486 -> 17628 (0.81%) helped: 2 HURT: 5 helped stats (abs) min: 2 max: 6 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.06% max: 1.81% x̄: 0.93% x̃: 0.93% HURT stats (abs) min: 6 max: 50 x̄: 30.00 x̃: 26 HURT stats (rel) min: 0.55% max: 2.17% x̄: 1.19% x̃: 1.02% 95% mean confidence interval for cycles value: -1.06 41.63 95% mean confidence interval for cycles %-change: -0.58% 1.74% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	d49eab2757	nir: Rearrange bcsel with two bcsel sources All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277220 -> 14277216 (<.01%) instructions in affected programs: 422 -> 418 (-0.95%) helped: 2 HURT: 0 total cycles in shared programs: 532577908 -> 532577848 (<.01%) cycles in affected programs: 2800 -> 2740 (-2.14%) helped: 2 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	b92fded6eb	nir: Collapse more repeated bcsels on the same argument All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277230 -> 14277220 (<.01%) instructions in affected programs: 751 -> 741 (-1.33%) helped: 4 HURT: 0 helped stats (abs) min: 2 max: 3 x̄: 2.50 x̃: 2 helped stats (rel) min: 1.23% max: 1.40% x̄: 1.32% x̃: 1.32% 95% mean confidence interval for instructions value: -3.42 -1.58 95% mean confidence interval for instructions %-change: -1.47% -1.17% Instructions are helped. total cycles in shared programs: 532577947 -> 532577908 (<.01%) cycles in affected programs: 10641 -> 10602 (-0.37%) helped: 4 HURT: 3 helped stats (abs) min: 1 max: 40 x̄: 13.75 x̃: 7 helped stats (rel) min: 0.11% max: 3.08% x̄: 1.10% x̃: 0.60% HURT stats (abs) min: 2 max: 8 x̄: 5.33 x̃: 6 HURT stats (rel) min: 0.13% max: 0.55% x̄: 0.30% x̃: 0.23% 95% mean confidence interval for cycles value: -20.69 9.55 95% mean confidence interval for cycles %-change: -1.63% 0.63% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-04 01:12:03 -07:00
Ian Romanick	408330ed48	nir: Don't compare i2f or u2i with zero Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14277620 -> 14277230 (<.01%) instructions in affected programs: 36905 -> 36515 (-1.06%) helped: 101 HURT: 6 helped stats (abs) min: 1 max: 6 x̄: 4.46 x̃: 6 helped stats (rel) min: 0.32% max: 7.69% x̄: 1.80% x̃: 1.51% HURT stats (abs) min: 1 max: 28 x̄: 10.00 x̃: 1 HURT stats (rel) min: 0.33% max: 1.74% x̄: 0.68% x̃: 0.47% 95% mean confidence interval for instructions value: -4.59 -2.70 95% mean confidence interval for instructions %-change: -1.90% -1.41% Instructions are helped. total cycles in shared programs: 532580716 -> 532577947 (<.01%) cycles in affected programs: 940575 -> 937806 (-0.29%) helped: 92 HURT: 12 helped stats (abs) min: 2 max: 158 x̄: 51.04 x̃: 62 helped stats (rel) min: 0.24% max: 3.99% x̄: 2.14% x̃: 2.41% HURT stats (abs) min: 10 max: 1112 x̄: 160.58 x̃: 63 HURT stats (rel) min: 0.06% max: 21.90% x̄: 4.22% x̃: 0.20% 95% mean confidence interval for cycles value: -50.66 -2.59 95% mean confidence interval for cycles %-change: -2.09% -0.73% Cycles are helped. total spills in shared programs: 8116 -> 8124 (0.10%) spills in affected programs: 200 -> 208 (4.00%) helped: 0 HURT: 2 total fills in shared programs: 11086 -> 11094 (0.07%) fills in affected programs: 436 -> 444 (1.83%) helped: 0 HURT: 2 Ivy Bridge and Haswell had similar results. (Haswell shown) total instructions in shared programs: 12979054 -> 12978067 (<.01%) instructions in affected programs: 33633 -> 32646 (-2.93%) helped: 120 HURT: 2 helped stats (abs) min: 1 max: 13 x̄: 8.53 x̃: 13 helped stats (rel) min: 0.30% max: 16.67% x̄: 4.55% x̃: 3.17% HURT stats (abs) min: 18 max: 18 x̄: 18.00 x̃: 18 HURT stats (rel) min: 1.15% max: 2.84% x̄: 2.00% x̃: 2.00% 95% mean confidence interval for instructions value: -9.19 -6.99 95% mean confidence interval for instructions %-change: -5.27% -3.62% Instructions are helped. total cycles in shared programs: 411212880 -> 411199636 (<.01%) cycles in affected programs: 696441 -> 683197 (-1.90%) helped: 107 HURT: 5 helped stats (abs) min: 2 max: 864 x̄: 124.90 x̃: 146 helped stats (rel) min: 0.03% max: 29.20% x̄: 8.58% x̃: 5.88% HURT stats (abs) min: 2 max: 50 x̄: 24.00 x̃: 22 HURT stats (rel) min: 0.01% max: 5.35% x̄: 1.29% x̃: 0.25% 95% mean confidence interval for cycles value: -136.96 -99.54 95% mean confidence interval for cycles %-change: -9.75% -6.53% Cycles are helped. total spills in shared programs: 78623 -> 78631 (0.01%) spills in affected programs: 66 -> 74 (12.12%) helped: 0 HURT: 2 total fills in shared programs: 80104 -> 80108 (<.01%) fills in affected programs: 133 -> 137 (3.01%) helped: 0 HURT: 2 No changes on Sandy Bridge, Iron Lake, or GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	a3845616a2	nir: Remove f2i(i2f(x)) conversions Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14277978 -> 14277620 (<.01%) instructions in affected programs: 36957 -> 36599 (-0.97%) helped: 76 HURT: 1 helped stats (abs) min: 2 max: 90 x̄: 4.89 x̃: 4 helped stats (rel) min: 0.44% max: 5.88% x̄: 1.04% x̃: 0.87% HURT stats (abs) min: 14 max: 14 x̄: 14.00 x̃: 14 HURT stats (rel) min: 0.36% max: 0.36% x̄: 0.36% x̃: 0.36% 95% mean confidence interval for instructions value: -7.06 -2.24 95% mean confidence interval for instructions %-change: -1.28% -0.77% Instructions are helped. total cycles in shared programs: 532584581 -> 532580716 (<.01%) cycles in affected programs: 973591 -> 969726 (-0.40%) helped: 76 HURT: 1 helped stats (abs) min: 2 max: 9940 x̄: 159.80 x̃: 32 helped stats (rel) min: <.01% max: 8.70% x̄: 1.15% x̃: 1.19% HURT stats (abs) min: 8280 max: 8280 x̄: 8280.00 x̃: 8280 HURT stats (rel) min: 2.10% max: 2.10% x̄: 2.10% x̃: 2.10% 95% mean confidence interval for cycles value: -386.98 286.59 95% mean confidence interval for cycles %-change: -1.41% -0.81% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 8127 -> 8116 (-0.14%) spills in affected programs: 108 -> 97 (-10.19%) helped: 1 HURT: 0 total fills in shared programs: 11090 -> 11086 (-0.04%) fills in affected programs: 440 -> 436 (-0.91%) helped: 1 HURT: 1 Haswell total instructions in shared programs: 12979174 -> 12979054 (<.01%) instructions in affected programs: 9040 -> 8920 (-1.33%) helped: 14 HURT: 1 helped stats (abs) min: 2 max: 34 x̄: 8.79 x̃: 6 helped stats (rel) min: 0.41% max: 7.04% x̄: 2.66% x̃: 1.14% HURT stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 HURT stats (rel) min: 0.19% max: 0.19% x̄: 0.19% x̃: 0.19% 95% mean confidence interval for instructions value: -13.58 -2.42 95% mean confidence interval for instructions %-change: -3.94% -1.01% Instructions are helped. total cycles in shared programs: 411227148 -> 411212880 (<.01%) cycles in affected programs: 630506 -> 616238 (-2.26%) helped: 15 HURT: 0 helped stats (abs) min: 2 max: 11192 x̄: 951.20 x̃: 38 helped stats (rel) min: <.01% max: 16.01% x̄: 3.92% x̃: 0.17% 95% mean confidence interval for cycles value: -2544.28 641.88 95% mean confidence interval for cycles %-change: -6.89% -0.94% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 78626 -> 78623 (<.01%) spills in affected programs: 42 -> 39 (-7.14%) helped: 1 HURT: 0 total fills in shared programs: 80111 -> 80104 (<.01%) fills in affected programs: 140 -> 133 (-5.00%) helped: 1 HURT: 1 Ivy Bridge total instructions in shared programs: 11684101 -> 11684030 (<.01%) instructions in affected programs: 3080 -> 3009 (-2.31%) helped: 4 HURT: 1 helped stats (abs) min: 5 max: 59 x̄: 18.50 x̃: 5 helped stats (rel) min: 6.47% max: 7.04% x̄: 6.87% x̃: 6.99% HURT stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 HURT stats (rel) min: 0.15% max: 0.15% x̄: 0.15% x̃: 0.15% 95% mean confidence interval for instructions value: -45.59 17.19 95% mean confidence interval for instructions %-change: -9.38% -1.56% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 258407697 -> 258389653 (<.01%) cycles in affected programs: 328323 -> 310279 (-5.50%) helped: 5 HURT: 0 helped stats (abs) min: 32 max: 14908 x̄: 3608.80 x̃: 32 helped stats (rel) min: 1.26% max: 17.22% x̄: 9.30% x̃: 10.60% 95% mean confidence interval for cycles value: -11616.71 4399.11 95% mean confidence interval for cycles %-change: -16.56% -2.03% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 4537 -> 4528 (-0.20%) spills in affected programs: 64 -> 55 (-14.06%) helped: 1 HURT: 0 total fills in shared programs: 4823 -> 4815 (-0.17%) fills in affected programs: 189 -> 181 (-4.23%) helped: 1 HURT: 1 Sandy Bridge total instructions in shared programs: 10488464 -> 10488449 (<.01%) instructions in affected programs: 272 -> 257 (-5.51%) helped: 3 HURT: 0 helped stats (abs) min: 5 max: 5 x̄: 5.00 x̃: 5 helped stats (rel) min: 5.49% max: 5.56% x̄: 5.51% x̃: 5.49% total cycles in shared programs: 150263359 -> 150263263 (<.01%) cycles in affected programs: 7978 -> 7882 (-1.20%) helped: 3 HURT: 0 helped stats (abs) min: 32 max: 32 x̄: 32.00 x̃: 32 helped stats (rel) min: 1.15% max: 1.23% x̄: 1.20% x̃: 1.23% No changes on Iron Lake or GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	ea6c276436	nir: Mark the 0.0 < abs(a) transformation as imprecise Unlike the much older -abs(a) >= 0.0 transformation, this is not precise. The behavior changes if the source is NaN. No shader-db changes on any platform. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Timothy Arceri	d5175d21c7	nir: add fall through comment to nir_gather_info This stops Coverity reporting a defect and helps make the code less error-prone. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-03 09:30:57 +10:00
Jason Ekstrand	7f75cf2a94	nir/lower_indirect: Bail early if modes == 0 There's no point in walking the program if we're never going to actually lower anything. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-01 18:02:28 -07:00
Dylan Baker	34998aae18	nir/meson: fix c vs cpp args for nir test Fixes: `d1992255bb` ("meson: Add build Intel "anv" vulkan driver") Signed-off-by: Dylan Baker <dylan.c.baker@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-01 12:51:22 -07:00
Mathieu Bridon	ad363913e6	python: Explicitly add the 'L' suffix on Python 3 Python 2 had two integer types: int and long. Python 3 dropped the latter, as it made the int type automatically support bigger numbers. As a result, Python 3 lost the 'L' suffix on integer litterals. This probably doesn't make much difference when compiling the generated C code, but adding it explicitly means that both Python 2 and 3 generate the exact same C code anyway, which makes it easier to compare and check for discrepencies when moving to Python 3. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-01 14:26:19 +01:00
Mathieu Bridon	e40200e0aa	python: Don't abuse hex() The hex() builtin returns a string containing the hexa-decimal representation of an integer. When the argument is not an integer, then the function calls that object's __hex__() method, if one is defined. That method is supposed to return a string. While that's not explicitly documented, that string is supposed to be a valid hexa-decimal representation for a number. Python 2 doesn't enforce this though, which is why we got away with returning things like 'NIR_TRUE' which are not numbers. In Python 3, the hex() builtin instead calls an object's __index__() method, which itself must return an integer. That integer is then automatically converted to a string with its hexa-decimal representation by the rest of the hex() function. As a result, we really can't make this compatible with Python 3 as it is. The solution is to stop using the hex() builtin, and instead use a hex() object method, which can return whatever we want, in Python 2 and 3. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-01 14:26:19 +01:00
Neil Roberts	1e3f61d1d5	nir/gather_info: Set info.gs.uses_streams Whenever a non-zero stream is written to it now sets uses_streams to true. This reflects the code in validate_geometry_shader_emissions for GLSL. v2: set uses_streams at gather_info instead that at spirv to nir (Jason Ekstrand) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-07-31 13:18:28 +02:00
Neil Roberts	3fd5b4c7aa	nir: Add members for the explicit XFB properties to nir_variable These are copied from the from the corresponding values in ir_variable. The intention is to eventually use them in a pure-NIR linker. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-07-31 13:18:28 +02:00
Jason Ekstrand	05fb2f88ec	nir/instr_set: Fix nir_instrs_equal for derefs We weren't returning at the end of the nir_isntr_type_deref case in nir_instrs_equal and it was falling through to the default of false. While we're at it, make the default unreachable because all statements in the switch now have their own returns. Had we done that before, we would have caught this bug a long time ago. Fixes: `19a4662a54` "nir: Add a deref instruction type" Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Thomas Helland<thomashelland90@gmail.com>	2018-07-29 13:39:35 -07:00
Jason Ekstrand	9a4ab4c120	nir: Take if uses into account in ssa_def_components_read Fixes: `d800b7daa5` "nir: Add a helper for figuring out what..." Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-07-29 13:39:35 -07:00
Karol Herbst	bc0e0c2818	nir/lower_int64: mark all metadata as dirty v2: use nir_metadata_preserve preserve metadata in case of !progress Fixes: `074f5ba0b5` "nir: Add a simple int64 lowering pass" Signed-off-by: Karol Herbst <kherbst@redhat.com>	2018-07-28 19:59:28 +02:00
Kenneth Graunke	488972222c	i965: Combine both gl_PatchVerticesIn lowering passes. Until now, we had separate passes for lowering gl_PatchVerticesIn to a statically known constant (for TES inputs when linked against a TCS), and a uniform in the other cases. Annoyingly, one had to be run before nir_lower_system_values, and the other afterward. This simplified the passes, but made life painful for the callers. This patch combines both into a single pass. If you give it a non-zero static count, it uses that. If you give it Mesa state slots, it turns it back into a built-in uniform. Otherwise, it does nothing. This also moves the i965 uniform lowering out to shared code. v2: Make token arrays const. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-07-26 21:51:36 -07:00
Eric Anholt	d934d3206e	nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform. This is controlled by a new nir_shader_compiler_options flag, and fixes dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-07-26 11:00:34 -07:00
Samuel Pitoiset	6465bf0015	nir: remove wrong assertion in print_var_decl() This breaks printing input/output variables with more than 4 components like mat4. Fixes: `1beef89ad8` ("nir: prepare for bumping up max components to 16") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-26 08:57:38 +02:00
Jason Ekstrand	b3b170ade9	nir: Add a couple of iand/ior optimizations Spotted in a shader in Batman: Arkham City. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-07-24 20:39:43 -07:00
Jason Ekstrand	f214baf72f	nir/serialize: Alloc constants off the variable nir_sweep assumes that constants area always allocated off the variable to which they belong. Violating this assumption causes them to get freed early and leads to use-after-free bugs. Fixes: `120da00975` "nir: add serialization and deserialization" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107366 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Mark Janes <mark.a.janes@intel.com>	2018-07-24 12:34:07 -07:00
Karol Herbst	7f95564a22	nir: rename f2f16_undef to f2f16 we need rounding modes on other conversions involving floats and it is easier to rename f2f16_undef than renaming all the other ones. v2: rebased on master Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Karol Herbst <kherbst@redhat.com>	2018-07-24 20:40:05 +02:00
Karol Herbst	2083cfb6eb	nir: add builtin builder also move some of the GLSL builtins over we will need for implementing some OpenCL builtins v2: replace NIR_IMM_FP by nir_imm_floatN_t in ported code fix up changes caused by swizzle rework Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Karol Herbst <kherbst@redhat.com>	2018-07-24 20:40:05 +02:00
Mathieu Bridon	9ebd8372b9	python: Use range() instead of xrange() Python 2 has a range() function which returns a list, and an xrange() one which returns an iterator. Python 3 lost the function returning a list, and renamed the function returning an iterator as range(). As a result, using range() makes the scripts compatible with both Python versions 2 and 3. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-07-24 11:07:04 -07:00
Mathieu Bridon	022d2a381d	python: Better use iterators In Python 2, iterators had a .next() method. In Python 3, instead they have a .__next__() method, which is automatically called by the next() builtin. In addition, it is better to use the iter() builtin to create an iterator, rather than calling its __iter__() method. These were also introduced in Python 2.6, so using it makes the script compatible with Python 2 and 3. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-07-24 11:07:04 -07:00
Mathieu Bridon	5530cb1296	python: Better iterate over dictionaries In Python 2, dictionaries have 2 sets of methods to iterate over their keys and values: keys()/values()/items() and iterkeys()/itervalues()/iteritems(). The former return lists while the latter return iterators. Python 3 dropped the method which return lists, and renamed the methods returning iterators to keys()/values()/items(). Using those names makes the scripts compatible with both Python 2 and 3. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-07-24 11:07:04 -07:00

... 4 5 6 7 8 ...

1510 Commits