KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Jordan Justen	31b35916dd	nir: Add int64/doubles options into nir_shader_compiler_options This will allow the options to be visible under nir_shader->options, which will allow the gallium state_tracker to use the driver preferred settings during glsl_to_nir. Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-02 14:33:41 -08:00
Ian Romanick	bae0c36751	nir/algebraic: Optimize away an fsat of a b2f The b2f can only produce 0.0 or 1.0, so the fsat does nothing. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-02 13:58:56 -08:00
Ian Romanick	ecc9ffa778	nir/algebraic: Replace a-fract(a) with floor(a) I noticed this while looking at a shader that was affected by Tim's "more loop unrolling" series. In review, Tim Arceri asked: > Why the hurt on Gen6+ is this something that should be in the late > optimisations pass? As far as I can tell, it's just because our scheduler is terrible. In all the fragment shaders that I looked at (some hurt shaders were from other stages), only one of the SIMD8 or SIMD16 version would be hurt. In many of those case, the other SIMD width is improved (e.g., shaders/closed/steam/brutal-legend/3990.shader_test). Often it looks like the scheduler decides to differently schedule a SEND the occurs somewhere early in the shader. Once that happens, everything is different. I looked at one vertex shader that was hurt (from Goat Simulator). In that case, both the floor and fract are used. The optimization eliminates the add, and it should allow better scheduling. In the area of the FRC and RNDD instructions, the scheduler does the right thing. However, later in the shader a MAD and and ADD get scheduled differently, and that makes it slightly worse. In light of this, I tried adding some "is_used_once" mark-up, and that did not fix all the cycles regressions. It also did a lot more harm than good on SKL (helped 82 vs. hurt 241). All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15437001 -> 15435259 (-0.01%) instructions in affected programs: 213651 -> 211909 (-0.82%) helped: 988 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1 helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59% 95% mean confidence interval for instructions value: -1.89 -1.63 95% mean confidence interval for instructions %-change: -1.23% -1.05% Instructions are helped. total cycles in shared programs: 383007378 -> 382997063 (<.01%) cycles in affected programs: 1650825 -> 1640510 (-0.62%) helped: 679 HURT: 302 helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14 helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98% HURT stats (abs) min: 1 max: 250 x̄: 18.43 x̃: 7 HURT stats (rel) min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53% 95% mean confidence interval for cycles value: -13.05 -7.98 95% mean confidence interval for cycles %-change: -0.86% -0.50% Cycles are helped. Iron Lake and GM45 had similar results. (GM45 shown) total instructions in shared programs: 5043616 -> 5043010 (-0.01%) instructions in affected programs: 119691 -> 119085 (-0.51%) helped: 432 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1 helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39% 95% mean confidence interval for instructions value: -1.58 -1.23 95% mean confidence interval for instructions %-change: -0.72% -0.59% Instructions are helped. total cycles in shared programs: 128139812 -> 128135762 (<.01%) cycles in affected programs: 3829724 -> 3825674 (-0.11%) helped: 602 HURT: 0 helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6 helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10% 95% mean confidence interval for cycles value: -8.40 -5.05 95% mean confidence interval for cycles %-change: -0.22% -0.16% Cycles are helped. Reviewed-by: Elie Tournier <tournier.elie@gmail.com>	2019-03-01 12:43:25 -08:00
Ian Romanick	d40640efe8	nir/algebraic: Replace a bcsel of a b2f sources with a b2f(!(a \|\| b)) I have not investigated the result of doing this during code generation. That should be possible, but it would be a bit more effort. All Gen6+ platforms had nearly identical results. (Skylake shown) total cycles in shared programs: 370961508 -> 370961367 (<.01%) cycles in affected programs: 5174 -> 5033 (-2.73%) helped: 2 HURT: 0 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8206587 -> 8206589 (<.01%) instructions in affected programs: 1325 -> 1327 (0.15%) helped: 0 HURT: 2 total cycles in shared programs: 187657422 -> 187657428 (<.01%) cycles in affected programs: 11566 -> 11572 (0.05%) helped: 0 HURT: 2 This change has almost no effect right now. However, removing this patch (but leaving the patch "intel/fs: Generate if instructions with inverted conditions") after adding a patch that removes !(a < b) -> (a >= b) optimizations (like https://patchwork.freedesktop.org/patch/264787/) has the following results on Skylake: Skylake total instructions in shared programs: 15071804 -> 15071806 (<.01%) instructions in affected programs: 640 -> 642 (0.31%) helped: 0 HURT: 2 total cycles in shared programs: 369914348 -> 369916569 (<.01%) cycles in affected programs: 27900 -> 30121 (7.96%) helped: 4 HURT: 15 helped stats (abs) min: 2 max: 112 x̄: 30.00 x̃: 3 helped stats (rel) min: 0.28% max: 12.28% x̄: 3.34% x̃: 0.40% HURT stats (abs) min: 2 max: 758 x̄: 156.07 x̃: 81 HURT stats (rel) min: 0.20% max: 74.30% x̄: 16.29% x̃: 16.91% 95% mean confidence interval for cycles value: 12.68 221.11 95% mean confidence interval for cycles %-change: 3.09% 21.23% Cycles are HURT. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	eae19f5f19	nir/algebraic: Replace i2b used by bcsel or if-statement with comparison All of the helped shaders are in Deus Ex. I looked at a couple shaders, and they have a pattern like: vec1 32 ssa_373 = i2b32 ssa_345.w vec1 32 ssa_374 = bcsel ssa_373, ssa_20, ssa_0 ... vec1 32 ssa_377 = ine ssa_345.w, ssa_0 if ssa_377 { ... vec1 32 ssa_416 = i2b32 ssa_385.w vec1 32 ssa_417 = bcsel ssa_416, ssa_386, ssa_374 ... } The massive help occurs because the i2b32 is removed, then other passes determine that ssa_374 must be ssa_20 inside the if-statement allowing the first bcsel to also be deleted. v2: Rebase on 1-bit Boolean changes. v3: Fix i2b32 vs ine problem in if-statement replacement. Noticed by Bas. Skylake total instructions in shared programs: 15241394 -> 15186287 (-0.36%) instructions in affected programs: 890583 -> 835476 (-6.19%) helped: 355 HURT: 0 helped stats (abs) min: 1 max: 497 x̄: 155.23 x̃: 149 helped stats (rel) min: 0.09% max: 16.49% x̄: 6.10% x̃: 6.59% 95% mean confidence interval for instructions value: -165.07 -145.39 95% mean confidence interval for instructions %-change: -6.42% -5.77% Instructions are helped. total cycles in shared programs: 373846583 -> 371023357 (-0.76%) cycles in affected programs: 118972102 -> 116148876 (-2.37%) helped: 343 HURT: 14 helped stats (abs) min: 45 max: 118284 x̄: 8332.32 x̃: 6089 helped stats (rel) min: 0.03% max: 38.19% x̄: 2.48% x̃: 1.77% HURT stats (abs) min: 120 max: 4126 x̄: 2482.79 x̃: 3019 HURT stats (rel) min: 0.16% max: 17.37% x̄: 2.13% x̃: 1.11% 95% mean confidence interval for cycles value: -8723.28 -7093.12 95% mean confidence interval for cycles %-change: -2.57% -2.02% Cycles are helped. total spills in shared programs: 32401 -> 23465 (-27.58%) spills in affected programs: 24457 -> 15521 (-36.54%) helped: 343 HURT: 0 total fills in shared programs: 37866 -> 31765 (-16.11%) fills in affected programs: 18889 -> 12788 (-32.30%) helped: 343 HURT: 0 Broadwell and Haswell had similar results. (Haswell shown) Haswell total instructions in shared programs: 13764783 -> 13750679 (-0.10%) instructions in affected programs: 1176256 -> 1162152 (-1.20%) helped: 334 HURT: 21 helped stats (abs) min: 1 max: 358 x̄: 42.59 x̃: 47 helped stats (rel) min: 0.09% max: 11.81% x̄: 1.30% x̃: 1.37% HURT stats (abs) min: 1 max: 61 x̄: 5.76 x̃: 1 HURT stats (rel) min: 0.03% max: 1.84% x̄: 0.17% x̃: 0.03% 95% mean confidence interval for instructions value: -43.99 -35.47 95% mean confidence interval for instructions %-change: -1.35% -1.08% Instructions are helped. total cycles in shared programs: 386511910 -> 385402528 (-0.29%) cycles in affected programs: 143831110 -> 142721728 (-0.77%) helped: 327 HURT: 39 helped stats (abs) min: 16 max: 25219 x̄: 3519.74 x̃: 3570 helped stats (rel) min: <.01% max: 10.26% x̄: 0.95% x̃: 0.96% HURT stats (abs) min: 16 max: 4881 x̄: 1065.95 x̃: 997 HURT stats (rel) min: <.01% max: 16.67% x̄: 0.70% x̃: 0.24% 95% mean confidence interval for cycles value: -3375.59 -2686.60 95% mean confidence interval for cycles %-change: -0.92% -0.64% Cycles are helped. total spills in shared programs: 100480 -> 97846 (-2.62%) spills in affected programs: 84702 -> 82068 (-3.11%) helped: 316 HURT: 21 total fills in shared programs: 96877 -> 94369 (-2.59%) fills in affected programs: 69167 -> 66659 (-3.63%) helped: 316 HURT: 9 No changes on Ivy Bridge or earlier platforms. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Caio Marcelo de Oliveira Filho	1458aa1f78	nir/copy_prop_vars: handle indirect vector elements Differently than the direct case, the indirect array derefs of vector are handled like regular derefs, with the exception that we ignore any vector entry that has SSA values when performing a load. Such SSA values don't help loading of the indirect unless we emit an if-ladder. Copy_derefs are supported for indirects. Also enable two tests that now pass. v2: Remove unnecessary temporaries. Be clearer when identifying the case where copy_entry doesn't help when we are dealing with an indirect array_deref (of a vector). (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho	6c0de78cc2	nir/copy_prop_vars: prefer using entries from equal derefs When looking up an entry to use, always prefer an equal match, as it more likely to contain reusable SSA or derefs to propagate. This will be necessary when adding entries with array derefs of vectors, because we don't want the vector if the equal entry (an array deref of that vector) is present. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho	61965afd00	nir/copy_prop_vars: add tests for indirect array deref Both on an actual array and on a vector, and an extra test on a vector mixing direct and indirect access. The vector tests are disabled and will be enabled by a later commit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho	96c32d7776	nir/copy_prop_vars: handle load/store of vector elements When direct array deref is used on a vector type (for loads and stores), copy_prop_vars is now smart to propagate values it knows about. Given a 'vec4 v', storing to v[3] will update the copy entry for v and it is equivalent to a write to v.w. Loading from v[1] will try first to see if there's a known value for v.y -- and drop the load in that case. The copy entries still always refer to the entire vectors, so the operations happen on the parent deref (the 'vector') and the values are fixed accordingly. It might be the case now that certain entries have not only different SSA defs in each element but also those come from different components than they are set to, because stores to individual elements always come from a SSA definition with a single component. Tests related to these cases are now enabled. v2: Instead of asserting on invalid indices, "load" an undef and remove the store. (Jason) v3: Merge code path for the cases of is_array_deref_of_vector into the regular code path. Add a base_index parameter to value_set_from_value. (code changes by Jason) v4: Removed the get_entry_for_deref helper, now being used only once. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:50:05 -08:00
Caio Marcelo de Oliveira Filho	33dafdc024	nir/copy_prop_vars: use NIR_MAX_VEC_COMPONENTS Also replace uses of 0xf with the appropriate full mask created from the number of components. Note that an increase of MAX might make us change how the data is stored later on, but for now at least we make sure the pass is not hardcoded. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:50:05 -08:00
Caio Marcelo de Oliveira Filho	e84c841fb0	nir/copy_prop_vars: rename/refactor store_to_entry helper The name reflected this function role back when the pass also did dead write elimination. So rename it to what it does now, which is setting a value using another value; and narrow the argument list. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:50:05 -08:00
Gert Wollny	b7201a468d	nir: Add posibility to not lower to source mod 'abs' for ops with three sources This is useful for r600 since there the abs source modifier is not supported for ops with three sources v2: Use correct logic to enable lowering to abs source mod (Eric Anhold) Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-27 11:04:06 +00:00
Kasireddy, Vivek	78fb3fd17e	nir/lower_tex: Add support for XYUV lowering The memory layout associated with this format would be: Byte: 0 1 2 3 Component: V U Y X Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-26 13:08:51 +00:00
Tapani Pälli	22267feff1	nir: initialize value in copy_prop_vars_block Fixes following valgrind warning: ==27561== Conditional jump or move depends on uninitialised value(s) ==27561== at 0x667856B: value_set_ssa_components (nir_opt_copy_prop_vars.c:78) ==27561== by 0x667A1C4: copy_prop_vars_block (nir_opt_copy_prop_vars.c:797) Fixes: `62332d139c` "nir: Add a local variable-based copy propagation pass" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-26 08:56:25 +02:00
Eric Anholt	7c1bf075f3	nir: Just return when asked to rewrite uses of an SSA def to itself. The nir_builder swizzling improvement to not emit extra MOVs resulted in nir_lower_tex() trying to rewrite an SSA def to itself, triggering the assert on all texturing in v3d. There's no work to be done in this case, so just stop asserting. Fixes: `743700be1f` ("nir/builder: Don't emit no-op swizzles") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-25 21:25:24 +00:00
Daniel Schürmann	0bd45f96b9	nir: Use SM5 properties to optimize shift(a@32, iand(31, b)) This is a common pattern from HLSL->SPIRV translation and supported in HW by all current NIR backends. vkpipeline-db results anv (SKL): total instructions in shared programs: `6403130` -> 6402380 (-0.01%) instructions in affected programs: 204084 -> 203334 (-0.37%) helped: 208 HURT: 0 total cycles in shared programs: 1915629582 -> 1918198408 (0.13%) cycles in affected programs: 1158892682 -> 1161461508 (0.22%) helped: 107 HURT: 86 shader-db results on i965 (KBL): total instructions in shared programs: 15284592 -> 15284568 (<.01%) instructions in affected programs: 81683 -> 81659 (-0.03%) helped: 24 HURT: 0 total cycles in shared programs: 375013622 -> 375013932 (<.01%) cycles in affected programs: 40169618 -> 40169928 (<.01%) helped: 13 HURT: 9 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-25 12:59:44 -06:00
Daniel Schürmann	0525bdc225	nir: Define shifts according to SM5 specification. SPIR-V shifts are undefined for values >= bitsize, but SM5 shifts are defined to only use the least significant bits. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-25 12:59:43 -06:00
Jason Ekstrand	743700be1f	nir/builder: Don't emit no-op swizzles The nir_swizzle helper is used some on it's own but it's also called by nir_channel and nir_channels which are used everywhere. It's pretty quick to check while we're walking the swizzle anyway whether or not it's an identity swizzle. If it is, we now don't bother emitting the instruction. Sure, copy-prop will clean it up for us but there's no sense making more work for the optimizer than we have to. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-24 20:01:27 -06:00
Jason Ekstrand	724371c6b9	nir/split_vars: Don't compact vectors unnecessarily Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-24 20:01:18 -06:00
Caio Marcelo de Oliveira Filho	4c160b6bd8	nir: fix MSVC build Zero initialize struct with {0} instead of {}.	2019-02-22 22:38:05 -08:00
Caio Marcelo de Oliveira Filho	eb13211997	nir/copy_prop_vars: add tests for load/store elements of vectors Test using array deref on vectors in loads and stores. These are marked DISABLED_ as this optimization is currently not done. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho	4f3809d389	nir: nir_build_deref_follower accept array derefs of vectors Code itself already supports it, just make sure we can use it for those cases. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho	c4beadd28e	nir/copy_prop_vars: change test helper to get intrinsics Replace find_next_intrinsic(intrinsic, after) with get_intrinsic(intrinsic, index). This makes slightly more convenient to check the resulting loads/stores/copies, since in most tests we know which one we care about. The cost is to perform more traversals, but for such tests this is not a problem. Added the ASSERT_EQ() on count to some tests missing it, so the indices queried are always expected to find something. Also, drop two nir_print_shader leftover calls in a test. v2: Remove redundant assertions. nir_src_comp_as_uint already assert what we need. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho	fdcb9779d9	nir/copy_prop_vars: keep track of components in copy_entry When a copy_entry is SSA, store not only the nir_ssa_def* for each component, but also the source component they come from. At the moment this is always a match (i.e. 'component[i] == i'), because all the operations for a copy_entry happen using definitions with the same size. This prepares the code for array_derefs of vectors, in which 'component[i] != i'. Also, extract setting all SSA components into a function of its own. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho	6624decbb5	nir/copy_prop_vars: add debug helpers Disabled by default, to be used during development. Adding those so I don't rewrite some ad-hoc version of them everytime I'm working with this pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho	60d9bb9ff5	nir/copy_prop_vars: don't get confused by array_deref of vectors For now these derefs are not handled, so don't let these get into the copies list -- which would cause wrong propagations. For load_derefs, do nothing. For store_derefs, invalidate whatever the store is writing to. For copy_derefs, invalidate whatever the copy is writing to. These cases will happen once derefs to SSBOs/UBOs are kept around long enough to get optimized by copy_prop_vars. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Timothy Arceri	f48527e51a	nir: allow nir_lower_phis_to_scalar() on more src types Rather than only lowering if all srcs are scalarizable we instead check that at least one src is scalarizable. We change undef type to return false otherwise it will cause regressions when it is the only scalarizable src. total instructions in shared programs: 13219105 -> 13024547 (-1.47%) instructions in affected programs: 1153797 -> 959239 (-16.86%) helped: 581 HURT: 74 total cycles in shared programs: 333968972 -> 324807922 (-2.74%) cycles in affected programs: 129809402 -> 120648352 (-7.06%) helped: 571 HURT: 131 total spills in shared programs: 57947 -> 29130 (-49.73%) spills in affected programs: 53364 -> 24547 (-54.00%) helped: 351 HURT: 0 total fills in shared programs: 51310 -> 25468 (-50.36%) fills in affected programs: 44882 -> 19040 (-57.58%) helped: 351 HURT: 0 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-23 11:11:51 +11:00
Timothy Arceri	d9e08e753b	nir: clone instruction set rather than removing individual entries This reduces the time spent in nir_opt_cse() by almost a half. The massif tool from callgrind reported no change in peak memory use with the large doliphin uber shaders I used for testing. Reviewed-by: Thomas Helland<thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 08:36:36 +11:00
Jason Ekstrand	f98fd9d15a	nir/lower_clip_cull: Fix an incorrect assert Copy+paste error. It was supposed to test cull and not clip. Fixes: `4e69fba534` "nir: Rewrite lower_clip_cull_distance_arrays..." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109717 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-21 12:05:12 -06:00
Jason Ekstrand	f9b2f10a41	nir: Fix a compile warning	2019-02-21 09:44:42 -06:00
Alejandro Piñeiro	0629b2a462	nir, glsl: move pixel_center_integer/origin_upper_left to shader_info.fs On GLSL that info is set as a layout qualifier when redeclaring gl_FragCoord, so somehow tied to a specific variable. But in practice, they behave as a global of the shader. On ARB programs they are set using a global OPTION (defined at ARB_fragment_coord_conventions), and on SPIR-V using ExecutionModes, that are also not tied specifically to the builtin. This patch moves that info from nir variable and ir variable to nir shader and gl_program shader_info respectively, so the map is more similar to SPIR-V, and ARB programs, instead of more similar to GLSL. FWIW, shader_info.fs already had pixel_center_integer, so this change also removes some redundancy. Also, as struct gl_program also includes a shader_info, we removed gl_program::OriginUpperLeft and PixelCenterInteger, as it would be superfluous. This change was needed because recently spirv_to_nir changed the order in which execution modes and variables are handled, so the variables didn't get the correct values. Now the info is set on the shader itself, and we don't need to go back to the builtin variable to set it. Fixes: `e68871f6a` ("spirv: Handle constants and types before execution modes") v2: (Jason) * glsl_to_nir: get the info before glsl_to_nir, while all the rest of the info gathering is happening * prog_to_nir: gather the info on a general info-gathering pass, not on variable setup. v3: (Jason) * Squash with the patch that removes that info from ir variable * anv: assert that OriginUpperLeft is true. It should be already set by spirv_to_nir. * blorp: set origin_upper_left on its core "compile fragment shader", not just on some specific places (for this we added an helper on a previous patch). * prog_to_nir: no need to gather specifically this fragcoord modes as the full gl_program shader_info is copied. * spirv_to_nir: assert that we are a fragment shader when handling this execution modes. v4: (reported by failing gitlab pipeline #18750) * state_tracker: update too due changes on ir.h/gl_program v5: * blorp: minor change after change on previous patch * radeonsi: update due this change. v6: (Timothy Arceri) * prog_to_nir: remove extra whitespace * shader_info: don't use :1 on origin_upper_left * glsl: program.fs.origin_upper_left/pixel_center_integer can be move out of the shader list loop	2019-02-21 11:47:59 +01:00
Jason Ekstrand	1a93fc382b	nir/xfb: Handle compact arrays in gather_xfb_info This makes us properly handle gl_ClipDistance and gl_CullDistance. Fixes: `19064b8c` "nir: Add a pass for gathering transform feedback info" Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-21 00:08:42 +00:00
Jason Ekstrand	558c314504	nir/xfb: Work in terms of components rather than slots We needed to better handle cases where a chunk of a variable starts at some non-zero location_frac and rolls over into the next slot but may not be more than 4 dwords. For example, if gl_CullDistance is an array of 3 things and has location_frac = 2, it will span across two vec4s but is not, itself, bigger than a vec4. If you ignore the clip/cull special case, it's not allowed to happen for anything else because the only things that can span more than one slot is dvec3 and dvec4 and they're both bigger than a vec4. The current code uses this attrib_slot thing where we count attribute slots and iterate over them. However, that doesn't work in the case above because gl_CullDistance will have an attrib_slot count of 1 even though it does span two slots. We could fix this by adjusting attrib_slot but we already have comp_mask and it's easier to just handle it that way. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-21 00:08:42 +00:00
Jason Ekstrand	4e69fba534	nir: Rewrite lower_clip_cull_distance_arrays to do a lot less lowering Instead of going to all the work of to combine them into one array, just make two arrays and use location_frac to colocate them within CLIP0. Then the back-end can sort things out and stack them on top of each other. Thanks to `ef99f4c8`, we also don't need to set compact anymore. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-21 00:08:42 +00:00
Jason Ekstrand	8f0fe71cc5	nir/xfb: Properly align 64-bit values Fixes: `19064b8c` "nir: Add a pass for gathering transform feedback info" Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-21 00:08:42 +00:00
Timothy Arceri	03783253b1	nir: remove non-ssa support from nir_copy_prop() Even in a very basic shader this reduces the time spent in nir_copy_prop() by ~17%. No shader-db changes for radeonsi NIR or i965. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 10:18:24 +11:00
Kenneth Graunke	d6337b59f6	nir: Don't forget if-uses in new nir_opt_dead_cf liveness check Commit `08bfd710a2`. (nir/dead_cf: Stop relying on liveness analysis) introduced a new check that iterated through a SSA def's uses, to see if it's used. But it only checked normal uses, and not uses which are part of an 'if' condition. This led to it thinking more nodes were dead than possible. Fixes Piglit's variable-indexing/tcs-output-array-float-index-wr test (and related tests) with the out-of-tree Iris driver. Fixes: `08bfd710a2` nir/dead_cf: Stop relying on liveness analysis Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-20 09:44:06 -08:00
Kenneth Graunke	535251487b	nir: Don't reassociate add/mul chains containing only constants The idea here is to reassociate a * (b * c) into (a * c) * b, when b is a non-constant value, but a and c are constants, allowing them to be combined. But nothing was enforcing that 'b' must be non-constant, which meant that running opt_algebraic in a loop would never terminate if the IR contained non-folded constant expressions like 256 * 0.5 * 2. Normally, we call constant folding in such a loop too, but IMO it's better for nir_opt_algebraic to be robust and not rely on that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109581 Fixes: `32e266a9a5` i965: Compile fp64 funcs only if we do not have 64-bit hardware support Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-16 23:36:14 -08:00
Timothy Arceri	a801196ec9	nir: remove simple dead if detection from nir_opt_dead_cf() This was probably useful when it was first written, however it looks to be no longer necessary. As far as I can tell these days dce is smart enough to remove useless instructions from if branches. Once this is done nir_opt_peephole_select() will end up removing the empty if. Removing this support reduces the dolphin uber shader compilation time spent in nir_opt_dead_cf() by a little over 7x. No shader-db changes on i965 or radeonsi. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-02-16 10:45:31 +11:00
Ian Romanick	979b43b347	nir/algebraic: Simplify comparison with sequential integers starting with 0 All of the affected shaders are Unreal4 demos. All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15437170 -> 15437001 (<.01%) instructions in affected programs: 21536 -> 21367 (-0.78%) helped: 43 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 3.93 x̃: 4 helped stats (rel) min: 0.68% max: 1.01% x̄: 0.80% x̃: 0.80% 95% mean confidence interval for instructions value: -4.07 -3.79 95% mean confidence interval for instructions %-change: -0.83% -0.77% Instructions are helped. total cycles in shared programs: 383007896 -> 383007378 (<.01%) cycles in affected programs: 158640 -> 158122 (-0.33%) helped: 38 HURT: 4 helped stats (abs) min: 1 max: 48 x̄: 13.89 x̃: 6 helped stats (rel) min: 0.03% max: 1.01% x̄: 0.33% x̃: 0.19% HURT stats (abs) min: 2 max: 3 x̄: 2.50 x̃: 2 HURT stats (rel) min: 0.06% max: 0.09% x̄: 0.08% x̃: 0.08% 95% mean confidence interval for cycles value: -16.90 -7.77 95% mean confidence interval for cycles %-change: -0.39% -0.19% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8213746 -> 8213745 (<.01%) instructions in affected programs: 127 -> 126 (-0.79%) helped: 1 HURT: 0 total cycles in shared programs: 187734146 -> 187734144 (<.01%) cycles in affected programs: 2132 -> 2130 (-0.09%) helped: 1 HURT: 0 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-15 11:11:02 -08:00
Ian Romanick	ad05920258	nir/algebraic: Convert some f2u to f2i Section 5.4.1 (Conversion and Scalar Constructors) of the GLSL 4.60 spec says: It is undefined to convert a negative floating-point value to an uint. Assuming that (uint)some_float behaves like (uint)(int)some_float allows some optimizations in the i965 backend to proceed. This basically undoes the small amount of damage done by "intel/compiler: Avoid propagating inequality cmods if types are different". v2: Replicate part of the commit message as a comment in the code. Suggested by Jason. shader-db results compairing before "intel/compiler: Avoid propagating inequality cmods if types are different" and after this commit: Skylake total cycles in shared programs: 383007996 -> 383007896 (<.01%) cycles in affected programs: 85208 -> 85108 (-0.12%) helped: 13 HURT: 8 helped stats (abs) min: 2 max: 26 x̄: 10.77 x̃: 6 helped stats (rel) min: 0.09% max: 0.65% x̄: 0.28% x̃: 0.14% HURT stats (abs) min: 2 max: 12 x̄: 5.00 x̃: 3 HURT stats (rel) min: 0.04% max: 0.32% x̄: 0.12% x̃: 0.07% 95% mean confidence interval for cycles value: -9.31 -0.21 95% mean confidence interval for cycles %-change: -0.24% <.01% Cycles are helped. Broadwell total cycles in shared programs: 415251194 -> 415251370 (<.01%) cycles in affected programs: 83750 -> 83926 (0.21%) helped: 7 HURT: 13 helped stats (abs) min: 10 max: 12 x̄: 11.43 x̃: 12 helped stats (rel) min: 0.30% max: 0.30% x̄: 0.30% x̃: 0.30% HURT stats (abs) min: 2 max: 36 x̄: 19.69 x̃: 22 HURT stats (rel) min: 0.05% max: 0.89% x̄: 0.44% x̃: 0.47% 95% mean confidence interval for cycles value: 0.76 16.84 95% mean confidence interval for cycles %-change: <.01% 0.37% Inconclusive result (%-change mean confidence interval includes 0). Haswell total instructions in shared programs: 13823885 -> 13823886 (<.01%) instructions in affected programs: 2249 -> 2250 (0.04%) helped: 0 HURT: 1 total cycles in shared programs: 390094243 -> 390094001 (<.01%) cycles in affected programs: 85640 -> 85398 (-0.28%) helped: 15 HURT: 6 helped stats (abs) min: 4 max: 26 x̄: 18.53 x̃: 18 helped stats (rel) min: 0.09% max: 0.66% x̄: 0.47% x̃: 0.42% HURT stats (abs) min: 2 max: 14 x̄: 6.00 x̃: 2 HURT stats (rel) min: 0.04% max: 0.37% x̄: 0.15% x̃: 0.04% 95% mean confidence interval for cycles value: -17.36 -5.69 95% mean confidence interval for cycles %-change: -0.44% -0.14% Cycles are helped. Ivy Bridge total cycles in shared programs: 180986448 -> 180986552 (<.01%) cycles in affected programs: 34835 -> 34939 (0.30%) helped: 0 HURT: 10 HURT stats (abs) min: 2 max: 18 x̄: 10.40 x̃: 10 HURT stats (rel) min: 0.06% max: 0.36% x̄: 0.28% x̃: 0.30% 95% mean confidence interval for cycles value: 4.67 16.13 95% mean confidence interval for cycles %-change: 0.20% 0.35% Cycles are HURT. Sandy Bridge total cycles in shared programs: 154603969 -> 154603970 (<.01%) cycles in affected programs: 171514 -> 171515 (<.01%) helped: 25 HURT: 14 helped stats (abs) min: 1 max: 4 x̄: 1.80 x̃: 1 helped stats (rel) min: 0.02% max: 0.10% x̄: 0.04% x̃: 0.04% HURT stats (abs) min: 1 max: 8 x̄: 3.29 x̃: 3 HURT stats (rel) min: 0.03% max: 0.28% x̄: 0.10% x̃: 0.11% 95% mean confidence interval for cycles value: -0.91 0.96 95% mean confidence interval for cycles %-change: -0.02% 0.04% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-15 11:11:02 -08:00
Juan A. Suarez Romero	1fb24080b7	nir: remove jump from two merging jump-ending blocks In opt_peel_initial_if optimization, when moving the continue list to end of the continue block, before the jump, could happen that the continue list itself also ends with a jump. This would mean that we would have two jump instructions in a row: the first one from the continue list and the second one from the contine block. As inserting an instruction after a jump is not allowed (and it does not make sense, as it will not be executed), remove the jump from the continue block and keep the one from continue list, as it will be executed first. CC: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-15 15:16:24 +01:00
Juan A. Suarez Romero	69be9934a7	nir: move ALU instruction before the jump instruction opt_split_alu_of_phi moves ALU instruction to the end of continue block. But if the continue block ends with a jump instruction (an explicit "continue" instruction) then the ALU must be inserted before the jump, as it is illegal to add instructions after the jump. CC: Ian Romanick <ian.d.romanick@intel.com> Fixes: `0881e90c09` ("nir: Split ALU instructions in loops that read phis") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-15 15:14:36 +01:00
Jason Ekstrand	08bfd710a2	nir/dead_cf: Stop relying on liveness analysis The liveness analysis pass is fairly expensive because it has to build large bit-sets and run a fix-point algorithm on them. Instead of requiring liveness for detecting if values escape a CF node, just take advantage of the structured nature of NIR and use block indices instead. This only requires the block index metadata which is the fastest we have metadata to generate. No shader-db changes on Kaby Lake Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-14 23:06:29 -06:00
Jason Ekstrand	b50465d197	nir/dead_cf: Inline cf_node_has_side_effects We want to handle live SSA values differently and it's going to involve walking the instructions. We can make it a single instruction walk if we combine it with cf_node_has_side_effects. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-14 23:05:28 -06:00
Jason Ekstrand	b14d7a6b60	nir: Silence a couple of warnings in release builds [28/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_gather_xfb_info.c.o'. ../src/compiler/nir/nir_gather_xfb_info.c: In function ‘nir_gather_xfb_info’: ../src/compiler/nir/nir_gather_xfb_info.c:171:13: warning: variable ‘max_offset’ set but not used [-Wunused-but-set-variable] unsigned max_offset[NIR_MAX_XFB_BUFFERS] = {0}; ^~~~~~~~~~ [36/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_instr_set.c.o'. ../src/compiler/nir/nir_instr_set.c:502:1: warning: ‘instr_each_src_and_dest_is_ssa’ defined but not used [-Wunused-function] instr_each_src_and_dest_is_ssa(nir_instr *instr) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-14 16:04:35 -06:00
Eric Anholt	42d2cae907	nir: Move panfrost's isign lowering to nir_opt_algebraic. I wanted to reuse this from v3d. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-14 00:32:30 +00:00
Timothy Arceri	68baf96824	nir: turn an ssa check in nir_search into an assert Everything should be in ssa form when we call this. This is a hotpath so replace the check with an assert. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-02-14 09:35:32 +11:00
Timothy Arceri	46a4d2c867	nir: turn ssa check into an assert Everthing should be in ssa form when this is called. Checking for it here is expensive so turn this into an assert instead. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-02-14 09:35:32 +11:00
Timothy Arceri	0a89c9779a	nir: prehash instruction in nir_instr_set_add_or_rewrite() There is no need to hash the instruction twice, especially as we end up adding it in the majority of cases. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-14 09:35:32 +11:00
Caio Marcelo de Oliveira Filho	017349997f	nir: fix example in opt_peel_loop_initial_if description Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-12 20:33:20 -08:00
Karol Herbst	7e08f22a72	nir/opt_if: don't mark progress if nothing changes if we have something like this: loop { ... if x { break; } else { continue; } } opt_if_loop_last_continue returns true marking progress allthough nothing changes. Fixes: `5921a19d4b` "nir: add if opt opt_if_loop_last_continue()" Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-13 00:21:35 +01:00
Tapani Pälli	19a85a704b	nir: add option to use scaling factor when sampling planes YUV lowering Patch adds nir_lower_tex_options as parameter to sample_plane so that we don't need to extend nir_tex_instr for this. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-12 08:41:20 +02:00
Kenneth Graunke	f5c7df4dc9	nir: Gather texture bitmasks in gl_nir_lower_samplers_as_deref. Eric and I would like a bitmask of which samplers are used, similar to prog->SamplersUsed, but available in NIR. The linker uses SamplersUsed for resource limit checking, but later optimizations may eliminate more samplers. So instead of propagating it through, we gather a new one. While there, we also gather the existing textures_used_by_txf bitmask. Gathering these bitfields in nir_shader_gather_info is awkward at best. The main reason is that it introduces an ordering dependency between the two passes. If gathering runs before lower_samplers_as_deref, it can't look at var->data.binding. If the driver doesn't use the full lowering to texture_index/texture_array_size (like radeonsi), then the gathering can't use those fields. Gathering might be run early /and/ late, first to get varying info, and later to update it after variant lowering. At this point, should gathering work on pre-lowered or post-lowered code? Pre-lowered is also harder due to the presence of structure types. Just doing the gathering when we do the lowering alleviates these ordering problems. This fixes ordering issues in i965 and makes the txf info gathering work for radeonsi (though they don't use it). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-11 21:34:45 -08:00
Kenneth Graunke	120f9b8362	nir: Use sampler derefs in drawpixels and bitmap lowering. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-11 21:34:44 -08:00
Jason Ekstrand	9e6a6ef0d4	nir/deref: Rematerialize parents in rematerialize_derefs_in_use_blocks When nir_rematerialize_derefs_in_use_blocks_impl was first written, I attempted to optimize things a bit by not bothering to re-materialize the sources of deref instructions figuring that the final caller would take care of that. However, in the case of more complex deref chains where the first link or two lives in block A and then another link and the load/store_deref intrinsic live in block B it doesn't work. The code in rematerialize_deref_in_block looks at the tail of the chain, sees that it's already in block B and skips it, not realizing that part of the chain also lives in block A. The easy solution here is to just rematerialize deref sources of deref instructions as well. This may potentially lead to a few more deref instructions being created by the conditions required for that to actually happen are fairly unlikely and, thanks to the caching, it's all linear time regardless. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109603 Fixes: `7d1d1208c2` "nir: Add a small pass to rematerialize derefs per-block" Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-11 10:57:23 -06:00
Ian Romanick	b031c64349	nir: Convert a bcsel with only phi node sources to a phi node v2: Remove the original ALU instruciton after all of its readers are modified to read the new ALU instruction. v3: Fix an issue where a bcsel that may not be executed on a loop iteration due to a break statement is converted to a phi (and therefore incorrectly "executed"). Noticed by Tim. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109216 Fixes: `8fb8ebfbb0` ("intel/compiler: More peephole select") Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-08 10:37:06 -08:00
Ian Romanick	0881e90c09	nir: Split ALU instructions in loops that read phis A single shader in Unigine Superposition is affected by this change. A single iadd is moved to the end of a loop. This iadd is involved in a complex set of logic to terminate the loop, and an extra mov instruction is inserted. This shader really needs the optimization suggested by bugzilla #94747, and I expect that to make this tiny regression go away. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15047543 -> 15047545 (<.01%) instructions in affected programs: 565 -> 567 (0.35%) helped: 0 HURT: 2 total cycles in shared programs: 369977253 -> 369978253 (<.01%) cycles in affected programs: 127910 -> 128910 (0.78%) helped: 0 HURT: 2 v2: Skip nir_op_vec{2,3,4} and nir_op_[fi]mov instructions to avoid infinite optimization loops. Remove the original ALU instruciton after all of its readers are modified to read the new ALU instruction. v3: Extend to the more general case. The if the prev-block value from the phi is not undef, this means the ALU instruction has to be duplicated in both the prev-block and the continue-block. Fixes: `8fb8ebfbb0` ("intel/compiler: More peephole select") Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-08 10:37:06 -08:00
Ian Romanick	0c0c69729b	nir: Select phi nodes using prev_block instead of continue_block This simplifies some changes coming later. Fixes: `8fb8ebfbb0` ("intel/compiler: More peephole select") Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-08 10:37:06 -08:00
Ian Romanick	8d8f80af3a	nir: Refactor code that checks phi nodes in opt_peel_loop_initial_if This will be used in a couple more places soon. The function name is... horribly long. Neither Matt nor I could think of any thing that was shorter and still more descriptive than "is_phi_foo". I'm willing to entertain suggestions. Fixes: `8fb8ebfbb0` ("intel/compiler: More peephole select") Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-08 10:37:06 -08:00
Ian Romanick	4d65d2b12e	nir: Document some fields of nir_loop_terminator Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-08 10:37:06 -08:00
Ian Romanick	78169870e4	nir: Silence zillions of unused parameter warnings in release builds Fixes: `cd56d79b59` "nir: check NIR_SKIP to skip passes by name" Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-08 10:37:06 -08:00
Timothy Arceri	26aa460940	nir: rewrite varying component packing There are a number of reasons for the rewrite. 1. Adding support for packing tess patch varyings in a sane way. 2. Making use of qsort allowing the code to be much easier to follow. 3. Fixes a bug where different interp types caused component packing to be skipped for all varyings in some scenarios. 4. Allows us to add a crude live range analysis for deciding which components should be packed together. This support can optionally be added in a future patch. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-08 02:54:56 +00:00
Timothy Arceri	2f53260417	nir: add is_packing_supported_for_type() helper This will be used in the following patches to determine if we support packing the components of a varying. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-08 02:54:56 +00:00
Timothy Arceri	7b01d5c354	nir: add support for marking used patches when packing varyings This adds support needed for marking the varyings as used but we don't actually support packing patches in this patch. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-08 02:54:56 +00:00
Kenneth Graunke	15c6902117	nir: Avoid splitting compact arrays into per-element variables. Compact arrays are used for special variables like clip and cull distances, or tessellation levels. Drivers using compact arrays assume that these values will always be actual arrays. We don't want to turn a float[1] gl_CullDistance into a single float; that would confuse drivers. Today, i965 uses compact arrays, and Gallium drivers use nir_lower_io_arrays_to_elements, so we haven't had any overlap that would demonstrate the issue. Iris will use both. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-05 13:58:46 -08:00
Kenneth Graunke	ba9dcc80fb	nir: Avoid clip/cull distance lowering multiple times. A couple places in st/nir assume that cull distances have been lowered away, so it will need to call this lowering pass for drivers which opt out of the GLSL IR lowering. The Intel backend also calls this pass, for i965 and anv. We need to only do it once. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-05 13:58:46 -08:00
Kenneth Graunke	5730364d69	nir: Bail on clip/cull distance lowering if GLSL IR already did it. We have a GLSL IR pass to convert clip/cull distance float[] arrays into vec4[2] arrays. In `ff281e6204`, we attempted to skip this pass if the GLSL IR lowering had already run. But, that code was not quite right, as we forgot to strip away the per-vertex IO array layer for geometry and tessellation shader varyings. If the GLSL IR pass has run, the variables will not be marked as "compact". So we can simply check that and bail. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-05 13:58:46 -08:00
Kenneth Graunke	3327c93510	nir: Record info->fs.pixel_center_integer in lower_system_values radeonsi uses a system value for gl_FragCoord rather than an input var. These get translated into load_frag_coord NIR intrinsics, which lose the pixel_center_integer and origin_upper_left decorations. To cope with this, Tim added a shader_info field for pixel_center_integer, and made glsl_to_nir set it accordingly. prog_to_nir also needs to handle these fragcoord conventions. Instead of duplicating the logic to set the info field, just move it to nir_lower_system_values so it'll happen regardless of who makes the NIR. (For what it's worth, we don't need an info flag for origin_upper_left, because radeonsi lowers origin conventions in nir_lower_wpos_ytransform before nir_lower_system_values destroys the variable and qualifiers.) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-05 13:51:52 -08:00
Jason Ekstrand	36734987a5	nir/deref: Drop zero ptr_as_array derefs They are effectively (&x)[0] or *&x which does nothing. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-05 15:17:19 -06:00
Jonathan Marek	4f0a3c9f9e	nir: add missing vec opcodes in lower_bool_to_float Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-05 15:34:15 +00:00
Caio Marcelo de Oliveira Filho	51547bbc5a	nir: keep the phi order when splitting blocks All things being equal is better to keep the original order. Since the new block is empty, push the phis in order to tail. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>	2019-02-04 20:41:13 -08:00
Matt Turner	9de90caca8	nir: Optimize double-precision lower_round_even() Use the trick of adding and then subtracting 2**52 (52 is the number of explicit mantissa bits a double-precision floating-point value has) to implement round-to-even. Cuts the number of instructions on SKL of the piglit test fs-roundEven-double.shader_test from 109 to 21. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-01-29 15:02:23 -08:00
Jason Ekstrand	9e34781aef	nir: Allow SSBOs and global to alias Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-26 13:41:50 -06:00
Jason Ekstrand	9839ce8bf9	nir/validate: Allow array derefs of vectors for nir_var_mem_global Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-01-26 13:39:18 -06:00
Jason Ekstrand	5f5503d498	nir/lower_io: Add support for nir_var_mem_global Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-01-26 13:39:18 -06:00
Jason Ekstrand	314d2c90c3	nir/lower_io: Add a 32 and 64-bit global address formats These are simple scalar addresses. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-26 13:39:18 -06:00
Jason Ekstrand	e461926ef2	nir: Add load/store/atomic global intrinsics These correspond roughly to reading/writing OpenCL global pointers. The idea is that they just take a bare address and load/store from it. Of course, exactly what this address means is driver-dependent. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-01-26 13:39:18 -06:00
Jason Ekstrand	39925d60ec	anv: Add pipeline cache support for xfb_info Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-22 10:42:56 -06:00
Alejandro Piñeiro	6b50b0a4a8	nir/xfb: distinguish array of structs vs array of blocks Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-22 10:42:56 -06:00
Jason Ekstrand	ac704e777c	nir/xfb: Properly handle arrays of blocks Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-22 10:42:56 -06:00
Alejandro Piñeiro	5649a0a6e8	nir/xfb: don't assert when xfb_buffer/stride is present but not xfb_offset In order to allow nir_gather_xfb_info to be used on OpenGL, specifically ARB_gl_spirv. So, from OpenGL 4.6 spec, section 11.1.2.1, "Output Variables": "outputs specifying both an XfbBuffer and an Offset are captured, while outputs not specifying both of these are not captured. Values are captured each time the shader writes to such a decorated object." This implies that are captured if both are present, and not if one of those are lacking. Technically, it doesn't explicitly point that having just one or the other is a mistake. In some cases, glslang is adding some extra XfbBuffer without XfbOffset around, and mentioning that technically that is not a bug (see issue#1526) And for the case of Vulkan, as the same glslang issue mentions, it is not clear if that should be a mistake or not. But even if it is a mistake, it is not really needed to be checked on the driver, and we can let the validation layers to check that. v2: simplify explicit_xfb_buffer and explicit_offset checks (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-22 10:42:56 -06:00
Jason Ekstrand	4f99ac9144	nir/xfb: Fix offset accounting for dvec3/4 Before, we were double-counting the component slots when we had a dvec3 or dvec4. Instead, just add them in once and manually offset the recorded output offset. Fixes: `19064b8c` "nir: Add a pass for gathering transform feedback info" Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-22 10:42:56 -06:00
Jason Ekstrand	96fa23bca5	nir: Preserve offsets in lower_io_to_scalar_early Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-22 10:42:56 -06:00
Samuel Pitoiset	b2bbd978d0	nir: fix lowering arrays to elements for XFB outputs If we have a transform feedback output like: float[2] x2_out (VARYING_SLOT_VAR1.x, 0, 0) which is lowered by nir_lower_io_arrays_to_elements to, float x2_out (VARYING_SLOT_VAR1.x, 0, 0) float x2_out@5 (VARYING_SLOT_VAR2.x, 0, 0) We have to update the destination offset to avoid overwriting the same value. v2 (Jason Ekstrand): - Compute the correct offsets for arrays of vectors and/or doubles Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-22 10:42:56 -06:00
Samuel Pitoiset	9f4e0aa7c1	nir: do not remove varyings used for transform feedback When a xfb buffer is explicitely declared on a varying variable, we shouldn't remove it at link time. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-22 10:42:56 -06:00
Jason Ekstrand	ca8c6c9781	nir: Mark deref UBO and SSBO access as non-scalar Fixes: `63b9aa2e25` "spirv: Add support for using derefs for..." Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 18:41:47 -06:00
Karol Herbst	8bb46de08b	mesa: add MESA_SHADER_KERNEL used for CL kernels Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 20:36:41 +01:00
Karol Herbst	0a793c78a3	nir: add bit_size parameter to system values with multiple allowed bit sizes v2: add assert to verify we have at least one valid bit_size v3: fix use of load_front_face in nir_lower_two_sided_color and tgsi_to_nir Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 00:17:18 +01:00
Karol Herbst	4125211e9c	nir: add legal bit_sizes to intrinsics With OpenCL some system values match the address bits, but in GLSL we also have some system values being 64 bit like subgroup masks. With this it is possible to adjust the builder functions so that depending on the bit_sizes the correct bit_size is used or an additional argument is added in case of multiple possible values. v2: validate dest bit_size v3: generate hex values in python code remove useless imports rename and move bit_sizes v4: add 1 to legal bit_sizes for front_face Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 00:16:51 +01:00
Karol Herbst	27bd07e230	nir/validate: allow to check against a bitmask of bit_sizes Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 00:16:51 +01:00
Karol Herbst	acdad24585	nir/spirv: handle SpvStorageClassCrossWorkgroup v2: rename nir_var_global to nir_var_mem_global Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-19 20:01:42 +01:00
Karol Herbst	36a76b7192	nir: rename nir_var_shared to nir_var_mem_shared Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-19 20:01:41 +01:00
Karol Herbst	6fefd69724	nir: rename nir_var_ssbo to nir_var_mem_ssbo Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-19 20:01:41 +01:00
Karol Herbst	3afc1e068f	nir: rename nir_var_ubo to nir_var_mem_ubo Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-19 20:01:41 +01:00
Karol Herbst	9b24028426	nir: rename nir_var_function to nir_var_function_temp Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-19 20:01:41 +01:00
Karol Herbst	e5daef9587	nir: rename nir_var_private to nir_var_shader_temp Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-19 20:01:41 +01:00
Caio Marcelo de Oliveira Filho	cd56d79b59	nir: check NIR_SKIP to skip passes by name Passes' function names, separated by comma, listed in NIR_SKIP environment variable will be skipped in debug mode. The mechanism is hooked into the _PASS macro, like NIR_PRINT. The extra macro NIR_SKIP is available as a developer convenience, to skip at pointer other than the passes entry points. v2: Fix typo in NIR_SKIP macro. (Bas) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-18 12:31:49 -08:00
Bas Nieuwenhuizen	8424cd8fbd	nir: Account for atomics in copy propagation. Otherwise writes get propagated across atomics if no barrier is used. Without barrier writes should still be visible in the same invocation, so an atomic has to be considered a write. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Fixes: `b3c6146925` "nir: Copy propagation between blocks" Fixes: `62332d139c` "nir: Add a local variable-based copy propagation pass"	2019-01-18 00:55:35 +01:00
Jason Ekstrand	2d2737dcfe	nir: Add a bool to float32 lowering pass From @jekstrand's nir-1-bit-bool branch, with improved ior/inot lowering. ior: fmax instead of fadd allows removing the fsat. inot: seq(x, 0) can be better than fsub(1, x). On a2xx, it works better with the scalar instruction set. Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-01-14 19:27:06 +00:00
Caio Marcelo de Oliveira Filho	9fdded0cc3	src/compiler: use new hash table and set creation helpers Replace calls to create hash tables and sets that use _mesa_hash_pointer/_mesa_key_pointer_equal with the helpers _mesa_pointer_hash_table_create() and _mesa_pointer_set_create(). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Eric Engestrom <eric@engestrom.ch>	2019-01-14 10:49:28 -08:00
Jason Ekstrand	821b6861ec	nir/gcm: Support deref instructions Even though no one's been brave enough to ever use this pass, I like to keep it functionally working. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-12 17:55:49 -06:00
Rhys Perry	0210243923	nir: fix copy-paste error in nir_lower_constant_initializers Fixes: `393b59e077` ('nir: Rework nir_lower_constant_initializers() to handle functions') Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-10 10:51:52 -06:00
Matt Turner	2623653126	nir: Unset metadata debug bit if no progress made NIR metadata validation verifies that the debug bit was unset (by a call to nir_metadata_preserve) if a NIR optimization pass made progress on the shader. With the expectation that the NIR shader consists of only a single main function, it has been safe to call nir_metadata_preserve() iff progress was made. However, most optimization passes calculate progress per-function and then return the union of those calculations. In the case that an optimization pass makes progress only on a subset of the functions in the shader metadata validation will detect the debug bit is still set on any unchanged functions resulting in a failed assertion. This patch offers a quick solution (short of a larger scale refactoring which I do not wish to undertake as part of this series) that simply unsets the debug bit on unchanged functions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	e633fae5cb	nir: Add lowering support for 64-bit operations to software Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	fe2cbcf3ee	nir: Create nir_builder in nir_lower_doubles_impl() We're going to use it more in a future patch, and this avoids a lot of gross code. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	ecb115eb3f	nir: Add and set info::uses_64bit Will be used to communicate that a shader uses 64-bit operations to the concerned lowering passes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	41f3e9e5f5	nir: Implement lowering of 64-bit shift operations Reviewed-by: Elie Tournier <tournier.elie@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	62d55f1281	nir: Wire up int64 lowering functions Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Jason Ekstrand	adab27e741	nir: Add some more int64 lowering helpers [mattst88]: Found in an old branch of Jason's. Jason implemented: inot, iand, ior, iadd, isub, ineg, iabs, compare, imin, imax, umin, umax Matt implemented: ixor, bcsel, b2i, i2b, i2i8, i2i16, i2i32, i2i64, u2u8, u2u16, u2u32, u2u64, and fixed ilt Reviewed-by: Elie Tournier <tournier.elie@gmail.com>	2019-01-09 16:42:40 -08:00
Matt Turner	dde73e646f	nir: Tag entrypoint for easy recognition by nir_shader_get_entrypoint() We're going to have multiple functions, so nir_shader_get_entrypoint() needs to do something a little smarter. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	393b59e077	nir: Rework nir_lower_constant_initializers() to handle functions Previously it assumed that only a single function (the entrypoint) existed and attempted to lower constant initializers of shader outputs for each function, for instance.	2019-01-09 16:42:40 -08:00
Eric Anholt	211b826790	nir: Make nir_deref_instr_build/get_const_offset actually use size_align. I think this was copy-and-paste mistake -- nir_opt_large_constants was passing in glsl_get_natural_size_align_bytes() given brw_nir.c's arguments to the opt pass. I wanted to reuse this function for handling constant offsets of arrays of images in V3D. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-01-08 15:40:53 -08:00
Eric Anholt	6051c11d17	nir: Add nir_lower_tex support for Broadcom's swizzled TG4 results. V3D returns the texels in a different order in the resulting vec4 from what GLSL wants, so we need to put in a swizzle. Fixes dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.base_level.level_1 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 13:03:41 -08:00
Caio Marcelo de Oliveira Filho	baabfb1959	nir: fix warning in nir_lower_io.c Initialize the variable with NULL. Fixes the following In file included from ../src/compiler/nir/nir_lower_io.c:34: ../src/compiler/nir/nir_lower_io.c: In function ‘nir_lower_explicit_io’: ../src/compiler/nir/nir.h:668:11: warning: ‘addr’ may be used uninitialized in this function [-Wmaybe-uninitialized] return src; ^~~ ../src/compiler/nir/nir_lower_io.c:735:17: note: ‘addr’ was declared here nir_ssa_def *addr; ^~~~ v2: Avoid using a 'default' case so we get help from the compiler when new deref types are added. (Lionel) Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 12:29:56 -08:00
Karol Herbst	d0c6ef2793	nir: rename global/local to private/function memory the naming is a bit confusing no matter how you look at it. Within SPIR-V "global" memory is memory accessible from all threads. glsl "global" memory normally refers to shader thread private memory declared at global scope. As we already use "shared" for memory shared across all thrads of a work group the solution where everybody could be happy with is to rename "global" to "private" and use "global" later for memory usually stored within system accessible memory (be it VRAM or system RAM if keeping SVM in mind). glsl "local" memory is memory only accessible within a function, while SPIR-V "local" memory is memory accessible within the same workgroup. v2: rename local to function as well v3: rename vtn_variable_mode_local as well Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 18:51:46 +01:00
Jason Ekstrand	63b9aa2e25	spirv: Add support for using derefs for UBO/SSBO access For now, it's hidden behind a cap. Hopefully, we can eventually drop that along with all the manual offset code in spirv_to_nir. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	e90b738f20	nir/vulkan: Add a descriptor type to vulkan resource intrinsics Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	f393b10b3f	nir/lower_io: Add "explicit" IO lowering This new pass is for lowering explicitly laid out memory coming in from SPIR-V or a similar source. It's quite a bit more complicated than the normal lower_io because we have to be able to handle matrices. The way the stride information is stored for matrices is awkward and dealing with row-major matrices is especially painful. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	52dd43c7ef	nir/validate: Allow array derefs on vectors in more modes Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	013ee5732b	nir/intrinsics: Add access flags to load/store_deref Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	7755171e4c	nir/intrinsics: Allow deref sources to consume anything This commit adds a new num_components value for intrinsic sources of -1 which means that it consumes everything and the number of components effectively isn't validated. This is useful for deref sources which just take the result of the deref and we leave it up to the driver to decide what that size should be. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	d0fe52a456	nir/validate: Allow derefs in phi nodes We added this assert when first moving derefs over to instructions to ensure that deref chains could go all the way back to the variables. Now that we're going to start using derefs for things that we can do variable pointers on such as UBOs and SSBOs, we need to be able to run derefs through phi nodes, selects, and basically anything else. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	7e85480a67	nir/remove_dead_variables: Properly handle deref casts We already detect any incomplete deref chains (where the deref is used for something other than another deref or a load/store) and flag the variable as used thanks to deref_used_for_not_store. All that's left to do is to properly skip casts when cleaning up. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	78d80f7db2	nir/deref: Skip over casts in fixup_deref_modes This pass is used when, for instance, we lazily change the mode of variables rather than replacing the variable with a new one. Since we only do this in cases where we know we have full deref chains, it's ok to just skip them in fixup_deref_modes. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	d8e3edb784	nir/deref: Support casts and ptr_as_array in comparisons The code which constructs deref paths already gives you the path starting at the nearest deref_cast or deref_var. All we need to do for casts is handle the case where the start of the path isn't a deref_var. For ptr_as_array derefs, we just bail if we have any after the divergence point between the two derefs. We may be able to do better in the future but this works for now. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	a1c688517d	nir/opt_deref: Properly optimize ptr_as_array derefs When handling casts, we can't blindly propagate the parent of a cast into a ptr_as_array deref because doing so might loose the stride information from the cast. Instead, before we can propagate into ptr_as_array derefs, we need to check that the cast is a cast of an array deref and that the stride matches. For other types of derefs, we can continue to propagate casts as normal because they don't need the stride. We also add an optimization which can combine a ptr_as_array deref with it parent if it is also an array deref of some form. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	427558a717	nir/validate: Don't allow derefs in if conditions Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	e94a027af8	nir: Add a ptr_as_array deref type These correspond directly to SPIR-V's OpPtrAccessChain. As such, they treat whatever their parent gives them as if it's the first element in some array and dereferences that array. If the parent is, itself, an array deref, then the two indices can just be added together to get the final array deref. However, it can also be used in cases where what you have is a dereference to some random vec2 value somewhere. In this case, we require a cast before the ptr_as_array and use the ptr_stride field in the cast to provide a stride for the ptr_as_array derefs. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	fc9c4f89b8	nir: Move propagation of cast derefs to a new nir_opt_deref pass We're going to want to do more deref optimizations going forward and this gives us a central place to do them. Also, cast propagation will get a bit more complicated with the addition of ptr_as_array derefs. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	6cebeb4f71	glsl_type: Add support for explicitly laid out matrices and arrays SPIR-V allows for matrix and array types to be decorated with explicit byte stride decorations and matrix types to be decorated row- or column-major. This commit adds support to glsl_type to encode this information. Because this doesn't work nicely with std430 and std140 alignments, we add asserts to ensure that we don't use any of the std430 or std140 layout functions with explicitly laid out types. In SPIR-V, the layout information for matrices is applied to the parent struct member instead of to the matrix type itself. However, this is gets rather clumsy when you're walking derefs trying to compute offsets because, the moment you hit a matrix, you have to crawl back the deref chain and find the struct. Instead, we take the same path here as we've taken in spirv_to_nir and put the decorations on the matrix type itself. This also subtly adds support for strided vector types. These don't come up in SPIR-V directly but you can get one as the result of taking a column from a row-major matrix or a row from a column-major matrix. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	d34f19feba	glsl_type: Drop the glsl_get_array_instance C helper It was added in `bce6f99875` even though it's completely redundant with glsl_array_type(). Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	a700a82bda	nir: Distinguish between normal uniforms and UBOs Previously, NIR had a single nir_var_uniform mode used for atomic counters, UBOs, samplers, images, and normal uniforms. This commit splits this into nir_var_uniform and nir_var_ubo where nir_var_uniform is still a bit of a catch-all but the nir_var_ubo is specific to UBOs. While we're at it, we also rename shader_storage to ssbo to follow the convention. We need this so that we can distinguish between normal uniforms and UBO access at the deref level without going all the way back variable and seeing if it has an interface type. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	c9a4135e14	nir: Allow storing to shader_storage I have no idea how shader_storage made it into the list of banned variable modes for stores but it clearly should be allowed. This only doesn't cause us a problem today because we never actually use derefs on shader_storage variables. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	cd93b0a670	nir/validate: Require array indices to match the deref bit size This doesn't currently change anything because array indices are required to be 32 bits and all derefs are also 32 bits. However, we will one day have 64-bit derefs for OpenCL. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	bfe31c5e46	nir/builder: Add nir_i2i and nir_u2u helpers which take a bit size Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com	2019-01-08 00:38:29 +00:00
Timothy Arceri	6dade5d534	nir: avoid uninitialized variable warning Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109231	2019-01-07 10:57:00 +11:00
Eric Anholt	f217a94542	nir: Add nir_lower_tex options to lower sampler return formats. I've been doing this in the nir-to-vir and nir-to-qir backends of v3d and vc4, but nir could potentially do some useful stuff for us (like avoiding unpack/repacks) if we give it the information. v2: Skip lowering for txs/query_levels v3: Fix a crash on old-style shadow v4: Rename to tex_packing, use nir_format_unpack_sint/uint helpers, pack the enum. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-04 15:59:57 -08:00
Eric Anholt	a74f2aeb4f	nir: Allow nir_format_unpack_int/sint to unpack larger values. For V3D, I want to unpack 4-16-bit packed integers for 8 and 16-bit integer samplers. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-04 15:59:30 -08:00
Caio Marcelo de Oliveira Filho	bbf9ee9b18	nir: remove dead code from copy_prop_vars When copy_prop_vars also took care of dead write handling, intrin was used as part of store_to_entry. Now it isn't, so this assignment isn't used really used. Add a comment clarifying what happens to intrin. Fixes: `4dfa7adc10` "nir: Remove handling of dead writes from copy_prop_vars" Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-04 15:18:41 -08:00
Timothy Arceri	4d3f6cb973	nir: merge some basic consecutive ifs After trying multiple times to merge if-statements with phis between them I've come to the conclusion that it cannot be done without regressions. The problem is for some shaders we end up with a whole bunch of phis for the merged ifs resulting in increased register pressure. So this patch just merges ifs that have no phis between them. This seems to be consistent with what LLVM does so for radeonsi we only see a change (although its a large change) in a single shader. Shader-db results i965 (SKL): total instructions in shared programs: 13098176 -> 13098152 (<.01%) instructions in affected programs: 1326 -> 1302 (-1.81%) helped: 4 HURT: 0 total cycles in shared programs: 332032989 -> 332037583 (<.01%) cycles in affected programs: 60665 -> 65259 (7.57%) helped: 0 HURT: 4 The cycles estimates reported by shader-db for i965 seem inaccurate as the only difference in the final code is the removal of the redundent condition evaluations and jumps. Also the biggest code reduction (~7%) for radeonsi was in a tomb raider tressfx shader but for some reason this does not get merged for i965. Shader-db results radeonsi (VEGA): Totals from affected shaders: SGPRS: 232 -> 232 (0.00 %) VGPRS: 164 -> 164 (0.00 %) Spilled SGPRs: 59 -> 59 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 14584 -> 13520 (-7.30 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 13 -> 13 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-01-03 15:17:16 +11:00
Timothy Arceri	19cafe8084	nir: add rewrite_phi_predecessor_blocks() helper This will also be used by the if merge pass in the following commit. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-01-03 15:17:16 +11:00
Timothy Arceri	5122fbc4ba	nir: simplify does_varying_match() Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-03 11:47:56 +11:00
Timothy Arceri	8d05ee2005	nir: make use of does_varying_match() helper Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-03 11:47:56 +11:00
Timothy Arceri	0016166d19	nir: make nir_opt_remove_phis_impl() static Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-03 11:47:56 +11:00
Caio Marcelo de Oliveira Filho	7d6babf995	nir: add a way to print the deref chain Makes debugging easier when we care about the deref chain and not the deref instruction itself. To make it take a const pointer, constify some of the static functions in nir_print.c. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 10:09:04 -08:00
Iago Toral Quiroga	95b7c29c2c	compiler/spirv: handle 16-bit float in radians() and degrees() v2: - use nir_imm_fmul helper (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:05 +01:00
Iago Toral Quiroga	aeee683780	compiler/nir: add nir_fadd_imm() and nir_fmul_imm() helpers Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:05 +01:00
Iago Toral Quiroga	5fc9ad1cb0	compiler/nir: add a nir_b2f() helper Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:05 +01:00
Timothy Arceri	70be9afccb	nir: link time opt duplicate varyings If we are outputting the same value to more than one output component rewrite the inputs to read from a single component. This will allow the duplicate varying components to be optimised away by the existing opts. shader-db results i965 (SKL): total instructions in shared programs: 12869230 -> 12860886 (-0.06%) instructions in affected programs: 322601 -> 314257 (-2.59%) helped: 3080 HURT: 8 total cycles in shared programs: 317792574 -> 317730593 (-0.02%) cycles in affected programs: 2584925 -> 2522944 (-2.40%) helped: 2975 HURT: 477 shader-db results radeonsi (VEGA): SGPRS: 31576 -> 31664 (0.28 %) VGPRS: 17484 -> 17064 (-2.40 %) Spilled SGPRs: 184 -> 167 (-9.24 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 583340 -> 569368 (-2.40 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 6162 -> 6270 (1.75 %) Wait states: 0 -> 0 (0.00 %) vkpipeline-db results RADV (VEGA): Totals from affected shaders: SGPRS: 14880 -> 15080 (1.34 %) VGPRS: 10872 -> 10888 (0.15 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 674016 -> 668396 (-0.83 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 2708 -> 2704 (-0.15 %) Wait states: 0 -> 0 (0.00 % V2: bunch of tidy ups suggested by Jason Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 12:19:17 +11:00
Timothy Arceri	d828694b80	nir: rework nir_link_opt_varyings() This just cleans things up a little and make things more safe for derefs. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 12:19:17 +11:00
Timothy Arceri	c0aba8b0dc	nir: add can_replace_varying() helper This will be reused by the following patch. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 12:19:17 +11:00
Timothy Arceri	50de3f80a8	nir: rename nir_link_constant_varyings() nir_link_opt_varyings() The following patches will add support for an additional optimisation so this function will no longer just optimise varying constants. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 12:19:17 +11:00
Iago Toral Quiroga	d6110d4d54	intel/compiler: move nir_lower_bool_to_int32 before nir_lower_locals_to_regs The former expects to see SSA-only things, but the latter injects registers. The assertions in the lowering where not seeing this because they asserted on the bit_size values only, not on the is_ssa field, so add that assertion too. Fixes: `11dc130779` "nir: Add a bool to int32 lowering pass" CC: mesa-stable@lists.freedesktop.org Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-20 08:02:44 +01:00
Caio Marcelo de Oliveira Filho	947f7b452a	nir: properly find the entry to keep in copy_prop_vars When copy propagation handles a store/copy, it iterates the current copy entries to remove aliases, but keeps the "equal" entry (if exists) to be updated. The removal step may swap the entries around (to ensure there are no holes), invalidating previous iteration pointers. The bug was saving such pointer to use later. Change the code to first perform the removals and then find the remaining right entry. This was causing updates to be lost since they were being made to an entry that was not part of the current copies. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108624 Fixes: `b3c6146925` "nir: Copy propagation between blocks" Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-19 09:33:36 -08:00
Caio Marcelo de Oliveira Filho	0ddc911f4d	nir: properly clear the entry sources in copy_prop_vars When updating a copy entry source value from a "non-SSA" (the data come from a copy instruction) to a "SSA" (the data or parts of it come from SSA values), it was possible to hold invalid data in ssa[0] depending on the writemask. Because the union, ssa[0] could contain a pointer to a nir_deref_instr left-over from previous non-SSA usage. Change code to clean up the array before use to avoid invalid data around. Fixes: `62332d139c` "nir: Add a local variable-based copy propagation pass" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-19 08:35:48 -08:00
Ian Romanick	96c4b135e3	nir/algebraic: Don't put quotes around floating point literals The quotation marks around 1.0 cause it to be treated as a string instead of a floating point value. The generator then treats it as an arbitrary variable replacement, so any iand involving a ('ineg', ('b2i', a)) matches. v2: Remove misleading comment about sized literals (suggested by Timothy). Add assertion that the name of a varible is entierly alphabetic (suggested by Jason). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Timothy Arceri <tarceri@itsqueeze.com> [v1] Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> [v1] Fixes: `6bcd2af086` ("nir/algebraic: Add some optimizations for D3D-style Booleans") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109075	2018-12-18 23:28:31 -08:00
Sagar Ghuge	933c44bcc4	nir: Add a new lowering option to lower 3D surfaces from txd to txl. Tested on gen9. v2: Rename lower_txd_3d_surafaces flag to lower_txd_3d (Jason Ekstrand) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-18 13:44:09 -08:00
Jason Ekstrand	5dad1abfdc	nir/dead_write_vars: Get modes directly from derefs Instead of going all the way back to the variable, just look at the deref. The modes are guaranteed to be the same by nir_validate whenever the variable can be found. This fixes clear_unused_for_modes for derefs that don't have an accessible variable. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	fa40a58fd9	nir/copy_prop_vars: Get modes directly from derefs Instead of going all the way back to the variable, just look at the deref. The modes are guaranteed to be the same by nir_validate whenever the variable can be found. This fixes apply_barrier_for_modes for derefs that don't have an accessible variable. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	cf7fb39805	nir/lower_wpos_center: Look at derefs for modes This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	867fe35a16	nir/lower_io_to_scalar: Look at derefs for modes This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	3fe0363dda	nir/lower_io_arrays_to_elements: Look at derefs for modes This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	8cc0f92492	nir/linking_helpers: Look at derefs for modes This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	8410cf66d7	nir/propagate_invariant: Skip unknown vars If we can't find the variable from the deref, just assume it isn't invariant and continue on. This can happen if, for instance, we're writing to a deref that points into an SSBO. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Ian Romanick	29e4b949b4	Revert "nir/lower_indirect: Bail early if modes == 0" "There's no point in walking the program if we're never going to actually lower anything." Except we might lower compacted local arrays. In that case, modes will be 0, but there is still lowering to be done. This reverts commit `7f75cf2a94`. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109081 Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Clayton Craft <clayton.a.craft@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org>	2018-12-18 10:47:54 -08:00
Ian Romanick	378f996771	nir/opt_peephole_select: Don't peephole_select expensive math instructions On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	09b7e1d8e4	nir/opt_peephole_select: Don't try to remove flow control around indirect loads That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Eric Anholt	708d8f4d0a	nir: Fix clamping of uints for image store lowering. I botched some copy-and-paste and clamped to signed int max instead of uint max. Fixes KHR-GL46.shader_image_load_store.multiple-uniforms on skl. Fixes: `d3e046e76c` ("nir: Pull some of intel's image load/store format conversion to nir_format.h") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-17 20:02:22 +00:00
Ian Romanick	9dc135efa1	nir: Release per-block metadata in nir_sweep nir_sweep already marks all metadata invalid, so it is safe to release the memory here too. mean soft fp64 using uint64: 1,342,759,331 => 1,010,670,475 gfxbench5 aztec ruins high 11: 63,555,571 => 61,889,811 deus ex mankind divided 148: 62,845,304 => 62,829,640 deus ex mankind divided 2890: 71,922,686 => 71,922,686 dirt showdown 676: 69,238,607 => 69,238,607 dolphin ubershaders 210: 77,822,072 => 77,822,072 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 14:39:56 -08:00
Ian Romanick	7adafd6e1c	nir: Fix holes in nir_instr Found using pahole. Changes in peak memory usage according to Valgrind massif: mean soft fp64 using uint64: 1,343,991,403 => 1,342,759,331 gfxbench5 aztec ruins high 11: 63,619,971 => 63,555,571 deus ex mankind divided 148: 62,887,728 => 62,845,304 deus ex mankind divided 2890: 72,399,750 => 71,922,686 dirt showdown 676: 69,464,023 => 69,238,607 dolphin ubershaders 210: 78,359,728 => 77,822,072 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 14:39:56 -08:00
Ian Romanick	8161a87b24	nir/phi_builder: Use per-value hash table to store [block] -> def mapping Replace the old array in each value with a hash table in each value. Changes in peak memory usage according to Valgrind massif: mean soft fp64 using uint64: 5,499,875,082 => 1,343,991,403 gfxbench5 aztec ruins high 11: 63,619,971 => 63,619,971 deus ex mankind divided 148: 62,887,728 => 62,887,728 deus ex mankind divided 2890: 72,402,222 => 72,399,750 dirt showdown 676: 74,466,431 => 69,464,023 dolphin ubershaders 210: 109,630,376 => 78,359,728 Run-time change for a full run on shader-db on my Haswell desktop (with -march=native) is 1.22245% +/- 0.463879% (n=11). This is about +2.9 seconds on a 237 second run. The first time I sent this version of this patch out, the run-time data was quite different. I had misconfigured the script that ran the test, and none of the tests from higher GLSL versions were run. These are generally more complex shaders, and they are more affected by this change. The previous version of this patch used a single hash table for the whole phi builder. The mapping was from [value, block] -> def, so a separate allocation was needed for each [value, block] tuple. There was quite a bit of per-allocation overhead (due to ralloc), so the patch was followed by a patch that added the use of the slab allocator. The results of those two patches was not quite as good: mean soft fp64 using uint64: 5,499,875,082 => 1,343,991,403 gfxbench5 aztec ruins high 11: 63,619,971 => 63,619,971 deus ex mankind divided 148: 62,887,728 => 62,887,728 deus ex mankind divided 2890: 72,402,222 => 72,402,222 * dirt showdown 676: 74,466,431 => 72,443,591 * dolphin ubershaders 210: 109,630,376 => 81,034,320 * The * denote tests that are better now. In the tests that are the same in both patches, the "after" peak memory usage was at a different location. I did not check the local peaks. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 14:39:56 -08:00
Jason Ekstrand	6bcd2af086	nir/algebraic: Add some optimizations for D3D-style Booleans D3D Booleans use a 32-bit 0/-1 representation. Because this previously matched NIR exactly, we didn't have to really optimize for it. Now that we have 1-bit Booleans, we need some specific optimizations to chew through the D3D12-style Booleans. Shader-db results on Kaby Lake: total instructions in shared programs: 15136811 -> 14967944 (-1.12%) instructions in affected programs: 2457021 -> 2288154 (-6.87%) helped: 8318 HURT: 10 total cycles in shared programs: 373544524 -> 359701825 (-3.71%) cycles in affected programs: 151029683 -> 137186984 (-9.17%) helped: 7749 HURT: 682 total loops in shared programs: 4431 -> 4399 (-0.72%) loops in affected programs: 32 -> 0 helped: 21 HURT: 0 total spills in shared programs: 10290 -> 10051 (-2.32%) spills in affected programs: 2532 -> 2293 (-9.44%) helped: 18 HURT: 18 total fills in shared programs: 22203 -> 21732 (-2.12%) fills in affected programs: 3319 -> 2848 (-14.19%) helped: 18 HURT: 18 Note that a large chunk of the improvement fixing regressions caused by switching to 1-bit Booleans. Previously, our ability to optimize D3D booleans was improved by using the D3D representation directly in NIR. Now that NIR does 1-bit bools, we need a few more optimizations. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	3b30814791	nir/algebraic: Optimize 1-bit Booleans Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	44227453ec	nir: Switch to using 1-bit Booleans for almost everything This is a squash of a few distinct changes: glsl,spirv: Generate 1-bit Booleans Revert "Use 32-bit opcodes in the NIR producers and optimizations" Revert "nir/builder: Generate 32-bit bool opcodes transparently" nir/builder: Generate 1-bit Booleans in nir_build_imm_bool Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	11dc130779	nir: Add a bool to int32 lowering pass We also enable it in all of the NIR drivers. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	191a1dce92	nir: Add 1-bit Boolean opcodes We also have to add support for 1-bit integers while we're here so we get 1-bit variants of iand, ior, and inot. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	615cc26b97	nir/algebraic: Generalize an optimization This just makes it nicely scale across bit sizes. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	487514ae61	nir/large_constants: Properly handle 1-bit bools Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	3191a82372	nir: Add support for 1-bit data types This commit adds support for 1-bit Booleans and integers. Booleans obviously take a value of true or false. Because we have to define the semantics of 1-bit signed and unsigned integers, we define uint1_t to take values of 0 and 1 and int1_t to take values of 0 and -1. 1-bit arithmetic is then well-defined in the usual way, just with fewer bits. The definition of int1_t and uint1_t doesn't usually matter but we do need something for purposes of constant folding. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	2fe8708ffd	nir/constant_expressions: Rework Boolean handling This commit contains three related changes. First, we define boolN_t for N = 8, 16, and 64 and move the definition of boolN_vec to the loop with the other vec definitions. Second, there's no reason why we need the != 0 on the source because that happens implicitly when it's converted to bool. Third, for destinations, we use a signed integer type and just do -(int)bool_val which will give us the 0/-1 behavior we want and neatly scales to all bit widths. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	80e8dfe9de	nir: Rename Boolean-related opcodes to include 32 in the name This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' */.c sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	b569093566	nir/algebraic: Make an optimization more specific Later in this series, bool is not going to imply 32-bit. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	517099809a	nir: Drop support for lower_b2f This was originally added for the out-of-tree Mali driver but I think we've all agreed it's easy enough for them to just do in their back-end. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	4bb1a34727	nir/algebraic: Optimize x2b(xneg(a)) -> a Shader-db results on Kaby Lake: total instructions in shared programs: 15072525 -> 15072525 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 This helps prevent regressions in later commits. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	3595a0abf4	nir/constant_folding: Fix source bit size logic Instead of looking at input_sizes[i] which contains the number of components for each source, we look at the bit size of input_types[i]. This fixes a regression in the 1-bit boolean series though I have no idea how we haven't seen it before now. Fixes: `35baee5dce` "nir/constant_folding: fix incorrect bit-size check" Fixes: `9076c4e289` "nir: update opcode definitions for different bit sizes" Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	e17426058c	nir/lower_idiv: Use ilt instead of bit twiddling The previous code was creating a boolean by doing an arithmetic right- shift by 31 which produces a boolean which is true if the argument is negative. This is the same as the expression r < 0 which is much simpler and doesn't depend on NIR's representation of booleans. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-12-16 21:03:02 +00:00
Rhys Perry	ed4020fabe	nir: fix constness in nir_intrinsic_align() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-16 14:56:10 +00:00
Ian Romanick	ba5402ec9a	nir/phi_builder: Internal users should use nir_phi_builder_value_set_block_def too Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-14 07:36:05 -08:00
Timothy Arceri	a2ec78883f	nir: fix opt_if_loop_last_continue() The pass did not correctly handle loops ending in: if ssa_7 { block block_8: /* preds: block_7 / continue / succs: block_1 / } else { block block_9: / preds: block_7 / break / succs: block_11 */ } The break will get eliminated by another opt but if this pass gets called first (as it does on RADV) we ended up inserting instructions after the break. Fixes: `5921a19d4b` ("nir: add if opt opt_if_loop_last_continue()") Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-12-14 17:21:35 +11:00
Eric Anholt	4407e688cd	nir: Move intel's half-float image store lowering to to nir_format.h. I needed the same function for v3d. This was originally in `d3e046e76c` ("nir: Pull some of intel's image load/store format conversion to nir_format.h") before we made am istake about simplifying the function. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-13 12:24:26 -08:00
Eric Anholt	c2c44dba7a	nir: Print the format of image variables. This helps a lot when debugging image load/store lowering on large testcases. Unfortunately the Mesa enum name stuff is under src/mesa and we can't get at it from the compiler. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-13 12:24:12 -08:00
Jason Ekstrand	74492ebad9	nir: Add a pass for lowering integer division by constants It's a reasonably well-known fact in the world of compilers that integer divisions by constants can be replaced by a multiply, an add, and some shifts. This commit adds such an optimization to NIR for easiest case of udiv. Other division operations will be added in following commits. In order to provide some additional driver control, the pass takes a minimum bit size to optimize. Reviewed-by: Ian Romanick ian.d.romanick@intel.com	2018-12-13 17:49:48 +00:00
Ian Romanick	090e282407	nir: Add a saturated unsigned integer add opcode Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-13 17:49:48 +00:00
Jason Ekstrand	39198a1238	nir/lower_int64: Add support for [iu]mul_high Reviewed-by: Ian Romanick ian.d.romanick@intel.com	2018-12-13 17:49:48 +00:00
Jason Ekstrand	9525971e2b	nir: Allow [iu]mul_high on non-32-bit types Reviewed-by: Ian Romanick ian.d.romanick@intel.com	2018-12-13 17:49:48 +00:00
Alejandro Piñeiro	c7bdcd67aa	nir: remove unused variable To avoid the following warning: ./src/compiler/nir/nir_loop_analyze.c:807:16: warning: unused variable ‘ns’ [-Wunused-variable] nir_shader *ns = impl->function->shader; Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-13 16:35:21 +01:00
Eric Anholt	d3e046e76c	nir: Pull some of intel's image load/store format conversion to nir_format.h I needed the same functions for v3d. Note that the color value in the Intel lowering has already been cut down to image.chans num_components. v2: Drop the half float one, since it was a 1-liner after cleanup. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-12 16:09:43 -08:00
Eric Anholt	19c7cba2ab	nir: Add some more consts to the nir_format_convert.h helpers. Most of the bits were constant, but a few were missed. Avoids warnings from v3d's upcoming static const bits declarations. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-12 16:09:37 -08:00
Timothy Arceri	9e6b39e1d5	nir: detect more induction variables This allows loop analysis to detect inductions variables that are incremented in both branches of an if rather than in a main loop block. For example: loop { block block_1: /* preds: block_0 block_7 / vec1 32 ssa_8 = phi block_0: ssa_4, block_7: ssa_20 vec1 32 ssa_9 = phi block_0: ssa_0, block_7: ssa_4 vec1 32 ssa_10 = phi block_0: ssa_1, block_7: ssa_4 vec1 32 ssa_11 = phi block_0: ssa_2, block_7: ssa_21 vec1 32 ssa_12 = phi block_0: ssa_3, block_7: ssa_22 vec4 32 ssa_13 = vec4 ssa_12, ssa_11, ssa_10, ssa_9 vec1 32 ssa_14 = ige ssa_8, ssa_5 / succs: block_2 block_3 / if ssa_14 { block block_2: / preds: block_1 / break / succs: block_8 / } else { block block_3: / preds: block_1 / / succs: block_4 / } block block_4: / preds: block_3 / vec1 32 ssa_15 = ilt ssa_6, ssa_8 / succs: block_5 block_6 / if ssa_15 { block block_5: / preds: block_4 / vec1 32 ssa_16 = iadd ssa_8, ssa_7 vec1 32 ssa_17 = load_const (0x3f800000 / 1.000000/) / succs: block_7 / } else { block block_6: / preds: block_4 / vec1 32 ssa_18 = iadd ssa_8, ssa_7 vec1 32 ssa_19 = load_const (0x3f800000 / 1.000000/) / succs: block_7 / } block block_7: / preds: block_5 block_6 / vec1 32 ssa_20 = phi block_5: ssa_16, block_6: ssa_18 vec1 32 ssa_21 = phi block_5: ssa_17, block_6: ssa_4 vec1 32 ssa_22 = phi block_5: ssa_4, block_6: ssa_19 / succs: block_1 */ } Unfortunatly GCM could move the addition out of the if for us (making this patch unrequired) but we still cannot enable the GCM pass without regressions. This unrolls a loop in Rise of The Tomb Raider. vkpipeline-db results (VEGA): Totals from affected shaders: SGPRS: 88 -> 96 (9.09 %) VGPRS: 56 -> 52 (-7.14 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 2168 -> 4560 (110.33 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 4 -> 4 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32211	2018-12-13 10:58:35 +11:00

... 2 3 4 5 6 ...

1510 Commits