KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Danylo Piliaiev	b8d486f298	nir/algebraic: Separate has_dot_4x8 into has_sdot_4x8 and has_udot_4x8 Adreno GPUs has native instruction for unsigned and mixed dot_4x8 but not signed dot product. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>	2022-01-10 13:20:39 +02:00
Daniel Schürmann	17ecd0b31a	nir/opt_algebraic: lower fneg_hi/lo to fmul This pattern, found in the FSR upscaling shader, helps the vectorization efforts by keeping the chain of vectorized instructions intact. Radeon can optimize it to per-component fneg modifiers. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13688>	2021-12-21 13:23:37 +01:00
Rhys Perry	403ae3b48e	nir/algebraic: optimize more 64-bit imul with constant source Two 64-bit shifts and an addition are usually faster than the several multiplications nir_lower_int64 creates. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14227>	2021-12-17 18:51:24 +00:00
Rhys Perry	a2d8c5b26d	nir/algebraic: optimize a*#b & -4 fossil-db (Sienna Cichlid): Totals from 611 (0.47% of 128647) affected shaders: CodeSize: 3096680 -> 3090976 (-0.18%) Instrs: 570494 -> 569249 (-0.22%) Latency: 5765865 -> 5759619 (-0.11%) InvThroughput: 969840 -> 967608 (-0.23%) VClause: 9690 -> 9688 (-0.02%) Copies: 42884 -> 42894 (+0.02%); split: -0.01%, +0.03% PreVGPRs: 28290 -> 28288 (-0.01%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13752>	2021-12-03 13:41:07 +00:00
Rhys Perry	12294026d5	nir/algebraic: optimize Cyberpunk 2077's open-coded bitfieldReverse() fossil-db (Sienna Cichlid): Totals from 9 (0.01% of 128647) affected shaders: CodeSize: 29900 -> 28640 (-4.21%) Instrs: 5677 -> 5443 (-4.12%) Latency: 96561 -> 95025 (-1.59%) Copies: 571 -> 544 (-4.73%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13673>	2021-11-05 09:31:04 +00:00
Jason Ekstrand	b71bdc3404	nir/algebraic: Add some opts for comparisons of comparisons Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13167>	2021-10-07 18:21:11 +00:00
Jason Ekstrand	7abf3955ca	nir/algebraic: Add some boolean optimizations Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13167>	2021-10-07 18:21:11 +00:00
Jason Ekstrand	c8b2be0b95	nir/algebraic: Lower fisfinite Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13167>	2021-10-07 18:21:11 +00:00
Ian Romanick	cb28361642	nir/algebraic: Small optimizations for SpvOpFOrdNotEqual and SpvOpFUnordEqual No shader-db changes on any Intel platform. Fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 144380118 -> 143692823 (-0.5%) SENDs in all programs: 6920822 -> 6920822 (+0.0%) Loops in all programs: 38299 -> 38299 (+0.0%) Cycles in all programs: 8434782176 -> 8423078994 (-0.1%) Spills in all programs: 206830 -> 204469 (-1.1%) Fills in all programs: 318737 -> 313660 (-1.6%) Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12320>	2021-10-06 01:53:47 +00:00
Rhys Perry	a1af902531	nir/algebraic: distribute fmul(fadd(a, b), c) when b and c are constants This allows for more MAD/FMA instructions to be created. fossil-db (Sienna Cichlid): Totals from 50134 (33.46% of 149839) affected shaders: VGPRs: 2436536 -> 2436000 (-0.02%); split: -0.05%, +0.03% SpillSGPRs: 13136 -> 13135 (-0.01%); split: -0.02%, +0.02% CodeSize: 206621424 -> 206278292 (-0.17%); split: -0.23%, +0.07% MaxWaves: 1116804 -> 1117448 (+0.06%); split: +0.07%, -0.01% Instrs: 38977460 -> 38862886 (-0.29%); split: -0.33%, +0.04% Latency: 832425389 -> 827432260 (-0.60%); split: -0.63%, +0.03% InvThroughput: 184193457 -> 183563350 (-0.34%); split: -0.37%, +0.03% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7458>	2021-09-17 17:28:26 +00:00
Rhys Perry	41ecef7855	nir: add sdot_2x16 and udot_2x16 opcodes Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>	2021-09-03 13:21:27 +00:00
Rhys Perry	ae00f5af61	nir: separate lower_add_sat Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>	2021-09-03 13:21:27 +00:00
Samuel Pitoiset	cff106c4b6	nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(fabs(b), -a) and fmin(-fmax(b, a)) to fmin(-fabs(b), -a). fossils-db (Sienna Cichlid): Totals from 34 (0.02% of 150170) affected shaders: CodeSize: 388540 -> 387748 (-0.20%) Instrs: 74621 -> 74423 (-0.27%) Latency: 1039407 -> 1039011 (-0.04%) InvThroughput: 208364 -> 208150 (-0.10%) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12519>	2021-08-25 07:18:24 +02:00
Ian Romanick	a6db40605e	nir/algebraic: Add some extract optimizations These help quite a bit when vectored versions of SpvOpSDotKHR and friends are emitted as packed versions and then lowered. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Ian Romanick	839495efc6	nir/algebraic: Add lowering for dot_4x8 instructions v2: Fix copy-and-paste bugs in lowering patterns. v3: Add has_sudot_4x8 flag. Requested by Rhys. v4: Since the names of the opcodes changed from dp4 to dot_4x8, also change the names of the lowering helpers. Suggested by Jason. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Ian Romanick	806cd2341c	nir/algebraic: Basic patterns for dot_4x8 v2: Add and modify patterns to let constant folding do better. v3: Remove '(is_not_zero)' from the patterns that try to combine addends. I honestly don't know why I had it there in the first place, and nothing in my deep git logs could help clue me in. Noticed by Alyssa. Remover patterns that detect open-coded udot_4x8. Suggested by Alyssa and Jason. Add missing sudot_4x8 patterns. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Daniel Schürmann	2cf164feb9	nir/opt_algebraic: optimize flrp(fadd, fadd, x) only if fadd are used_once Totals from 201 (0.13% of 150170) affected shaders: (GFX10.3) VGPRs: 13880 -> 13856 (-0.17%) CodeSize: 1517328 -> 1518124 (+0.05%); split: -0.04%, +0.10% MaxWaves: 3184 -> 3192 (+0.25%) Instrs: 285487 -> 285569 (+0.03%); split: -0.06%, +0.08% Latency: 7774066 -> 7780877 (+0.09%); split: -0.10%, +0.19% InvThroughput: 1936341 -> 1935287 (-0.05%); split: -0.07%, +0.02% SClause: 11446 -> 11448 (+0.02%); split: -0.01%, +0.03% Copies: 17500 -> 17506 (+0.03%); split: -0.51%, +0.55% Branches: 8174 -> 8180 (+0.07%); split: -0.13%, +0.21% PreVGPRs: 12507 -> 12427 (-0.64%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12061>	2021-08-24 16:10:30 +00:00
Samuel Pitoiset	f4b858e746	Revert "nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(b, -a)" This is wrong for negative values. This reverts commit `07cd30ca29`. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12515>	2021-08-24 08:58:38 +00:00
Samuel Pitoiset	07cd30ca29	nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(b, -a) Found with Cyberpunk 2077. fossils-db (GFX10.3): Totals from 128 (2.34% of 5465) affected shaders: CodeSize: 769720 -> 767656 (-0.27%); split: -0.27%, +0.00% Instrs: 145748 -> 145229 (-0.36%) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11604>	2021-08-23 17:53:38 +00:00
Ian Romanick	f0a8a9816a	nir: intel/compiler: Add and use nir_op_pack_32_4x8_split A lot of CTS tests write a u8vec4 or an i8vec4 to an SSBO. This results in a lot of shifts and MOVs. When that pattern can be recognized, the individual 8-bit components can be packed much more efficiently. v2: Rebase on `b4369de27f` ("nir/lower_packing: use shader_instructions_pass") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>	2021-08-18 22:03:37 +00:00
Ian Romanick	89f639c0ca	nir/algebraic: Remove spurious conversions from inside logic ops Not only does this eliminate a bunch of unnecessary type converting MOVs, but it can also enable some SWAR. The dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag test does something about like: c = a.x ^ b.x; d = a.y ^ b.y; e = a.z ^ b.z; After this change, it looks more like: uint t = i8vec3AsUint(a) ^ i8vec3AsUint(b); c = extract_u8(t, 0); d = extract_u8(t, 1); e = extract_u8(t, 2); On Ice Lake, this results in: SIMD8 shader: 41 instructions. 1 loops. 3804 cycles. 0:0 spills:fills, 5 sends SIMD8 shader: 31 instructions. 1 loops. 2844 cycles. 0:0 spills:fills, 5 sends Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>	2021-08-18 22:03:37 +00:00
Ian Romanick	a147717a93	nir/algebraic: Optimize some extract forms resulting from 8-bit lowering This eliminates some spurious, size-converting moves. For example, on Ice Lake this helps dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag: SIMD8 shader: 56 instructions. 1 loops. 4444 cycles. 0:0 spills:fills, 5 sends SIMD8 shader: 52 instructions. 1 loops. 4164 cycles. 0:0 spills:fills, 5 sends v2: Condition two of the patterns on !options->lower_extract_byte. Suggested by Lionel. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>	2021-08-18 22:03:37 +00:00
Rhys Perry	ed70b256ce	nir: add ffma creation helpers Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>	2021-08-16 17:19:45 +00:00
Rhys Perry	4ec4d862c2	nir/algebraic: add is_used_once to dot product reassociation optimization This improves register usage. fossil-db (Sienna Cichlid, on top of !9805): Totals from 4317 (2.88% of 149839) affected shaders: VGPRs: 352592 -> 351704 (-0.25%); split: -1.48%, +1.23% SpillSGPRs: 182 -> 248 (+36.26%) CodeSize: 31601192 -> 31587624 (-0.04%); split: -0.09%, +0.04% MaxWaves: 56964 -> 57298 (+0.59%); split: +2.48%, -1.90% Instrs: 5973557 -> 5974122 (+0.01%); split: -0.05%, +0.06% Latency: 72088175 -> 72253033 (+0.23%); split: -0.36%, +0.59% InvThroughput: 14978160 -> 14798919 (-1.20%); split: -1.29%, +0.09% VClause: 100994 -> 98645 (-2.33%); split: -3.05%, +0.73% SClause: 278206 -> 276820 (-0.50%); split: -0.54%, +0.04% Copies: 200264 -> 199556 (-0.35%); split: -1.17%, +0.82% Branches: 86410 -> 85930 (-0.56%); split: -0.56%, +0.01% PreSGPRs: 207355 -> 207759 (+0.19%); split: -0.00%, +0.20% PreVGPRs: 314646 -> 310911 (-1.19%); split: -1.35%, +0.17% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>	2021-08-16 17:19:45 +00:00
Rhys Perry	f95a16be72	nir/algebraic: reassociate add chains for more MAD/FMA-friendly code fossil-db (GFX10.3): Totals from 25866 (17.68% of 146267) affected shaders: VGPRs: 1625456 -> 1644936 (+1.20%); split: -0.05%, +1.24% SpillSGPRs: 11729 -> 11725 (-0.03%); split: -0.07%, +0.03% CodeSize: 161604460 -> 161458052 (-0.09%); split: -0.11%, +0.02% MaxWaves: 454842 -> 452160 (-0.59%); split: +0.04%, -0.63% Instrs: 30652596 -> 30456446 (-0.64%); split: -0.65%, +0.01% Latency: 723098749 -> 722084247 (-0.14%); split: -0.21%, +0.07% InvThroughput: 166023468 -> 165506875 (-0.31%); split: -0.36%, +0.05% fossil-db (GFX10): Totals from 25866 (17.68% of 146267) affected shaders: VGPRs: 1593576 -> 1611976 (+1.15%); split: -0.09%, +1.25% SpillSGPRs: 11729 -> 11725 (-0.03%); split: -0.07%, +0.03% CodeSize: 162294468 -> 162154456 (-0.09%); split: -0.11%, +0.02% MaxWaves: 477448 -> 474166 (-0.69%); split: +0.10%, -0.79% Instrs: 30820164 -> 30625805 (-0.63%); split: -0.65%, +0.02% Latency: 723190249 -> 722273445 (-0.13%); split: -0.20%, +0.08% InvThroughput: 163114872 -> 162582966 (-0.33%); split: -0.37%, +0.04% fossil-db (GFX9): Totals from 25866 (17.67% of 146401) affected shaders: SGPRs: 2167808 -> 2169920 (+0.10%); split: -0.09%, +0.19% VGPRs: 1649404 -> 1667592 (+1.10%); split: -0.43%, +1.53% CodeSize: 161273556 -> 161281996 (+0.01%); split: -0.07%, +0.08% MaxWaves: 114910 -> 113519 (-1.21%); split: +0.10%, -1.31% Instrs: 31557180 -> 31403708 (-0.49%); split: -0.50%, +0.02% Latency: 899594793 -> 898786283 (-0.09%); split: -0.19%, +0.10% InvThroughput: 412265691 -> 411551698 (-0.17%); split: -0.28%, +0.11% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>	2021-08-16 17:19:45 +00:00
Rhys Perry	110bcb4919	nir/algebraic: add various ffma optimizations fossil-db (GFX10.3): Totals from 7532 (5.15% of 146267) affected shaders: VGPRs: 414696 -> 414304 (-0.09%); split: -0.18%, +0.08% CodeSize: 33393444 -> 33375908 (-0.05%); split: -0.13%, +0.08% MaxWaves: 149854 -> 150094 (+0.16%); split: +0.27%, -0.11% Instrs: 6279823 -> 6271364 (-0.13%); split: -0.18%, +0.05% Latency: 60308898 -> 60296025 (-0.02%); split: -0.13%, +0.11% InvThroughput: 13770542 -> 13745192 (-0.18%); split: -0.24%, +0.06% fossil-db (GFX10): Totals from 7532 (5.15% of 146267) affected shaders: VGPRs: 406664 -> 405564 (-0.27%); split: -0.39%, +0.12% CodeSize: 33544656 -> 33527568 (-0.05%); split: -0.13%, +0.08% MaxWaves: 158584 -> 158858 (+0.17%); split: +0.30%, -0.13% Instrs: 6316242 -> 6307913 (-0.13%); split: -0.18%, +0.05% Latency: 60243290 -> 60232844 (-0.02%); split: -0.13%, +0.11% InvThroughput: 13643345 -> 13620171 (-0.17%); split: -0.24%, +0.07% fossil-db (GFX9): Totals from 7543 (5.15% of 146401) affected shaders: SGPRs: 546384 -> 547472 (+0.20%); split: -0.08%, +0.28% VGPRs: 412636 -> 411896 (-0.18%); split: -0.27%, +0.09% CodeSize: 33216196 -> 33210564 (-0.02%); split: -0.12%, +0.11% MaxWaves: 38771 -> 38789 (+0.05%); split: +0.17%, -0.12% Instrs: 6419878 -> 6414891 (-0.08%); split: -0.18%, +0.11% Latency: 70972327 -> 70922754 (-0.07%); split: -0.15%, +0.08% InvThroughput: 33949039 -> 33909258 (-0.12%); split: -0.20%, +0.08% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>	2021-08-16 17:19:45 +00:00
Eric Engestrom	f1eae2f8bb	python: drop python2 support Signed-off-by: Eric Engestrom <eric@engestrom.ch> Acked-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3674>	2021-08-14 21:44:32 +00:00
Ian Romanick	84d2e53789	Revert "nir/algebraic: Convert some f2u to f2i" Per https://gitlab.freedesktop.org/mesa/mesa/-/issues/5178#note_1019666, the assumption fundamental to this optimization is false. Section 2.4.1 (Float to Integer) of Ivy Bridge PRMs describes the situation. The wording of the section is somewhat confusing (because it doesn't clearly delineate between signed and unsigned integers), but the last two rows of the table make it clear that F->UD conversion clamps negative float values to 0. All other hardware mentioned in that thread seems to behave the same way. The real problem is that, with hardware that behaves in this ways, converting f2u(2147483648.0) to f2i(2147483648.0) changes the bit pattern that would be produced from 0x80000000 to 0x7fffffff. This reverts commit `ad05920258`. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12297>	2021-08-10 22:16:13 +00:00
Rhys Perry	4e2b94331b	nir/algebraic: improve irem by power-of-two optimization Requires one less instruction. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>	2021-08-09 11:00:39 +00:00
Rhys Perry	b009467b81	nir/algebraic: add optimizations for imul(a, INT_MIN) is_pos_power_of_two would catch this, but nir_op_imul has signed sources, so is_neg_power_of_two catches it instead, which creates a useless nir_op_ineg. fossil-db (Sienna Cichlid): Totals from 1014 (0.68% of 150170) affected shaders: CodeSize: 3592296 -> 3592288 (-0.00%); split: -0.00%, +0.00% Instrs: 671211 -> 670426 (-0.12%) Latency: 5268917 -> 5268479 (-0.01%); split: -0.01%, +0.00% InvThroughput: 2187349 -> 2187343 (-0.00%); split: -0.00%, +0.00% VClause: 8634 -> 8636 (+0.02%) Copies: 97585 -> 97604 (+0.02%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>	2021-08-09 11:00:39 +00:00
Rhys Perry	65cd5a0f22	nir/algebraic: don't optimize umod/imod/irem if lower_bitops=true Match the udiv/idiv/imul by power-of-two optimizations. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>	2021-08-09 11:00:39 +00:00
Rhys Perry	ec4b425f59	nir/algebraic: fix imod by negative power-of-two If "a" is a multiple of "b", then the result would have been "b" instead of 0. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Fixes: `0ef5f3552f` ("nir: add strength reduction pattern for imod/irem with pow2 divisor.") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12039>	2021-08-09 11:00:39 +00:00
Dave Airlie	ad92c2b253	nir: add fisnormal lowering just lower the 32-bit version for now. Thanks to alyssa for this suggested lowering. Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12207>	2021-08-06 14:27:48 +10:00
Sagar Ghuge	06ab737686	nir: Add optimizations for iadd3 This patch also adds has_iadd3 bit to give more control if backend supports ternary add instruction or not. v2: - Add patterns in late optimization (Connor Abbott) Suggested-by: Alyssa/Jason Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11596>	2021-07-16 15:59:56 +00:00
Jason Ekstrand	2e08bae9b3	nir,vc4: Suffix a bunch of unorm 4x8 opcodes _vc4 Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11463>	2021-06-21 09:04:08 -05:00
Rhys Perry	1cbcfb8b38	nir, nir/algebraic: add byte/word insertion instructions Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>	2021-06-08 08:57:42 +00:00
Rhys Perry	edae3e5623	nir/algebraic: optimize extract of extract Found in some sottr shaders (originally iand(ishr(a, 16), 0xffff)) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3151>	2021-06-08 08:57:42 +00:00
Ian Romanick	d246c31ec1	nir/algebraic: Add algebraic opt for float comparisons with identical operands. The flt version could have been added in `56e21647e2`, but our collective understanding of NaN and comparisons was poor in 2015. The new "is_a_number" predicate makes the others possible. All of the helped shaders in shader-db are either from Mad Max or Skia. Some of the Skia shaders just get decimated by this change: instructions helped: shaders/skia/580-4.shader_test FS SIMD8: 81 -> 29 (-64.20%) (scheduled: top-down) I looked at a couple of those shaders, and they had sequences like: vec1 32 ssa_44 = flt32 ssa_32, ssa_32 vec1 32 ssa_45 = b32csel ssa_44, ssa_43, ssa_0 vec1 32 ssa_46 = fge32 ssa_32, ssa_32 vec1 32 ssa_47 = b32csel ssa_46, ssa_0, ssa_45 vec1 32 ssa_48 = iand ssa_46, ssa_44 vec1 32 ssa_49 = b32csel ssa_48, ssa_43, ssa_0 ssa_44 is replaced with False. Then ssa_47 selects between ssa_0 and ssa_0, so ssa_47 and ssa_46 are eliminated. ssa_48 is (False && don't care), so ssa_48 and ssa_49 are eliminated. After that, many calculations now involve constants of zero, so they are optimized down too. So it continues until there's not much left! Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> All Intel platforms had similar results. (Tiger Lake shown) total instructions in shared programs: 21072238 -> 21071386 (<.01%) instructions in affected programs: 33722 -> 32870 (-2.53%) helped: 146 HURT: 1 helped stats (abs) min: 1 max: 62 x̄: 5.84 x̃: 2 helped stats (rel) min: 0.19% max: 62.35% x̄: 4.09% x̃: 1.07% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.20% max: 0.20% x̄: 0.20% x̃: 0.20% 95% mean confidence interval for instructions value: -7.94 -3.65 95% mean confidence interval for instructions %-change: -5.87% -2.25% Instructions are helped. total cycles in shared programs: 856203326 -> 856192238 (<.01%) cycles in affected programs: 749966 -> 738878 (-1.48%) helped: 148 HURT: 0 helped stats (abs) min: 1 max: 1226 x̄: 74.92 x̃: 18 helped stats (rel) min: 0.07% max: 49.70% x̄: 2.69% x̃: 0.46% 95% mean confidence interval for cycles value: -104.82 -45.02 95% mean confidence interval for cycles %-change: -4.01% -1.37% Cycles are helped. LOST: 4 GAINED: 0 Fossil-db results: Tiger Lake Instructions in all programs: 160915223 -> 160898354 (-0.0%) SENDs in all programs: 6812780 -> 6812780 (+0.0%) Loops in all programs: 38340 -> 38340 (+0.0%) Cycles in all programs: 7434144207 -> 7433978462 (-0.0%) Spills in all programs: 192582 -> 192582 (+0.0%) Fills in all programs: 304537 -> 304537 (+0.0%) Ice Lake Instructions in all programs: 145296298 -> 145279531 (-0.0%) SENDs in all programs: 6863692 -> 6863692 (+0.0%) Loops in all programs: 38334 -> 38334 (+0.0%) Cycles in all programs: 8800257014 -> 8800088384 (-0.0%) Spills in all programs: 216880 -> 216880 (+0.0%) Fills in all programs: 334248 -> 334248 (+0.0%) Skylake Instructions in all programs: 135891664 -> 135874910 (-0.0%) SENDs in all programs: 6802946 -> 6802946 (+0.0%) Loops in all programs: 38331 -> 38331 (+0.0%) Cycles in all programs: 8444273433 -> 8444130932 (-0.0%) Spills in all programs: 194839 -> 194839 (+0.0%) Fills in all programs: 301114 -> 301114 (+0.0%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>	2021-05-20 01:39:35 +00:00
Ian Romanick	64bcfc3a17	nir/algebraic: Rearrange some logic-joined comparisons and reduce On Skylake and Broadwell, a single big compute shader in Dirt Rally has spills and fills REALLY helped. That same shader is hurt very slightly for spills and fills on Ice Lake. v2: Move the patterns earlier to be nearer other patterns that are similar. Mark the replacement fmin and fmax exact. Both suggested by Rhys. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Tiger Lake total instructions in shared programs: 21073812 -> 21073041 (<.01%) instructions in affected programs: 77608 -> 76837 (-0.99%) helped: 522 HURT: 33 helped stats (abs) min: 1 max: 26 x̄: 1.58 x̃: 1 helped stats (rel) min: 0.22% max: 14.29% x̄: 1.29% x̃: 1.02% HURT stats (abs) min: 1 max: 8 x̄: 1.67 x̃: 1 HURT stats (rel) min: 0.25% max: 3.42% x̄: 1.06% x̃: 0.86% 95% mean confidence interval for instructions value: -1.57 -1.20 95% mean confidence interval for instructions %-change: -1.25% -1.05% Instructions are helped. total cycles in shared programs: 856224346 -> 856211096 (<.01%) cycles in affected programs: 2394231 -> 2380981 (-0.55%) helped: 603 HURT: 25 helped stats (abs) min: 1 max: 5218 x̄: 59.37 x̃: 28 helped stats (rel) min: 0.06% max: 5.61% x̄: 1.52% x̃: 1.37% HURT stats (abs) min: 2 max: 21394 x̄: 901.92 x̃: 10 HURT stats (rel) min: 0.02% max: 5.90% x̄: 0.95% x̃: 0.59% 95% mean confidence interval for cycles value: -93.61 51.41 95% mean confidence interval for cycles %-change: -1.50% -1.34% Inconclusive result (value mean confidence interval includes 0). LOST: 1 GAINED: 1 Ice Lake total instructions in shared programs: 20025692 -> 20024554 (<.01%) instructions in affected programs: 104981 -> 103843 (-1.08%) helped: 738 HURT: 0 helped stats (abs) min: 1 max: 30 x̄: 1.54 x̃: 1 helped stats (rel) min: 0.31% max: 10.53% x̄: 1.20% x̃: 1.06% 95% mean confidence interval for instructions value: -1.66 -1.43 95% mean confidence interval for instructions %-change: -1.26% -1.14% Instructions are helped. total cycles in shared programs: 979474407 -> 979422333 (<.01%) cycles in affected programs: 4136364 -> 4084290 (-1.26%) helped: 759 HURT: 59 helped stats (abs) min: 2 max: 11010 x̄: 72.78 x̃: 28 helped stats (rel) min: 0.03% max: 6.43% x̄: 1.23% x̃: 1.02% HURT stats (abs) min: 1 max: 698 x̄: 53.66 x̃: 8 HURT stats (rel) min: 0.02% max: 24.05% x̄: 1.64% x̃: 0.33% 95% mean confidence interval for cycles value: -97.08 -30.24 95% mean confidence interval for cycles %-change: -1.14% -0.91% Cycles are helped. total spills in shared programs: 10568 -> 10569 (<.01%) spills in affected programs: 102 -> 103 (0.98%) helped: 0 HURT: 1 total fills in shared programs: 11347 -> 11349 (0.02%) fills in affected programs: 277 -> 279 (0.72%) helped: 0 HURT: 1 LOST: 2 GAINED: 2 Skylake total instructions in shared programs: 18190419 -> 18188523 (-0.01%) instructions in affected programs: 102502 -> 100606 (-1.85%) helped: 791 HURT: 0 helped stats (abs) min: 1 max: 676 x̄: 2.40 x̃: 1 helped stats (rel) min: 0.34% max: 20.23% x̄: 1.41% x̃: 1.23% 95% mean confidence interval for instructions value: -4.07 -0.72 95% mean confidence interval for instructions %-change: -1.47% -1.34% Instructions are helped. total cycles in shared programs: 960737969 -> 960498951 (-0.02%) cycles in affected programs: 4435351 -> 4196333 (-5.39%) helped: 804 HURT: 67 helped stats (abs) min: 1 max: 198540 x̄: 300.54 x̃: 24 helped stats (rel) min: 0.03% max: 25.41% x̄: 1.21% x̃: 0.92% HURT stats (abs) min: 2 max: 680 x̄: 39.06 x̃: 6 HURT stats (rel) min: 0.05% max: 23.98% x̄: 1.12% x̃: 0.19% 95% mean confidence interval for cycles value: -722.03 173.20 95% mean confidence interval for cycles %-change: -1.15% -0.91% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 9757 -> 9722 (-0.36%) spills in affected programs: 138 -> 103 (-25.36%) helped: 1 HURT: 0 total fills in shared programs: 9861 -> 9576 (-2.89%) fills in affected programs: 564 -> 279 (-50.53%) helped: 1 HURT: 0 LOST: 5 GAINED: 2 Broadwell total instructions in shared programs: 17853870 -> 17852414 (<.01%) instructions in affected programs: 101276 -> 99820 (-1.44%) helped: 777 HURT: 0 helped stats (abs) min: 1 max: 264 x̄: 1.87 x̃: 1 helped stats (rel) min: 0.34% max: 8.44% x̄: 1.37% x̃: 1.23% 95% mean confidence interval for instructions value: -2.54 -1.21 95% mean confidence interval for instructions %-change: -1.42% -1.32% Instructions are helped. total cycles in shared programs: 1029846029 -> 1029725458 (-0.01%) cycles in affected programs: 4435791 -> 4315220 (-2.72%) helped: 813 HURT: 43 helped stats (abs) min: 2 max: 68560 x̄: 149.95 x̃: 24 helped stats (rel) min: 0.02% max: 73.73% x̄: 1.43% x̃: 0.92% HURT stats (abs) min: 2 max: 726 x̄: 31.12 x̃: 13 HURT stats (rel) min: 0.01% max: 8.43% x̄: 0.62% x̃: 0.31% 95% mean confidence interval for cycles value: -299.58 17.87 95% mean confidence interval for cycles %-change: -1.63% -1.02% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 20333 -> 20307 (-0.13%) spills in affected programs: 151 -> 125 (-17.22%) helped: 1 HURT: 0 total fills in shared programs: 25899 -> 25775 (-0.48%) fills in affected programs: 573 -> 449 (-21.64%) helped: 1 HURT: 0 LOST: 5 GAINED: 0 Sandy Bridge, Ivy Bridge, and Haswell had similar results. (Haswell shown) total instructions in shared programs: 16417658 -> 16416320 (<.01%) instructions in affected programs: 96495 -> 95157 (-1.39%) helped: 774 HURT: 0 helped stats (abs) min: 1 max: 18 x̄: 1.73 x̃: 1 helped stats (rel) min: 0.33% max: 9.80% x̄: 1.52% x̃: 1.20% 95% mean confidence interval for instructions value: -1.83 -1.63 95% mean confidence interval for instructions %-change: -1.59% -1.46% Instructions are helped. total cycles in shared programs: 1037104346 -> 1037080579 (<.01%) cycles in affected programs: 3787747 -> 3763980 (-0.63%) helped: 791 HURT: 53 helped stats (abs) min: 1 max: 5411 x̄: 65.87 x̃: 32 helped stats (rel) min: 0.02% max: 21.17% x̄: 1.44% x̃: 1.18% HURT stats (abs) min: 2 max: 14160 x̄: 534.72 x̃: 18 HURT stats (rel) min: 0.02% max: 15.37% x̄: 5.70% x̃: 0.54% 95% mean confidence interval for cycles value: -69.39 13.07 95% mean confidence interval for cycles %-change: -1.19% -0.80% Inconclusive result (value mean confidence interval includes 0). LOST: 12 GAINED: 2 GM45 and Iron Lake had similar results. (Iron Lake shown) total instructions in shared programs: 8132855 -> 8132703 (<.01%) instructions in affected programs: 8782 -> 8630 (-1.73%) helped: 38 HURT: 0 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 1.66% max: 3.23% x̄: 1.77% x̃: 1.72% 95% mean confidence interval for instructions value: -4.00 -4.00 95% mean confidence interval for instructions %-change: -1.88% -1.65% Instructions are helped. total cycles in shared programs: 238300850 -> 238298568 (<.01%) cycles in affected programs: 257202 -> 254920 (-0.89%) helped: 62 HURT: 2 helped stats (abs) min: 4 max: 58 x̄: 36.90 x̃: 50 helped stats (rel) min: 0.15% max: 1.55% x̄: 0.87% x̃: 1.12% HURT stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3 HURT stats (rel) min: 0.12% max: 0.22% x̄: 0.17% x̃: 0.17% 95% mean confidence interval for cycles value: -41.34 -29.98 95% mean confidence interval for cycles %-change: -0.95% -0.73% Cycles are helped. Fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 145296888 -> 145296346 (-0.0%) SENDs in all programs: 6863696 -> 6863696 (+0.0%) Loops in all programs: 38334 -> 38334 (+0.0%) Cycles in all programs: 8800262303 -> 8800258950 (-0.0%) Spills in all programs: 216880 -> 216880 (+0.0%) Fills in all programs: 334248 -> 334248 (+0.0%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>	2021-05-20 01:39:35 +00:00
Ian Romanick	adc2835646	nir/algebraic: Mark some more logic-joined comparison reductions as exact If the values are known to be numbers, the the replacements are exact. This is only applied to the patterns with constants. Constants should always be numbers, and shaders with NaN constants should be handled in a different way. No shader-db or fossil-db changes on any Intel platform. The intention is to make these patterns more future proof. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>	2021-05-20 01:39:35 +00:00
Ian Romanick	23bbf3932b	nir/algebraic: Mark some more comparison reductions exact Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> All Haswell and later Intel platforms had similar results. (Tiger Lake shown) total instructions in shared programs: 21049056 -> 21048939 (<.01%) instructions in affected programs: 4716 -> 4599 (-2.48%) helped: 39 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 3.00 x̃: 3 helped stats (rel) min: 0.99% max: 5.43% x̄: 2.80% x̃: 2.51% 95% mean confidence interval for instructions value: -3.46 -2.54 95% mean confidence interval for instructions %-change: -3.22% -2.38% Instructions are helped. total cycles in shared programs: 855141411 -> 855141159 (<.01%) cycles in affected programs: 54491 -> 54239 (-0.46%) helped: 28 HURT: 5 helped stats (abs) min: 2 max: 34 x̄: 12.82 x̃: 12 helped stats (rel) min: 0.06% max: 2.73% x̄: 0.94% x̃: 0.75% HURT stats (abs) min: 2 max: 52 x̄: 21.40 x̃: 6 HURT stats (rel) min: 0.11% max: 2.46% x̄: 0.90% x̃: 0.56% 95% mean confidence interval for cycles value: -13.72 -1.55 95% mean confidence interval for cycles %-change: -1.01% -0.31% Cycles are helped. Tiger Lake Instructions in all programs: 160902191 -> 160899554 (-0.0%) SENDs in all programs: 6812435 -> 6812435 (+0.0%) Loops in all programs: 38225 -> 38225 (+0.0%) Cycles in all programs: 7428581420 -> 7428555881 (-0.0%) Spills in all programs: 192582 -> 192582 (+0.0%) Fills in all programs: 304539 -> 304539 (+0.0%) A lot of fragment shaders in Shadow of the Tomb Raider were helped, and a bunch of vertex shaders in Octopath Traveler were hurt. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>	2021-05-20 01:39:35 +00:00
Ian Romanick	7d85dc4f35	nir/algebraic: Equality comparison inversions require sources be numbers v2: Update A630 expected image checksum for minetest.trace. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Tiger Lake total instructions in shared programs: 21036690 -> 21049485 (0.06%) instructions in affected programs: 852085 -> 864880 (1.50%) helped: 240 HURT: 2514 helped stats (abs) min: 1 max: 46 x̄: 2.45 x̃: 2 helped stats (rel) min: 0.15% max: 4.30% x̄: 0.79% x̃: 0.55% HURT stats (abs) min: 1 max: 198 x̄: 5.32 x̃: 2 HURT stats (rel) min: 0.06% max: 10.71% x̄: 1.48% x̃: 1.04% 95% mean confidence interval for instructions value: 4.14 5.15 95% mean confidence interval for instructions %-change: 1.23% 1.34% Instructions are HURT. total cycles in shared programs: 856045255 -> 855816220 (-0.03%) cycles in affected programs: 16743786 -> 16514751 (-1.37%) helped: 790 HURT: 1973 helped stats (abs) min: 1 max: 10766 x̄: 627.97 x̃: 18 helped stats (rel) min: <.01% max: 32.59% x̄: 3.01% x̃: 0.64% HURT stats (abs) min: 1 max: 4078 x̄: 135.36 x̃: 18 HURT stats (rel) min: <.01% max: 54.56% x̄: 2.80% x̃: 0.82% 95% mean confidence interval for cycles value: -131.36 -34.42 95% mean confidence interval for cycles %-change: 0.88% 1.40% Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree). total spills in shared programs: 9771 -> 9766 (-0.05%) spills in affected programs: 47 -> 42 (-10.64%) helped: 1 HURT: 0 total fills in shared programs: 9451 -> 9430 (-0.22%) fills in affected programs: 91 -> 70 (-23.08%) helped: 1 HURT: 0 LOST: 16 GAINED: 51 All Intel GPUs from Sandybridge through Ice Lake had similar results. (Ice Lake shown) total instructions in shared programs: 20024781 -> 20025568 (<.01%) instructions in affected programs: 103309 -> 104096 (0.76%) helped: 12 HURT: 389 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.20% max: 2.70% x̄: 1.36% x̃: 1.37% HURT stats (abs) min: 1 max: 8 x̄: 2.06 x̃: 1 HURT stats (rel) min: 0.05% max: 7.14% x̄: 1.25% x̃: 0.95% 95% mean confidence interval for instructions value: 1.78 2.15 95% mean confidence interval for instructions %-change: 1.06% 1.28% Instructions are HURT. total cycles in shared programs: 979419070 -> 979439180 (<.01%) cycles in affected programs: 4968711 -> 4988821 (0.40%) helped: 60 HURT: 381 helped stats (abs) min: 1 max: 1296 x̄: 96.92 x̃: 26 helped stats (rel) min: <.01% max: 27.10% x̄: 1.64% x̃: 0.65% HURT stats (abs) min: 1 max: 7320 x̄: 68.04 x̃: 30 HURT stats (rel) min: <.01% max: 19.77% x̄: 1.32% x̃: 0.87% 95% mean confidence interval for cycles value: 10.25 80.95 95% mean confidence interval for cycles %-change: 0.69% 1.15% Cycles are HURT. LOST: 1 GAINED: 2 GM45 and Iron Lake had similar results. (Iron Lake shown) total instructions in shared programs: 8128474 -> 8132527 (0.05%) instructions in affected programs: 642323 -> 646376 (0.63%) helped: 12 HURT: 1972 helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 4 helped stats (rel) min: 0.72% max: 1.72% x̄: 1.09% x̃: 0.83% HURT stats (abs) min: 1 max: 16 x̄: 2.07 x̃: 3 HURT stats (rel) min: 0.12% max: 7.14% x̄: 0.77% x̃: 0.70% 95% mean confidence interval for instructions value: 1.99 2.10 95% mean confidence interval for instructions %-change: 0.74% 0.79% Instructions are HURT. total cycles in shared programs: 238280994 -> 238294376 (<.01%) cycles in affected programs: 8841250 -> 8854632 (0.15%) helped: 84 HURT: 1192 helped stats (abs) min: 4 max: 64 x̄: 12.50 x̃: 8 helped stats (rel) min: 0.02% max: 1.61% x̄: 0.28% x̃: 0.17% HURT stats (abs) min: 2 max: 198 x̄: 12.11 x̃: 12 HURT stats (rel) min: 0.02% max: 8.03% x̄: 0.28% x̃: 0.14% 95% mean confidence interval for cycles value: 9.65 11.32 95% mean confidence interval for cycles %-change: 0.22% 0.27% Cycles are HURT. No fossil-db changes on any Intel platform. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>	2021-05-20 01:39:35 +00:00
Ian Romanick	4246c2869c	nir/algebraic: Invert comparisons less often This fixes the piglit test range_analysis_fsat_of_nan.shader_test. That test contains some code like o = saturate(X) > 0 ? vec4(1.0, 0.0, 0.0, 1.0) : vec4(0.0, 1.0, 0.0, 1.0); A clever optimizer will convert this to o = vec4(float(saturate(X) > 0), float(!(saturate(X) > 0)), 0, 1); Due to the ordering of optimizations in the compiler, the `saturate` operations are removed. This is safe even in the presense of NaN. o = vec4(float(X > 0), float(!(X > 0)), 0, 1); Since the calculations are not marked precise, an overzealous optimizer may reduce this to o = vec4(float(X > 0), float(X <= 0), 0, 1); This will result in black being output. The GLSL spec gives quite a bit of leeway with respect to NaN, but that seems too far. The shader author asked for a result of red or green. A result of black is still "undefined behavior," but it's also a little mean. This also enables CSE to do its job better. v2: Update A530 expected image checksum for minetest.trace. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4531 Fixes: `0dbda153aa` ("nir/algebraic: Flag inexact optimizations") Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Tiger Lake total instructions in shared programs: 21041563 -> 21041789 (<.01%) instructions in affected programs: 992066 -> 992292 (0.02%) helped: 526 HURT: 548 helped stats (abs) min: 1 max: 16 x̄: 2.48 x̃: 2 helped stats (rel) min: 0.04% max: 5.56% x̄: 0.74% x̃: 0.49% HURT stats (abs) min: 1 max: 27 x̄: 2.80 x̃: 2 HURT stats (rel) min: 0.04% max: 4.55% x̄: 0.59% x̃: 0.38% 95% mean confidence interval for instructions value: -0.00 0.42 95% mean confidence interval for instructions %-change: -0.12% <.01% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 855885569 -> 856118189 (0.03%) cycles in affected programs: 343637248 -> 343869868 (0.07%) helped: 907 HURT: 541 helped stats (abs) min: 1 max: 7724 x̄: 206.45 x̃: 36 helped stats (rel) min: <.01% max: 29.97% x̄: 1.01% x̃: 0.37% HURT stats (abs) min: 1 max: 14177 x̄: 776.09 x̃: 31 HURT stats (rel) min: <.01% max: 29.94% x̄: 1.24% x̃: 0.35% 95% mean confidence interval for cycles value: 84.30 237.00 95% mean confidence interval for cycles %-change: -0.32% -0.01% Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree). LOST: 3 GAINED: 5 Ice Lake total instructions in shared programs: 20027107 -> 20025352 (<.01%) instructions in affected programs: 1068856 -> 1067101 (-0.16%) helped: 1153 HURT: 273 helped stats (abs) min: 1 max: 14 x̄: 1.83 x̃: 1 helped stats (rel) min: 0.03% max: 5.66% x̄: 0.61% x̃: 0.35% HURT stats (abs) min: 1 max: 15 x̄: 1.29 x̃: 1 HURT stats (rel) min: 0.16% max: 1.30% x̄: 0.58% x̃: 0.60% 95% mean confidence interval for instructions value: -1.33 -1.13 95% mean confidence interval for instructions %-change: -0.43% -0.34% Instructions are helped. total cycles in shared programs: 979499227 -> 979448725 (<.01%) cycles in affected programs: 344261539 -> 344211037 (-0.01%) helped: 1079 HURT: 441 helped stats (abs) min: 1 max: 9384 x̄: 147.78 x̃: 48 helped stats (rel) min: <.01% max: 31.83% x̄: 0.90% x̃: 0.33% HURT stats (abs) min: 1 max: 7220 x̄: 247.07 x̃: 32 HURT stats (rel) min: <.01% max: 31.30% x̄: 1.52% x̃: 0.53% 95% mean confidence interval for cycles value: -70.01 3.56 95% mean confidence interval for cycles %-change: -0.35% -0.05% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 10564 -> 10568 (0.04%) spills in affected programs: 143 -> 147 (2.80%) helped: 0 HURT: 1 total fills in shared programs: 11343 -> 11347 (0.04%) fills in affected programs: 287 -> 291 (1.39%) helped: 0 HURT: 1 LOST: 3 GAINED: 2 Skylake total instructions in shared programs: 18192274 -> 18190128 (-0.01%) instructions in affected programs: 1000188 -> 998042 (-0.21%) helped: 1149 HURT: 55 helped stats (abs) min: 1 max: 14 x̄: 1.92 x̃: 1 helped stats (rel) min: 0.04% max: 6.67% x̄: 0.67% x̃: 0.42% HURT stats (abs) min: 1 max: 2 x̄: 1.05 x̃: 1 HURT stats (rel) min: 0.16% max: 0.55% x̄: 0.27% x̃: 0.26% 95% mean confidence interval for instructions value: -1.87 -1.69 95% mean confidence interval for instructions %-change: -0.67% -0.58% Instructions are helped. total cycles in shared programs: 960856054 -> 960728040 (-0.01%) cycles in affected programs: 340840968 -> 340712954 (-0.04%) helped: 1079 HURT: 233 helped stats (abs) min: 1 max: 7640 x̄: 170.95 x̃: 46 helped stats (rel) min: <.01% max: 30.20% x̄: 0.96% x̃: 0.28% HURT stats (abs) min: 1 max: 6864 x̄: 242.23 x̃: 26 HURT stats (rel) min: <.01% max: 34.64% x̄: 2.10% x̃: 0.22% 95% mean confidence interval for cycles value: -135.62 -59.53 95% mean confidence interval for cycles %-change: -0.59% -0.25% Cycles are helped. LOST: 15 GAINED: 1 Broadwell total instructions in shared programs: 17855624 -> 17853580 (-0.01%) instructions in affected programs: 1012209 -> 1010165 (-0.20%) helped: 1105 HURT: 52 helped stats (abs) min: 1 max: 13 x̄: 1.90 x̃: 1 helped stats (rel) min: 0.03% max: 6.67% x̄: 0.67% x̃: 0.36% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.13% max: 0.52% x̄: 0.26% x̃: 0.25% 95% mean confidence interval for instructions value: -1.86 -1.67 95% mean confidence interval for instructions %-change: -0.68% -0.58% Instructions are helped. total cycles in shared programs: 1029905447 -> 1029840699 (<.01%) cycles in affected programs: 347102680 -> 347037932 (-0.02%) helped: 1007 HURT: 211 helped stats (abs) min: 1 max: 1360 x̄: 89.76 x̃: 48 helped stats (rel) min: <.01% max: 16.26% x̄: 0.69% x̃: 0.25% HURT stats (abs) min: 1 max: 1297 x̄: 121.51 x̃: 20 HURT stats (rel) min: <.01% max: 31.31% x̄: 1.21% x̃: 0.20% 95% mean confidence interval for cycles value: -62.39 -43.92 95% mean confidence interval for cycles %-change: -0.47% -0.25% Cycles are helped. total spills in shared programs: 20335 -> 20333 (<.01%) spills in affected programs: 19 -> 17 (-10.53%) helped: 2 HURT: 0 total fills in shared programs: 25905 -> 25899 (-0.02%) fills in affected programs: 23 -> 17 (-26.09%) helped: 2 HURT: 0 LOST: 9 GAINED: 0 Haswell total instructions in shared programs: 16418516 -> 16417293 (<.01%) instructions in affected programs: 223785 -> 222562 (-0.55%) helped: 590 HURT: 67 helped stats (abs) min: 1 max: 15 x̄: 2.19 x̃: 1 helped stats (rel) min: 0.03% max: 6.52% x̄: 0.87% x̃: 0.60% HURT stats (abs) min: 1 max: 2 x̄: 1.04 x̃: 1 HURT stats (rel) min: 0.04% max: 1.85% x̄: 0.44% x̃: 0.25% 95% mean confidence interval for instructions value: -2.01 -1.71 95% mean confidence interval for instructions %-change: -0.80% -0.67% Instructions are helped. total cycles in shared programs: 1037179754 -> 1037084874 (<.01%) cycles in affected programs: 352541071 -> 352446191 (-0.03%) helped: 1093 HURT: 182 helped stats (abs) min: 1 max: 888 x̄: 111.03 x̃: 64 helped stats (rel) min: <.01% max: 27.30% x̄: 0.84% x̃: 0.20% HURT stats (abs) min: 1 max: 6777 x̄: 145.49 x̃: 21 HURT stats (rel) min: <.01% max: 24.10% x̄: 1.99% x̃: 0.29% 95% mean confidence interval for cycles value: -88.10 -60.73 95% mean confidence interval for cycles %-change: -0.58% -0.29% Cycles are helped. total spills in shared programs: 17457 -> 17456 (<.01%) spills in affected programs: 12 -> 11 (-8.33%) helped: 1 HURT: 0 total fills in shared programs: 20387 -> 20385 (<.01%) fills in affected programs: 15 -> 13 (-13.33%) helped: 1 HURT: 0 LOST: 6 GAINED: 1 Ivy Bridge and earlier platforms had similar results. (Ivy Bridge shown) total instructions in shared programs: 15515482 -> 15513998 (<.01%) instructions in affected programs: 239739 -> 238255 (-0.62%) helped: 573 HURT: 57 helped stats (abs) min: 1 max: 20 x̄: 2.73 x̃: 2 helped stats (rel) min: 0.03% max: 9.84% x̄: 0.94% x̃: 0.55% HURT stats (abs) min: 1 max: 2 x̄: 1.39 x̃: 1 HURT stats (rel) min: 0.09% max: 1.85% x̄: 0.52% x̃: 0.35% 95% mean confidence interval for instructions value: -2.57 -2.14 95% mean confidence interval for instructions %-change: -0.89% -0.73% Instructions are helped. total cycles in shared programs: 584509880 -> 584463152 (<.01%) cycles in affected programs: 11765280 -> 11718552 (-0.40%) helped: 661 HURT: 152 helped stats (abs) min: 1 max: 3073 x̄: 101.99 x̃: 32 helped stats (rel) min: <.01% max: 34.38% x̄: 1.46% x̃: 0.50% HURT stats (abs) min: 1 max: 6637 x̄: 136.10 x̃: 15 HURT stats (rel) min: <.01% max: 24.19% x̄: 1.75% x̃: 0.25% 95% mean confidence interval for cycles value: -82.79 -32.16 95% mean confidence interval for cycles %-change: -1.11% -0.61% Cycles are helped. LOST: 9 GAINED: 0 Tiger Lake Instructions in all programs: 160905127 -> 160900949 (-0.0%) SENDs in all programs: 6812418 -> 6812085 (-0.0%) Loops in all programs: 38225 -> 38225 (+0.0%) Cycles in all programs: 7431911114 -> 7433914697 (+0.0%) Spills in all programs: 192582 -> 192582 (+0.0%) Fills in all programs: 304539 -> 304537 (-0.0%) Ice Lake Instructions in all programs: 145296733 -> 145292370 (-0.0%) SENDs in all programs: 6863818 -> 6863485 (-0.0%) Loops in all programs: 38219 -> 38219 (+0.0%) Cycles in all programs: 8798257570 -> 8800204360 (+0.0%) Spills in all programs: 216880 -> 216880 (+0.0%) Fills in all programs: 334250 -> 334248 (-0.0%) Skylake Instructions in all programs: 135891485 -> 135887357 (-0.0%) SENDs in all programs: 6803031 -> 6802698 (-0.0%) Loops in all programs: 38216 -> 38216 (+0.0%) Cycles in all programs: 8442221881 -> 8444201959 (+0.0%) Spills in all programs: 194839 -> 194839 (+0.0%) Fills in all programs: 301116 -> 301114 (-0.0%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>	2021-05-20 01:39:35 +00:00
Ian Romanick	49177b9e2f	nir/algebraic: Tautology replacements require sources be numbers It seems worth the small amount of damage to give an extra cushion of not having to debug problems later. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> All Intel platforms had similar results. (Tiger Lake shown) total instructions in shared programs: 21043197 -> 21043359 (<.01%) instructions in affected programs: 4409 -> 4571 (3.67%) helped: 0 HURT: 25 HURT stats (abs) min: 1 max: 16 x̄: 6.48 x̃: 5 HURT stats (rel) min: 0.39% max: 15.38% x̄: 4.59% x̃: 4.40% 95% mean confidence interval for instructions value: 4.37 8.59 95% mean confidence interval for instructions %-change: 2.93% 6.26% Instructions are HURT. total cycles in shared programs: 856175986 -> 856176921 (<.01%) cycles in affected programs: 58908 -> 59843 (1.59%) helped: 0 HURT: 25 HURT stats (abs) min: 7 max: 70 x̄: 37.40 x̃: 38 HURT stats (rel) min: 0.27% max: 5.63% x̄: 1.87% x̃: 1.39% 95% mean confidence interval for cycles value: 31.11 43.69 95% mean confidence interval for cycles %-change: 1.35% 2.39% Cycles are HURT. No fossil-db changes on any Intel platform. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>	2021-05-20 01:39:35 +00:00
Ian Romanick	d69ba58644	nir/algebraic: Remove some optimizations of comparisons with fsat When most of these patterns were created, we believed, incorrectly, that fsat(NaN) was NaN. We have since realized that fsat(NaN) is zero. Originally, this changed the patterns to use is_a_number. This didn't help any shaders, so it's easier to just drop the optimizations. This commit crossed paths with `4c3ad4d065` ("nir/algebraic: mark more optimization with fsat(NaN) as inexact") and `bc123c396a` ("nir/algebraic: mark some optimizations with fsat(NaN) as inexact"). Given that these don't impact very many shaders, it seems safer to just remove them. As discussed in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8716, I tried modifying these patterns to use !(b cmp a). Unfortunately, on Intel GPUs, the results were much worse than just removing the patterns altogether. Some other related patterns will be addressed in later commits. There are still a number of patterns that use the identity fsat(1-X) == 1 - fsat(X). If X is NaN, the former is zero while the latter is 1.0. I haven't evaluted these patterns yet. If changes are needed in these patterns, it should be a separate commit anyway. v2: Replace arrow `=>` with `->` in comments because the `=>` looks a lot like `<=` comparison. Suggested by Rhys. Fixes: `92b75c126b` ("nir/algebraic: Replace checks that a value is between (or not) [0, 1]") Fixes: `a7f0c57673` ("nir/algebraic: Eliminate useless fsat() on operand of comparison w/value in (0, 1)") Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> All Intel hardware had similar results. (Ice Lake shown) total instructions in shared programs: 20029060 -> 20029670 (<.01%) instructions in affected programs: 69236 -> 69846 (0.88%) helped: 0 HURT: 263 HURT stats (abs) min: 1 max: 20 x̄: 2.32 x̃: 1 HURT stats (rel) min: 0.30% max: 11.11% x̄: 1.35% x̃: 0.98% 95% mean confidence interval for instructions value: 1.86 2.78 95% mean confidence interval for instructions %-change: 1.18% 1.52% Instructions are HURT. total cycles in shared programs: 979821278 -> 979834425 (<.01%) cycles in affected programs: 1476848 -> 1489995 (0.89%) helped: 49 HURT: 204 helped stats (abs) min: 1 max: 812 x̄: 102.31 x̃: 20 helped stats (rel) min: 0.01% max: 21.43% x̄: 2.23% x̃: 0.52% HURT stats (abs) min: 2 max: 2600 x̄: 89.02 x̃: 16 HURT stats (rel) min: 0.04% max: 27.27% x̄: 1.49% x̃: 0.72% 95% mean confidence interval for cycles value: 13.18 90.75 95% mean confidence interval for cycles %-change: 0.29% 1.25% Cycles are HURT. No fossil-db changes. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10012>	2021-05-20 01:39:35 +00:00
Jesse Natalie	d7ca0319d7	nir: Add relaxed 24bit opcodes These are equivalent to the 32bit opcodes if there are no more efficient 24bit opcodes available, but inputs are guaranteed to already be 24bit, so the 24bit opcodes can be used instead if they exist and are efficient. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10549>	2021-05-05 22:06:42 +00:00
Gert Wollny	a199697642	nir/opt_algebraic: optimizations for add umax/umin with zero For unsigned comparisons with zero these ops can be eliminated. v2: Add comparison optimizations with -1 (Rhys Perry) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net> (v1) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10583>	2021-05-04 09:33:32 +02:00
Jesse Natalie	3c8bcdc863	nir: Add a new opcode for [un]packing doubles HLSL doesn't support bitcasting a 64bit integer to a double. DXIL doesn't have generic pack/unpack instructions, so we lower those to integer bitwise ops. As a result, NIR generic double pack/unpack would require our backend to emit a bitcast to get a double, but we want to match HLSL semantics and emit MakeDouble/SplitDouble. Adding a dedicated opcode for double pack/unpack allows us to add a pass to emit that instead, which lets our backend emit the right instruction to pack and unpack doubles. Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10063>	2021-04-09 01:54:33 +00:00
Gert Wollny	0f5b3c37c5	nir: Add opcodes for fused comp + csel and optimizations Some backends, like r600 support a fused version of int and float compare against zero and and csel. Adding these opcodes here makes it possible to optimize this in nir. v2: Add rules for float compare + csel Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9452>	2021-03-22 15:19:46 +01:00
Gert Wollny	a5747f8ab3	nir: add opcodes for find_msb_rev and lowering Some hardware supports a version of find_msb where the bits are counted starting at the high bit, and this needs some lowering to obtain the value that is expected by find_msb Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9452>	2021-03-22 15:19:46 +01:00

1 2 3 4 5 ...

408 Commits