KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Emma Anholt	f6c5b1d6c6	nir: Split usub_sat lowering flag from uadd_sat. Intel vec4 would like to do uadd_sat, but use lowering for usub_sat. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17637>	2022-07-22 17:54:28 +00:00
Georg Lehmann	aac8ddae2f	nir/opt_algebraic: Optimize [ui](add\|sub)_sat with 0. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17468>	2022-07-13 07:34:09 +00:00
Rhys Perry	bc1ea2fda9	nir/algebraic: optimize bcsel(c, fsin/cos_amd(a), fsin/cos_amd(b)) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10587>	2022-07-07 22:18:08 +00:00
Ian Romanick	a2a2fbc510	nir/algebraic: Fix NaN-unsafe fcsel patterns For example, the proof for this pattern (('bcsel', ('flt', 'a@32', 0), 'b@32', 'c@32'), ('fcsel_ge', a, c, b)), would be bcsel(a < 0, b, c) bcsel(!(a < 0), c, b) bcsel(a >= 0, c, b) fcsel_ge(a, c, b) However, !(a < 0) => (a >= 0) is well known to produce different results if `a` is NaN. Instead of that replacement, use this replacement: bcsel(a < 0, b, c) bcsel(-0 < -a, b, c) bcsel(0 < -a, b, c) fcsel_gt(-a, b, c) This is NaN-safe and exact. Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Fixes: `0f5b3c37c5` ("nir: Add opcodes for fused comp + csel and optimizations") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17048>	2022-06-22 19:26:59 +00:00
Georg Lehmann	bfc25d6ec9	nir: Add optional lowering for mul_32x16. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13895>	2022-06-01 17:09:25 +00:00
Jason Ekstrand	836ff4b586	nir/algebraic: Add two more pack/unpack rules Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16591>	2022-05-23 14:10:54 +00:00
Gert Wollny	3749a6ecd2	nir: honor lower_double options for ffloor and ffract v2: Don't lower ffloor@64 to ffract@64 when both ops are to be lowered. Settle on ffloor in opt_algebraic because in can be lowered to other ops in lower_double_ops. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>(v1) Jason Ekstrand <jason.ekstrand@collabora.com> (v1) Reviewed-by: Emma Anholt <emma@anholt.net> (v1) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16431>	2022-05-16 15:03:05 +00:00
Georg Lehmann	bc5c68fc08	nir/opt_algebraic: Optimize Doom Eternal's word extract by LSB. Foz-db GFX10_3: Totals from 419 (0.31% of 134913) affected shaders: CodeSize: 4126032 -> 4121756 (-0.10%) Instrs: 783608 -> 782541 (-0.14%) Latency: 7889664 -> 7888521 (-0.01%); split: -0.02%, +0.00% InvThroughput: 1315690 -> 1314863 (-0.06%); split: -0.06%, +0.00% VClause: 11826 -> 11830 (+0.03%) SClause: 27736 -> 27734 (-0.01%) Copies: 50493 -> 50428 (-0.13%); split: -0.13%, +0.01% PreSGPRs: 23264 -> 23265 (+0.00%) Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16436>	2022-05-12 17:10:41 +00:00
Jason Ekstrand	df1876f615	nir: Mark negative re-distribution on fadd as imprecise Otherwise, it would mutate `fneg(fadd(-0, 0))` into `fadd(0, -0)` which isn't correct since -0 + (+0) = +0 + (-0) = +0. This fixes the OpenCL contraction tests on Iris. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16041>	2022-05-12 00:05:10 +00:00
Georg Lehmann	60c9a45562	nir/opt_algebraic: Simple xor/ishr optimizations. The first pattern here removes the xor-swap pattern. Foz-DB GFX10_3: Totals from 305 (0.23% of 134913) affected shaders: CodeSize: 1589040 -> 1585164 (-0.24%) Instrs: 284344 -> 283375 (-0.34%) Latency: 4205148 -> 4198472 (-0.16%); split: -0.16%, +0.00% InvThroughput: 708745 -> 708739 (-0.00%) Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16411>	2022-05-10 19:29:31 +00:00
Georg Lehmann	66e917fff6	nir/opt_algebraic: Fix mask in shift by constant combining. The comment above is correct, but the code to calculate the mask was broken. No Foz-db changes outside of noise. Fixes: `0e6581b87d` ("nir/algebraic: Reassociate shift-by-constant of shift-by-constant") Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15990>	2022-05-10 18:47:21 +00:00
Karol Herbst	a2c9e1cb50	nir: add 16 and 64 bit fisnormal lowering Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16206>	2022-04-28 18:36:52 +00:00
Gert Wollny	47d3f7c69f	nir: Don't optimize to 64 bit fsub if the driver doesn't support it Fixes: `a4840e15ab` r600: Use nir-to-tgsi instead of TGSI when the NIR debug opt is disabled. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16130>	2022-04-27 00:01:20 +00:00
Jason Ekstrand	1755730362	nir: Lower all bit sizes of usub_borrow It's not clear why this is restricted to 32-bit besides that being the only bit size where GLSL has an intrinsic for this. All drivers that set this probably want it lowered for all bit sizes as far as I can tell. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6353 Fixes: `8a3e344180` ("nir/opt_algebraic: Fix some expressions with ambiguous bit sizes") Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16146>	2022-04-25 21:27:09 +00:00
Emma Anholt	e4aa5f7889	nir: Skip fround_even on already-integral values. Just like the other make-the-float-an-integer opcodes. Noticed in a gallium nine shader run through TGSI-to-NIR, where the array index had been floored by the user, but got implicitly rounded by DX9 array indexing. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15870>	2022-04-16 13:07:09 -07:00
Emma Anholt	6947016b46	nir: Add lowering for fround_even on r300. When we put NIR in the compiler stack for r300, indirect addressing broke for gallium nine. DX's array indirects round the float value, so the DX shader gets mapped to a TGSI "ARR ADDR[0] src.x" instruction. Translating that to NIR maps to r0[f2i32(fround(src.x))]. While we might hope that in translation back using nir-to-tgsi after optimization we would recognize the construct and emit ARR again, that's going to be error prone (think "what if src.x is in a NIR register?") so we need a fallback plan. r300 will be able to handle this lowering, so get it in place first to fix the regression. Fixes: #6297 Fixes: `7d2ea9b0ed` ("r300: Request NIR shaders from mesa/st and use NIR-to-TGSI.") Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15870>	2022-04-16 13:07:09 -07:00
Georg Lehmann	16be909936	nir: Add an option to lower 64bit iadd_sat. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15421>	2022-03-28 20:02:52 +00:00
Georg Lehmann	922916bf64	nir: Move lower_usub_sat64 to nir_lower_int64_options. Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15421>	2022-03-28 20:02:52 +00:00
Ian Romanick	06eb9fb125	nir/algebraic: Optimize some cases of (sXX(a, b) != 0.0) I noticed the SGE case while looking at the output of shaders/closed/steam/trine-2/fp-3.shader_test on i915g. These are especially bad on i915 that needs two instructions to implement SNE. An alternative would be to duplicate the sne(sXX(a, b), 0.0) rules in an algebraic pass that occurs after bool_to_float. Doing the work earlier seems preferable. i915 total instructions in shared programs: 788274 -> 788223 (<.01%) instructions in affected programs: 666 -> 615 (-7.66%) helped: 5 HURT: 0 helped stats (abs) min: 9 max: 12 x̄: 10.20 x̃: 9 helped stats (rel) min: 5.00% max: 11.11% x̄: 8.12% x̃: 8.16% 95% mean confidence interval for instructions value: -12.24 -8.16 95% mean confidence interval for instructions %-change: -10.81% -5.43% Instructions are helped. LOST: 0 GAINED: 2 The two gained shaders are assembly fragment programs in Euro Truck Simulator 2. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15210>	2022-03-03 00:07:58 +00:00
Bas Nieuwenhuizen	d1530a3f3b	Revert "nir/algebraic: distribute fmul(fadd(a, b), c) when b and c are constants" This reverts commit `a1af902531`. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5423 Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14532>	2022-01-21 16:58:11 +00:00
Rhys Perry	495debebad	nir/algebraic: optimize expressions using fmulz/ffmaz Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>	2022-01-20 22:54:42 +00:00
Rhys Perry	f2fbba7920	nir/algebraic: optimize open-coded fmulz/ffmaz This pattern will be found in future versions of D3D9 DXVK. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>	2022-01-20 22:54:42 +00:00
Rhys Perry	312a284980	nir/algebraic: add ignore_exact() wrapper Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>	2022-01-20 22:54:42 +00:00
Rhys Perry	7f05ea3793	nir: add nir_op_fmulz and nir_op_ffmaz Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13436>	2022-01-20 22:54:42 +00:00
Connor Abbott	9c9e8c3349	nir: Reorder ffma and fsub combining It's relatively common to do something like "a * b - c", which on most GPUs can be implemented in a single instruction. Before opt_algebraic_late this will be something like "fadd(fmul(a, b), fneg(c))", and we want to turn it info "ffma(a, b, fneg(c))". But because the fsub pattern was first we instead turned it into "fsub(fmul(a, b), c)". Fix this by reordering them. Selected shader-db results on freedreno: total instructions in shared programs: 1561330 -> 1551619 (-0.62%) instructions in affected programs: 780272 -> 770561 (-1.24%) helped: 1941 HURT: 491 helped stats (abs) min: 1 max: 147 x̄: 7.98 x̃: 4 helped stats (rel) min: 0.07% max: 30.77% x̄: 4.36% x̃: 3.17% HURT stats (abs) min: 1 max: 307 x̄: 11.76 x̃: 5 HURT stats (rel) min: 0.09% max: 18.71% x̄: 2.26% x̃: 1.38% 95% mean confidence interval for instructions value: -4.57 -3.41 95% mean confidence interval for instructions %-change: -3.21% -2.84% Instructions are helped. total nops in shared programs: 358926 -> 356263 (-0.74%) nops in affected programs: 167116 -> 164453 (-1.59%) helped: 1395 HURT: 859 helped stats (abs) min: 1 max: 108 x̄: 6.80 x̃: 3 helped stats (rel) min: 0.17% max: 100.00% x̄: 19.15% x̃: 10.57% HURT stats (abs) min: 1 max: 307 x̄: 7.95 x̃: 3 HURT stats (rel) min: 0.00% max: 381.82% x̄: 20.04% x̃: 10.00% 95% mean confidence interval for nops value: -1.77 -0.59 95% mean confidence interval for nops %-change: -5.55% -2.87% Nops are helped. total non-nops in shared programs: 1202404 -> 1195356 (-0.59%) non-nops in affected programs: 496682 -> 489634 (-1.42%) helped: 1951 HURT: 265 helped stats (abs) min: 1 max: 39 x̄: 4.02 x̃: 3 helped stats (rel) min: 0.07% max: 15.38% x̄: 2.97% x̃: 1.96% HURT stats (abs) min: 1 max: 22 x̄: 2.97 x̃: 2 HURT stats (rel) min: 0.05% max: 10.00% x̄: 1.14% x̃: 0.75% 95% mean confidence interval for non-nops value: -3.38 -2.99 95% mean confidence interval for non-nops %-change: -2.60% -2.36% Non-nops are helped. total systall in shared programs: 288317 -> 292975 (1.62%) systall in affected programs: 87876 -> 92534 (5.30%) helped: 388 HURT: 431 helped stats (abs) min: 1 max: 214 x̄: 14.39 x̃: 8 helped stats (rel) min: 0.25% max: 100.00% x̄: 22.12% x̃: 11.96% HURT stats (abs) min: 1 max: 232 x̄: 23.77 x̃: 7 HURT stats (rel) min: 0.00% max: 1300.00% x̄: 51.71% x̃: 17.30% 95% mean confidence interval for systall value: 3.07 8.30 95% mean confidence interval for systall %-change: 9.49% 23.97% Systall are HURT. (The systall hurt is probably just due to having having fewer instructions to hide latency with.) Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14554>	2022-01-18 17:44:50 +00:00
Danylo Piliaiev	b8d486f298	nir/algebraic: Separate has_dot_4x8 into has_sdot_4x8 and has_udot_4x8 Adreno GPUs has native instruction for unsigned and mixed dot_4x8 but not signed dot product. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13986>	2022-01-10 13:20:39 +02:00
Daniel Schürmann	17ecd0b31a	nir/opt_algebraic: lower fneg_hi/lo to fmul This pattern, found in the FSR upscaling shader, helps the vectorization efforts by keeping the chain of vectorized instructions intact. Radeon can optimize it to per-component fneg modifiers. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13688>	2021-12-21 13:23:37 +01:00
Rhys Perry	403ae3b48e	nir/algebraic: optimize more 64-bit imul with constant source Two 64-bit shifts and an addition are usually faster than the several multiplications nir_lower_int64 creates. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14227>	2021-12-17 18:51:24 +00:00
Rhys Perry	a2d8c5b26d	nir/algebraic: optimize a*#b & -4 fossil-db (Sienna Cichlid): Totals from 611 (0.47% of 128647) affected shaders: CodeSize: 3096680 -> 3090976 (-0.18%) Instrs: 570494 -> 569249 (-0.22%) Latency: 5765865 -> 5759619 (-0.11%) InvThroughput: 969840 -> 967608 (-0.23%) VClause: 9690 -> 9688 (-0.02%) Copies: 42884 -> 42894 (+0.02%); split: -0.01%, +0.03% PreVGPRs: 28290 -> 28288 (-0.01%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13752>	2021-12-03 13:41:07 +00:00
Rhys Perry	12294026d5	nir/algebraic: optimize Cyberpunk 2077's open-coded bitfieldReverse() fossil-db (Sienna Cichlid): Totals from 9 (0.01% of 128647) affected shaders: CodeSize: 29900 -> 28640 (-4.21%) Instrs: 5677 -> 5443 (-4.12%) Latency: 96561 -> 95025 (-1.59%) Copies: 571 -> 544 (-4.73%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13673>	2021-11-05 09:31:04 +00:00
Jason Ekstrand	b71bdc3404	nir/algebraic: Add some opts for comparisons of comparisons Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13167>	2021-10-07 18:21:11 +00:00
Jason Ekstrand	7abf3955ca	nir/algebraic: Add some boolean optimizations Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13167>	2021-10-07 18:21:11 +00:00
Jason Ekstrand	c8b2be0b95	nir/algebraic: Lower fisfinite Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13167>	2021-10-07 18:21:11 +00:00
Ian Romanick	cb28361642	nir/algebraic: Small optimizations for SpvOpFOrdNotEqual and SpvOpFUnordEqual No shader-db changes on any Intel platform. Fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 144380118 -> 143692823 (-0.5%) SENDs in all programs: 6920822 -> 6920822 (+0.0%) Loops in all programs: 38299 -> 38299 (+0.0%) Cycles in all programs: 8434782176 -> 8423078994 (-0.1%) Spills in all programs: 206830 -> 204469 (-1.1%) Fills in all programs: 318737 -> 313660 (-1.6%) Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12320>	2021-10-06 01:53:47 +00:00
Rhys Perry	a1af902531	nir/algebraic: distribute fmul(fadd(a, b), c) when b and c are constants This allows for more MAD/FMA instructions to be created. fossil-db (Sienna Cichlid): Totals from 50134 (33.46% of 149839) affected shaders: VGPRs: 2436536 -> 2436000 (-0.02%); split: -0.05%, +0.03% SpillSGPRs: 13136 -> 13135 (-0.01%); split: -0.02%, +0.02% CodeSize: 206621424 -> 206278292 (-0.17%); split: -0.23%, +0.07% MaxWaves: 1116804 -> 1117448 (+0.06%); split: +0.07%, -0.01% Instrs: 38977460 -> 38862886 (-0.29%); split: -0.33%, +0.04% Latency: 832425389 -> 827432260 (-0.60%); split: -0.63%, +0.03% InvThroughput: 184193457 -> 183563350 (-0.34%); split: -0.37%, +0.03% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7458>	2021-09-17 17:28:26 +00:00
Rhys Perry	41ecef7855	nir: add sdot_2x16 and udot_2x16 opcodes Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>	2021-09-03 13:21:27 +00:00
Rhys Perry	ae00f5af61	nir: separate lower_add_sat Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12617>	2021-09-03 13:21:27 +00:00
Samuel Pitoiset	cff106c4b6	nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(fabs(b), -a) and fmin(-fmax(b, a)) to fmin(-fabs(b), -a). fossils-db (Sienna Cichlid): Totals from 34 (0.02% of 150170) affected shaders: CodeSize: 388540 -> 387748 (-0.20%) Instrs: 74621 -> 74423 (-0.27%) Latency: 1039407 -> 1039011 (-0.04%) InvThroughput: 208364 -> 208150 (-0.10%) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12519>	2021-08-25 07:18:24 +02:00
Ian Romanick	a6db40605e	nir/algebraic: Add some extract optimizations These help quite a bit when vectored versions of SpvOpSDotKHR and friends are emitted as packed versions and then lowered. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Ian Romanick	839495efc6	nir/algebraic: Add lowering for dot_4x8 instructions v2: Fix copy-and-paste bugs in lowering patterns. v3: Add has_sudot_4x8 flag. Requested by Rhys. v4: Since the names of the opcodes changed from dp4 to dot_4x8, also change the names of the lowering helpers. Suggested by Jason. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Ian Romanick	806cd2341c	nir/algebraic: Basic patterns for dot_4x8 v2: Add and modify patterns to let constant folding do better. v3: Remove '(is_not_zero)' from the patterns that try to combine addends. I honestly don't know why I had it there in the first place, and nothing in my deep git logs could help clue me in. Noticed by Alyssa. Remover patterns that detect open-coded udot_4x8. Suggested by Alyssa and Jason. Add missing sudot_4x8 patterns. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12142>	2021-08-24 19:58:57 +00:00
Daniel Schürmann	2cf164feb9	nir/opt_algebraic: optimize flrp(fadd, fadd, x) only if fadd are used_once Totals from 201 (0.13% of 150170) affected shaders: (GFX10.3) VGPRs: 13880 -> 13856 (-0.17%) CodeSize: 1517328 -> 1518124 (+0.05%); split: -0.04%, +0.10% MaxWaves: 3184 -> 3192 (+0.25%) Instrs: 285487 -> 285569 (+0.03%); split: -0.06%, +0.08% Latency: 7774066 -> 7780877 (+0.09%); split: -0.10%, +0.19% InvThroughput: 1936341 -> 1935287 (-0.05%); split: -0.07%, +0.02% SClause: 11446 -> 11448 (+0.02%); split: -0.01%, +0.03% Copies: 17500 -> 17506 (+0.03%); split: -0.51%, +0.55% Branches: 8174 -> 8180 (+0.07%); split: -0.13%, +0.21% PreVGPRs: 12507 -> 12427 (-0.64%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12061>	2021-08-24 16:10:30 +00:00
Samuel Pitoiset	f4b858e746	Revert "nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(b, -a)" This is wrong for negative values. This reverts commit `07cd30ca29`. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12515>	2021-08-24 08:58:38 +00:00
Samuel Pitoiset	07cd30ca29	nir/opt_algebraic: optimize fmax(-fmin(b, a), b) -> fmax(b, -a) Found with Cyberpunk 2077. fossils-db (GFX10.3): Totals from 128 (2.34% of 5465) affected shaders: CodeSize: 769720 -> 767656 (-0.27%); split: -0.27%, +0.00% Instrs: 145748 -> 145229 (-0.36%) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11604>	2021-08-23 17:53:38 +00:00
Ian Romanick	f0a8a9816a	nir: intel/compiler: Add and use nir_op_pack_32_4x8_split A lot of CTS tests write a u8vec4 or an i8vec4 to an SSBO. This results in a lot of shifts and MOVs. When that pattern can be recognized, the individual 8-bit components can be packed much more efficiently. v2: Rebase on `b4369de27f` ("nir/lower_packing: use shader_instructions_pass") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>	2021-08-18 22:03:37 +00:00
Ian Romanick	89f639c0ca	nir/algebraic: Remove spurious conversions from inside logic ops Not only does this eliminate a bunch of unnecessary type converting MOVs, but it can also enable some SWAR. The dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag test does something about like: c = a.x ^ b.x; d = a.y ^ b.y; e = a.z ^ b.z; After this change, it looks more like: uint t = i8vec3AsUint(a) ^ i8vec3AsUint(b); c = extract_u8(t, 0); d = extract_u8(t, 1); e = extract_u8(t, 2); On Ice Lake, this results in: SIMD8 shader: 41 instructions. 1 loops. 3804 cycles. 0:0 spills:fills, 5 sends SIMD8 shader: 31 instructions. 1 loops. 2844 cycles. 0:0 spills:fills, 5 sends Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>	2021-08-18 22:03:37 +00:00
Ian Romanick	a147717a93	nir/algebraic: Optimize some extract forms resulting from 8-bit lowering This eliminates some spurious, size-converting moves. For example, on Ice Lake this helps dEQP-VK.spirv_assembly.type.vec3.i8.bitwise_xor_frag: SIMD8 shader: 56 instructions. 1 loops. 4444 cycles. 0:0 spills:fills, 5 sends SIMD8 shader: 52 instructions. 1 loops. 4164 cycles. 0:0 spills:fills, 5 sends v2: Condition two of the patterns on !options->lower_extract_byte. Suggested by Lionel. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>	2021-08-18 22:03:37 +00:00
Rhys Perry	ed70b256ce	nir: add ffma creation helpers Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>	2021-08-16 17:19:45 +00:00
Rhys Perry	4ec4d862c2	nir/algebraic: add is_used_once to dot product reassociation optimization This improves register usage. fossil-db (Sienna Cichlid, on top of !9805): Totals from 4317 (2.88% of 149839) affected shaders: VGPRs: 352592 -> 351704 (-0.25%); split: -1.48%, +1.23% SpillSGPRs: 182 -> 248 (+36.26%) CodeSize: 31601192 -> 31587624 (-0.04%); split: -0.09%, +0.04% MaxWaves: 56964 -> 57298 (+0.59%); split: +2.48%, -1.90% Instrs: 5973557 -> 5974122 (+0.01%); split: -0.05%, +0.06% Latency: 72088175 -> 72253033 (+0.23%); split: -0.36%, +0.59% InvThroughput: 14978160 -> 14798919 (-1.20%); split: -1.29%, +0.09% VClause: 100994 -> 98645 (-2.33%); split: -3.05%, +0.73% SClause: 278206 -> 276820 (-0.50%); split: -0.54%, +0.04% Copies: 200264 -> 199556 (-0.35%); split: -1.17%, +0.82% Branches: 86410 -> 85930 (-0.56%); split: -0.56%, +0.01% PreSGPRs: 207355 -> 207759 (+0.19%); split: -0.00%, +0.20% PreVGPRs: 314646 -> 310911 (-1.19%); split: -1.35%, +0.17% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>	2021-08-16 17:19:45 +00:00
Rhys Perry	f95a16be72	nir/algebraic: reassociate add chains for more MAD/FMA-friendly code fossil-db (GFX10.3): Totals from 25866 (17.68% of 146267) affected shaders: VGPRs: 1625456 -> 1644936 (+1.20%); split: -0.05%, +1.24% SpillSGPRs: 11729 -> 11725 (-0.03%); split: -0.07%, +0.03% CodeSize: 161604460 -> 161458052 (-0.09%); split: -0.11%, +0.02% MaxWaves: 454842 -> 452160 (-0.59%); split: +0.04%, -0.63% Instrs: 30652596 -> 30456446 (-0.64%); split: -0.65%, +0.01% Latency: 723098749 -> 722084247 (-0.14%); split: -0.21%, +0.07% InvThroughput: 166023468 -> 165506875 (-0.31%); split: -0.36%, +0.05% fossil-db (GFX10): Totals from 25866 (17.68% of 146267) affected shaders: VGPRs: 1593576 -> 1611976 (+1.15%); split: -0.09%, +1.25% SpillSGPRs: 11729 -> 11725 (-0.03%); split: -0.07%, +0.03% CodeSize: 162294468 -> 162154456 (-0.09%); split: -0.11%, +0.02% MaxWaves: 477448 -> 474166 (-0.69%); split: +0.10%, -0.79% Instrs: 30820164 -> 30625805 (-0.63%); split: -0.65%, +0.02% Latency: 723190249 -> 722273445 (-0.13%); split: -0.20%, +0.08% InvThroughput: 163114872 -> 162582966 (-0.33%); split: -0.37%, +0.04% fossil-db (GFX9): Totals from 25866 (17.67% of 146401) affected shaders: SGPRs: 2167808 -> 2169920 (+0.10%); split: -0.09%, +0.19% VGPRs: 1649404 -> 1667592 (+1.10%); split: -0.43%, +1.53% CodeSize: 161273556 -> 161281996 (+0.01%); split: -0.07%, +0.08% MaxWaves: 114910 -> 113519 (-1.21%); split: +0.10%, -1.31% Instrs: 31557180 -> 31403708 (-0.49%); split: -0.50%, +0.02% Latency: 899594793 -> 898786283 (-0.09%); split: -0.19%, +0.10% InvThroughput: 412265691 -> 411551698 (-0.17%); split: -0.28%, +0.11% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8056>	2021-08-16 17:19:45 +00:00

1 2 3 4 5 ...

433 Commits