KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Marius Hillenbrand	c5d6e57e42	llvmpipe: Use lp_build_round_arch on IBM Z (s390x) LLVM has all the required intrinsics available on IBM Z, so use them for rounding operations (they will be implemented as a single instruction). This change makes the test case lp_test_arit pass, because it avoids using the buggy generic code. v2: update .gitlab-ci/cross-xfail-s390x to reflect passing lp_test_arit Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13927>	2021-11-23 17:49:02 +00:00
Dave Airlie	b9aee98912	gallivm: use pmulhrsw to make aos sampling more accurate. This uses pmulhrsw avx2 and ssse3 variants. It fixes the precision of texture filtering calculations. However it does leave these paths inaccurate on platforms that don't support it. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13075>	2021-09-29 21:55:23 +00:00
Dave Airlie	0d3b285360	gallivm: use llvm intrinsics for 16-bit round/trunc/roundeven Otherwise the inf translations don't seem to work, and the VK CTS fails Fixes VK CTS dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic* Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11816>	2021-09-16 04:15:41 +00:00
Dave Airlie	836b0ace10	gallivm/nir: handle 16-bit exp/lod using intrinsics. This just passes the 16-bit float versions to the llvm intrinsics Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11816>	2021-09-16 04:15:41 +00:00
Dave Airlie	6decb1b896	gallivm: add 16-bit sin/cos via llvm intrinsic Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11816>	2021-09-16 04:15:41 +00:00
Erik Faye-Lund	08a12feb6e	gallivm: use lp_build_log2_safe for pow lp_build_log2 isn't robust enough to handle special cases for pow, so let's use lp_build_log2_safe instead. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11800>	2021-08-05 17:18:57 +00:00
Ian Romanick	2d5b64818f	gallivm: Remove unused GALLIVM_NAN_RETURN_NAN In the review, Roland says, "I think the unused nan behaviors was there just for completeness, so it can easily go." Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10532>	2021-05-04 00:13:34 +00:00
Ian Romanick	61624934f6	gallivm: Use GALLIVM_NAN_RETURN_OTHER_SECOND_NONNAN for norm clamping Since the second source is always a constant that is known to be a number, this should have the same performance as GALLIVM_NAN_BEHAVIOR_UNDEFINED. A lofty goal is to eventually remove GALLIVM_NAN_BEHAVIOR_UNDEFINED. There's still a lot of (mostly implicit) users, and I don't feel like tackling that right now. :) Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10532>	2021-05-04 00:13:34 +00:00
Dave Airlie	6adbf6c86c	llvmpipe: add reduction mode support Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9423>	2021-03-12 16:02:25 +10:00
Rob Clark	a9618e7c42	util: Add accessor for util_cpu_caps In release builds, there should be no change, but in debug builds the assert will help us catch undefined behavior resulting from using util_cpu_caps before it is initialized. With fix for u_half_test for MSVC from Jesse Natalie squashed in. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9266>	2021-02-26 18:31:19 +00:00
Adam Jackson	10bcf25766	gallivm: Silence a warning at -Og ../src/gallium/auxiliary/gallivm/lp_bld_arit.c: In function ‘lp_build_round_arch’: ../src/gallium/auxiliary/gallivm/lp_bld_arit.c:2042:7: warning: ‘intrinsic_root’ may be used uninitialized in this function [-Wmaybe-uninitialized] 2042 \| lp_format_intrinsic(intrinsic, sizeof intrinsic, intrinsic_root, bld->vec_type); Can't happen, mark it unreachable. Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8724>	2021-02-18 20:59:43 +00:00
Dave Airlie	9845c1636c	gallivm: add support for 8/16-bit mul_hi This 32x32 code only needs small tweaks for this case. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7309>	2020-10-26 11:03:51 +10:00
Dave Airlie	d12cdc9374	gallivm: fix pow(0, y) to be 0 The log2(0) was producing bad results. Fixes: piglit pow tests on llvmpipe. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6907>	2020-09-29 21:51:22 +00:00
Dave Airlie	b31e8460a6	gallivm/nir: allow 64-bit arit ops Fixes: dEQP-VK.glsl.builtin.precision_double.round.* dEQP-VK.glsl.builtin.precision_double.roundeven.* dEQP-VK.glsl.builtin.precision_double.trunc.* Reviewed-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6381>	2020-08-31 13:20:11 +10:00
Roland Scheidegger	045f05a2f6	gallivm: Fix saturated signed psub/padd intrinsics on llvm 8 LLVM 8 did remove both the signed and unsigned sse2/avx intrinsics in the end, and provide arch-independent llvm intrinsics instead. Fixes a crash when using snorm framebuffers (tested with piglit arb_color_buffer_float-render GL_RGBA8_SNORM -auto). Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Dave Airlie <airlied@redhat.com> CC: <mesa-stable@lists.freedesktop.org>	2019-10-17 17:42:16 +02:00
Adam Jackson	96b592696f	gallium: Require LLVM >= 3.9 To go any further than this would be to break the current version of Android. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2019-09-11 17:00:43 +00:00
Adam Jackson	9abf7d5755	gallium: Require LLVM >= 3.6 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2019-09-11 17:00:43 +00:00
Adam Jackson	4fdd455eeb	gallium: Require LLVM >= 3.4 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2019-09-11 17:00:43 +00:00
Eric Engestrom	ba73564b52	gallivm: drop LLVM<3.3 code paths as no build system allows that Suggested-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-09-06 22:26:29 +01:00
Eric Engestrom	1c1c477470	gallivm: replace more complex 3.x version check with LLVM_VERSION_MAJOR/MINOR Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Michel Dänzer <mdaenzer@redhat.com>	2019-09-06 22:26:29 +01:00
Eric Engestrom	08890068c5	gallivm: replace major llvm version checks with LLVM_VERSION_MAJOR Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Michel Dänzer <mdaenzer@redhat.com>	2019-09-06 22:26:29 +01:00
Roland Scheidegger	332b21db55	gallivm: use fallback code for mul_hi with llvm >= 7.0 LLVM 7.0 ditched the pmulu intrinsics. This is only a trivial patch to use the fallback code instead. It'll likely produce atrocious code since the pattern doesn't match what llvm itself uses in its autoupgrade paths, hence the pattern won't be recognized. Should fix https://bugs.freedesktop.org/show_bug.cgi?id=111496 Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-29 16:55:49 +02:00
Jose Fonseca	3573412981	gallivm: Improve lp_build_rcp_refine. Use the alternative more accurate expression from https://en.wikipedia.org/wiki/Division_algorithm#Newton%E2%80%93Raphson_division v2: Use lp_build_fmuladd as suggested by Roland Tested by enabling this code path, and running lp_test_arit. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-06-28 11:48:12 +01:00
Roland Scheidegger	dded2edf8b	gallivm: fix saturated signed add / sub with llvm 9 llvm 8 removed saturated unsigned add / sub x86 sse2 intrinsics, and now llvm 9 removed the signed versions as well - they were proposed for removal earlier, but the pattern to recognize those was very complex, so it wasn't done then. However, instead of these arch-specific intrinsics, there's now arch-independent intrinsics for saturated add / sub, both for signed and unsigned, so use these. They should have only advantages (work with arbitrary vector sizes, optimal code for all archs), although I don't know how well they work in practice for other archs (at least for x86 they do the right thing). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110454 Reviewed-by: Brian Paul <brianp@vmware.com>	2019-04-17 17:42:13 +02:00
Matt Turner	70a7ece035	gallivm: Return true from arch_rounding_available() if NEON is available LLVM uses the single instruction "FRINTI" to implement llvm.nearbyint. Fixes the rounding tests of lp_test_arit. Bug: https://bugs.gentoo.org/665570 Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-01-24 11:07:24 -08:00
Matt Turner	2d48d5116b	gallivm: Use nextafterf(0.5, 0.0) as rounding constant The common truncf(x + 0.5) fails for the floating-point value just less than 0.5 (nextafterf(0.5, 0.0)). nextafterf(0.5, 0.0) + 0.5, after rounding is 1.0, thus truncf does not produce the desired value. The solution is to add nextafterf(0.5, 0.0) instead of 0.5 before truncating. This works for all values. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-11-28 11:22:47 -08:00
Roland Scheidegger	8e1be9a34a	gallivm: don't use saturated unsigned add/sub intrinsics for llvm 8.0 These have been removed. Unfortunately auto-upgrade doesn't work for jit. (Worse, it seems we don't get a compilation error anymore when compiling the shader, rather llvm will just do a call to a null function in the jitted shaders making it difficult to detect when intrinsics vanish.) Luckily the signed ones are still there, I helped convincing llvm removing them is a bad idea for now, since while the unsigned ones have sort of agreed-upon simplest patterns to replace them with, this is not the case for the signed ones, and they require _significantly_ more complex patterns - to the point that the recognition is IMHO probably unlikely to ever work reliably in practice (due to other optimizations interfering). (Even for the relatively trivial unsigned patterns, llvm already added test cases where recognition doesn't work, unsaturated add followed by saturated add may produce atrocious code.) Nevertheless, it seems there's a serious quest to squash all cpu-specific intrinsics going on, so I'd expect patches to nuke them as well to resurface. Adapt the existing fallback code to match the simple patterns llvm uses and hope for the best. I've verified with lp_test_blend that it does produce the expected saturated assembly instructions. Though our cmp/select build helpers don't use boolean masks, but it doesn't seem to interfere with llvm's ability to recognize the pattern. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106231 Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2018-08-24 07:50:13 +02:00
Ian Romanick	d76c204d05	util: Move util_is_power_of_two to bitscan.h and rename to util_is_power_of_two_or_zero The new name make the zero-input behavior more obvious. The next patch adds a new function with different zero-input behavior. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2018-03-29 14:09:23 -07:00
Roland Scheidegger	b5957cee92	llvmpipe: fix snorm blending The blend math gets a bit funky due to inverse blend factors being in range [0,2] rather than [-1,1], our normalized math can't really cover this. src_alpha_saturate blend factor has a similar problem too. (Note that piglit fbo-blending-formats test is mostly useless for anything but unorm formats, since not just all src/dst values are between [0,1], but the tests are crafted in a way that the results are between [0,1] too.) v2: some formatting fixes, and fix a fairly obscure (to debug) issue with alpha-only formats (not related to snorm at all), where blend optimization would think it could simplify the blend equation if the blend factors were complementary, however was using the completely unrelated rgb blend factors instead of the alpha ones... Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2017-11-21 04:06:29 +01:00
Tim Rowley	0023b5ae67	gallivm: allow arch rounding with avx512 Fixes piglit vs-roundeven-{float,vec[234]} with simd16 VS. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2017-11-02 10:24:54 -05:00
Roland Scheidegger	52b73caaf4	gallivm: don't use pabs intrinsic with llvm version >= 6 The intrinsic is gone, causing shader compilation to crash. While here, also change the fallback code to match what llvm's auto-updater of these intrinsics would do (except that there will still be zext/trunc instructions in there), which should ensure that the sequence gets recognized and fused back into a pabs in the end (I didn't test this, and it's possible even the old sequence would get recognized, but I don't see a reason why we shouldn't use the same sequence in any case). Tested-by: Vinson Lee <vlee@freedesktop.org>	2017-10-07 00:54:09 +02:00
Roland Scheidegger	f4df21ed95	gallivm: don't try to use fast rcp for fdiv The use of fast rcp instruction is disabled, and will always fall back to use a division instead (1 / x). Hence, if we get a division opcode, it doesn't make much sense trying to split that into rcp/mul. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2017-01-24 19:12:46 +01:00
Nicolai Hähnle	b46a9c570f	gallivm: fix [IU]MUL_HI regression harder The fix in commit `88f791db75` was insufficient for radeonsi because the vector case was not handled properly. It seems piglit only covers the scalar case, unfortunately. Fixes GL45-CTS.shader_bitfield_operation.[iu]mulExtended.* Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-11-10 13:17:10 +01:00
Nicolai Hähnle	88f791db75	gallivm: fix [IU]MUL_HI regression This patch does two things: 1. It separates the host-CPU code generation from the generic code generation. This guards against accidently breaking things for radeonsi in the future. 2. It makes sure we actually use both arguments and don't just compute a square :-p Fixes a regression introduced by commit `29279f44b3` Cc: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-11-08 16:25:54 +01:00
Roland Scheidegger	29279f44b3	gallivm: introduce 32x32->64bit lp_build_mul_32_lohi function This is used by shader umul_hi/imul_hi functions (and soon by draw). It's actually useful separating this out on its own, however the real reason for doing it is because we're using an optimized sse2 version, since the code llvm generates is atrocious (since there's no widening mul in llvm, and it does not recognize the widening mul pattern, so it generates code for real 64x64->64bit mul, which the cpu can't do natively, in contrast to 32x32->64bit mul which it could do). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-11-08 03:41:26 +01:00
Roland Scheidegger	6f2f0daeb4	gallivm: Use native packs and unpacks for the lerps For the texturing packs, things looked pretty terrible. For every lerp, we were repacking the values, and while those look sort of cheap with 128bit, with 256bit we end up with 2 of them instead of just 1 but worse, plus 2 extracts too (the unpack, however, works fine with a single instruction, albeit only with llvm 3.8 - the vpmovzxbw). Ideally we'd use more clever pack for llvmpipe backend conversion too since we actually use the "wrong" shuffle (which is more work) when doing the fs twiddle just so we end up with the wrong order for being able to do native pack when converting from 2x8f -> 1x16b. But this requires some refactoring, since the untwiddle is separate from conversion. This is only used for avx2 256bit pack/unpack for now. Improves openarena scores by 8% or so, though overall it's still pretty disappointing how much faster 256bit vectors are even with avx2 (or rather, aren't...). And, of course, eliminating the needless packs/unpacks in the first place would eliminate most of that advantage (not quite all) from this patch. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-10-19 01:44:59 +02:00
José Fonseca	e088390c7d	gallivm: Basic AVX2 support. v2: pblendb -> pblendvb Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-10-04 23:36:20 +01:00
Roland Scheidegger	b0cf99165a	gallivm: don't use integer min/max sse intrinsics with llvm >= 3.9 Apparently, these are deprecated. There's some AutoUpgrade feature which is supposed to promote these to cmp/select, which apparently doesn't work with jit code. It is possible it's not actually even meant to work (see the bug filed against llvm which couldn't provide an answer neither) but in any case this is meant to be only temporary unless the intrinsics are really illegal. So, just use the fallback code (which should be cmp/select, we're actually doing cmp/sext/trunc/select, but in any case llvm 3.9 manages to optimize this back to pmin/pmax in the end). This addresses https://llvm.org/bugs/show_bug.cgi?id=28176 CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Tested-by: Vinson Lee <vlee@freedesktop.org> Tested-by: Aaron Watry <awatry@gmail.com>	2016-06-20 17:19:03 +02:00
Jose Fonseca	2b4cee0571	gallivm: Never emit llvm.fmuladd on LLVM 3.3. Besides the old JIT bug, it seems the X86 backend on LLVM 3.3 doesn't handle llvm.fmuladd and instead it fall backs to a C function. Which in turn causes a segfault on Windows. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-06-10 16:17:04 +01:00
Jose Fonseca	320d1191c6	gallivm: Use llvm.fmuladd.*. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-06-10 13:47:35 +01:00
Roland Scheidegger	9247570d42	gallivm: eliminate a unnecessary AND with unorm lerps Instead of doing a add and then mask out the upper bits, we can simply do a add with a half wide type (this, of course, assumes the hw can actually do it...), so we'll get the required zero in the upper bits automatically. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-05-27 19:11:28 +02:00
Brian Paul	e522a76226	gallivm: s/Elements/ARRAY_SIZE/ Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-04-27 10:23:19 -06:00
Roland Scheidegger	bd07e20d20	gallivm: make sampling more robust against bogus coordinates Some cases (especially these using fract for coord wrapping) did not handle NaNs (or Infs) correctly - the following code assumed the fract result could not be outside [0,1], but if the input is a NaN (or +-Inf) the fract result was NaN - which then could produce out-of-bound offsets. (Note that the explicit NaN behavior changes for min/max on x86 sse don't result in actual changes in the generated jit code, but may on other architectures. Found by looking through all the wrap functions.) This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94955 No piglit changes. (v2: fix min/max typo in coord_mirror, add comment) Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org> Tested-by: Bruce Cherniak <bruce.cherniak@intel.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-04-26 04:55:37 +02:00
Jose Fonseca	9586468c03	gallivm: Workaround LLVM PR 27332. The credit for finding and isolating this bug goes to Vinson and Roland. The buggy LLVM versions were found by doing opt -instcombine llvm-pr27332.ll > /dev/null where llvm-pr27332.ll is the IR from https://llvm.org/bugs/show_bug.cgi?id=27332#c3 Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-04-13 16:42:55 +01:00
Roland Scheidegger	cb438d8b3e	gallivm: use llvm.nearbyint instead of llvm.round. We used to use sse roundps intrinsic directly, but switched to use the llvm intrinsics for rounding with `e4f01da15d`. However, llvm semantics follows standard math lib round function which is specced to do roundNearestAwayFromZero but we really want roundNearestEven (moreoever, using round generates atrocious code since the cpu can't do it directly and it results in scalar calls to libm __roundf). So, use llvm.nearbyint instead, which does exactly the right thing, and even has the advantage of being available with llvm 3.3 too. (I've verified it actually generates a roundps instruction with llvm 3.3.) This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94909 Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2016-04-13 11:13:03 +01:00
Jose Fonseca	7ad49daca6	gallivm: Introduce lp_format_intrinsic. For adding .v4f32 like suffixes to intrinsics, taking special care for scalar case, which was being often neglected. This fixes invalid IR when doing mipmap filtering on SSE2 (the only case where we'd use intrinsics with scalars.) Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-04-04 00:06:09 +01:00
Jose Fonseca	a293f57e13	gallivm: Use llvm.fabs. Exactly the same code. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-04-03 22:09:09 +01:00
Jose Fonseca	e4f01da15d	gallivm: Prefer backend agnostic intrinsic for rounding. We could unconditionally use these instrinsics, but performance with SSE2 would suck, as LLVM falls back to calling libm. lp_test_arit. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-04-03 22:09:07 +01:00
Edward O'Callaghan	147fd00bb3	gallium/auxiliary: Trivial code style cleanup Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2015-12-06 17:10:22 +01:00
Marek Olšák	0c805b6240	gallivm: add LLVMAttribute parameter to lp_build_intrinsic This will help remove some duplicated code from radeon. Reviewed-by: Dave Airlie <airlied@redhat.com>	2015-07-31 16:49:16 +02:00

1 2 3 4

162 Commits