KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
José Fonseca	2eddf91faf	gallivm: Workaround http://llvm.org/PR18600 We have code generation paths that carry out swizzles of AoS vectors via bitwise shifts, as these tend to generate more efficient code than straightforward byte shuffles. But when the input is a constant the additional bitwise arithmetic operations somehow don't really get constant propagated properly, evenutally causing assertion failure in InstCombine pass. Therefore avoid the bug by using the trivial shuffles for constant inputs. Although the sample LLVM IR can cause a crash with any LLVM version, this was only seen in practice with LLVM 3.2. Reviewed-by: Matthew McClure <mcclurem@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-01-28 14:27:27 +00:00
José Fonseca	8771285054	s/Tungsten Graphics/VMware/ Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the old copyright name is creating unnecessary confusion, hence this change. This was the sed script I used: $ cat tg2vmw.sed # Run as: # # git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 \| xargs -0 sed -i -f tg2vmw.sed # # Rename copyrights s/Tungsten Gra$ph\\|hp$ics,\? [iI]nc\.\?$, Cedar Park$\?$, Austin$\?$, \(Texas\\|TX$\)\?\.\?/VMware, Inc./g /Copyright/s/Tungsten Graphics$,\? [iI]nc\.$\?$, Cedar Park$\?$, Austin$\?$, \(Texas\\|TX$\)\?\.\?/VMware, Inc./ s/TUNGSTEN GRAPHICS/VMWARE/g # Rename emails s/alanh@tungstengraphics.com/alanh@vmware.com/ s/jens@tungstengraphics.com/jowen@vmware.com/g s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/ s/jrfonseca\?@tungstengraphics.com/jfonseca@vmware.com/g s/keithw\?@tungstengraphics.com/keithw@vmware.com/g s/michel@tungstengraphics.com/daenzer@vmware.com/g s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/ s/zack@tungstengraphics.com/zackr@vmware.com/ # Remove dead links s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g # C string src/gallium/state_trackers/vega/api_misc.c s/"Tungsten Graphics, Inc"/"VMware, Inc"/ Reviewed-by: Brian Paul <brianp@vmware.com>	2014-01-17 20:00:32 +00:00
Zack Rusin	93b953d139	llvmpipe: do constant buffer bounds checking in shaders It's possible to bind a smaller buffer as a constant buffer, than what the shader actually uses/requires. This could cause nasty crashes. This patch adds the architecture to pass the maximum allowable constant buffer index to the jit to let it make sure that the constant buffer indices are always within bounds. The behavior follows the d3d10 spec, which says the overflow should always return all zeros, and overflow is only defined as access beyond the size of the currently bound buffer. Accesses beyond the declared shader constant register size are not considered an overflow and expected to return garbage but consistent garbage (we follow the behavior which some wlk tests expect which is to return the actual values from the bound buffer). Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-01-16 16:33:57 -05:00
Roland Scheidegger	27d47bd42f	gallivm: fix pointer type for stmxcsr/ldmxcsr The argument is a i8 pointer not a i32 pointer (even though the value actually stored/loaded IS i32). Older llvm versions didn't care but 3.2 and newer do leading to crashes. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-12-14 17:11:03 +01:00
Zack Rusin	155139059b	llvmpipe: fix blending with half-float formats The fact that we flush denorms to zero breaks our half-float conversion and blending. This patches enables denorms for blending. It's a little tricky due to the llvm bug that makes it incorrectly reorder the mxcsr intrinsics: http://llvm.org/bugs/show_bug.cgi?id=6393 Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Zack Rusin <zackr@vmware.com>	2013-12-10 16:39:48 -05:00
Roland Scheidegger	2983c039df	gallium: new shader cap bit for the amount of sampler views Ever since introducing separate sampler and sampler view max this was really missing. Every driver but llvmpipe reports the same number as number of samplers for now, so nothing should break. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-11-28 04:02:18 +01:00
Vinson Lee	7f56780915	gallivm: Ignore unknown file type in non-debug builds. Fixes "Uninitialized pointer read" defect reported by Coverity. Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-11-20 22:35:36 -08:00
Si Chen	e7a5905d8a	gallivm: Fix mask calculation for emit_kill_if. The exec_mask must be taken in consideration, just like emit_kill above. The tgsi_exec module has the same bug and should be fixed in a future change. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-11-19 19:16:18 +00:00
José Fonseca	a29e40a423	gallivm: Compile flag to debug TGSI execution through printfs. It is similar to tgsi_exec.c's DEBUG_EXECUTION compile flag. I had prototyped this for a while while debugging an issue, but finally cleaned this up and added a few more bells and whistles. v2: Use '$' as marker; better output. Thanks to Brian, Zack and Roland reviews. Here is a sample output. CONST[0].x = 0.00625000009 0.00625000009 0.00625000009 0.00625000009 CONST[0].y = -0.00714285718 -0.00714285718 -0.00714285718 -0.00714285718 CONST[0].z = -1 -1 -1 -1 CONST[0].w = 1 1 1 1 IN[0].x = 143.5 175.5 175.5 143.5 IN[0].y = 123.5 123.5 155.5 155.5 IN[0].z = 0 0 0 0 IN[0].w = 1 1 1 1 $ 1: RCP TEMP[0].w, IN[0].wwww TEMP[0].w = 1 1 1 1 $ 2: MAD TEMP[0].xy, IN[0], CONST[0], CONST[0].zwzw TEMP[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976 TEMP[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316 $ 3: MUL OUT[0].xy, TEMP[0], TEMP[0].wwww OUT[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976 OUT[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316 $ 4: MUL OUT[0].z, IN[0].zzzz, TEMP[0].wwww OUT[0].z = 0 0 0 0 $ 5: MOV OUT[0].w, TEMP[0] OUT[0].w = 1 1 1 1 $ 6: END OUT[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976 OUT[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316 OUT[0].z = 0 0 0 0 OUT[0].w = 1 1 1 1	2013-11-14 14:04:28 +00:00
Roland Scheidegger	754319490f	gallivm,llvmpipe: fix float->srgb conversion to handle NaNs d3d10 requires us to convert NaNs to zero for any float->int conversion. We don't really do that but mostly seems to work. In particular I suspect the very common float->unorm8 path only really passes because it relies on sse2 pack intrinsics which just happen to work by luck for NaNs (float->int conversion in hw gives integer indeterminate value, which just happens to be -0x80000000 hence gets converted to zero in the end after pack intrinsics). However, float->srgb didn't get so lucky, because we need to clamp before blending and clamping resulted in NaN behavior being undefined (and actually got converted to 1.0 by clamping with sse2). Fix this by using a zero/one clamp with defined nan behavior as we can handle the NaN for free this way. I suspect there's more bugs lurking in this area (e.g. converting floats to snorm) as we don't really use defined NaN behavior everywhere but this seems to be good enough. While here respecify nan behavior modes a bit, in particular the return_second mode didn't really do what we wanted. From the caller's perspective, we really wanted to say we need the non-nan result, but we already know the second arg isn't a NaN. So we use this now instead, which means that cpu architectures which actually implement min/max by always returning non-nan (that is adhering to ieee754-2008 rules) don't need to bend over backwards for nothing. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-11-14 12:24:55 +00:00
Roland Scheidegger	ea1f7d2894	gallivm: deduplicate some indirect register address code There's only one minor functional change, for immediates the pixel offsets are no longer added since the values are all the same for all elements in any case (it might be better if those weren't stored as soa vectors in the first place maybe). Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-11-08 03:38:32 +01:00
Roland Scheidegger	b35ea09349	gallivm: fix indirect addressing of inputs We weren't adding the soa offsets when constructing the indices for the gather functions. That meant that we were always returning the data in the first element. (Copied straight from the same fix for temps.) While here fix up a couple of broken comments in the fetch functions, plus don't name a straight float type float4 which is just confusing. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-11-06 18:20:54 +01:00
Roland Scheidegger	5ae31d7e1d	gallivm: optimize lp_build_minify for sse SSE can't handle true vector shifts (with variable shift count), so llvm is turning them into a mess of extracts, scalar shifts and inserts. It is however possible to emulate them in lp_build_minify with float muls, which should be way faster (saves over 20 instructions per 8-wide lp_build_minify). This wouldn't work for "generic" 32bit shifts though since we've got only 24bits of mantissa (actually for left shifts it would work by using sse41 int mul instead of float mul but not for right shifts). Note that this has very limited scope for now, since this is only used with per-pixel lod (otherwise we're avoiding the non-constant shift count by doing per-quad shifts manually), and only 1d textures even then (though the latter should change). Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-11-05 23:32:24 +01:00
Vinson Lee	749cb89097	gallivm: Remove llvm::DisablePrettyStackTrace for LLVM >= 3.4. LLVM 3.4 r193971 removed llvm::DisablePrettyStackTrace and made the pretty stack trace opt-in rather than opt-out. The default value of DisablePrettyStackTrace has changed to true in LLVM 3.4 and newer. Signed-off-by: Vinson Lee <vlee@freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60929 Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-11-04 18:22:04 -08:00
Roland Scheidegger	2b2fc03beb	gallivm: implement fully accurate corner filtering for seamless cube maps d3d10 requires that cube corners are filtered with accurate weights (that is, the weight of the non-existing corner texel should be evenly distributed to the other 3 texels). OpenGL does not require this (but recommends it). This requires us to use different filtering code, since we need per-texel weights which our 2d lerp doesn't (and can't) do. And of course the (now per element) weights need to be adjusted too for it to work. Invoke the new filtering code whenever there's an edge to keep things simpler, as it will work for edges too not just corners but of course it's only needed with corners. More ugly code for not much gain but at least a hacked up cubemap demo shows very nice corners now... Not sure yet if and how this should be configurable... v2: incorporate feedback from Jose, only use special corner filtering code when there's a corner not when there's only an edge (as corner filtering code is slower, though a perf difference was only measureable when always forcing edge code). Plus some minor style fixes. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-10-25 01:29:14 +02:00
Roland Scheidegger	3bdd1074e1	gallivm: implement seamless cube filtering For seamless cube filtering it is necessary to determine new faces and new coords per sample. The logic for this is _seriously_ complex (what needs to happen is very "asymmetric" wrt face, x/y under/overflow), further complicated by the fact that if the 4 samples are in a corner (meaning we only have actually 3 samples, and all 3 are on different faces) then falling off the edge is happening _both_ on x and y axis simultaneously. There was a noticeable performance hit in mesa's cubemap demo when seamless filtering was forced on (just below 10 percent or so in a debug build, when disabling all filtering hacks, otherwise it would probably be a bit more) and when always doing the logic, hence use a branch which it only does it if any of the pixels in a quad (or in two quads) actually hit this. With that there was no measurable performance hit in the cubemap demo (neither in a debug nor release buidl), but this will vary (cubemap demo very rarely hits edges). Might also be different on other cpus, as this forces SoA sampling path which potentially can be quite a bit slower. Note that as for corners, this code gets all the 3 samples which actually exist right, and the 4th texel will simply be the same as one of the others, meaning that filter weights will be a bit wrong. This however should be enough for full OpenGL (but not d3d10) compliance. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-10-21 15:42:04 +02:00
Roland Scheidegger	9b3dbaf396	gallivm: kill old per-quad face selection code Not used since ages, and it wouldn't work at all with explicit derivatives now (not that it did before as it ignored them but now the code would just use the derivs pre-projected which would be quite random numbers). v2: also get rid of 3 helper functions no longer used. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-10-10 04:32:57 +02:00
Roland Scheidegger	47d0613eb7	gallivm: handle explicit derivatives for cubemaps They need some special handling. Quite complicated. Additionally, use the same code for implicit derivatives too if no_rho_approx and no_quad_lod is set, because it seems while generally it should be ok to use per quad lod for implicit derivatives there's at least some test which insists that in case of cubemaps the shared lod value MUST come from a pixel inside the primitive (due to the derivatives becoming different if a different larger major axis is chosen). v2: based on Brian's feedback, clean up code a bit. And use sign bit of major axis instead of pre-select s/t/r sign for coord mirroring (which should be the same in the end, saves 2 ands). Also fix two bugs with select/mirror of derivatives, the minor axes need to use major axis sign as well (instead of major derivative axis sign), and don't mistakenly use absolute values of major derivative and inverse major values. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-10-10 04:32:57 +02:00
Roland Scheidegger	ce1d8634aa	gallivm: ignore rho approximation for cube maps There's two reasons for this: 1) even when ignoring rho approximation for cube maps, the result is still not correct, but it's better as the max error at edges is now sqrt(2) instead of 2 (which was a full mip level), same as it is for ordinary 2d maps when doing rho approximations (so the error actually goes from factor 2 at edges and sqrt(2) completely inside a face to sqrt(2) at edges and 0 inside a face). 2) I want to repurpose rho_no_approx for cubemaps for fully correct cubemap derivatives (so don't need yet another debug var). Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-10-10 04:32:57 +02:00
Zack Rusin	87fe4a33d3	llvmpipe: implement 64 bit mul opcodes in llvmpipe Both the imul_hi and umul_hi are working with this patch. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-10-09 18:30:27 -04:00
Zack Rusin	c01c6a95b4	gallivm: support printing of 64 bit integers only 8 and 32 bit integers were supported before. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-10-09 18:29:05 -04:00
Roland Scheidegger	532dc8939f	gallivm: adjust wrap mode to CLAMP_TO_EDGE always for cube maps. Technically without seamless filtering enabled GL allows any wrap mode, which made sense when supporting true borders (can get seamless effect with border and CLAMP_TO_BORDER), but gallium doesn't support borders and d3d9 requires wrap modes to be ignored and it's a pain to fix up the sampler state (as it makes it texture dependent). It is difficult to imagine a situation where an app really wants another behavior so just cheat here. (It looks like some graphics hw (intel) actually requires this too hence it should be safe.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-09-19 17:14:36 +02:00
Roland Scheidegger	93b5f71179	gallivm: some bits of seamless cube filtering implementation Simply adjust wrap mode to clamp_to_edge. This is all that's needed for a correct implementation for nearest filtering, and it's way better than using repeat wrap for instance for linear filtering (though obviously this doesn't actually do seamless filtering). v2: fix s/t wrap not r/s... Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-09-18 00:00:37 +02:00
José Fonseca	315f8f17d0	llvmpipe: Remove the special path for TGSI_OPCODE_EXP. It was wrong for EXP.y, as we clamped the source before computing the fractional part, and this opcode should be rarely used, so it's not worth the hassle.	2013-09-12 11:24:24 +01:00
Zack Rusin	e9f1f6ab42	gallivm: support indirect registers on both dimensions We support indirect addressing only on the vertex index, but some shaders also use indirect addressing on attributes. This patch adds support for indirect addressing on both dimensions inside gs arrays. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-09-06 15:05:27 -04:00
Roland Scheidegger	f37edb5e20	gallivm: handle unbound textures in texture sampling / texture queries Turns out we don't need to do much extra work for detecting this case, since we are guaranteed to get a empty static texture state in this case, hence just rely on format being 0 and return all zero then. Previously needed dummy textures (would just have crashed on format being 0 otherwise) which cannot return the correct result for size queries and when sampling textures with wrap modes using border. As a bonus should hugely increase performance when sampling unbound textures - too bad it isn't a useful feature :-). Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-30 23:20:03 +02:00
Roland Scheidegger	289faa7e23	gallivm: (trivial) don't pass sampler_unit variable down to filtering funcs The only reason this was needed was because the fetch texel function had to get the (dynamic) border color, but this is now done much earlier. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-30 23:20:03 +02:00
Roland Scheidegger	61add3cc3c	gallivm: don't use AoS path if min/mag filter are different with multiple lods Instead of enhancing the AoS path so it can deal with it, just use SoA. Fixing AoS path wouldn't be all that difficult (use all the same logic as SoA) but considered not worth it for now. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-30 23:20:03 +02:00
Roland Scheidegger	a479f34025	gallivm: support per-pixel min/mag filter in SoA path Since we can have per-pixel lod we should also honor the filter per-pixel (in fact we didn't honor it per quad neither in the multiple quad case). Do this by running the linear path and simply beating the weights into shape (the sample with the higher weight is the one which should have been chosen with nearest filtering hence adjust filter weight to 1.0/0.0 based on that). If all pixels use nearest filter (either min and mag) then still run just a nearest filter as this is way cheaper (probably around 4 times faster for 2d, more for 3d case) and it should be relatively rare that pixels really need different filtering. OTOH if all pixels would require linear don't do anything special since the linear path with filter adjustments shouldn't really be all that much more expensive than ordinary linear, and we think it's rare that min/mag filters are configured differently so there doesn't seem much value in trying to optimize this further. This does not yet fix the AoS path (though currently AoS is only used for single quads hence it could be considered less broken, just never honoring per-pixel filter decision but doing it per quad). v2: simplify code a bit (unify min linear and min nearest cases) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-30 02:16:45 +02:00
Roland Scheidegger	81cfcdbd87	gallivm: don't calculate square root of rho if we use accurate rho method While a sqrt here and there shouldn't hurt much (depending on the cpu) it is possible to completely omit it since rho is only used for calculating lod and there log2(x) == 0.5*log2(x^2). Depending on the exact path taken for calculating lod this means we get a simple mul instead of sqrt (in case of nearest mip filter in fact we don't need to replace the sqrt with something else at all), only in some not very useful path this doesn't work (combined brilinear calculation of int level and fractional lod, accurate rho calc but brilinear filtering seems odd). Apart from being faster as an added bonus this should increase our crappy fractional accuracy of lod, since fast_log2 is only good for ~3bits and this should increase accuracy by one bit (though not used if dimension is just one as we'd need an extra mul there as we never had the squared rho in the first place). v2: use separate ilog2_sqrt function if we have squared rho. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-30 02:16:45 +02:00
Roland Scheidegger	10e40ad11d	gallivm: refactor num_lods handling This is just preparation for per-pixel (or per-quad in case of multiple quads) min/mag filter since some assumptions about number of miplevels being equal to number of lods no longer holds true. This change does not change behavior yet (though theoretically when forcing per-element path it might be slower with different min/mag filter since the code will respect this setting even when there's no mip maps now in this case, so some lod calcs will be done per-element just ultimately still the same filter used for all pixels). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-30 02:16:45 +02:00
Roland Scheidegger	ad9b5b9ae9	gallivm: fix min/mag switchover point for nearest/none mip filter Previously, the min/mag switchover point when using nearest/none mip filter was effectively -0.5 which can't be right. Looks like new OpenGL thinks it's ok if it's always 0.0 (older versions required 0.5 in some cases), let's hope everybody else thinks that's fine too. Refactor this slightly and get the per-quad/per-pixel min/mag decision values further down to sampling, though still only the first component is used yet. While here also fix code trying to skip lod bias application etc. when mipfilter is none, as this is still needed for determining min/mag filter. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-23 23:46:28 +02:00
Roland Scheidegger	bd0b6c5180	gallivm: do per-element lod for lod bias and explicit derivs too Except for explicit derivs with cube maps which are very bogus anyway. Just like explicit lod this is only used if no_quad_lod is set in GALLIVM_DEBUG env var. Minification is terrible on cpus which don't support true vector shifts (but should work correctly). Cannot do the min/mag filter decision (if they are different) per pixel though, only selecting different mip levels works. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-22 19:05:52 +02:00
Roland Scheidegger	33694a1800	gallivm: (trivial) fix int/uint border color clamping Just a copy & paste error. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=68409. Note that the test passing before probably simply means it doesn't verify clamping of the border color itself as required by the OpenGL spec. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-22 19:05:52 +02:00
Roland Scheidegger	6ff9008544	gallivm: (trivial) fix linear aos sampling of 3d compressed formats block size depth is always 1 even for compressed formats (unless someone invents true 3d compressed formats at least which we can't represent). Nearest (and soa) path had it right. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-22 19:05:52 +02:00
José Fonseca	fb62388d6a	gallium: Support PIPE_FORMAT_R10G10B10A2_UINT. Same as PIPE_FORMAT_B10G10R10A2_UINT but without the swizzling. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-08-22 12:14:15 +01:00
Roland Scheidegger	e6013e4bee	gallivm: unify sin and cos implementation The (complicated!) math is all identical, there's just minimal differences how sign bit is calculated plus there's an additional subtraction for the argument going into the polynomial for cos. The logic stays 100% the same (with a small exception, sign bit calculation for sin is minimally simplified, applying sign mask after xoring the arguments instead of applying it to each argument). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-21 22:05:53 +02:00
Roland Scheidegger	275d2efeed	gallivm: add comment for bogus min/mag filter selection with nearest mip filter Detected this hunting some other bug, not sure if it really needs fixing but it is definitely wrong. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-21 22:05:52 +02:00
Roland Scheidegger	21d8fa2759	gallivm: fix rho calculation for 1d case Was using wrong (undefined) vector element (the elements are at 0/2 position, not 0/1). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-21 22:05:52 +02:00
Roland Scheidegger	4b45b61fef	util: add avx2 and xop detection to cpu detection code Going to need this soon (not going to bother with avx2 intrinsics at this time but don't want to do workarounds for true vector shifts if llvm itself can use them just fine and won't need the gazillion instruction emulation). Not really tested other than my cpu returns 0 for these features... (I have no idea if llvm actually would emit avx2/xop instructions neither...) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-20 23:00:24 +02:00
Roland Scheidegger	9299128bf2	gallivm: fix bogus aos path detection Need to check the wrap mode of the actually used coords not a fixed 2. While checking more than necessary would only potentially disable aos and not cause any harm I'm pretty sure for 3d textures it could have caused assertion failures (if s,t coords have simple filter and r not). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-20 23:00:24 +02:00
Roland Scheidegger	fe92d7fab4	gallivm: do clamping of border color correctly for all formats Turns out it is actually very complicated to figure out what a format really is wrt range, as using channel information for determining unorm/snorm etc. doesn't work for a bunch of cases - namely compressed, subsampled, other. Also while here add clamping for uint/sint as well - d3d10 doesn't actually need this (can only use ld with these formats hence no border) and we could do this outside the shader for GL easily (due to the fixed texture/sampler relation) do it here too just so I can forget about it. v2: move border color clamping out of fetch texel. Also change it to clamp the whole border vector at once (and use vectorized load of border color), which saves a couple of instructions - needs some different handling of mixed signed/unsigned formats so skip the per channel stuff and just derive this from first channel except for special formats. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-20 23:00:24 +02:00
Roland Scheidegger	ac1a2714c7	gallivm: implement better control of per-quad/per-element/scalar lod There's a new debug value used to disable per-quad lod optimizations in fragment shader (ignored for vs/gs as the results are just too wrong typically). Also trying to detect if a supplied lod value is really a scalar (if it's coming from immediate or constant file) in which case sampler code can use this to stay on per-quad-lod path (in fact for explicit lod could simplify even further and use same lod for both quads in the avx case but this is not implemented yet). Still need to actually implement per-element lod bias (and derivatives), and need to handle per-element lod in size queries. v2: fix comments, prettify. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-20 23:00:24 +02:00
Zack Rusin	7115bc3940	draw: handle nan clipdistance If clipdistance for one of the vertices is nan (or inf) then the entire primitive should be discarded. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-08-15 16:26:32 -04:00
Roland Scheidegger	6ca18e06ae	gallivm: revert accidentally commited hunk That magic wasn't meant to be commited, need to work on some proper fix.	2013-08-15 19:26:39 +02:00
Roland Scheidegger	5626a84a00	gallivm: do per-sample depth comparison instead of doing it post-filter Doing the comparisons pre-filter is highly recommended by OpenGL (and d3d9) and definitely required by d3d10. This actually doesn't do it pre-filter but more "in-filter" as otherwise need to push the comparisons even further down into fetch code and this also trivially allows using a somewhat cheaper lerp. Doing it pre-filter would actually have some performance advantage for UNORM formats (because the comparisons should be done in texture format, we'd only need to convert the shadow ref coord to texture format once, but in turn would save converting the per-sample texture values to floats) but this gets a bit messy as this has implications for border color handling as well (which needs to be done prior to depth comparisons, hence would also need to convert border color to texture format too or use some other tricks like doing separate border color / shadow ref comparison and simply using that result directly when doing border replacement). Should make no difference for nearest filtering, and performance for linear filtering should be mostly the same too (essentially have one more comparison instruction per sample, and replace the sub/mul/add lerp with a sub/and/and/add special "lerp" which all in all shouldn't be much of a difference). v2: get rid of old code completely Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-15 18:42:20 +02:00
Roland Scheidegger	e58c2310b8	gallivm: already pass coords in the right place in the sampler interface This makes things a bit nicer, and more importantly it fixes an issue where a "downgraded" array texture (due to view reduced to 1 layer and addressed with (non-array) samplec instruction) would use the wrong coord as shadow reference value. (This could also be fixed by passing target through the sampler interface much the same way as is done for size queries, might do this eventually anyway.) And if we'd ever want to support (shadow) cube map arrays, we'd need 5 coords in any case. v2: fix bugs (texel fetch using wrong layer coord for 1d, shadow tex using wrong shadow coord for 2d...). Plus need to project the shadow coord, and just for fun keep projecting the layer coord too. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-15 00:40:14 +02:00
Roland Scheidegger	d4b43cedb6	gallivm: change coordinate handling throughout functions Instead of passing s,t,r coordinates pass a coord array - the reason is that I need to pass more coords (in particular for shadow "coord", future will also need another one for cube map arrays) so just pass them as an array. Also, to simplify things, use fixed location for the shadow reference value I want to get rid of the silly "where is the right coord value" game. Keep old-style however for aos sampling (which is not going to need shadow coord, though for cube map arrays it still would need fixing). (Next patch will pass those through using the new arrangement directly from sampler interface.) v2: fix up soa split path (unreachable currently but still...) Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-15 00:40:14 +02:00
Roland Scheidegger	c6c55ad3e9	gallivm: fix border color with normalized texture formats We need to put border color into texture format color space which essentially means clamping for non-float, normalized formats (not entirely sure if we're also meant to quantize the float but it's probably ok not to do it thankfully). For OpenGL we could do this easily outside generated code due to the 1:1 sampler/texture correspondence but not for d3d10 which is terrible (as we recalculate a constant over and over again per shader invocation). Fortunately border color should be rare enough that we don't care THAT much. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-15 00:40:14 +02:00
Roland Scheidegger	6991f86945	gallivm: implement new float comparison instructions returning integer masks FSEQ/FSGE/FSLT/FSNE work just the same as SEQ/SGE/SLT/SNE except skip the select. And just for consistency use the same appropriate ordered/unordered comparisons for the old opcodes as well. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-13 19:09:17 +02:00
Roland Scheidegger	cd2f26090a	gallivm: fix exec_mask interaction with geometry shader after end of main Because we must maintain an exec_mask even if there's currently nothing on the mask stack, we can still have an exec_mask at the end of the program. Effectively, this mask should be set back to default when returning from main. Without relying on END/RET opcode (I think it's valid to have neither) it is actually difficult to do this, as there doesn't seem any reasonable place to do it, so instead let's just say the exec_mask is invalid outside main (which it really is effectively). The problem is that geometry shader called end_primitive outside the shader (in the epilogue), and as a result used a bogus mask, leading to bugs if we had to set the (somewhat misnamed) ret_in_main bit anywhere. So just avoid the mask combining function when called from outside the shader. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-12 23:33:00 +02:00
Roland Scheidegger	7147094ff2	gallivm: simplify geometry shader mask handling a bit Instead of reducing masks to 0/1 simply use the mask directly as -1. Also use some signed comparison instead of unsigned (as far as I understand these values have to be (very) small and signed means llvm doesn't have to apply additional logic to do the unsigned comparisons the cpu can't do). Saves a couple of instructions in some test geometry shader here. v2: that was a bit to much optimization, don't skip combining the masks... Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-12 23:33:00 +02:00
Roland Scheidegger	8c5283dc17	gallivm: (trivial) fix typo in argument declaration of lp_build_size_query_soa Was meant to match the name used elsewhere, spotted by Anthony.	2013-08-12 23:33:00 +02:00
Roland Scheidegger	894d4903e7	gallivm: set non-existing values really to zero in size queries for d3d10 My previous attempt at doing so double-failed miserably (minification of zero still gives one, and even if it would not the value was never written anyway). While here also rename the confusingly named int_vec bld as we have int vecs of different sizes, and rename need_nr_mips (as this also changes out-of-bounds behavior) to is_sviewinfo too. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-09 20:49:19 +02:00
Roland Scheidegger	b0f74250e1	gallivm: use texture target from shader instead of static state for size query d3d10 has no notion of distinct array resources neither at the resource nor sampler view level. However, shader dcl of resources certainly has, and d3d10 expects resinfo to return the values according to that - in particular a resource might have been a 1d texture with some array layers, then the sampler view might have only used 1 layer so it can be accessed both as 1d or 1d array texture (I think - the former definitely works). resinfo of a resource decleared as array needs to return number of array layers but non-array resource needs to return 0 (and not 1). Hence fix this by passing the target from the shader decl to emit_size_query and use that (in case of OpenGL the target will come from the instruction itself). Could probably do the same for actual sampling, though it may not matter there (as the bogus components will essentially get clamped away), possibly could wreak havoc though if it REALLY doesn't match (which is of course an error but still). Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-09 20:49:18 +02:00
Roland Scheidegger	38ad404f76	gallivm: honor d3d10's wishes of out-of-bounds behavior for texture size query Specifically, must return 0 for non-existent mip levels (and non-existent textures which is an unsolved problem) for everything but total mip count. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-09 20:49:18 +02:00
Roland Scheidegger	6ce54a81b2	gallivm: honor d3d10 floating point rules for shadow comparisons d3d10 specifies ordered comparisons for everything but not_equal which is unordered (http://msdn.microsoft.com/en-us/library/windows/desktop/cc308050.aspx). OpenGL probably doesn't care. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-08 18:55:58 +02:00
Roland Scheidegger	e1590b9690	gallivm: don't clamp reference value for shadow comparison for float formats This is wrong both for OpenGL and d3d. (In fact clamping is a side effect of converting to depth format, so this should really do quantization too at least in d3d10 for the comparisons to be truly correct.) Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-08 18:55:57 +02:00
Roland Scheidegger	eac57bc223	gallivm: propagate scalar_lod to emit_size_query too Clearly the returned values need to be per-element if the lod is per element. Does not actually change behavior yet. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-08 18:55:57 +02:00
Roland Scheidegger	ce0e66af0a	gallivm: fix out-of-bounds behavior for fetch/ld For d3d10 and ARB_robust_buffer_access_behavior, we are required to return 0 for out-of-bounds coordinates (for which we can just enable the code already there was just disabled). Additionally, also need to return 0 for out-of-bounds mip level and out-of-bounds layer. This changes the logic so instead of clamping the level/layer, an out-of-bound mask is computed instead in this case (actual clamping then can be omitted just like with coordinates, since we set the fetch offset to zero if that happens anyway). Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-08 18:55:57 +02:00
Roland Scheidegger	2d9fea95e8	gallivm: fix comment wrt srgb accuracy. I think it's actually not good enough now...	2013-08-08 18:55:57 +02:00
Laurent Carlier	2572e3b4a1	gallivm: Fix build - Remove TargetOptions.RealignStack for llvm>=3.4 Since llvm -3.4svn r187618, TargetOptions doesn't provide RealignStack, so only enable it with llvm<3.4 This option must now be specified using function attributes, see LLVM commit r187618 Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2013-08-06 15:31:48 -07:00
Roland Scheidegger	e7ed70a52e	gallivm: obey clarified shift behavior llvm shifts are undefined for shift counts exceeding (or matching) bit width, so need to apply a mask for the tgsi shift instructions. v2: only use mask for the tgsi shift instructions, not for the build shift helpers. None of the internal callers need this behavior, and while llvm can optimize away the masking for constants there are legitimate cases where it might not be able to do so even if we know that shift count must be smaller than type width (currently all such callers do not use the build shift helpers). Reviewed-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-02 03:49:57 +02:00
Roland Scheidegger	b1ed7202df	gallivm: use nearest rounding for float->unorm24 conversion Previously we were using truncation, which gives the correct result only for numbers in [0.5-1.0] range (because there's no mantissa bits to do any rounding there). This is frequently hit (and probably only used there) when converting fragment depth to depth format (d24s8 etc.) or otherwise dealing with depth format. v2: as spotted by Jose, get rid of extra type (src_type is already unsigned). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-07-31 17:09:02 +02:00
Roland Scheidegger	e08114fed7	gallivm: (trivial) git rid of assertion in float->uint conversion code Commit `8c3d3622d9` introduced a new assertion, but since it causes lp_test_conv failures remove it again and let's hope we don't really hit bugs caused by the potentially bogus code (it is possible the assert() caught some cases which work correctly too).	2013-07-29 13:23:56 +02:00
Roland Scheidegger	8c3d3622d9	gallivm: fix float->SNORM conversion Just like the UNORM case we need to use round to nearest, not trunc. (There's also another problem, we're using the formula for SNORM->float which will produce a value below -1.0 for the most negative value which according to both OpenGL and d3d10 would need clamping. However, no actual failures have been observed due to that hence keep cheating on that.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-07-27 16:41:29 +02:00
Roland Scheidegger	abcc40e7f0	gallivm: handle texel swizzles correctly for d3d10-style sample opcodes unlike OpenGL, the texel swizzle is embedded in the instruction, so honor that. (Technically we now execute both the sampler_view swizzle and the per-instruction swizzle but this should be quite ok.) v2: add documentation note as it's not obvious. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-07-27 16:41:29 +02:00
Vinson Lee	60c248c3af	gallivm: Remove NoFramePointerElimNonLeaf for LLVM >= 3.4. TargetOptions::NoFramePointerElimNonLeaf was removed in LLVM 3.4 r187093. Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-07-25 09:50:07 -07:00
Vinson Lee	0ac3164708	gallivm: Remove dead code in lp_build_compare_ext. There are earlier returns for PIPE_FUNC_NEVER and PIPE_FUNC_ALWAYS. The switch value of 'func' cannot be either of those values. Fixes "Logically dead code" defects reported by Coverity. Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-07-24 23:47:34 -07:00
Zack Rusin	f7c06785d0	gallivm: add a log function that handles edge cases Same as log2_safe, which means that it can handle infs, 0s and nans. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-07-19 16:29:18 -04:00
Zack Rusin	018c69ac56	gallivm: export unordered/ordered cmp to a common function Only the floating point operarators change everything else is the same so it makes sense to share the code. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-07-19 16:29:18 -04:00
Zack Rusin	192c68b85a	gallivm: handle -inf, inf and nan's in sin/cos instructions sin/cos for anything not finite is nan and everything else has to be between [-1, 1]. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-07-19 16:29:17 -04:00
Zack Rusin	13e2cd2f2c	gallivm: add a version of log2 which handles edge cases That means that if input is: * - less than zero (to and including -inf) then NaN will be returned * - equal to zero (-denorm, -0, +0 or +denorm), then -inf will be returned * - +infinity, then +infinity will be returned * - NaN, then NaN will be returned It's a separate function because the checks are a little bit costly and in most cases are likely unnecessary. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-07-19 16:29:17 -04:00
Zack Rusin	7b672c1503	gallivm: fix edge cases in exp2 exp(0) has to be exactly 1, exp(-inf) has to be 0, exp(inf) has to be inf and exp(nan) has to be nan, this fixes all of those cases. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-07-19 16:29:17 -04:00
Zack Rusin	ab47bbecd6	gallivm: handle nan's in min/max Both D3D10 and OpenCL say that if one the inputs is nan then the other should be returned. To preserve that behavior the patch fixes both the sse and the non-sse paths in both functions and adds helper code for handling nans. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-07-19 16:29:17 -04:00
Roland Scheidegger	7fd30a8621	gallivm: (trivial) simplify lp_build_cos/lp_build_sin a tiny bit Use "or" instead of "add" (this is a classic select sequence, which at least newer llvm versions can actually recognize (3.2+?), and the "add" might prevent that - and we really don't want an add instead of an or with avx if it isn't recognized (even without avx logic ops might be cheaper)). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-07-17 18:16:34 +02:00
Kyle McMartin	87c3440567	llvmpipe: use MCJIT on ARM and AArch64 MCJIT is the only supported LLVM JIT on AArch64 and ARM (the regular JIT has bit-rotted badly on ARM and doesn't exist on AArch64.) Signed-off-by: Kyle McMartin <kyle@redhat.com> Signed-off-by: Dave Airlie <airlied@gmail.com>	2013-07-17 17:29:01 +10:00
Roland Scheidegger	dc1cc928ed	llvmpipe: support sRGB framebuffers Just use the new conversion functions to do the work. The way it's plugged in into the blend code is quite hacktastic but follows all the same hacks as used by packed float format already. Only support 4x8bit srgb formats (rgba/rgbx plus swizzle), 24bit formats never worked anyway in the blend code and are thus disabled, and I don't think anyone is interested in L8/L8A8. Would need even more hacks otherwise. Unless I'm missing something, this is the last feature except MSAA needed for OpenGL 3.0, and for OpenGL 3.1 as well I believe. v2: prettify a bit, use separate function for packing. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-07-16 01:54:51 +02:00
Roland Scheidegger	796b73d1fe	gallivm: (trivial) use constant instead of exp2f() function Some lame compilers can't do exp2f() and as far as I can tell they can't do exp2() (with doubles) neither so instead of providing some workaround for that (wouldn't actually be too bad just replace with pow) and since it is used with a constant only just use the precalculated constant.	2013-07-14 02:39:33 +02:00
Roland Scheidegger	6bcbb0dc82	gallivm: handle srgb-to-linear and linear-to-srgb conversions srgb-to-linear is using 3rd degree polynomial for now which should be _just_ good enough. Reverse is using some rational polynomials and is quite accurate, though not hooked into llvmpipe's blend code yet and hence unused (untested). Using a table might also be an option (for srgb-to-linear especially). This does not enable any new features yet because EXT_texture_srgb was already supported via util_format fallbacks, but performance was lacking probably due to the external function call (the table used by the util_format_srgb code may not be all that much slower on its own). Some performance figures (taken from modified gloss, replaced both base and sphere texture to use GL_SRGB instead of GL_RGB, measured on 1Ghz Sandy Bridge, the numbers aren't terribly accurate): normal gloss, aos, 8-wide: 47 fps normal gloss, aos, 4-wide: 48 fps normal gloss, forced to soa, 8-wide: 48 fps normal gloss, forced to soa, 4-wide: 47 fps patched gloss, old code, soa, 8-wide: 21 fps patched gloss, old code, soa, 4-wide: 24 fps patched gloss, new code, soa, 8-wide: 41 fps patched gloss, new code, soa, 4-wide: 38 fps So there's a performance hit but it seems acceptable, certainly better than using the fallback. Note the new code only works for 4x8bit srgb formats, others (L8/L8A8) will continue to use the old util_format fallback, because I can't be bothered to write code for formats noone uses anyway (as decoding is done as part of lp_build_unpack_rgba_soa which can only handle block type width of 32). Compressed srgb formats should get their own path though eventually (it is going to be expensive in any case, first decompress, then convert). No piglit regressions. v2: use lp_build_polynomial instead of ad-hoc polynomial construction, also since keeping both linear to srgb functions for now make sure both are compiled (since they share quite some code just integrate into the same function). v3: formatting fixes and bugfix in the complicated (disabled) linear-to-srgb path. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-07-13 18:42:17 +02:00
Roland Scheidegger	9b8d97e5bf	gallivm: better support for fast rsqrt We had to disable fast rsqrt before because it wasn't precise enough etc. However in situations when we know we're not going to need more precision we can still use a fast rsqrt (which can be several times faster than the quite expensive sqrt). Hence introduce a new helper which does exactly that - it is probably not useful calling it in some situations if there's no fast rsqrt available so make it queryable if it's available too. v2: use fast_rsqrt consistently instead of rsqrt_fast, fix indentation, let rsqrt use fast_rsqrt. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-07-13 18:42:17 +02:00
Zack Rusin	00cd455bd5	gallium: fixup definitions of the rsq and sqrt GLSL spec says that rsq is undefined for src<=0, but the D3D10 spec says it needs to be a NaN, so lets stop taking an absolute value of the source which completely breaks that behavior. For the gl program we can simply insert an extra abs instrunction which produces the desired behavior there. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-07-11 20:19:04 -04:00
José Fonseca	1b0d29b5da	gallivm: Eliminate redundant lp_build_select calls. lp_build_cmp already returns 0 / ~0, so the lp_build_select call is unnecessary. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-07-12 15:40:16 +01:00
Brian Paul	46205ab8cc	tgsi: rename the TGSI fragment kill opcodes TGSI_OPCODE_KIL and KILP had confusing names. The former was conditional kill (if any src component < 0). The later was unconditional kill. At one time KILP was supposed to work with NV-style condition codes/predicates but we never had that in TGSI. This patch renames both opcodes: TGSI_OPCODE_KIL -> KILL_IF (kill if src.xyzw < 0) TGSI_OPCODE_KILP -> KILL (unconditional kill) Note: I didn't just transpose the opcode names to help ensure that I didn't miss updating any code anywhere. I believe I've updated all the relevant code and comments but I'm not 100% sure that some drivers had this right in the first place. For example, the radeon driver might have llvm.AMDGPU.kill and llvm.AMDGPU.kilp mixed up. Driver authors should review their code. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-07-12 08:32:51 -06:00
Brian Paul	f501baabdb	tgsi: fix-up KILP comments KILP is really unconditional fragment kill. We've had KIL and KILP transposed forever. I'll fix that next. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-07-12 08:32:51 -06:00
Zack Rusin	63386b2f66	util: treat denorm'ed floats like zero The D3D10 spec is very explicit about treatment of denorm floats and the behavior is exactly the same for them as it would be for -0 or +0. This makes our shading code match that behavior, since OpenGL doesn't care and on a few cpu's it's faster (worst case the same). Float16 conversions will likely break but we'll fix them in a follow up commit. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-07-09 23:30:55 -04:00
Roland Scheidegger	9ef49cfd84	gallivm: (trivial) fix using one lod instead of per-quad lod for texel fetch The logic for choosing number of lods was bogus. (The code should ultimately handle the case of only one lod even with multiple quads but currently can't.)	2013-07-05 18:07:51 +02:00
José Fonseca	45f174ce40	gallivm: Remove bogus assert. It is perfectly valid for the swizzle to be bigger than 2. For example the texel offsets could be SAMPLE ..., IMM[0].zzz What is not correct is for chan_index to be bigger than 2. Trivial.	2013-07-05 14:35:54 +01:00
Roland Scheidegger	4dbca8672b	gallivm: (trivial) fix bogus assertion for per-element lod with 1d resources The assertion was always broken but the code unused until enabling the per-element lod code. Fixes piglit texelFetch vs isampler1D and similar tests (only run with GL 3.0 version override).	2013-07-05 01:19:23 +02:00
Roland Scheidegger	f3bbf65929	gallivm: do per-pixel lod calculations for explicit lod d3d10 requires per-pixel lod calculations for explicit lod, lod bias and explicit derivatives, and we should probably do it for OpenGL too - at least if they are used from vertex or geometry shaders (so doesn't apply to lod bias) this doesn't just affect neighboring pixels. Some code was already there to handle this so fix it up and enable it. There will no doubt be a performance hit unfortunately, we could do better if we'd knew we had a real vector shift instruction (with variable shift count) but this requires AVX2 on x86 (or a AMD Bulldozer family cpu). Don't do anything for lod bias and explicit derivatives yet, though no special magic should be needed for them neither. Likewise, the size query is still broken just the same. v2: Use information if lod is a (broadcast) scalar or not. The idea would be to base this on the actual value, for now just pretend it's a scalar in fs and not a scalar otherwise (so, per-pixel lod is only used in gs/vs but same code is generated for fs as before). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-07-04 19:42:04 +02:00
José Fonseca	84f367e69a	gallivm: Simplify intrinsic name construction. Just noticed this could be slightly shortened when fixing MSVC build. Trivial.	2013-07-02 13:12:31 +01:00
José Fonseca	4c859901ce	gallivm: Fix MSVC build.	2013-07-02 06:41:32 +01:00
José Fonseca	e621ec816d	gallivm: Fix indirect immediate registers. If reg->Register.Indirect is true then the immediate is not truly a constant LLVM expression. There is no performance regression in using LLVMBuildBitCast, as it will fallback to LLVMConstBitCast internally when the argument is a constant. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-07-02 06:30:06 +01:00
Zack Rusin	34546d61c1	draw/gallivm: export overflow arithmetic to its own file We'll be reusing this code so lets put it in a common file and use it in the draw module. Signed-off-by: Zack Rusin <zackr@vmware.com>	2013-06-28 04:24:24 -04:00
Adam Jackson	2151d893fb	gallium: Fix llvmpipe on big-endian machines Squashed commit of the following: commit 0857a7e105bfcbc4d1431b2cc56612094c747ca3 Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:07 2013 -0400 gallivm: Fix lp_build_rgba8_to_fi32_soa for big endian Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 0d65131649a8aa140e2db228ba779d685c4333e3 Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:07 2013 -0400 gallivm: Fix big-endian machines This adds a bit-shift count to the format table, and adds the concept of vector or bitwise alignment on gathers. Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 9740bda9b7dc894b629ed38be9b51059ce90818f Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:07 2013 -0400 llvmpipe: Fix convert_to_blend_type on big-endian Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit ae037c2de0f029e4e99371c0de25560484f0d8df Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:06 2013 -0400 util: Convert color pack to packed formats This fixes them on big-endian. Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 5b05ac0c89ae092ea8ba5bba9f739708d7396b5c Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:06 2013 -0400 graw-xlib: Convert to packed formats Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 51396e7d098cb6ff794391cf11afe4dbf86dbea0 Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:06 2013 -0400 format: Convert to packed formats Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 417b60bc66eb450e68a92ab0e47f76e292b385e6 Author: Adam Jackson <ajax@redhat.com> Date: Tue Jun 18 12:25:06 2013 -0400 st/dri: Convert to packed formats Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 0934b2e022a5e0847d312c40734e2b44cac52fd8 Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:06 2013 -0400 st/xlib: Convert to packed formats Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit a307ea3c3716a706963acce7966b5e405ba11db9 Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:06 2013 -0400 gbm: Convert to packed formats Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 53eebdd253e1960a645ea278f31d7ef6a6cf4aeb Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:06 2013 -0400 tests: Convert to packed formats Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 2f77fe3ee524945eacd546efcac34f7799fb3124 Author: Adam Jackson <ajax@redhat.com> Date: Tue Jun 18 13:07:37 2013 -0400 gallium: Document packed formats Signed-off-by: Adam Jackson <ajax@redhat.com> commit 1f1017159ce951f922210a430de9229f91f62714 Author: Richard Sandiford <r.sandiford@uk.ibm.com> Date: Tue Jun 18 12:25:06 2013 -0400 gallium: Introduce 32-bit packed format names These are for interacting with buffers natively described in terms of bit shifts, like X11 visuals: uint32_t xyzw8888 = (x << 0) \| (y << 8) \| (z << 16) \| (w << 24); Define these in terms of (endian-dependent) aliases to the array-style format names. Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com> commit 6cc7ab1ee66ed668da78c1d951dfd7782b4e786a Author: Adam Jackson <ajax@redhat.com> Date: Mon Jun 3 12:10:32 2013 -0400 gallium: Document format name conventions v2: - Fix a channel name thinko (Michel Dänzer) - Elaborate on SCALED versus INT - Add links to DirectX and FOURCC docs Signed-off-by: Adam Jackson <ajax@redhat.com> commit df4d269e7fb62051a3c029b84147465001e5776e Author: Adam Jackson <ajax@redhat.com> Date: Tue Jun 18 12:25:06 2013 -0400 gallivm: Remove all notion of byte-swapping Signed-off-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Adam Jackson <ajax@redhat.com>	2013-06-24 09:48:56 -04:00
Roland Scheidegger	957c040eb8	gallivm: (trivial) remove duplicated code block (including comment)	2013-06-13 00:41:43 +02:00
Richard Sandiford	ba6cd796dd	llvmpipe: Use saturating add/sub for UNORM formats lp_build_add and lp_build_sub have fallback code for cases that cannot be handled by known intrinsics. For UNORM formats, this code was using modulo rather than saturating arithmetic. This fixes some rendering issues for a gnome session on System z. It also fixes various piglit tests on z, such as spec/ARB_color_buffer_float/GL_RGBA8-render. The patch deliberately doesn't tackle the more complicated SNORM case. Tested against piglit on x86_64 and System z with no regressions. Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>	2013-06-10 16:20:45 -04:00
Roland Scheidegger	213c207b3a	gallivm: work around slow code generated for interleaving 128bit vectors We use 128bit vector interleave for untwiddling in the blend code (with 256bit vectors). llvm generates terrible code for this for some reason, so instead of generating a shuffle for 2 128bit vectors use a extract/insert shuffle instead (it only seems to matter we're not using 128bit wide vectors for the shuffle). This decreases instruction count of the blend code generated for a rgba8 render target without blending from 169 to 113 with llvm 3.1 and from 136 to 114 in llvm 3.2/3.3, and I got a ~8% (llvm 3.1) and ~5% (3.2/3.3) performance improvement in gears. (The generated code is still not terribly good as we could actually avoid the interleaving completely but llvm can't know this.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-08 17:33:51 +02:00
Brian Paul	14541dacab	tgsi: replace tgsi_file_names tgsi_file_names[] with tgsi_file_name() function This change came from the discovery that the STATIC_ASSERT to check that the number of register file strings didn't actually work. Similar changes could be made for the other string arrays in tgsi_string.c Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-07 09:23:24 -06:00
Richard Sandiford	7bdf1f2f1a	gallium: System z support The main change is to use MCJIT rather than the old JIT, which will never be supported for System z. The endianness part is by example since the patch was tested on a glibc system. Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com> Signed-off-by: Brian Paul <brianp@vmware.com>	2013-06-05 08:36:24 -06:00
Roland Scheidegger	44993c1808	gallivm: enhance special sse2 4x4f and 2x8f -> 1x16ub conversion There's no good reason why it can't handle 2x4f->1x8ub, 1x4f->1x4ub and 1x8f->1x8ub cases, there might be legitimate reasons why we don't have enough input vectors for a full destination vector, and using pack intrinsics should still be much better than using generic conversion (it looks like convert_alpha from the blend code might hit this though I suspect it could be avoided). v2: add another test vector format to lp_test_conv so this gets tested. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-05 00:29:46 +02:00
Roland Scheidegger	ce82523db9	gallivm: (trivial) fix lp_build_concat_n The code was designed to handle no-op concat but failed (unless the caller was using same pointer for src and dst). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-05 00:29:46 +02:00
Roland Scheidegger	458a9a0f85	gallivm: fix out-of-bounds access with mirror_clamp_to_edge address mode Surprising this bug survived so long, we were missing a clamp (in the linear filtering version). (Valgrind complained a lot about invalid reads with piglit texwrap, I've also seen spurios failures in this test which might have happened due to this. Valgrind probably didn't complain before the alignment reduction in llvmpipe to 4x4 since the test is using tiny textures so the reads were still always well within allocated area.) While here, also do an effective clamp (after half subtraction) of [0,length-0.5] instead of [0, length-1] which saves an instruction (the filtering weight could be different due to this, but only if both texels point to the same max texel so it doesn't matter). (Both changes are borrowed from PIPE_TEX_CLAMP_TO_EDGE case.) Note: This is a candidate for the stable branches. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-06-01 20:03:59 +02:00
Roland Scheidegger	070a9afb54	llvmpipe: handle z32s8x24 depth/stencil format We need to split up the depth and stencil values in this case, and there's some new logic required to handle float depth and stencil simultaneously. Also make sure we get the 64bit zs clear values and masks propagated correctly.	2013-05-18 00:32:33 +02:00
Roland Scheidegger	d7e811c0b0	gallivm: handle z32s8x24 format for sampling Since we can only sample either depth or stencil but not both only load the required bits which makes things a bit easier (it requires special handling since the format doesn't fit into 32bit). The logic for deciding if depth or stencil should be sampled is a bit odd, but seems to be what other drivers and statetrackers do: if it's a format with both depth and stencil (or just with depth) then sample depth, for sampling stencil a sampler view format with only stencil is required. Also while here fix up stencil sampling for other formats as well, though this isn't supported by mesa (ARB_stencil_texturing), and while blits would use it they don't work neither since they'd also need stencil export. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-05-18 00:31:49 +02:00
José Fonseca	6166ffeaf7	gallivm: Eliminate 8.8 fixed point intermediates from AoS sampling path. This change was meant as a stepping stone to use PMADDUBSW SSSE3 instruction, but actually this refactoring by itself yields a 10% speedup on texture intensive shaders (e.g, Google Earth's ocean water w/o S3TC on a Ivy Bridge machine), while giving yielding exactly the same results, whereas PMADDUBSW only gave an extra 5%, at the expense of 2bits of precision in the interpolation. I belive that the speedup of this change comes from the reduced register pressure (as 8.8 fixed point intermediates take twice the space of 8bit unorm). Also, not dealing with 8.8 simplifies lp_bld_sample_aos.c code substantially -- it's no longer necessary to have code duplicated for low and high register halfs. Note about lp_build_sample_mipmap(): the path for num_quads > 1 is never executed (as it is faster on AVX to split the 256bit wide texture computation into two 128bit chunks, in order to leverage integer opcodes). This path might be useful in the future, so in order to verify this change did not break that path I had to apply this change: @@ -1662,11 +1662,11 @@ lp_build_sample_soa(struct gallivm_state gallivm, / * we only try 8-wide sampling with soa as it appears to * be a loss with aos with AVX (but it should work). * (It should be faster if we'd support avx2) / - if (num_quads == 1 \|\| !use_aos) { + if (/ num_quads == 1 \|\| ! / use_aos) { if (num_quads > 1) { if (mip_filter == PIPE_TEX_MIPFILTER_NONE) { LLVMValueRef index0 = lp_build_const_int32(gallivm, 0); / and then run texfilt mesademo: LP_NATIVE_VECTOR_WIDTH=256 ./texfilt Ran whole piglit without regressions. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-05-17 20:23:00 +01:00
José Fonseca	5aaa4bafe0	gallivm: Add and use lp_build_lerp_3d. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-05-17 20:22:50 +01:00
José Fonseca	cb9913cdab	gallivm: Support pointers in lp_build_print_value(). Trivial.	2013-05-16 15:14:10 +01:00
Zack Rusin	386327c48f	gallivm/soa: implement indirect addressing in immediates The support is analogous to the way we handle indirect addressing in temporaries, except that we don't have to worry about storing (after declarations) and thus we'll able to keep using the old code when indirect addressing isn't used. In other words we're still using constants directly, unless the instruction has immediate register with indirect addressing. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-05-14 03:09:15 -04:00
Vinson Lee	ff256ec068	gallivm: Fix build with LLVM >= 3.4 r181680. Tested-by: Laurent Carlier <lordheavym@gmail.com> Signed-off-by: Vinson Lee <vlee@freedesktop.org>	2013-05-14 09:06:14 -07:00
Stéphane Marchesin	1c56fc1025	draw/llvm: Add additional llvm optimization passes It helps a bit with vertex shader performance on i915g (a couple percent faster with openarena). I have tried most other passes, and they weren't showing any measurable improvement. Note that my vertex shaders didn't have loops, so maybe the loop optimizations could still be useful in the future. Reviewed-by: Brian Paul <brianp@vmware.com>	2013-05-08 22:05:54 -07:00
Chia-I Wu	75a48a53d8	tgsi: fix operand type of TGSI_OPCODE_NOT It should be TGSI_TYPE_UNSIGNED, not TGSI_TYPE_FLOAT. Fixed also gallivm not_emit_cpu() to use uint build context. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Acked-by: Roland Scheidegger <sroland@vmware.com>	2013-05-08 11:03:49 +08:00
Tom Stellard	730c90a70e	gallivm: Fix build for LLVM < 3.3 The C API versions of the LLVM multithreaded functions were added in LLVM 3.3.	2013-05-06 11:17:03 -07:00
Tom Stellard	55eb8eaaa8	gallivm: Move LLVMStartMultithreaded() static initializer into gallivm This does not solve all of the problems with using LLVM in a multithreaded enivronment, but it should help in some cases. Reviewed-by: Mathias.Froehlich@web.de	2013-05-06 09:06:03 -07:00
Zack Rusin	999cd79c9e	tgsi: allow negation of all integer types It's valid because we reuse certain arithmetic operations for both signed and unsigned types (e.g. uadd, umad, which have a bit unfortunate naming) Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-05-02 02:43:42 -04:00
Armin K	4742f9b00b	gallivm: Fix build with LLVM 3.3 Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2013-05-02 09:00:37 -07:00
Adam Jackson	16296cc843	gallivm: Fix altivec intrinsics for 8xi16 add/sub Signed-off-by: Adam Jackson <ajax@redhat.com>	2013-05-02 10:34:08 -04:00
Zack Rusin	f9f57312de	gallivm: fix indirect addressing of temps in soa mode we weren't adding the soa offsets when constructing the indices for the gather functions. That meant that we were always returning the data in the first vertex/primitive/pixel in the SoA structure and not correctly fetching from all structures. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-04-27 01:18:51 -04:00
José Fonseca	542c5b3703	gallivm: Fix trivial out-of-bounds indirection in lp_build_cube_lookup(). Courtesy of clang: src/gallium/auxiliary/gallivm/lp_bld_sample.c:1483:10: warning: array index of '2' indexes past the end of an array (that contains 2 elements) [-Warray-bounds] tmp[2] = lp_build_swizzle_aos(coord_bld, ddx_ddy[1], swizzle02); ^ ~ src/gallium/auxiliary/gallivm/lp_bld_sample.c:1430:10: note: array 'tmp' declared here LLVMValueRef ddx_ddy[2], tmp[2], rho_vec; ^ src/gallium/auxiliary/gallivm/lp_bld_sample.c:1487:56: warning: array index of '2' indexes past the end of an array (that contains 2 elements) [-Warray-bounds] rho_vec = lp_build_add(coord_bld, rho_vec, tmp[2]); ^ ~ src/gallium/auxiliary/gallivm/lp_bld_sample.c:1430:10: note: array 'tmp' declared here LLVMValueRef ddx_ddy[2], tmp[2], rho_vec; ^ src/gallium/auxiliary/gallivm/lp_bld_sample.c:1491:56: warning: array index of '2' indexes past the end of an array (that contains 2 elements) [-Warray-bounds] rho_vec = lp_build_max(coord_bld, rho_vec, tmp[2]); ^ ~ src/gallium/auxiliary/gallivm/lp_bld_sample.c:1430:10: note: array 'tmp' declared here LLVMValueRef ddx_ddy[2], tmp[2], rho_vec; ^	2013-04-26 08:44:37 +01:00
Tom Stellard	ead4db420e	gallivm: Fix build with LLVM >= r180063	2013-04-23 11:53:05 -04:00
José Fonseca	c0538860bf	gallivm: Fix assignment of unsigned values to OUT register. TEMP is not the only register file that accept unsigned. OUT too. Actually, what determines the appropriate type of the destination value is not the opcode, but rather the register. Also cleanup/simplify code. Add a few more asserts, but also make code more robust by handling graceful if assert fails. This fixes segfault / assertion in the included vert-uadd.sh graw shader. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-04-22 18:23:42 +01:00
José Fonseca	9fb5b2f45c	Revert "gallivm: Emit vector selects." It caused inumerous regressions (LLVM 3.1) in blending. In particular: - lp_test_blend type=u8nx16 rgb_func=sub rgb_src_factor=zero rgb_dst_factor=inv_src_color alpha_func=rev_sub alpha_src_factor=one alpha_dst_factor=const_color ... MISMATCH Src: 0 0 0 b5 49 29 0 a2 0 21 de 0 c3 1b ec 0 Src1: 2d 85 14 0 f8 0 79 a1 99 0 d8 0 59 16 0 0 Dst: 0 a9 97 0 c0 0 78 0 0 8b aa f0 bd 0 78 f6 Con: 7d 0 c0 0 0 bb 77 0 0 0 50 0 40 51 0 0 Res: 0 0 0 0 0 29 0 0 0 0 c8 0 97 1b e3 0 Ref: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 type=u8nx16 rgb_func=max rgb_src_factor=one rgb_dst_factor=inv_const_color alpha_func=min alpha_src_factor=zero alpha_dst_factor=inv_src1_alpha ... MISMATCH Src: d 0 0 e9 0 37 35 f0 62 0 0 b2 e9 f7 0 5c Src1: 8f 0 bf 0 a8 5 0 0 c4 0 d7 7 92 a 0 17 Dst: cb 0 1e 0 0 0 19 8e 0 4d 0 0 0 0 3 46 Con: aa 5a 5f 8f 0 0 bc 92 0 88 0 0 b7 8a c0 88 Res: 44 0 13 0 0 0 7 8e 0 24 0 0 0 0 1 40 Ref: 44 0 13 0 0 37 35 0 62 24 0 0 e9 f7 1 0 This reverts commit `1e266c7ef0`.	2013-04-21 09:07:19 +01:00
José Fonseca	f701a5a0fe	gallivm: Disable LLVM 2.7 workaround on other versions. 2.7 was a particularly trouble ridden release. Furthermore, the bug no longer can be reproduced ever since the first_level state was taken in account. Reviewed-by: Brian Paul <brianp@vmware.com>	2013-04-20 23:25:36 +01:00
José Fonseca	1e266c7ef0	gallivm: Emit vector selects. They are supported on LLVM 3.1, at least on x86. (I haven't tested on PPC though.) Actually lp_build_linear_mip_levels() already has been emitting them for some time. This avoids intrinsics, which tend to be an obstacle for certain optimization passes. Reviewed-by: Brian Paul <brianp@vmware.com>	2013-04-20 23:25:36 +01:00
Roland Scheidegger	85974e5fee	gallivm: implement switch opcode Should be able to handle all things which make this tricky to implement. Fallthroughs, including most notably into/out of default, should be handled correctly but are quite a mess. If we see largely unoptimized switches in the wild should probably think about some "real" switch optimization pass, e.g. things like this: switch case1 someinst brk case2 default case3 someinst brk case4 someinst endswitch are legal, but the pointless case2/case3 statements not only cause condition evaluation but will turn this into a "fake" fallthrough case (because mask and defaultmask are already updated for case2 when default is encountered) requiring executing code twice. If default is at the end though, there's never any code re-execution, and if that's not the case if there's no fallthrough in (not even a fake one) and out of default there's no code re-execution neither. v2: add comments, and use enum for break type instead of magic boolean. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-20 02:27:53 +02:00
Roland Scheidegger	8f5d4283c0	gallivm: use uint build context for mask instead of float Unsurprisingly noone was using it except for grabbing builder. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-20 02:27:53 +02:00
Roland Scheidegger	107550e71a	gallivm/tgsi: fix up breakc It seems there was a typo in gallivm breakc handling (I am actually still not sure it is really needed but otherwise that statement really should go away). Also fix the wrong src argument type, even though they weren't really used. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-20 02:27:53 +02:00
Roland Scheidegger	443950c6aa	gallivm: increase nesting limit to 66 This is still not really correct, since at least for sm 4.0 the nesting limit is 64 per subroutine, and subroutine nesting itself has a limit of 32, so since we have a flat stack we'd need 32*64. But this should probably be better fixed with per-subroutine stacks, since otherwise these structures get really big (like 100kB for the lp_exec_mask). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-20 02:27:53 +02:00
José Fonseca	dbb690872e	gallivm: Fix half floats with MCJIT. Prevents: LLVM ERROR: Cannot select: intrinsic %llvm.x86.vcvtph2ps.128	2013-04-19 10:13:19 +01:00
Roland Scheidegger	50cbcf0c46	gallivm: change cubemaps / derivatives handling, take 55 Turns out the previous "fix" for handling per-pixel face selection and derivatives didn't work out that well - the derivatives were wrong by quite a bit, in theory transformation of the derivatives into cube space should work, but would be _a lot_ more work than the "simplified" transform used. So, for explicit derivatives, I'm just giving up and go back to not honoring them. For implicit derivatives (and the fake explicit ones) however we try something a little different, we just calculate rho as we would for a 3d texture, that is after scaling the coords by the inverse major axis. This gives the same results as calculating the derivs after projection of the coords to the same face as long as all pixels hit the same face (and only without rho_no_opt, otherwise it should be a bit worse). And when not all pixels are hitting the same face, the results aren't so hot but not catastrophically bad (I believe not off by more than a factor of 2 without no_rho_approx and not more than sqrt(2) with no_rho_approx). I think this is better than just picking the wrong face but who knows... Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-18 17:06:43 +02:00
Roland Scheidegger	0d07f05ee8	gallivm: Add no_rho_approx debug option This will calculate rho correctly as sqrt(max((ds/dx)^2 + (dt/dx)^2 + (dr/dx)^2), (ds/dx)^2 + (dt/dx)^2 + (dr/dx)^2)) instead of max(\|ds/dx\|,\|dt/dx\|,\|dr/dx\|,\|ds/dy\|,\|dt/dy,\|dr/dy\|) (for 3 coords - 2 coords work analogous, for 1 coord there's no point doing the exact version), for both implicit and explicit derivatives. While such approximation seems to be allowed in OpenGL some APIs may be less forgiving, and the error can be quite large (sqrt(2) for 2 coords, sqrt(3) for 3 coords so wrong by nearly one mip level in the latter case). This also helps to single out "real" bugs from "expected" ones, so it is debug only (though at least combined with no_brilinear I didn't really see much of a performance difference but only tested with a debug build - at least with implicit mipmaps the instruction count is almost exactly the same though the instructions are more complex (1 sqrt and mul/adds instead of and/max mostly). The code when the option isn't set stays exactly the same. v2: rename no_rho_opt to no_rho_approx. Reviewed-by: Brian Paul <brianp@vmware.com>	2013-04-18 17:04:01 +02:00
José Fonseca	6e833d4d09	gallivm: Drop pos arg from lp_build_tgsi_soa. Never used. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-04-18 14:18:13 +01:00
Zack Rusin	cb58c79efb	gallivm/gs: fix indirect addressing in geometry shaders We were always treating the vertex index as a scalar but when the shader is using indirect addressing it will be a vector of indices for each channel. This was causing some nasty crashes insides LLVM. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-04-17 15:28:54 -07:00
Zack Rusin	f01f754ca1	draw/gs: make sure geometry shaders don't overflow The specification says that the geometry shader should exit if the number of emitted vertices is bigger or equal to max_output_vertices and we can't do that because we're running in the SoA mode, which means that our storing routines will keep getting called on channels that have overflown (even though they will be masked out, but we just can't skip them). So we need some scratch area where we can keep writing the overflown vertices without overwriting anything important or crashing. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-16 23:38:47 -07:00
Zack Rusin	b739376cff	gallivm/gs: fix the end primitive calls The issue with SOA execution and end_primitive opcode is that it can be executed both when we haven't emitted any vertices, in which case we don't want to emit an empty primitive, and when the execution mask is zero and the execution should be skipped. We handled only the latter of those conditions. Now we're combining the execution mask with a mask created from emitted vertices to handle both cases. As a result we don't need the pending_end_primitive flag which was broken because it was static and could be affected by both above mentioned conditions at run-time. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-16 23:38:46 -07:00
José Fonseca	b8f6858fcb	gallivm: JIT symbol resolution with linux perf. Details on docs/llvmpipe.html Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-04-17 16:50:52 +01:00
José Fonseca	50b3fc6204	gallium: Disambiguate TGSI_OPCODE_IF. TGSI_OPCODE_IF condition had two possible interpretations: - src.x != 0.0f - Mesa statetracker when PIPE_SHADER_CAP_INTEGERS was false either for vertex and fragment shaders - gallivm/llvmpipe - postprocess - vl state tracker - vega state tracker - most old drivers - old internal state trackers - many graw examples - src.x != 0U - Mesa statetracker when PIPE_SHADER_CAP_INTEGERS was true for both vertex and fragment shaders - tgsi_exec/softpipe - r600 - radeonsi - nv50 And drivers that use draw module also were a mess (because Mesa would emit float IFs, but draw module supports native integers so it would interpret IF arg as integers...) This sort of works if the source argument is limited to float +0.0f or +1.0f, integer 0, but would fail if source is float -0.0f, or integer in the float NaN range. It could also fail if source is integer 1, and hardware flushes denormalized numbers to zero. But with this change there are now two opcodes, IF and UIF, with clear meaning. Drivers that do not support native integers do not need to worry about UIF. However, for backwards compatibility with old state trackers and examples, it is advisable that native integer capable drivers also support the float IF opcode. I tried to implement this for r600 and radeonsi based on the surrounding code. I couldn't do this for nouveau, so I just shunted IF/UIF together, which matches the current behavior. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Marek Olšák <maraeo@gmail.com> v2: - Incorporate Roland's feedback. - Fix r600_shader.c merge conflict. - Fix typo in radeon, spotted by Michel Dänzer. - Incorporte Christoph Bumiller's patch to handle TGSI_OPCODE_IF(float) properly in nv50/ir.	2013-04-17 10:54:08 +01:00
José Fonseca	f61b7da80e	gallium: Eliminate TGSI_OPCODE_IFC. Never used or implemented. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-04-17 10:54:08 +01:00
Roland Scheidegger	1d6eb23f2d	gallivm: fix small but severe bug in handling multiple lod level strides Inserting the value for the second quad in the wrong place for the following shuffle. This meant the row or image stride was undefined which is quite catastrophic, can lead to bogus texels fetched or just segfault. This code is only hit for SoA path currently, still surprising it didn't crash more or caused more visible issues (I think llvm used a broadcast shuffle for the undefined parts of the vector, hence the undefined value for the second quad was just the same as that from the first quad, so as long as both quads hit the same mip level everything was fine, and since lower mips always have the same large stride it made it less likely to hit out-of-bound memory in case of differing lods). Note: this is a candidate for stable branches. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-15 15:23:40 +02:00
Zack Rusin	fe29f99293	gallivm/tgsi: handle untyped moves both mov and ucmp can be used to move variables of any type. correctly note that about ucmp in the tgsi_info and make sure gallivm can handle that by correctly casting the untyped moves. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-04-10 12:37:17 -07:00
Zack Rusin	d56f2d5267	gallivm: fix loops and conditionals within GS We were using simple temporaries, without using alloca or phi nodes which meant that on every iteration of the loop our temporaries, which were holding the number of vertices and primitives which were emitted, were being reset to zero. Now we're using alloca to allocate those variables to preserve them across conditionals. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-04-10 12:33:59 -07:00
Zack Rusin	7466e0b6c8	gallivm: fix unsigned divide and remainder opcodes We want to both make sure we never divide by zero to not generate sigfpe and that divide by zero is guaranteed to return 0xffffffff. Based on José idea. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-04-10 12:31:22 -07:00
Zack Rusin	1ad4a4eeb3	gallivm: fix breakc we break when the mask values are 0 not, 1, plus it's bit comparison not a floating point comparison. This fixes both. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-04-10 12:25:34 -07:00
Roland Scheidegger	9eef86bb55	gallivm: some minor cube map cleanup The ar_ge_as_at variable was just very very confusing since the condition was actually the other way around (as_at_ge_ar). So change the condition (and the selects depending on it) to match the variable name. And also change the chosen major axis in case the coord values are the same. OpenGL doesn't care one bit which one is chosen in this case but it looks like dx10 would require z chosen over y, and y chosen over x (previously did x chosen over y, y chosen over z). Since it's all the same effort just honor dx10's wishes. (Though actually, for some prefered orderings, we could save one (or two with derivatives) selects since the tnewx and tnewz (and the corresponding dmax values) are the same.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-04 23:22:10 +02:00
Zack Rusin	be9a42e980	llvmpipe: implement ucmp and add a test for it Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-04-04 12:09:55 -07:00
Roland Scheidegger	ce5096a0a9	gallivm: honor explicit derivatives values for cube maps. This is trivial now, though need to make sure we pass all the necessary derivative values (which is 3 each for ddx/ddy not 2). Passes piglit arb_shader_texture_lod-texgradcube test. v2: add the forgotten abs() for all incoming derivatives (discovered by new piglit arb_shader_texture_lod-texgradcube test, though more by luck as it was failing only for exactly one pixel...). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-04 01:03:42 +02:00
Roland Scheidegger	f621015cb5	gallivm: do per-pixel cube face selection (finally!!!) This proved to be tricky, the problem is that after selection/mirroring we cannot calculate reasonable derivatives (if not all pixels in a quad end up on the same face the derivatives could get "randomly" exceedingly large). However, it is actually quite easy to simply calculate the derivatives before selection/mirroring and then transform them similar to the cube coordinates (they only need selection/projection, but not mirroring as we're not interested in the sign bit, of course). While there is a tiny bit more work to do (need to calculate derivs for 3 coords instead of 2, and additional selects) it also simplifies things somewhat for the coord selection itself (as we save some broadcast aos shuffles, and we don't need to calculate the average vector) - hence if derivatives aren't needed this should actually be faster. Also, this has the benefit that this will (trivially) work for explicit derivatives too, which we completely ignored before that (will be in a separate commit for better trackability). Note that while the way for getting rho looks very different, it should result in "nearly" the same values as before (the "nearly" is only because before the code would choose the face based on an "average" vector and hence the derivatives calculated according to this face, where now (for implicit derivatives) the derivatives are projected on the face selected for the first (top-left) pixel in a quad, so not necessarly the same face). The transformation done might not quite be state-of-the-art, calculating length(dx,dy) as max(dx,dy) certainly isn't neither but this stays the same as before (that is I think a better transform would _somehow_ take the "derivative major axis" into account so that derivative changes in the major axis wouldn't get ignored). Should solve some accuracy problems with cubemaps (can easily be seen with the cubemap demo when switching wrapping/filtering), though we still don't do seamless filtering to fix it completely (so not per-sample but per-pixel is certainly better than per-quad and already sufficient for accurate results with nearest tex filter). As for performance, it seems to be a tiny bit faster too (maybe 3% or so with cubemap demo). Which I'd have expected with nearest/nearest filtering where this will be less instructions, but the difference seems to actually be larger with linear/linear_mipmap_linear where it is slightly more instructions, probably the code appears less serialized allowing better scheduling (on a sandy bridge cpu). It actually seems to be now at least as fast as the old path using a conditional when using 128bit vectors too (that is probably more a result of testing with a newer cpu though), for now that old path is still there but unused. No piglit regressions. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-04 01:03:42 +02:00
Roland Scheidegger	bdfbeb9633	gallivm: minor rho calculation optimization for 1 or 3 coords Using a different packing for the single coord case should save a shuffle. Plus some minor style fixes. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-04 01:03:42 +02:00
Roland Scheidegger	067a0ae420	gallivm: use f16c hw support for float->half and half->float conversion Should be way faster of course on cpus supporting this (includes AMD Bulldozer and Jaguar cores, Intel Ivy Bridge and up (except budget models)). Passes piglit fbo-blending-formats GL_ARB_texture_float -auto on Ivy Bridge. Reviewed-by: Brian Paul <brianp@vmware.com>	2013-04-04 01:03:42 +02:00
Zack Rusin	d8543bd752	draw: Implement support for primitive id We were largely ignoring primitive id. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-04-03 10:16:25 -07:00

1 2 3 4 5 ...

939 Commits