KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Roland Scheidegger	ca4f0baca2	gallivm: fix somewhat broken NaN behavior for exp2 I actually screwed that up in `754319490f`, mistakenly thinking the code actually wanted the non-nan result before. So, introduce that missing nan behavior case and use that instead. For sse, there's no actual change in the resulting code at all, the fallback code wouldn't have done the right thing though. Of course, the actual issue I saw with pow() was completely unrelated... Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-08-30 01:34:41 +02:00
Roland Scheidegger	9da75f96bc	gallivm: handle cube map arrays for texture sampling Pretty easy, just make sure that all paths testing for PIPE_TEXTURE_CUBE also recognize PIPE_TEXTURE_CUBE_ARRAY, and add the layer * 6 calculation to the calculated face. Also handle it for texture size query, looks like OpenGL wants the number of cubes, not layers (so need division by 6). No piglit regressions. v2: fix up adding cube layer to face for seamless filtering (needs to happen after calculating per-sample face). Undetected by piglit unfortunately. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> (v1)	2014-08-30 01:33:02 +02:00
Vinson Lee	c04a6d5c29	gallivm: Fix build with LLVM >= 3.6 r215967. This LLVM 3.6 commit changed EngineBuilder constructor. commit 3f4ed32b4398eaf4fe0080d8001ba01e6c2f43c8 Author: Rafael Espindola <rafael.espindola@gmail.com> Date: Tue Aug 19 04:04:25 2014 +0000 Make it explicit that ExecutionEngine takes ownership of the modules. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215967 91177308-0d34-0410-b5e6-96231b3b80d8 Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>	2014-08-20 15:24:44 +09:00
Marek Olšák	c10332bbb8	gallium: remove PIPE_SHADER_CAP_MAX_ADDRS This limit is fixed in Mesa core and cannot be changed. It only affects ARB_vertex_program and ARB_fragment_program. The minimum value for ARB_vertex_program is 1 according to the spec. The maximum value for ARB_vertex_program is limited to 1 by Mesa core. The value should be zero for ARB_fragment_program, because it doesn't support ARL. Finally, drivers shouldn't mess with these values arbitrarily. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2014-08-11 21:53:57 +02:00
Darius Goad	5492296318	gallivm: Handle MSAA textures in emit_fetch_texels This support is preliminary due to the fact that MSAA is not actually implemented. However, this patch does fix the piglit test: spec/!OpenGL 3.2/glsl-resource-not-bound 2DMS (bug #79740). (v2 RS: don't emit 4th coord as explicit lod) Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-08-08 18:54:08 +02:00
Roland Scheidegger	394ea139c7	draw: hack around weird primitive id input in gs The distinction between system values and ordinary inputs is not very obvious in gallium - further fueled by the fact that they use the same semantic names. Still, if there's any value which imho really is a system value, it's the primitive id input into the gs (while earlier (tessleation) stages could read it, it is _always_ generated by the system). For some odd reason though (which I'd classify as a bug but seems too complicated to fix) the glsl compiler in mesa treats this as an ordinary varying, and everything else after that (including the state tracker and other drivers) just go along with that. But input fetching in gs for llvm based draw was definitely limited to the ordinary (2-dimensional) inputs so only worked with other state trackers, the code was also additionally relying on tgsi_scan_shader filling uses_primid correctly which did not happen neither (would set it only for all stages if it was a system value, but only set it for the fragment shader if it was an input value). This fixes piglit glsl-1.50-geometry-primitive-id-restart and primitive-id-in in llvmpipe. Reviewed-by: Brian Paul <brianp@vmware.com>	2014-08-08 18:54:08 +02:00
Jan Vesely	e28136343b	gallivm: Fix build with latest LLVM Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>	2014-08-05 12:52:56 +09:00
Roland Scheidegger	c3c33756ff	gallivm: fix cube map array (and cube map shadow with bias) handling In particular need to handle TEX2/TXB2/TXL2 opcodes. cube map shadow with bias already used TXB2 which didn't work before at all, despite that there's by default no piglit change (but using no_quad_lod and no_rho_opt indeed passes some more tex-miplevel-selection tests). The actual sampling code still won't handle cube map arrays. Reviewed-by: Brian Paul <brianp@vmware.com>	2014-08-05 04:13:17 +02:00
Roland Scheidegger	5a12155503	gallivm: fix up out-of-bounds level when using conformant out-of-bound behavior When using (d3d10) conformant out-of-bound behavior for texel fetching (currently always enabled) the level still needs to be set to a safe value even though the offset in the end won't get used because the level is used to look up the mip offset itself and the actual strides, which might otherwise crash. For simplicity, we'll use level 0 in this case (this ought to be safe, llvmpipe does not actually fill in level 0 information if first_level is larger, but some random strides / offsets shouldn't hurt as ultimately we always use offset 0 in this case). Fixes a crash in some in-house test where random huge levels appear in lp_build_fetch_texel() (the test actually uses level 0 always but if the fetching happens in a block with a execution mask random values may appear). CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-07-31 01:31:06 +02:00
Marek Olšák	04f2c88f45	gallium: rename shader cap MAX_CONSTS to MAX_CONST_BUFFER_SIZE This new name isn't so confusing. I also changed the gallivm limit, because it looked wrong. Reviewed-by: Brian Paul <brianp@vmware.com> v2: use sizeof(float[4])	2014-07-28 23:57:08 +02:00
Tom Stellard	fea996c2aa	gallium: Add PIPE_SHADER_CAP_DOUBLES This is for reporting whether or not double precision floating-point operations are supported. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2014-07-02 15:31:52 -04:00
Aaron Watry	564821c917	gallivm: Fix build after LLVM commit 211259 Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2014-06-20 19:49:18 -05:00
Roland Scheidegger	cad60420d5	gallivm: set mcpu when initializing llvm execution engine Previously llvm detected cpu features automatically when the execution engine was created (based on host cpu). This is no longer the case, which meant llvm was then not able to emit some of the intrinsics we used as we didn't specify any sse attributes (only on avx supporting systems this was not a problem since despite at least some llvm versions enabling it anyway we always set this manually). So, instead of trying to figure out which MAttrs to set just set MCPU. This fixes https://bugs.freedesktop.org/show_bug.cgi?id=77493. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Tested-by: Vinson Lee <vlee@freedesktop.org>	2014-06-19 16:58:00 +02:00
Marek Olšák	1df7199fc9	gallium: implement ARB_texture_query_levels The extension is always supported if GLSL 1.30 is supported. Softpipe and llvmpipe support is also added (trivial). Radeon and nouveau support is already done. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-06-19 00:17:36 +02:00
Roland Scheidegger	56335b4441	gallivm: fix SCALED -> NORM conversions Such conversions (which are most likely rather pointless in practice) were resulting in shifts with negative shift counts and shifts with counts the same as the bit width. This was always undefined in llvm, the code generated was rather horrendous but happened to work. So make sure such shifts are filtered out and replaced with something that works (the generated code is still just as horrendous as before). This fixes lp_test_format, https://bugs.freedesktop.org/show_bug.cgi?id=73846. v2: prettify by using build context shift helpers. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-06-18 19:52:57 +02:00
José Fonseca	172ef0c5a5	gallivm: Disable workaround for PR12833 on LLVM 3.2+. Fixed upstream.	2014-05-23 11:37:47 +01:00
José Fonseca	2c02f34fcc	gallivm: Support MCJIT on Windows. It works fine, though it requires using ELF objects. With this change there is nothing preventing us to switch exclusively to MCJIT, everywhere. It's still off though.	2014-05-23 11:37:47 +01:00
Roland Scheidegger	1e9cbbb1c4	llvmpipe: do IR counting for shader cache management after optimization. `2ea923cf57` had the side effect of IR counting now being done after IR optimization instead of before. Some quick analysis shows that there's roughly 1.5 times more IR instructions before optimization than after, hence the effective shader cache size got quite a bit smaller. Could counter this with an increase of the instruction limit but it probably makes more sense to count them after optimizations, so move that code. Reviewed-by: Brian Paul <brianp@vmware.com>	2014-05-19 17:07:41 +02:00
Roland Scheidegger	3bf2d86c09	gallivm: (trivial) fix compilation with llvm 3.1, 3.2 I actually checked the getModuleIdentifier() function exists with 3.1 but missed that the file moved... This fixes https://bugs.freedesktop.org/show_bug.cgi?id=78803	2014-05-17 02:03:35 +02:00
Roland Scheidegger	3a1da0abee	gallivm: print out how long it takes to optimize shader IR. Enabled with GALLIVM_DEBUG=perf (which up to now was only used to print warnings for unoptimized code). While some unexpectedly long shader compile times for some shaders were fixed with `8a9f5ecdb1` this should help recognize such problems in the future. For now though only available in debug builds (which are not always suitable for such analysis). And since this uses system time, it might not be all that accurate (even llvmpipe's own rasterization threads might be running at the same time, or just other tasks). (llvmpipe also has LP_DEBUG=counters but this only gives an average per shader and the the total time for all shaders.) This prints information like this: optimizing module fs17_variant0 took 1 msec optimizing module setup_variant_0 took 0 msec optimizing module draw_llvm_vs_variant0 took 9 msec optimizing module draw_llvm_vs_variant0 took 12 msec optimizing module fs17_variant1 took 2 msec v2: rebase for recent gallivm compilation changes, and print time for whole modules instead of functions (otherwise it would be very spammy since it would include all trivial inline sse2 functions), using the shiny new module names, prying them off LLVM using new helper (not available through C bindings). Per function timings, while possibly giving more information (if there'd be a problem only in for instance the partial not the whole function), don't seem all that useful for now. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-05-16 22:50:14 +02:00
Roland Scheidegger	26cac02c51	gallivm: give more verbose names to modules When we had just one module "gallivm" was an appropriate name. But now we have modules containing all functions for a particular variant, so give it a corresponding name (this is really just for helping debugging). Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-05-16 22:50:14 +02:00
Roland Scheidegger	b416645387	gallivm: remove optimization workaround when not having sse 4.1 This workaround doesn't list any llvm version, but it was introduced 2010-06-10 (`e277d5c1f6`). It is unlikely this bug is still present in llvm versions we support (3.1+). There's no specific test listed, but I ran lp_test_arit (which uses the mentioned functions) on llvm 3.1 and 3.3 with sse41 disabled and this pass enabled without issues. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-05-16 01:09:34 +02:00
Roland Scheidegger	93731fbeec	gallivm: remove workaround for reversing optimization pass order. 32bit code generation and llvm >= 2.7 used a different optimization pass order - this code was initially introduced (2010-07-23) by `815e79e72c`, apparently due to buggy code being generated with then brand new llvm versions (which was llvm 2.7 plus pre 2.8 devel). It seems very highly likely that whatever this bug was it has been fixed in newer llvm versions, though there's no easy way to test this - the mentioned piglit test has been removed years ago, and even if you'd build it I'm sceptical the glsl compiler would still produce the required code to trigger it. I have no idea what a good order of passes is, but just remove the workaround and use the same order everywhere. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-05-16 01:09:34 +02:00
Roland Scheidegger	8a9f5ecdb1	gallivm: only fetch pointers to constant buffers once In `1d35f77228` support for multiple constant buffers was introduced. This meant we had another indirection, and we did resolve the indirection for each constant buffer access. This looks very reasonable since llvm can figure out if it's the same pointer, however it turns out that this can cause llvm compilation time to go through the roof and beyond (I've seen cases in excess of factor 100, e.g. from 50 ms to more than 10 seconds (!)), with all the additional time spent in IR optimization passes (and in the end all of it in DominatorTree::dominate()). I've been unable to narrow it down a bit more (only some shaders seem affected, seemingly without much correlation to overall shader complexity or constant usage) but it is easily avoidable by doing the buffer lookups themeselves just once (at constant buffer declaration time). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-05-14 16:23:33 +02:00
Roland Scheidegger	18c6454ad1	gallivm: fix output stream flushing in error case for disassembly. When there's an error, also need to flush the stream, otherwise an assertion is hit (meaning you don't actually see the error neither).	2014-05-14 16:23:33 +02:00
José Fonseca	b18b7781b2	gallivm: Remove lp_func_delete_body. Not necessary, now that we will free the whole module (hence all function bodies) immediately after compiling. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-05-14 11:05:00 +01:00
José Fonseca	a6f5cc66db	gallivm: Remove gallivm_free_function. Unused. Deprecated by gallivm_free_ir(). Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-05-14 11:05:00 +01:00
Frank Henigman	865d0312c0	gallivm: Separate freeing LLVM intermediate data from freeing final code. Split free_gallivm_state() into two steps. First step is gallivm_free_ir() which cleans up the LLVM scaffolding used to generate code while preserving the code itself. Second step is gallivm_free_code() to free the memory occupied by the code. v2: s/gallivm_teardown/gallivm_free_ir/ (Jose) Signed-off-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-05-14 11:05:00 +01:00
Frank Henigman	2c73102dc3	gallivm: One code memory pool with deferred free. Provide a JITMemoryManager derivative which puts all generated code into one memory pool instead of creating a new one each time code is generated. This saves significant memory per shader as the pool size is 512K and a small shader occupies just several K. This memory manager also defers freeing generated code until you tell it to do so, making it possible to destroy the LLVM engine while keeping the code, thus enabling future memory savings. v2: Fix compilation errors with LLVM 3.4 (Jose) Signed-off-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-05-14 11:05:00 +01:00
José Fonseca	2ea923cf57	gallivm: Run passes per module, not per function. This is how it is meant to be done nowadays. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-05-14 11:05:00 +01:00
José Fonseca	920933e09e	gallivm: Use LLVM global context. I saw that LLVM internally uses its global context for some things, even when we use our own. Given ours is also global, might as well use LLVM's. However, sepearate contexts can still be enabled with a simple source code modification, for when the need/benefit arises. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-05-14 11:05:00 +01:00
José Fonseca	69f0835ff1	gallivm: Stop using module providers. Nowadays LLVMModuleProviderRef is just an alias for LLVMModuleRef, so its use just causes unnecessary confusion. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-05-14 11:05:00 +01:00
José Fonseca	9cf67e51b0	gallivm,draw,llvmpipe: Remove support for versions of LLVM prior to 3.1. Older versions haven't been tested probably don't work anyway. But more importantly, code supporting it is hindering further work. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-05-14 11:04:59 +01:00
Brian Paul	baec25635d	gallivm: add PIPE_SHADER_CAP_PREFERRED_IR switch case, remove default Return PIPE_SHADER_IR_TGSI for the PIPE_SHADER_CAP_PREFERRED_IR query. Remove default switch case so we learn of missing switch cases at compile time. Reviewed-by: José Fonseca <jfonseca@vmware.com>	2014-05-07 11:32:11 -06:00
Roland Scheidegger	64d6460a56	gallivm: fix 2 leaks in disassembly code don't leak the MCSubtargetInfo (not really big, was already fixed with llvm master) and TargetMachine (big). While this is only used for debugging the leak is large enough to get you into trouble in some cases. Tested with llvm 3.1 and master. Before (llvm 3.1), GALLIVM_DEBUG=asm glxgears: ==14152== LEAK SUMMARY: ==14152== definitely lost: 105,228 bytes in 20 blocks ==14152== indirectly lost: 347,252 bytes in 261 blocks ==14152== possibly lost: 866,625 bytes in 1,453 blocks ==14152== still reachable: 7,344,677 bytes in 6,494 blocks ==14152== suppressed: 0 bytes in 0 blocks After: ==13799== LEAK SUMMARY: ==13799== definitely lost: 3,108 bytes in 6 blocks ==13799== indirectly lost: 0 bytes in 0 blocks ==13799== possibly lost: 804,143 bytes in 1,429 blocks ==13799== still reachable: 7,314,267 bytes in 6,473 blocks ==13799== suppressed: 0 bytes in 0 blocks Reviewed-by: Brian Paul <brianp@vmware.com>	2014-05-01 16:13:38 +02:00
José Fonseca	1527a545a4	gallivm: Fix wrong operator in lp_exec_default. Courtesy of MSVC static code analyser. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-04-24 14:49:53 +01:00
Roland Scheidegger	f23d1160c2	gallivm: fix compilation with llvm 3.5 r206241+ Just adjust to the ever-changing API, pass in MCContext when creating the MCDisassembler. Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2014-04-16 19:57:47 +02:00
Zack Rusin	b1909b260f	draw/llvm: improve debugging output a bit it's useful to know what the llvmbuildstore arguments are going to be before executing it because it can crash and make sure to print out the inputs only if we're not generating a gs because it fetches inputs differently. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-03-26 15:58:59 -04:00
Roland Scheidegger	3b421daf32	gallivm: fix no-op n:n lp_build_resize() This can get called in some circumstances if both src type and dst type have same width (seen with float32->unorm32). While this particular case was bogus anyway let's just fix that as it can work trivially (due to the way it was called it actually worked anyway apart from the assert). Reviewed-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-03-26 01:44:23 +01:00
Roland Scheidegger	9477d8c862	llvmpipe: add support for b5g6r5_srgb The conversion code for srgb was tuned for n x 4x8bit AoS -> 4 x nxfloat SoA (and vice versa), fix this to handle also 16bit 565-style srgb formats. Still not really all that generic, things like r10g10b10a2_srgb or r4g4b4a4_srgb wouldn't work (the latter trivial to fix, the former would not require more work to not crash but near certainly need some higher precision calculation) but not needed right now. The code is not fully optimized for this (could use more direct calculation instead of expanding to 8-bit range first) but should be good enough. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-03-21 17:23:38 +01:00
Jeff Muizelaar	ff1e850eec	gallivm: optimize repeat linear npot code in the aos int path Similar to the other cases, shift some weight/coord calculations to int space. This should be slightly faster (on x86 sse it should actually safe one instruction, and generally int instructions are cheaper).	2014-03-14 19:41:18 +01:00
Roland Scheidegger	9954f01497	gallivm: use correct rounding for nearest wrap mode (in the aos int path) The previous code used coords which were calculated as (int) (f_coord * tex_size * 256) >> 8. This is not only unnecessarily complex but can give the wrong texel due to rounding for negative coords (as an example, after denormalization coords from -1.0 to 0.0 should give -1, but this will give -1 for numbers from -1.0-1/256 - 0.0-1/256. Instead, juse use ifloor, dropping the shift stuff. Unfortunately, this will most likely be slower - with arch rounding available it shouldn't be too bad (trades a int shift for a round but also saves an int mul (which is shared by all coords) but otherwise it's a mess.	2014-03-14 19:41:18 +01:00
Jeff Muizelaar	88637e5764	gallivm: use correct rounding for linear wrap mode (in the aos int path) The previous method for converting coords to ints was sligthly inaccurate (effectively losing 1bit from the 8bit lerp weight). This is probably especially noticeable when trying to draw a pixel-aligned texture. As an example, for a 100x100 texture after dernormalization the texture coords in this case would turn up as 0.5, 1.5, 2.5, 3.5, 4.5, ... After the mul by 256, conversion to int and 128 subtraction, they end up as 0, 256, 512, 768, 1024, ... which gets us the correct coords/weights of 0/0, 1/0, 2/0, 3/0, 4/0, ... But even LSB errors (which are unavoidable) in the input coords may cause these coords/weights to be wrong, e.g. for a coord of 3.49999 we'd get a coord/weight of 2/255 instead. Fix this by using round-to-nearest int instead of FPToSi (trunc). Should be equally fast on x86 sse though other archs probably suffer a little.	2014-03-14 19:41:18 +01:00
Roland Scheidegger	b2b2a2c06c	gallivm: add smallfloat to float conversion not relying on cpu denorm handling The previous code relied on cpu denorm support for converting small float formats (such r11g11b10_float and r16_float) to floats, otherwise denorms are flushed to zero. We worked around that in llvmpipe blend code by reenabling denorms, but this did nothing for texture sampling. Now it would be possible to reenable it there too but I'm not really a fan of messing with fpu flags (and it seems we can't actually do it reliably with llvm in any case looking at some bug reports). (Not to mention if you actually have a lot of denorms in there, you can expect some order-of-magnitude slowdown with x86 cpus.) So instead use code which adjusts exponents etc. directly hence not relying on cpu denorm support for the rescaling mul. (We still need the fpu flag handling as we can't do float-to-smallfloat without using cpu denorms at least for now - I actually wanted to keep both the old and new code and using one or the other depending on from where it's called but that didn't work out as the parameter would have to be passed through too many layers than I'd like.) Reviewed-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Si Chen <sichen@vmware.com>	2014-02-20 18:41:42 +01:00
Zack Rusin	efb152dd04	gallivm: make sure analysis works with large number of immediates We need to handle a lot more immediates and in order to do that we also switch from allocating this structure on the stack to allocating it on the heap. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-02-05 19:40:53 -05:00
Zack Rusin	69ee3f431f	gallivm: handle huge number of immediates We only supported up to 256 immediates, which isn't enough. We had code which was allocating immediates as an allocated array, but it was always used along a statically backed array for performance reasons. This commit adds code to skip that performance optimization and always use just the dynamically allocated immediates if the number of them is too great. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-02-05 19:40:53 -05:00
Zack Rusin	8507afc97f	gallivm: allow large numbers of temporaries The number of allowed temporaries increases almost with every iteration of an api. We used to support 128, then we started increasing and the newer api's support 4096+. So if we notice that the number of temporaries is larger than our statically allocated storage would allow we just treat them as indexable temporaries and allocate them as an array from the start. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-02-05 19:40:53 -05:00
Roland Scheidegger	4a7da3bec5	gallivm: fix F2U opcode Previously, we were really doing F2I. And also move it to generic section. (Note that for llvmpipe the code generated is definitely bad, due to lack of unsigned conversions with sse. I think though what llvm does (using scalar conversions to 64bit signed either with x87 fpu (32bit) or sse (64bit) including lots of domain changes is quite suboptimal, could do something like is_large = arg >= 2^31 half_arg = 0.5 * arg small_c = fptoint(arg) large_c = fptoint(half_arg) << 1 res = select(is_large, large_c, small_c) which should be much less instructions but that's something llvm should do itself.) This fixes piglit fs/vs-float-uint-conversion.shader_test (maybe more, needs GL 3.0 version override to run.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Zack Rusin <zackr@vmware.com>	2014-02-05 17:45:31 +01:00
Zack Rusin	9bace99d77	gallivm: fix opcode and function nesting gallivm soa code supported only a single level of nesting for control flow opcodes (if, switch, loops...) but the d3d10 spec clearly states that those are nested within functions. To support nesting of conditionals inside functions we need to store the nesting data inside function contexts and keep a stack of those. Furthermore we make sure that if nesting for subroutines is deeper than 32 then we simply ignore all subsequent 'call' invocations. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-02-03 13:29:14 -05:00
Brian Paul	c20b48c48e	gallivm: add a few const qualifiers Trivial.	2014-02-02 06:52:36 -07:00
José Fonseca	2eddf91faf	gallivm: Workaround http://llvm.org/PR18600 We have code generation paths that carry out swizzles of AoS vectors via bitwise shifts, as these tend to generate more efficient code than straightforward byte shuffles. But when the input is a constant the additional bitwise arithmetic operations somehow don't really get constant propagated properly, evenutally causing assertion failure in InstCombine pass. Therefore avoid the bug by using the trivial shuffles for constant inputs. Although the sample LLVM IR can cause a crash with any LLVM version, this was only seen in practice with LLVM 3.2. Reviewed-by: Matthew McClure <mcclurem@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-01-28 14:27:27 +00:00
José Fonseca	8771285054	s/Tungsten Graphics/VMware/ Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the old copyright name is creating unnecessary confusion, hence this change. This was the sed script I used: $ cat tg2vmw.sed # Run as: # # git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 \| xargs -0 sed -i -f tg2vmw.sed # # Rename copyrights s/Tungsten Gra$ph\\|hp$ics,\? [iI]nc\.\?$, Cedar Park$\?$, Austin$\?$, \(Texas\\|TX$\)\?\.\?/VMware, Inc./g /Copyright/s/Tungsten Graphics$,\? [iI]nc\.$\?$, Cedar Park$\?$, Austin$\?$, \(Texas\\|TX$\)\?\.\?/VMware, Inc./ s/TUNGSTEN GRAPHICS/VMWARE/g # Rename emails s/alanh@tungstengraphics.com/alanh@vmware.com/ s/jens@tungstengraphics.com/jowen@vmware.com/g s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/ s/jrfonseca\?@tungstengraphics.com/jfonseca@vmware.com/g s/keithw\?@tungstengraphics.com/keithw@vmware.com/g s/michel@tungstengraphics.com/daenzer@vmware.com/g s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/ s/zack@tungstengraphics.com/zackr@vmware.com/ # Remove dead links s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g # C string src/gallium/state_trackers/vega/api_misc.c s/"Tungsten Graphics, Inc"/"VMware, Inc"/ Reviewed-by: Brian Paul <brianp@vmware.com>	2014-01-17 20:00:32 +00:00
Zack Rusin	93b953d139	llvmpipe: do constant buffer bounds checking in shaders It's possible to bind a smaller buffer as a constant buffer, than what the shader actually uses/requires. This could cause nasty crashes. This patch adds the architecture to pass the maximum allowable constant buffer index to the jit to let it make sure that the constant buffer indices are always within bounds. The behavior follows the d3d10 spec, which says the overflow should always return all zeros, and overflow is only defined as access beyond the size of the currently bound buffer. Accesses beyond the declared shader constant register size are not considered an overflow and expected to return garbage but consistent garbage (we follow the behavior which some wlk tests expect which is to return the actual values from the bound buffer). Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2014-01-16 16:33:57 -05:00
Roland Scheidegger	27d47bd42f	gallivm: fix pointer type for stmxcsr/ldmxcsr The argument is a i8 pointer not a i32 pointer (even though the value actually stored/loaded IS i32). Older llvm versions didn't care but 3.2 and newer do leading to crashes. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-12-14 17:11:03 +01:00
Zack Rusin	155139059b	llvmpipe: fix blending with half-float formats The fact that we flush denorms to zero breaks our half-float conversion and blending. This patches enables denorms for blending. It's a little tricky due to the llvm bug that makes it incorrectly reorder the mxcsr intrinsics: http://llvm.org/bugs/show_bug.cgi?id=6393 Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Zack Rusin <zackr@vmware.com>	2013-12-10 16:39:48 -05:00
Roland Scheidegger	2983c039df	gallium: new shader cap bit for the amount of sampler views Ever since introducing separate sampler and sampler view max this was really missing. Every driver but llvmpipe reports the same number as number of samplers for now, so nothing should break. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-11-28 04:02:18 +01:00
Vinson Lee	7f56780915	gallivm: Ignore unknown file type in non-debug builds. Fixes "Uninitialized pointer read" defect reported by Coverity. Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-11-20 22:35:36 -08:00
Si Chen	e7a5905d8a	gallivm: Fix mask calculation for emit_kill_if. The exec_mask must be taken in consideration, just like emit_kill above. The tgsi_exec module has the same bug and should be fixed in a future change. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-11-19 19:16:18 +00:00
José Fonseca	a29e40a423	gallivm: Compile flag to debug TGSI execution through printfs. It is similar to tgsi_exec.c's DEBUG_EXECUTION compile flag. I had prototyped this for a while while debugging an issue, but finally cleaned this up and added a few more bells and whistles. v2: Use '$' as marker; better output. Thanks to Brian, Zack and Roland reviews. Here is a sample output. CONST[0].x = 0.00625000009 0.00625000009 0.00625000009 0.00625000009 CONST[0].y = -0.00714285718 -0.00714285718 -0.00714285718 -0.00714285718 CONST[0].z = -1 -1 -1 -1 CONST[0].w = 1 1 1 1 IN[0].x = 143.5 175.5 175.5 143.5 IN[0].y = 123.5 123.5 155.5 155.5 IN[0].z = 0 0 0 0 IN[0].w = 1 1 1 1 $ 1: RCP TEMP[0].w, IN[0].wwww TEMP[0].w = 1 1 1 1 $ 2: MAD TEMP[0].xy, IN[0], CONST[0], CONST[0].zwzw TEMP[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976 TEMP[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316 $ 3: MUL OUT[0].xy, TEMP[0], TEMP[0].wwww OUT[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976 OUT[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316 $ 4: MUL OUT[0].z, IN[0].zzzz, TEMP[0].wwww OUT[0].z = 0 0 0 0 $ 5: MOV OUT[0].w, TEMP[0] OUT[0].w = 1 1 1 1 $ 6: END OUT[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976 OUT[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316 OUT[0].z = 0 0 0 0 OUT[0].w = 1 1 1 1	2013-11-14 14:04:28 +00:00
Roland Scheidegger	754319490f	gallivm,llvmpipe: fix float->srgb conversion to handle NaNs d3d10 requires us to convert NaNs to zero for any float->int conversion. We don't really do that but mostly seems to work. In particular I suspect the very common float->unorm8 path only really passes because it relies on sse2 pack intrinsics which just happen to work by luck for NaNs (float->int conversion in hw gives integer indeterminate value, which just happens to be -0x80000000 hence gets converted to zero in the end after pack intrinsics). However, float->srgb didn't get so lucky, because we need to clamp before blending and clamping resulted in NaN behavior being undefined (and actually got converted to 1.0 by clamping with sse2). Fix this by using a zero/one clamp with defined nan behavior as we can handle the NaN for free this way. I suspect there's more bugs lurking in this area (e.g. converting floats to snorm) as we don't really use defined NaN behavior everywhere but this seems to be good enough. While here respecify nan behavior modes a bit, in particular the return_second mode didn't really do what we wanted. From the caller's perspective, we really wanted to say we need the non-nan result, but we already know the second arg isn't a NaN. So we use this now instead, which means that cpu architectures which actually implement min/max by always returning non-nan (that is adhering to ieee754-2008 rules) don't need to bend over backwards for nothing. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-11-14 12:24:55 +00:00
Roland Scheidegger	ea1f7d2894	gallivm: deduplicate some indirect register address code There's only one minor functional change, for immediates the pixel offsets are no longer added since the values are all the same for all elements in any case (it might be better if those weren't stored as soa vectors in the first place maybe). Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-11-08 03:38:32 +01:00
Roland Scheidegger	b35ea09349	gallivm: fix indirect addressing of inputs We weren't adding the soa offsets when constructing the indices for the gather functions. That meant that we were always returning the data in the first element. (Copied straight from the same fix for temps.) While here fix up a couple of broken comments in the fetch functions, plus don't name a straight float type float4 which is just confusing. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-11-06 18:20:54 +01:00
Roland Scheidegger	5ae31d7e1d	gallivm: optimize lp_build_minify for sse SSE can't handle true vector shifts (with variable shift count), so llvm is turning them into a mess of extracts, scalar shifts and inserts. It is however possible to emulate them in lp_build_minify with float muls, which should be way faster (saves over 20 instructions per 8-wide lp_build_minify). This wouldn't work for "generic" 32bit shifts though since we've got only 24bits of mantissa (actually for left shifts it would work by using sse41 int mul instead of float mul but not for right shifts). Note that this has very limited scope for now, since this is only used with per-pixel lod (otherwise we're avoiding the non-constant shift count by doing per-quad shifts manually), and only 1d textures even then (though the latter should change). Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-11-05 23:32:24 +01:00
Vinson Lee	749cb89097	gallivm: Remove llvm::DisablePrettyStackTrace for LLVM >= 3.4. LLVM 3.4 r193971 removed llvm::DisablePrettyStackTrace and made the pretty stack trace opt-in rather than opt-out. The default value of DisablePrettyStackTrace has changed to true in LLVM 3.4 and newer. Signed-off-by: Vinson Lee <vlee@freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60929 Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-11-04 18:22:04 -08:00
Roland Scheidegger	2b2fc03beb	gallivm: implement fully accurate corner filtering for seamless cube maps d3d10 requires that cube corners are filtered with accurate weights (that is, the weight of the non-existing corner texel should be evenly distributed to the other 3 texels). OpenGL does not require this (but recommends it). This requires us to use different filtering code, since we need per-texel weights which our 2d lerp doesn't (and can't) do. And of course the (now per element) weights need to be adjusted too for it to work. Invoke the new filtering code whenever there's an edge to keep things simpler, as it will work for edges too not just corners but of course it's only needed with corners. More ugly code for not much gain but at least a hacked up cubemap demo shows very nice corners now... Not sure yet if and how this should be configurable... v2: incorporate feedback from Jose, only use special corner filtering code when there's a corner not when there's only an edge (as corner filtering code is slower, though a perf difference was only measureable when always forcing edge code). Plus some minor style fixes. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-10-25 01:29:14 +02:00
Roland Scheidegger	3bdd1074e1	gallivm: implement seamless cube filtering For seamless cube filtering it is necessary to determine new faces and new coords per sample. The logic for this is _seriously_ complex (what needs to happen is very "asymmetric" wrt face, x/y under/overflow), further complicated by the fact that if the 4 samples are in a corner (meaning we only have actually 3 samples, and all 3 are on different faces) then falling off the edge is happening _both_ on x and y axis simultaneously. There was a noticeable performance hit in mesa's cubemap demo when seamless filtering was forced on (just below 10 percent or so in a debug build, when disabling all filtering hacks, otherwise it would probably be a bit more) and when always doing the logic, hence use a branch which it only does it if any of the pixels in a quad (or in two quads) actually hit this. With that there was no measurable performance hit in the cubemap demo (neither in a debug nor release buidl), but this will vary (cubemap demo very rarely hits edges). Might also be different on other cpus, as this forces SoA sampling path which potentially can be quite a bit slower. Note that as for corners, this code gets all the 3 samples which actually exist right, and the 4th texel will simply be the same as one of the others, meaning that filter weights will be a bit wrong. This however should be enough for full OpenGL (but not d3d10) compliance. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-10-21 15:42:04 +02:00
Roland Scheidegger	9b3dbaf396	gallivm: kill old per-quad face selection code Not used since ages, and it wouldn't work at all with explicit derivatives now (not that it did before as it ignored them but now the code would just use the derivs pre-projected which would be quite random numbers). v2: also get rid of 3 helper functions no longer used. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-10-10 04:32:57 +02:00
Roland Scheidegger	47d0613eb7	gallivm: handle explicit derivatives for cubemaps They need some special handling. Quite complicated. Additionally, use the same code for implicit derivatives too if no_rho_approx and no_quad_lod is set, because it seems while generally it should be ok to use per quad lod for implicit derivatives there's at least some test which insists that in case of cubemaps the shared lod value MUST come from a pixel inside the primitive (due to the derivatives becoming different if a different larger major axis is chosen). v2: based on Brian's feedback, clean up code a bit. And use sign bit of major axis instead of pre-select s/t/r sign for coord mirroring (which should be the same in the end, saves 2 ands). Also fix two bugs with select/mirror of derivatives, the minor axes need to use major axis sign as well (instead of major derivative axis sign), and don't mistakenly use absolute values of major derivative and inverse major values. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-10-10 04:32:57 +02:00
Roland Scheidegger	ce1d8634aa	gallivm: ignore rho approximation for cube maps There's two reasons for this: 1) even when ignoring rho approximation for cube maps, the result is still not correct, but it's better as the max error at edges is now sqrt(2) instead of 2 (which was a full mip level), same as it is for ordinary 2d maps when doing rho approximations (so the error actually goes from factor 2 at edges and sqrt(2) completely inside a face to sqrt(2) at edges and 0 inside a face). 2) I want to repurpose rho_no_approx for cubemaps for fully correct cubemap derivatives (so don't need yet another debug var). Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-10-10 04:32:57 +02:00
Zack Rusin	87fe4a33d3	llvmpipe: implement 64 bit mul opcodes in llvmpipe Both the imul_hi and umul_hi are working with this patch. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-10-09 18:30:27 -04:00
Zack Rusin	c01c6a95b4	gallivm: support printing of 64 bit integers only 8 and 32 bit integers were supported before. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-10-09 18:29:05 -04:00
Roland Scheidegger	532dc8939f	gallivm: adjust wrap mode to CLAMP_TO_EDGE always for cube maps. Technically without seamless filtering enabled GL allows any wrap mode, which made sense when supporting true borders (can get seamless effect with border and CLAMP_TO_BORDER), but gallium doesn't support borders and d3d9 requires wrap modes to be ignored and it's a pain to fix up the sampler state (as it makes it texture dependent). It is difficult to imagine a situation where an app really wants another behavior so just cheat here. (It looks like some graphics hw (intel) actually requires this too hence it should be safe.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-09-19 17:14:36 +02:00
Roland Scheidegger	93b5f71179	gallivm: some bits of seamless cube filtering implementation Simply adjust wrap mode to clamp_to_edge. This is all that's needed for a correct implementation for nearest filtering, and it's way better than using repeat wrap for instance for linear filtering (though obviously this doesn't actually do seamless filtering). v2: fix s/t wrap not r/s... Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-09-18 00:00:37 +02:00
José Fonseca	315f8f17d0	llvmpipe: Remove the special path for TGSI_OPCODE_EXP. It was wrong for EXP.y, as we clamped the source before computing the fractional part, and this opcode should be rarely used, so it's not worth the hassle.	2013-09-12 11:24:24 +01:00
Zack Rusin	e9f1f6ab42	gallivm: support indirect registers on both dimensions We support indirect addressing only on the vertex index, but some shaders also use indirect addressing on attributes. This patch adds support for indirect addressing on both dimensions inside gs arrays. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-09-06 15:05:27 -04:00
Roland Scheidegger	f37edb5e20	gallivm: handle unbound textures in texture sampling / texture queries Turns out we don't need to do much extra work for detecting this case, since we are guaranteed to get a empty static texture state in this case, hence just rely on format being 0 and return all zero then. Previously needed dummy textures (would just have crashed on format being 0 otherwise) which cannot return the correct result for size queries and when sampling textures with wrap modes using border. As a bonus should hugely increase performance when sampling unbound textures - too bad it isn't a useful feature :-). Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-30 23:20:03 +02:00
Roland Scheidegger	289faa7e23	gallivm: (trivial) don't pass sampler_unit variable down to filtering funcs The only reason this was needed was because the fetch texel function had to get the (dynamic) border color, but this is now done much earlier. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-30 23:20:03 +02:00
Roland Scheidegger	61add3cc3c	gallivm: don't use AoS path if min/mag filter are different with multiple lods Instead of enhancing the AoS path so it can deal with it, just use SoA. Fixing AoS path wouldn't be all that difficult (use all the same logic as SoA) but considered not worth it for now. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-30 23:20:03 +02:00
Roland Scheidegger	a479f34025	gallivm: support per-pixel min/mag filter in SoA path Since we can have per-pixel lod we should also honor the filter per-pixel (in fact we didn't honor it per quad neither in the multiple quad case). Do this by running the linear path and simply beating the weights into shape (the sample with the higher weight is the one which should have been chosen with nearest filtering hence adjust filter weight to 1.0/0.0 based on that). If all pixels use nearest filter (either min and mag) then still run just a nearest filter as this is way cheaper (probably around 4 times faster for 2d, more for 3d case) and it should be relatively rare that pixels really need different filtering. OTOH if all pixels would require linear don't do anything special since the linear path with filter adjustments shouldn't really be all that much more expensive than ordinary linear, and we think it's rare that min/mag filters are configured differently so there doesn't seem much value in trying to optimize this further. This does not yet fix the AoS path (though currently AoS is only used for single quads hence it could be considered less broken, just never honoring per-pixel filter decision but doing it per quad). v2: simplify code a bit (unify min linear and min nearest cases) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-30 02:16:45 +02:00
Roland Scheidegger	81cfcdbd87	gallivm: don't calculate square root of rho if we use accurate rho method While a sqrt here and there shouldn't hurt much (depending on the cpu) it is possible to completely omit it since rho is only used for calculating lod and there log2(x) == 0.5*log2(x^2). Depending on the exact path taken for calculating lod this means we get a simple mul instead of sqrt (in case of nearest mip filter in fact we don't need to replace the sqrt with something else at all), only in some not very useful path this doesn't work (combined brilinear calculation of int level and fractional lod, accurate rho calc but brilinear filtering seems odd). Apart from being faster as an added bonus this should increase our crappy fractional accuracy of lod, since fast_log2 is only good for ~3bits and this should increase accuracy by one bit (though not used if dimension is just one as we'd need an extra mul there as we never had the squared rho in the first place). v2: use separate ilog2_sqrt function if we have squared rho. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-30 02:16:45 +02:00
Roland Scheidegger	10e40ad11d	gallivm: refactor num_lods handling This is just preparation for per-pixel (or per-quad in case of multiple quads) min/mag filter since some assumptions about number of miplevels being equal to number of lods no longer holds true. This change does not change behavior yet (though theoretically when forcing per-element path it might be slower with different min/mag filter since the code will respect this setting even when there's no mip maps now in this case, so some lod calcs will be done per-element just ultimately still the same filter used for all pixels). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-30 02:16:45 +02:00
Roland Scheidegger	ad9b5b9ae9	gallivm: fix min/mag switchover point for nearest/none mip filter Previously, the min/mag switchover point when using nearest/none mip filter was effectively -0.5 which can't be right. Looks like new OpenGL thinks it's ok if it's always 0.0 (older versions required 0.5 in some cases), let's hope everybody else thinks that's fine too. Refactor this slightly and get the per-quad/per-pixel min/mag decision values further down to sampling, though still only the first component is used yet. While here also fix code trying to skip lod bias application etc. when mipfilter is none, as this is still needed for determining min/mag filter. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-23 23:46:28 +02:00
Roland Scheidegger	bd0b6c5180	gallivm: do per-element lod for lod bias and explicit derivs too Except for explicit derivs with cube maps which are very bogus anyway. Just like explicit lod this is only used if no_quad_lod is set in GALLIVM_DEBUG env var. Minification is terrible on cpus which don't support true vector shifts (but should work correctly). Cannot do the min/mag filter decision (if they are different) per pixel though, only selecting different mip levels works. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-22 19:05:52 +02:00
Roland Scheidegger	33694a1800	gallivm: (trivial) fix int/uint border color clamping Just a copy & paste error. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=68409. Note that the test passing before probably simply means it doesn't verify clamping of the border color itself as required by the OpenGL spec. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-22 19:05:52 +02:00
Roland Scheidegger	6ff9008544	gallivm: (trivial) fix linear aos sampling of 3d compressed formats block size depth is always 1 even for compressed formats (unless someone invents true 3d compressed formats at least which we can't represent). Nearest (and soa) path had it right. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-22 19:05:52 +02:00
José Fonseca	fb62388d6a	gallium: Support PIPE_FORMAT_R10G10B10A2_UINT. Same as PIPE_FORMAT_B10G10R10A2_UINT but without the swizzling. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-08-22 12:14:15 +01:00
Roland Scheidegger	e6013e4bee	gallivm: unify sin and cos implementation The (complicated!) math is all identical, there's just minimal differences how sign bit is calculated plus there's an additional subtraction for the argument going into the polynomial for cos. The logic stays 100% the same (with a small exception, sign bit calculation for sin is minimally simplified, applying sign mask after xoring the arguments instead of applying it to each argument). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-21 22:05:53 +02:00
Roland Scheidegger	275d2efeed	gallivm: add comment for bogus min/mag filter selection with nearest mip filter Detected this hunting some other bug, not sure if it really needs fixing but it is definitely wrong. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-21 22:05:52 +02:00
Roland Scheidegger	21d8fa2759	gallivm: fix rho calculation for 1d case Was using wrong (undefined) vector element (the elements are at 0/2 position, not 0/1). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-21 22:05:52 +02:00
Roland Scheidegger	4b45b61fef	util: add avx2 and xop detection to cpu detection code Going to need this soon (not going to bother with avx2 intrinsics at this time but don't want to do workarounds for true vector shifts if llvm itself can use them just fine and won't need the gazillion instruction emulation). Not really tested other than my cpu returns 0 for these features... (I have no idea if llvm actually would emit avx2/xop instructions neither...) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-20 23:00:24 +02:00
Roland Scheidegger	9299128bf2	gallivm: fix bogus aos path detection Need to check the wrap mode of the actually used coords not a fixed 2. While checking more than necessary would only potentially disable aos and not cause any harm I'm pretty sure for 3d textures it could have caused assertion failures (if s,t coords have simple filter and r not). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-20 23:00:24 +02:00
Roland Scheidegger	fe92d7fab4	gallivm: do clamping of border color correctly for all formats Turns out it is actually very complicated to figure out what a format really is wrt range, as using channel information for determining unorm/snorm etc. doesn't work for a bunch of cases - namely compressed, subsampled, other. Also while here add clamping for uint/sint as well - d3d10 doesn't actually need this (can only use ld with these formats hence no border) and we could do this outside the shader for GL easily (due to the fixed texture/sampler relation) do it here too just so I can forget about it. v2: move border color clamping out of fetch texel. Also change it to clamp the whole border vector at once (and use vectorized load of border color), which saves a couple of instructions - needs some different handling of mixed signed/unsigned formats so skip the per channel stuff and just derive this from first channel except for special formats. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-20 23:00:24 +02:00
Roland Scheidegger	ac1a2714c7	gallivm: implement better control of per-quad/per-element/scalar lod There's a new debug value used to disable per-quad lod optimizations in fragment shader (ignored for vs/gs as the results are just too wrong typically). Also trying to detect if a supplied lod value is really a scalar (if it's coming from immediate or constant file) in which case sampler code can use this to stay on per-quad-lod path (in fact for explicit lod could simplify even further and use same lod for both quads in the avx case but this is not implemented yet). Still need to actually implement per-element lod bias (and derivatives), and need to handle per-element lod in size queries. v2: fix comments, prettify. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-20 23:00:24 +02:00
Zack Rusin	7115bc3940	draw: handle nan clipdistance If clipdistance for one of the vertices is nan (or inf) then the entire primitive should be discarded. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-08-15 16:26:32 -04:00
Roland Scheidegger	6ca18e06ae	gallivm: revert accidentally commited hunk That magic wasn't meant to be commited, need to work on some proper fix.	2013-08-15 19:26:39 +02:00
Roland Scheidegger	5626a84a00	gallivm: do per-sample depth comparison instead of doing it post-filter Doing the comparisons pre-filter is highly recommended by OpenGL (and d3d9) and definitely required by d3d10. This actually doesn't do it pre-filter but more "in-filter" as otherwise need to push the comparisons even further down into fetch code and this also trivially allows using a somewhat cheaper lerp. Doing it pre-filter would actually have some performance advantage for UNORM formats (because the comparisons should be done in texture format, we'd only need to convert the shadow ref coord to texture format once, but in turn would save converting the per-sample texture values to floats) but this gets a bit messy as this has implications for border color handling as well (which needs to be done prior to depth comparisons, hence would also need to convert border color to texture format too or use some other tricks like doing separate border color / shadow ref comparison and simply using that result directly when doing border replacement). Should make no difference for nearest filtering, and performance for linear filtering should be mostly the same too (essentially have one more comparison instruction per sample, and replace the sub/mul/add lerp with a sub/and/and/add special "lerp" which all in all shouldn't be much of a difference). v2: get rid of old code completely Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-15 18:42:20 +02:00
Roland Scheidegger	e58c2310b8	gallivm: already pass coords in the right place in the sampler interface This makes things a bit nicer, and more importantly it fixes an issue where a "downgraded" array texture (due to view reduced to 1 layer and addressed with (non-array) samplec instruction) would use the wrong coord as shadow reference value. (This could also be fixed by passing target through the sampler interface much the same way as is done for size queries, might do this eventually anyway.) And if we'd ever want to support (shadow) cube map arrays, we'd need 5 coords in any case. v2: fix bugs (texel fetch using wrong layer coord for 1d, shadow tex using wrong shadow coord for 2d...). Plus need to project the shadow coord, and just for fun keep projecting the layer coord too. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-15 00:40:14 +02:00
Roland Scheidegger	d4b43cedb6	gallivm: change coordinate handling throughout functions Instead of passing s,t,r coordinates pass a coord array - the reason is that I need to pass more coords (in particular for shadow "coord", future will also need another one for cube map arrays) so just pass them as an array. Also, to simplify things, use fixed location for the shadow reference value I want to get rid of the silly "where is the right coord value" game. Keep old-style however for aos sampling (which is not going to need shadow coord, though for cube map arrays it still would need fixing). (Next patch will pass those through using the new arrangement directly from sampler interface.) v2: fix up soa split path (unreachable currently but still...) Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-15 00:40:14 +02:00
Roland Scheidegger	c6c55ad3e9	gallivm: fix border color with normalized texture formats We need to put border color into texture format color space which essentially means clamping for non-float, normalized formats (not entirely sure if we're also meant to quantize the float but it's probably ok not to do it thankfully). For OpenGL we could do this easily outside generated code due to the 1:1 sampler/texture correspondence but not for d3d10 which is terrible (as we recalculate a constant over and over again per shader invocation). Fortunately border color should be rare enough that we don't care THAT much. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-15 00:40:14 +02:00
Roland Scheidegger	6991f86945	gallivm: implement new float comparison instructions returning integer masks FSEQ/FSGE/FSLT/FSNE work just the same as SEQ/SGE/SLT/SNE except skip the select. And just for consistency use the same appropriate ordered/unordered comparisons for the old opcodes as well. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-13 19:09:17 +02:00

1 2 3 4 5 ...

939 Commits