KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Paul Berry	7d4f1e6467	glsl: Fix array indexing when constant folding built-in functions. Mesa constant-folds built-in functions by using a miniature GLSL interpreter (see ir_function_signature::constant_expression_evaluate_expression_list()). This interpreter had a bug in its handling of array indexing, which caused expressions like "m[i][j]" (where m is a matrix) to be handled incorrectly. Specifically, it incorrectly treated j as indexing into the whole matrix (rather than indexing just into the vector m[i]); as a result the offset computed for m[i] was lost and m[i][j] was treated as m[j][0]. Fixes piglit tests inverse-mat[234].{vert,frag}. NOTE: This is a candidate for the 9.1 and 9.0 branches. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57436	2013-04-02 12:24:08 -07:00
Roland Scheidegger	450950c57a	gallivm: bring back optimized but incorrect float to smallfloat optimizations Conceptually the same as previously done in float_to_half. Should cut down number of instructions from 14 to 10 or so, but will promote some NaNs to Infs, so it's disabled. It gets a bit tricky though handling all the cases correctly... Passes basic tests either way (though there are no tests testing special cases, but some manual tests injecting them seemed promising). v2: style and comment fixes suggested by Jose Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-02 18:24:31 +02:00
Roland Scheidegger	3febc4a1cd	gallivm: consolidate code for float-to-half and float-to-packed conversion. This replaces the existing float-to-half implementation. There are definitely a couple of differences - the old implementation had unspecified(?) rounding behavior, and could at least in theory construct Inf values out of NaNs. NaNs and Infs should now always be properly propagated, and rounding behavior is now towards zero (note this means too large but non-Infinity values get propagated to max representable value, not Infinity). The implementation will definitely not match util code, however (which does nearest rounding, which also means too large values will get propagated to Infinity). Also fix a bogus round mask probably leading to rounding bugs... v2: fix a logic bug in handling infs/nans. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-02 18:24:31 +02:00
Vadim Girlin	9be624b3ef	r600g: don't reserve more stack space than required v5 Reduced stack size allows to run more threads in some cases, improving performance for the shaders that use stack (that is, for the shaders with control flow instructions). E.g. with unigine-based apps. v4: implement exact computation taking into account wavefront size v5: add cases for RV620, RS880 Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>	2013-04-02 19:34:14 +04:00
Vadim Girlin	7e04227f39	r600g: fix range handling for tgsi input declarations v2 Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>	2013-04-02 19:34:14 +04:00
Marek Olšák	f8502b7e71	gallium/hud: do .xxxx swizzling for the font texture in the fragment shader This allows using L8 and R8 for the font if I8 isn't supported. Tested-by: Brian Paul <brianp@vmware.com>	2013-04-02 16:57:57 +02:00
Brian Paul	98b64cc20f	hud: flush/unmap the vertex buffer before drawing The VMware svga driver is picky about making sure the VBO is unmapped before drawing. Reviewed-by: Marek Olšák <maraeo@gmail.com>	2013-04-02 08:17:28 -06:00
Brian Paul	bdd3770b78	draw: use pipe_transfer_unmap() to match pipe_transfer_map()	2013-04-02 08:17:28 -06:00
Roland Scheidegger	9b329f4c09	gallivm: fix signed small float to float conversion Introduced by `5f41e08cf3`, just a silly typo. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=62921.	2013-04-02 13:21:07 +02:00
Christian König	a0dca4409a	radeonsi: add instance divisor support v3 v2: reduce key size, don't copy key around to much. v3: remove key size reduction Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2013-04-02 13:01:43 +02:00
Christian König	cf9b31f78a	radeonsi: add start instance support This works different than on R600, we need to add the start instance manually. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Tested-by: Michel Dänzer <michel.daenzer@amd.com>	2013-04-02 13:01:43 +02:00
Christian König	e4ed58763a	radeonsi: add instanceid support Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Tested-by: Michel Dänzer <michel.daenzer@amd.com>	2013-04-02 13:01:43 +02:00
Christian König	83df955ca9	radeon/llvm: move system value fetching to common code This should be used by both SI and R600. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Tested-by: Michel Dänzer <michel.daenzer@amd.com>	2013-04-02 13:01:42 +02:00
Michel Dänzer	c6efb4870b	radeonsi: Handle arbitrary 2-byte formats in resource_copy_region Fixes mplayer -vo vdpau OSD. NOTE: This is a candidate for the 9.1 branch. Reported-by: Igor Vagulin <igor.vagulin@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Tested-by: Christian König <christian.koenig@amd.com>	2013-04-02 11:42:35 +02:00
Maarten Lankhorst	6d20c646d6	nvc0: Fix fd leak in nvc0_create_decoder NOTE: This is a candidate for the 9.0 and 9.1 branches. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>	2013-04-02 10:25:26 +02:00
Aras Pranckevicius	b2eee0869f	GLSL: fix lower_jumps to report progress properly A fix for lower_jumps progress reporting, very much like similar in `c1e591eed`. NOTE: This is a candidate for stable branches. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2013-04-01 16:57:17 -07:00
Eric Anholt	62501c3af8	i965/fs: Allow CSE on pre-gen7 varying-index uniform loads All the other expression types allowed here have inst->mlen == 0, and this one has implied MRF writes for all of its payload, so nothing else in the implementation should need to change. Reduces SEND messages for loading from pull constants in kwin's Lanczos shader from 16 to 6. (Due to a deficiency in constant propagation, I can't use the hack I did in the previous commit to test the performance change) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554 NOTE: This is a candidate for the 9.1 branch.	2013-04-01 16:17:26 -07:00
Eric Anholt	70b27e0e4b	i965/fs: Use LD messages for pre-gen7 varying-index uniform loads This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on my GM45 forced to load all uniforms through the varying-index path), but we get a whole vec4 at a time to reuse in the next commit. v2: Fix comment about channels in the other message. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> NOTE: This is a candidate for the 9.1 branch.	2013-04-01 16:17:26 -07:00
Eric Anholt	ce316f62ef	i965/fs: Don't double-emit SEND dependency workarounds at control flow. We weren't setting needs_dep[i] in the loops, so we'd continue on to potentially add the same workaround MOVs to the later basic block boundaries, too. We can either set needs_dep[i] to exit through the normal path, or we can just return since we know we're done. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 16:17:26 -07:00
Eric Anholt	3cf69b2284	i965/fs: Bake regs_written into the IR instead of recomputing it later. For sampler messages, it depends on the target gen, and on gen4 SIMD16-sampler-on-SIMD8-execution we were returning 4 instead of 8 like we should. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> NOTE: This is a candidate for the 9.1 branch.	2013-04-01 16:17:26 -07:00
Eric Anholt	8edc7cbe64	i965/fs: Clean up the setup of gen4 simd16 message destinations. I think this makes it much more obvious what's going on here. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 16:17:26 -07:00
Eric Anholt	9f43b84928	i965/fs: Do CSE on gen7's varying-index pull constant loads. This is our first CSE on a regs_written() > 1 instruction, so it takes a bit of extra fixup. Reduces the number of loads on kwin's Lanczos shader from 12 to 2. v2: Fix compiler warning (false positive on possibly-uninitialized variable) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1) NOTE: This is a candidate for the 9.1 branch.	2013-04-01 16:17:25 -07:00
Eric Anholt	dca5fc1435	i965/fs: Improve performance of varying-index uniform loads on IVB. Like we have done for the VS and for constant-index uniform loads, we use the sampler engine to get caching in front of the L3 to avoid tickling the IVB L3 bug. This is also a bit of a functional change, as we're now loading a vec4 instead of a single dword, though we're not taking advantage of the other 3 components of the vec4 (yet). With the driver hacked to always take the varying-index path for all uniforms, improves performance of my old GLSL demo by 315% +/- 2% (n=4). This a major fix for some blur shaders in compositors from the varying-index uniforms support I introduced in 9.1. v2: Move old offset computation into the pre-gen7 path. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554 NOTE: This is a candidate for the 9.1 branch.	2013-04-01 16:17:25 -07:00
Eric Anholt	bc0e1591f6	i965/fs: Avoid inappropriate optimization with regs_written > 1. Right now we don't have anything with regs_written() > 1 and !inst->mlen, but that's about to change. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 16:17:25 -07:00
Eric Anholt	740350c982	i965: Make the fragment shader pull constants index by dwords, not vec4s. We want to load vec4s, since loading a vec4 instead of a dword is basically no increased latency. But for variable indexed access, the previous requirement of aligned vec4s for a sampler LD was hard to implement. Note that this change only affects those messages that use the surface format, like sampler LDs, but not to the untyped data cache loads we've used in other cases. No significant performance difference on my GLSL demo with uniforms forced to take the varying pull constants path (n=4). NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 16:17:25 -07:00
Eric Anholt	2f41a60145	i965: Make the constant surface interface take a normal byte size. This puts the rounding-up logic into the function itself instead of all the callers having to manage it. Also drop an "unused" comment in gen4, as the stride is used for texbos (and will be for uniforms soon). NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 16:17:25 -07:00
Eric Anholt	8c694dfe64	i965/fs: Move varying uniform offset compuation into the helper func. I'm going to want to change the math for gen7 using sampler LD instructions in a way that gets CSE to occur like we'd hope. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 16:17:25 -07:00
Eric Anholt	59e858861c	i965/fs: Remove creation of a MOV instruction that's never used. We weren't inserting it into the list, so it did nothing. This line was replaced by the MOV/MUL block above. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 16:17:24 -07:00
Eric Anholt	1d6ead3804	i965/fs: Allow constant propagation into MACH. This happens quite a bit with varying-index uniform loads. We could also do better by avoiding the MACH entirely, but there's no reason not to at least take this step. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 16:17:24 -07:00
Vincent Lejeune	50fd9c4544	r600g/llvm: Update LLVM_REVISION.txt	2013-04-01 23:50:20 +02:00
Vincent Lejeune	8c8c4e3977	r600g/llvm: Use stack_size provided from llvm.	2013-04-01 23:43:57 +02:00
Vincent Lejeune	4ac0d85ca6	r600g/llvm: uses function attribute to pass shader type	2013-04-01 23:43:42 +02:00
Vincent Lejeune	af38695f51	r600g/llvm: Add support for cf_alu native encode	2013-04-01 23:43:27 +02:00
Haixia Shi	bc0cc2944f	ACTIVE_UNIFORM_MAX_LENGTH should include 3 extra characters for arrays. If the active uniform is an array, then the length of the uniform name should include the three extra characters for the "[0]" suffix, which is required by the GL 4.2 spec to be appended to the uniform name in glGetActiveUniform(). This avoids the situation where the output buffer does not have enough space to hold the "[0]" suffix, resulting in an incomplete array specification like "foobar[0". NOTE: This is a candidate for the 9.1 branch. Change-Id: I41e87ba347a7169eec8c575596cc3416adbe0728 Signed-off-by: Haixia Shi <hshi@chromium.org> Reviewed-by: Stéphane Marchesin <marcheu@chromium.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2013-04-01 13:39:13 -07:00
Matt Turner	e2b40e253b	i965/fs: Fix bad interaction between tex swizzles and textureQueryLOD. Reported-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 13:11:43 -07:00
Eric Anholt	4ee892ee8a	i965: Remove the old brw_optimize() code. This is now done in the VS backend before instruction emit. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 11:36:06 -07:00
Eric Anholt	4fee05b020	i965/vs: Add a pass to set dependency control fields on instructions. This is a more aggressive version of the old brw_optimize() path. Reduces cycles spent in the vertex shader on minecraft by 18.6% +/- 10.0% (n=15). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 11:36:05 -07:00
Eric Anholt	229a51cdbe	i965: Dump shader source for linked shader programs. We dump shader source in ir_to_mesa.cpp, and we dump linked programs here, but we had no reference from the linked programs to their source. This was preventing improvement of shader-db to use linked shader programs instead of individual shader files (which is bogus, because it means we optimize out VS outputs, and don't interpolate FS inputs!) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 11:30:36 -07:00
Mike Lothian	777a7f2003	clover: Fix build with LLVM 3.3	2013-04-01 10:50:23 -07:00
Brian Paul	1165ff1af1	llvmpipe: use triangle subdivision to avoid fixed-point overflow issues If we're drawing to a surface that's 2048 x 2048 pixels or larger there's danger of fixed-point overflow in the triangle rasterization code. That leads to various rendering glitches. Rather than implement some intricate changes to the rasterization code, simply subdivide triangles into smaller subtriangles to avoid the issue. Only do this when the drawing surface is larger than 2048 by 2048. Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-04-01 08:40:35 -06:00
Brian Paul	95df2b2883	mesa: remove platform checks around __builtin_ffs, __builtin_ffsll Use the __builtin_ffs, __builtin_ffsll functions whenever we have GCC, not just for specific platforms. Fixes Solaris build. Note: This is a candidate for the stable branches. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62868 Signed-off-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-01 08:40:35 -06:00
Brian Paul	99811c344b	docs: add a new page documenting known application issues Let's try to update this when we find other broken applications... Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-04-01 08:40:35 -06:00
Brian Paul	fe30fa9ad6	drirc: set always_have_depth_buffer for Topogon Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-01 08:18:09 -06:00
Adam Jackson	e26d5940ff	gallivm: Minor comment cleanup Signed-off-by: Adam Jackson <ajax@redhat.com>	2013-04-01 09:45:38 -04:00
Dave Airlie	135bb3c1a9	mesa: fix texture storage multisample prototypes harder. I just noticed the warnings since I fixed the other bit. Signed-off-by: Dave Airlie <airlied@redhat.com>	2013-04-01 19:54:56 +10:00
Vincent Lejeune	c3fb34ee8d	r600g/llvm: Update LLVM_REVISION	2013-03-31 21:37:20 +02:00
Vincent Lejeune	67a8ee7aaa	r600g/llvm: use native encode for tex	2013-03-31 21:35:47 +02:00
Dave Airlie	5b36bc05be	glapi: fix storage multisample build errors Reported on #radeon by udovdh Signed-off-by: Dave Airlie <airlied@redhat.com>	2013-03-31 20:41:28 +10:00
Chris Forbes	2a528889a3	docs: mark ARB_texture_storage_multisample done Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-03-31 22:19:42 +13:00
Chris Forbes	d25b4d5e90	i965: enable ARB_texture_storage_multisample on Gen6+ This can be enabled everywhere that ARB_texture_multisample is supported -- ARB_texture_storage is supported on everything. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-03-31 22:19:40 +13:00

1 2 3 4 5 ...

55885 Commits All Branches Search

55885 Commits

All Branches