KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Chia-I Wu	5df62dce34	ilo: make constant buffer 0 upload optional Add ILO_KERNEL_SKIP_CBUF0_UPLOAD so that we can skip constant buffer 0 upload when the kernel does not need it.	2013-08-13 15:52:37 +08:00
Chia-I Wu	8b5b5fe394	Revert "ilo: initialize constant buffer SURFACE_STATE early" This reverts commit `a9b800aa81`. With push constant support, the constructed SURFACE_STATE is unused and wasted. The change only slows things down.	2013-08-13 15:24:58 +08:00
Armin K	f423eba46e	gbm: Link to libwayland-drm if Wayland EGL platform is enabled We were relying on libEGL to pull in libwayland-client symbols, but with commit `2c2e64edab` cleaned up the symbol leak. https://bugs.freedesktop.org/show_bug.cgi?id=67962	2013-08-12 15:16:22 -07:00
Roland Scheidegger	cd2f26090a	gallivm: fix exec_mask interaction with geometry shader after end of main Because we must maintain an exec_mask even if there's currently nothing on the mask stack, we can still have an exec_mask at the end of the program. Effectively, this mask should be set back to default when returning from main. Without relying on END/RET opcode (I think it's valid to have neither) it is actually difficult to do this, as there doesn't seem any reasonable place to do it, so instead let's just say the exec_mask is invalid outside main (which it really is effectively). The problem is that geometry shader called end_primitive outside the shader (in the epilogue), and as a result used a bogus mask, leading to bugs if we had to set the (somewhat misnamed) ret_in_main bit anywhere. So just avoid the mask combining function when called from outside the shader. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-12 23:33:00 +02:00
Roland Scheidegger	dfa7b72563	draw: simplify prim mask construction The code was quite weird, the second comparison was in fact a complete no-op and we can also do the comparison with the vector directly instead of scalar, which should not also be faster but it is way more obvious how that mask is actually going to look like. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-12 23:33:00 +02:00
Roland Scheidegger	7147094ff2	gallivm: simplify geometry shader mask handling a bit Instead of reducing masks to 0/1 simply use the mask directly as -1. Also use some signed comparison instead of unsigned (as far as I understand these values have to be (very) small and signed means llvm doesn't have to apply additional logic to do the unsigned comparisons the cpu can't do). Saves a couple of instructions in some test geometry shader here. v2: that was a bit to much optimization, don't skip combining the masks... Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-12 23:33:00 +02:00
Roland Scheidegger	84fce45321	draw: (trivial) dump tgsi for geometry shaders with GALLIVM_DEBUG_TGSI And dump the variant key too (same as vs does). Just so I can stop wondering why I see the tgsi dump for fs and vs but not gs...	2013-08-12 23:33:00 +02:00
Roland Scheidegger	8c5283dc17	gallivm: (trivial) fix typo in argument declaration of lp_build_size_query_soa Was meant to match the name used elsewhere, spotted by Anthony.	2013-08-12 23:33:00 +02:00
Kenneth Graunke	4d95efd146	i965/fs: Add dump_instruction() support for ARF destinations. CMP instructions use BRW_ARF_NULL as a destination. Prior to this patch, dump_instruction() decoded the destination as "???". Now it decodes BRW_ARF_NULL as "(null)" and other ARFs numerically. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2013-08-12 13:13:06 -07:00
Kenneth Graunke	ee7bfab068	i965/fs: Remove extraneous newline in dump_instruction() for CMP. This resulted in printouts like: 246: cmp.cmod.f0.0 ???, vgrf152, 0.000000f, (null), With this patch, CMP is properly printed on one line. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2013-08-12 13:13:04 -07:00
Kenneth Graunke	80e1c2f35f	i965/fs: Optimize IF/MOV/ELSE/MOV/ENDIF to SEL when possible. Many GLSL shaders contain code of the form: x = condition ? foo : bar The compiler emits an ir_if tree for this, since each subexpression might be a complex tree that could have side-effects and short-circuit logic operations. However, the common case is to simply pick one of two constants or variable's values---which is exactly what SEL is for. Replacing IF/ELSE with SEL also simplifies the control flow graph, making optimization passes which work on basic blocks more effective. The shader-db statistics: total instructions in shared programs: 1655247 -> 1503234 (-9.18%) instructions in affected programs: 949188 -> 797175 (-16.02%) 2,970 shaders were helped, none hurt. Gained 181 SIMD16 programs. This helps Valve's Source Engine games (max -41.33%), The Cave (max -33.33%), Serious Sam 3 (max -18.64%), Yo Frankie! (max -30.19%), Zen Bound (max -22.22%), GStreamer (max -6.12%), and GLBenchmark 2.7 (max -1.94%). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2013-08-12 13:13:01 -07:00
Kenneth Graunke	2c32c3985c	i965/fs: Consider predicated SEL instructions as whole variable writes. The instruction (+f0.0) SEL dst, src0, src1 will write either src0 or src1 to dst, depending on the predicate. Unlike most predicated instructions, it always writes to dst. fs_inst::is_partial_write() is supposed to return true if the whole register is guaranteed to be written. The !inst->predicated check makes sense for most instructions, which might not write the whole register, but SEL is a special case. This caused live interval analysis to ignore the destination of predicated SEL instructions when computing "def" information. Requires the previous commit to avoid regressions. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2013-08-12 13:12:59 -07:00
Kenneth Graunke	d21f542aa1	i965/fs: Explicitly disallow CSE on predicated instructions. The existing inst->is_partial_write() already disallows predicated instructions, so this has no functional change. However, it's worth doing explicitly since the CSE pass does not consider the flag register. This means it could blindly factor out operations that use the same sources, but which have different condition codes set. This prevents a regression in the next commit. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2013-08-12 13:12:57 -07:00
Kenneth Graunke	53d8cff63b	i965/fs: Log a performance warning if skipping 16-wide due to pulls. Usually, the driver creates both 8-wide and 16-wide variants of every fragment shader. When 16-wide compilation fails, it logs a performance warning explaining why only an 8-wide program exists. However, when there are pull parameters, the driver won't even bother trying the 16-wide compile (since it would fail). In this case, it failed to emit a performance warning, leaving no explanation for the missing 16-wide program. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2013-08-12 13:12:47 -07:00
Chia-I Wu	a9b800aa81	ilo: initialize constant buffer SURFACE_STATE early Fix ilo_gpe_init_view_surface_for_buffer to allow buffer to be NULL, and add ilo_gpe_set_view_surface_bo to set it later. This allows us to set up SURFACE_STATE early for constant buffers backed by user buffers.	2013-08-12 11:49:51 +08:00
Chia-I Wu	b2f79a3823	ilo: 3DSTATE_INDEX_BUFFER may be wrongly skipped In finalize_index_buffer(), when the current index buffer was destroyed due to u_upload_data(), it may happen that the new index buffer is at the same address as the old one. Comparing the pointers to the two buffers could fail to work, and 3DSTATE_INDEX_BUFFER would be incorrectly skipped. Holding a reference to the current index buffer before calling u_upload_data() should fix the problem.	2013-08-10 13:01:41 +08:00
Chris Forbes	637e6a0aa8	i965: add missing BRW_NEW_INTERPOLATION_MAP to state dump Makes this flag appear in the output for INTEL_DEBUG=state Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-08-10 20:29:12 +12:00
Chris Forbes	e114b13dae	i965: Add a new debug mode for the VUE map INTEL_DEBUG=vue now emits a listing of each slot in the VUE map, and the corresponding interpolation mode. V2: Fix whitespace issues. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-08-10 20:28:45 +12:00
Ian Romanick	5894898148	glsl: Don't allow const on out or inout function parameters Fixes piglit tests const-inout-parameter.frag and const-out-parameter.frag. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: "9.2" <mesa-stable@lists.freedesktop.org>	2013-08-09 13:51:18 -07:00
Roland Scheidegger	894d4903e7	gallivm: set non-existing values really to zero in size queries for d3d10 My previous attempt at doing so double-failed miserably (minification of zero still gives one, and even if it would not the value was never written anyway). While here also rename the confusingly named int_vec bld as we have int vecs of different sizes, and rename need_nr_mips (as this also changes out-of-bounds behavior) to is_sviewinfo too. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-09 20:49:19 +02:00
Roland Scheidegger	b0f74250e1	gallivm: use texture target from shader instead of static state for size query d3d10 has no notion of distinct array resources neither at the resource nor sampler view level. However, shader dcl of resources certainly has, and d3d10 expects resinfo to return the values according to that - in particular a resource might have been a 1d texture with some array layers, then the sampler view might have only used 1 layer so it can be accessed both as 1d or 1d array texture (I think - the former definitely works). resinfo of a resource decleared as array needs to return number of array layers but non-array resource needs to return 0 (and not 1). Hence fix this by passing the target from the shader decl to emit_size_query and use that (in case of OpenGL the target will come from the instruction itself). Could probably do the same for actual sampling, though it may not matter there (as the bogus components will essentially get clamped away), possibly could wreak havoc though if it REALLY doesn't match (which is of course an error but still). Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-09 20:49:18 +02:00
Roland Scheidegger	38ad404f76	gallivm: honor d3d10's wishes of out-of-bounds behavior for texture size query Specifically, must return 0 for non-existent mip levels (and non-existent textures which is an unsolved problem) for everything but total mip count. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-09 20:49:18 +02:00
Paul Berry	417dc8081b	glsl: Enable ARB_fragment_coord_conventions functionality in GLSL 1.50. GLSL 1.50 incorporates the functionality of the ARB_fragment_coord_conventions extension, so we need to make this functionality available even if the extension isn't enabled. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2013-08-09 10:35:06 -07:00
Paul Berry	13fedf2883	main: Fix deprecation of glLineWidth() From section E.1 (Profiles and Deprecated Features of OpenGL 3.0) of the OpenGL 3.0 spec: "LineWidth is not deprecated, but values greater than 1.0 will generate an INVALID VALUE error" From context it is clear that values greater than 1.0 should only generate an INVALID VALUE error in a forward-compatible context. The code was correctly quoting this spec text, but it was disallowing all line widths in forward-compatible contexts, instead of just widths greater than 1.0. This patch introduces the correct check, so that setting a line width of 1.0 or less is permitted. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2013-08-09 10:34:05 -07:00
Roland Scheidegger	836098f6b2	util: (trivial) fix asm input/output list for fxsave Otherwise gcc might do very unsafe optimizations, spotted by Uros Bizjak. Hopefully this time it's finally right?	2013-08-09 17:30:13 +02:00
Alex Deucher	c88783047e	r600g: disable GPUVM by default Cayman and trinity systems still seem to suffer from stability problems with GPUVM. This also fixes compute on these asics. It can still be enabled for testing by setting env var RADEON_VA=true. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=65958 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: "9.2" <mesa-stable@lists.freedesktop.org> CC: "9.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Christian König <christian.koenig@amd.com>	2013-08-09 10:51:25 -04:00
Zack Rusin	e8d8974f80	softpipe: fix the regressions softpipe has a really weird handling of the draw attrs, lets just not inject outputs in its data. Trivial.	2013-08-08 20:54:50 -04:00
Zack Rusin	662a4d4a12	draw: rewrite primitive assembler We can't be injecting the primitive id's in the pipeline because by that time the primitives have already been decomposed. To properly number the primitives we need to handle the adjacency primitives by hand. This patch moves the prim id injection into the original primitive assembler and completely removes the useless pipeline stage. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-08-08 20:54:25 -04:00
Zack Rusin	1d425c4c6d	draw: reset the vertex id when injecting new primitive id Without reseting the vertex id, with primitives where the same vertex is used with different primitives (e.g. tri/lines strips) our vbuf module won't re-emit those vertices with the changed primitive id. So lets reset the vertex id whenever injecting new primitive id to make sure that the vertex data is correctly emitted. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-08-08 20:54:03 -04:00
Zack Rusin	57cd326778	draw: cleanup the extra attribs Before inserting new front face and prim id outputs cleanup the old extra outputs, otherwise our cache will use previous output slots which will break as soon as outputs of the current shader don't match the last. Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2013-08-08 20:53:40 -04:00
Dieter Nützel	8f40fa0e7f	util: (trivial) fix more compile errors in u_cpu_detect (gcc/x86 this time). Oops. Should fix https://bugs.freedesktop.org/show_bug.cgi?id=67921	2013-08-09 01:25:54 +02:00
Chad Versace	2c2e64edab	egl: Do not export private symbols libEGL was incorrectly exporting all symbols, public and private. This patch adds -fvisibility=hidden to libEGL's linker flags to ensure that only symbols annotated with __attribute__((visibility("default"))) get exported. Sanity-checked with libEGL's builtin DRI2 driver and the i965 DRI driver by running Piglit on X/EGL and by running weston-gears on Weston as an X client. Sanity-checked with libEGL's Gallium driver (which is not built-in) and the swrast Gallium driver by running es2gears_x11. Kristian reviewed the symbol diff in `nm libEGL.so`. CC: "9.2" <mesa-stable@lists.freedesktop.org> CC: Ian Romanick <idr@freedesktop.org> Acked-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> Signed-off-by: Chad Versace <chad.versace@linux.intel.com>	2013-08-08 15:17:51 -07:00
Kenneth Graunke	fb3d62fe3d	i965: Remember to call intel_prepare_render() before blitting. Otherwise, blits to the window system buffer may cause crashes, since dst_irb->mt may be NULL. This code is lifted straight out of brw_blorp_framebuffer()'s try_blorp_blit() helper. Fixes crashes in Piglit's fbo-sys-blit on systems without BLORP. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65919 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <idr@freedesktop.org> Reviewed-by: Chad Versace <chad.versace@linux.intel.com> Cc: "9.2" <mesa-stable@lists.freedesktop.org>	2013-08-08 12:12:47 -07:00
Roland Scheidegger	43076a55c2	util: (trivial) fix compile error with MSVC on x86	2013-08-08 19:08:57 +02:00
Roland Scheidegger	6ce54a81b2	gallivm: honor d3d10 floating point rules for shadow comparisons d3d10 specifies ordered comparisons for everything but not_equal which is unordered (http://msdn.microsoft.com/en-us/library/windows/desktop/cc308050.aspx). OpenGL probably doesn't care. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-08 18:55:58 +02:00
Roland Scheidegger	aa84f1ad55	softpipe: don't clamp reference value for shadow comparison for float formats Clamping is only done for fixed-point formats as part of conversion to texture format. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-08 18:55:57 +02:00
Roland Scheidegger	e1590b9690	gallivm: don't clamp reference value for shadow comparison for float formats This is wrong both for OpenGL and d3d. (In fact clamping is a side effect of converting to depth format, so this should really do quantization too at least in d3d10 for the comparisons to be truly correct.) Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-08 18:55:57 +02:00
Roland Scheidegger	eac57bc223	gallivm: propagate scalar_lod to emit_size_query too Clearly the returned values need to be per-element if the lod is per element. Does not actually change behavior yet. Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-08 18:55:57 +02:00
Roland Scheidegger	c8572a9457	gallium: clarify SVIEWINFO opcode This opcode is quite problematic in tgsi, while it tries to mirror d3d10 resinfo it can't really do what's stated there due to missing the crazy return type modifiers. Hence specify this is ignored along with the swizzle. (Other options would be to have multiple opcodes or specify the ret type modifier maybe in dst_reg as there's padding bits left there but it is the only instruction allowing this.) Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-08 18:55:57 +02:00
Roland Scheidegger	ce0e66af0a	gallivm: fix out-of-bounds behavior for fetch/ld For d3d10 and ARB_robust_buffer_access_behavior, we are required to return 0 for out-of-bounds coordinates (for which we can just enable the code already there was just disabled). Additionally, also need to return 0 for out-of-bounds mip level and out-of-bounds layer. This changes the logic so instead of clamping the level/layer, an out-of-bound mask is computed instead in this case (actual clamping then can be omitted just like with coordinates, since we set the fetch offset to zero if that happens anyway). Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-08 18:55:57 +02:00
Roland Scheidegger	883987503f	util: try much harder to set DAZ flag While so far this only causes some harmless test failures, there's lots more cpus with DAZ. All 64bit capable ones can do it (particularly relevant for AMD cpus as they supported sse3 very very late) but if really necessary we can check support for that for real with some more magic. (In fact just about ANY cpu with sse2 can support DAZ, I believe the only exception are first gen P4 (Willamette) and from those only early steppings which can't do it it's almost like intel forgot to add it... - a real pity though docs say you can't just try to set it as they will throw a GPF.) While this was meant to address https://bugs.freedesktop.org/show_bug.cgi?id=67672 it does not fix it. Most likely the tests need fixing as I don't think there's any guarantee about denorm handling in the reference math library functions if the flags aren't set to standard values. Nevertheless enabling DAZ on all cpus which can do it should be the right thing to do. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-08-08 18:55:57 +02:00
Roland Scheidegger	e3b5e2db1b	util: implement table-based + linear interpolation linear-to-srgb conversion Should be much faster, seems to work in softpipe. While here (also it's now disabled) fix up the pow factor - the former value is what is in GL core it is however not actually accurate to fp32 standard (as it is 1.0/2.4), and if someone would do all the accurate math there's no reason to waste 8 mantissa bits or so... v2: use real table generating function instead of just printing the values (might take a bit longer as it does calculations on some 3+ million floats but much more descriptive obviously). Also fix up another inaccurate pow factor (this time in the python code) - wondering where the couple one bit errors came from :-(. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Zack Rusin <zackr@vmware.com>	2013-08-08 18:55:57 +02:00
Roland Scheidegger	2d9fea95e8	gallivm: fix comment wrt srgb accuracy. I think it's actually not good enough now...	2013-08-08 18:55:57 +02:00
Chia-I Wu	f9a4288bd2	ilo: get rid of GPE tables completely Move the estimate functions out of the tables and kill the tables.	2013-08-08 13:46:01 +08:00
Chia-I Wu	19204081ce	ilo: clean up GPE header inclusions This reduces the number of source files need to be recompiled when GPE functions are changed other than regular clean ups.	2013-08-08 13:41:10 +08:00
Chia-I Wu	e292b9362a	ilo: initialize alpha test state in ilo_gpe_init_dsa This could speed up BLEND_STATE and COLOR_CALC_STATE emission a bit.	2013-08-08 13:30:34 +08:00
Chia-I Wu	02496cd2b6	ilo: fold gen6_translate_index_size into the caller There is only one caller so fold it.	2013-08-08 13:10:36 +08:00
Chia-I Wu	1c19d0bb81	ilo: fold gen6_translate_depth_format into the caller There is only one caller so fold it.	2013-08-08 13:02:17 +08:00
Courtney Goeltzenleuchter	c2c5366ff2	ilo: Call GPE emit functions directly. Eliminate pipeline and GPE function vectors and have the pipeline functions call the GPE emit functions directly.	2013-08-08 11:39:21 +08:00
Courtney Goeltzenleuchter	4bc9daf923	ilo: move emit functions so that they can be inlined.	2013-08-08 11:39:21 +08:00

1 2 3 4 5 ...

58069 Commits All Branches Search

58069 Commits

All Branches