KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Eric Anholt	dca5fc1435	i965/fs: Improve performance of varying-index uniform loads on IVB. Like we have done for the VS and for constant-index uniform loads, we use the sampler engine to get caching in front of the L3 to avoid tickling the IVB L3 bug. This is also a bit of a functional change, as we're now loading a vec4 instead of a single dword, though we're not taking advantage of the other 3 components of the vec4 (yet). With the driver hacked to always take the varying-index path for all uniforms, improves performance of my old GLSL demo by 315% +/- 2% (n=4). This a major fix for some blur shaders in compositors from the varying-index uniforms support I introduced in 9.1. v2: Move old offset computation into the pre-gen7 path. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554 NOTE: This is a candidate for the 9.1 branch.	2013-04-01 16:17:25 -07:00
Eric Anholt	bc0e1591f6	i965/fs: Avoid inappropriate optimization with regs_written > 1. Right now we don't have anything with regs_written() > 1 and !inst->mlen, but that's about to change. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 16:17:25 -07:00
Eric Anholt	740350c982	i965: Make the fragment shader pull constants index by dwords, not vec4s. We want to load vec4s, since loading a vec4 instead of a dword is basically no increased latency. But for variable indexed access, the previous requirement of aligned vec4s for a sampler LD was hard to implement. Note that this change only affects those messages that use the surface format, like sampler LDs, but not to the untyped data cache loads we've used in other cases. No significant performance difference on my GLSL demo with uniforms forced to take the varying pull constants path (n=4). NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 16:17:25 -07:00
Eric Anholt	2f41a60145	i965: Make the constant surface interface take a normal byte size. This puts the rounding-up logic into the function itself instead of all the callers having to manage it. Also drop an "unused" comment in gen4, as the stride is used for texbos (and will be for uniforms soon). NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 16:17:25 -07:00
Eric Anholt	8c694dfe64	i965/fs: Move varying uniform offset compuation into the helper func. I'm going to want to change the math for gen7 using sampler LD instructions in a way that gets CSE to occur like we'd hope. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 16:17:25 -07:00
Eric Anholt	59e858861c	i965/fs: Remove creation of a MOV instruction that's never used. We weren't inserting it into the list, so it did nothing. This line was replaced by the MOV/MUL block above. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 16:17:24 -07:00
Eric Anholt	1d6ead3804	i965/fs: Allow constant propagation into MACH. This happens quite a bit with varying-index uniform loads. We could also do better by avoiding the MACH entirely, but there's no reason not to at least take this step. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 16:17:24 -07:00
Vincent Lejeune	50fd9c4544	r600g/llvm: Update LLVM_REVISION.txt	2013-04-01 23:50:20 +02:00
Vincent Lejeune	8c8c4e3977	r600g/llvm: Use stack_size provided from llvm.	2013-04-01 23:43:57 +02:00
Vincent Lejeune	4ac0d85ca6	r600g/llvm: uses function attribute to pass shader type	2013-04-01 23:43:42 +02:00
Vincent Lejeune	af38695f51	r600g/llvm: Add support for cf_alu native encode	2013-04-01 23:43:27 +02:00
Haixia Shi	bc0cc2944f	ACTIVE_UNIFORM_MAX_LENGTH should include 3 extra characters for arrays. If the active uniform is an array, then the length of the uniform name should include the three extra characters for the "[0]" suffix, which is required by the GL 4.2 spec to be appended to the uniform name in glGetActiveUniform(). This avoids the situation where the output buffer does not have enough space to hold the "[0]" suffix, resulting in an incomplete array specification like "foobar[0". NOTE: This is a candidate for the 9.1 branch. Change-Id: I41e87ba347a7169eec8c575596cc3416adbe0728 Signed-off-by: Haixia Shi <hshi@chromium.org> Reviewed-by: Stéphane Marchesin <marcheu@chromium.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2013-04-01 13:39:13 -07:00
Matt Turner	e2b40e253b	i965/fs: Fix bad interaction between tex swizzles and textureQueryLOD. Reported-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 13:11:43 -07:00
Eric Anholt	4ee892ee8a	i965: Remove the old brw_optimize() code. This is now done in the VS backend before instruction emit. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 11:36:06 -07:00
Eric Anholt	4fee05b020	i965/vs: Add a pass to set dependency control fields on instructions. This is a more aggressive version of the old brw_optimize() path. Reduces cycles spent in the vertex shader on minecraft by 18.6% +/- 10.0% (n=15). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 11:36:05 -07:00
Eric Anholt	229a51cdbe	i965: Dump shader source for linked shader programs. We dump shader source in ir_to_mesa.cpp, and we dump linked programs here, but we had no reference from the linked programs to their source. This was preventing improvement of shader-db to use linked shader programs instead of individual shader files (which is bogus, because it means we optimize out VS outputs, and don't interpolate FS inputs!) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-04-01 11:30:36 -07:00
Mike Lothian	777a7f2003	clover: Fix build with LLVM 3.3	2013-04-01 10:50:23 -07:00
Brian Paul	1165ff1af1	llvmpipe: use triangle subdivision to avoid fixed-point overflow issues If we're drawing to a surface that's 2048 x 2048 pixels or larger there's danger of fixed-point overflow in the triangle rasterization code. That leads to various rendering glitches. Rather than implement some intricate changes to the rasterization code, simply subdivide triangles into smaller subtriangles to avoid the issue. Only do this when the drawing surface is larger than 2048 by 2048. Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-04-01 08:40:35 -06:00
Brian Paul	95df2b2883	mesa: remove platform checks around __builtin_ffs, __builtin_ffsll Use the __builtin_ffs, __builtin_ffsll functions whenever we have GCC, not just for specific platforms. Fixes Solaris build. Note: This is a candidate for the stable branches. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62868 Signed-off-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-01 08:40:35 -06:00
Brian Paul	99811c344b	docs: add a new page documenting known application issues Let's try to update this when we find other broken applications... Reviewed-by: José Fonseca <jfonseca@vmware.com>	2013-04-01 08:40:35 -06:00
Brian Paul	fe30fa9ad6	drirc: set always_have_depth_buffer for Topogon Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2013-04-01 08:18:09 -06:00
Adam Jackson	e26d5940ff	gallivm: Minor comment cleanup Signed-off-by: Adam Jackson <ajax@redhat.com>	2013-04-01 09:45:38 -04:00
Dave Airlie	135bb3c1a9	mesa: fix texture storage multisample prototypes harder. I just noticed the warnings since I fixed the other bit. Signed-off-by: Dave Airlie <airlied@redhat.com>	2013-04-01 19:54:56 +10:00
Vincent Lejeune	c3fb34ee8d	r600g/llvm: Update LLVM_REVISION	2013-03-31 21:37:20 +02:00
Vincent Lejeune	67a8ee7aaa	r600g/llvm: use native encode for tex	2013-03-31 21:35:47 +02:00
Dave Airlie	5b36bc05be	glapi: fix storage multisample build errors Reported on #radeon by udovdh Signed-off-by: Dave Airlie <airlied@redhat.com>	2013-03-31 20:41:28 +10:00
Chris Forbes	2a528889a3	docs: mark ARB_texture_storage_multisample done Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-03-31 22:19:42 +13:00
Chris Forbes	d25b4d5e90	i965: enable ARB_texture_storage_multisample on Gen6+ This can be enabled everywhere that ARB_texture_multisample is supported -- ARB_texture_storage is supported on everything. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-03-31 22:19:40 +13:00
Chris Forbes	e0015c819c	mesa: allow multisample texture targets in [Get]TexParameter* ARB_texture_storage_multisample allows texture parameters to be queried for TEXTURE_2D_MULTISAMPLE and TEXTURE_2D_MULTISAMPLE_ARRAY targets. Some parameters may also be set, with the following exceptions: - TEXTURE_BASE_LEVEL may not be set to a nonzero value; generates INVALID_OPERATION - any state which appears in the `per-sampler` state table may not be set; generates INVALID_OPERATION V2: Don't introduce bogus handling of TEXTURE_MAX_LEVEL Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-03-31 22:19:36 +13:00
Chris Forbes	b15c558c85	mesa: improve reported function name in Tex*Multisample Now that there are 4 variants, just pass the function name into teximagemultisample rather than reconstructing it. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-03-31 22:19:34 +13:00
Chris Forbes	9cbfe98bfc	mesa: add enable bit for ARB_texture_storage_multisample Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-03-31 22:19:32 +13:00
Chris Forbes	719974b54c	glapi: add definition of ARB_texture_storage_multisample Adds XML for the extension, dispatch_sanity enabling, and the two new entrypoints. These are both implemented by calling the shared teximagemultisample() with immutable=GL_TRUE. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-03-31 22:19:28 +13:00
Chris Forbes	788b0f8535	mesa: add support for immutable textures to teximagemultisample() The new entrypoints will come later, but this adds the actual logic for supporting immutable multisample textures: - The immutability flag is set as desired. - Attempting to modify an immutable multisample texture produces INVALID_OPERATION. Note: The extension spec does not mention adding this behavior to TexImage*Multisample, but it seems like the reasonable thing to do. V2: - Cover missing error cases (unsized formats; texture object zero) Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> [V1] Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-03-31 22:19:22 +13:00
Chris Forbes	7f32b9560b	mesa: extract _mesa_is_legal_tex_storage_format helper This is about to be used in teximagemultisample() when immutable=true. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Brian Paul <brianp@vmware.com>	2013-03-31 22:19:13 +13:00
Kenneth Graunke	fdc5941972	mesa: Delete VERT_ATTRIB_GENERIC_NV and VERT_BIT_GENERIC_NV macros. These haven't been used since we deleted NV_vertex_program support. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2013-03-30 19:19:45 -07:00
Eric Anholt	0967c362bf	i965: Fix an inconsistency inb the VUE map with gl_ClipVertex on gen4/5. We are intentionally not allocating a slot for gl_ClipVertex. But by leaving the bit set in the slots_valid, the fragment shader's computation of where varyings are in urb entry coming out of the SF would be off by one. Fixes rendering in Freespace 2 SCP, and improves rendering in TF2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62830 Tested-by: Joaquín Ignacio Aramendía <samsagax@gmail.com> NOTE: This is a candidate for the 9.1 branch. Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>	2013-03-30 17:24:18 -07:00
Eric Anholt	9dd19575d3	intel: Remove a never-taken debug print path. Alessandro Pignotti noted when I added this code in commit `0e723b135b` that it's in the else block for "if (busy)", so this debug print couldn't happen. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-03-30 17:23:50 -07:00
Brian Paul	c34bbe110d	st/mesa: add ir_lod case in GLSL->TGSI code to silence warning	2013-03-29 17:21:33 -06:00
Ian Romanick	e0131196ca	glsl: Generated masked write instead of vector array index for UBO lowering When reading a column from a row-major matrix, we would slot the single value read into the vector using an ir_dereference_array of the vector with a constant index. This will (eventually) get optimized to a masked-write, so just generate the masked write in the first place. v2: Remove unused variable 'chan'. Suggested by Ken. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Cc: Eric Anholt <eric@anholt.net>	2013-03-29 12:01:14 -07:00
Ian Romanick	65cc68f430	glsl: Replace open-coded dot-product with dot Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Cc: Eric Anholt <eric@anholt.net> Cc: Paul Berry <stereotype441@gmail.com>	2013-03-29 12:01:11 -07:00
Ian Romanick	dbf94d105a	glsl: Replace constant-index vector array accesses with swizzles Search and replace: ][0] -> ].x ][1] -> ].y ][2] -> ].z ][3] -> ].w Fixes piglit tests inverse-mat[234].{vert,frag}. These tests call the inverse function with constant parameters and expect proper constant folding to happen. My suspicion is that this patch papers over some bug in constant propagation involving array accesses. Either way, all of these accesses eventually get lowered to swizzles. This cuts out the middle man (saving a trivial amount of CPU). NOTE: This is a candidate for the 9.1 branch. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Cc: Eric Anholt <eric@anholt.net> Cc: Paul Berry <stereotype441@gmail.com>	2013-03-29 12:01:07 -07:00
Ian Romanick	c770faea0a	glsl: Add missing bool case in glsl_type::get_scalar_type Since the case was missing bec4->get_scalar_type() would return bvec4, but vec4->get_scalar_type() would return float. NOTE: This is a candidate for stable branches. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2013-03-29 12:01:01 -07:00
Kenneth Graunke	57a502518e	i965: Fix INTEL_DEBUG=shader_time for fragment shaders with discards. "discard" instructions generate HALT instructions which jump to a final HALT near the end of the shader. Previously, fs_generator created this final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it to jump right before the FB write epilogue. This is normally good. However, INTEL_DEBUG=shader_time also has an epilogue section which records the final timestamp. The frontend emits IR for this just before FS_OPCODE_FB_WRITE. Unfortunately, this led to the following ordering: 1. Shader Time Epilogue 2. Final HALT (where discards jump) 3. Framebuffer Write Epilogue This meant that discarded pixels completely skipped the shader time epilogue, causing no ending timestamp to be written. This obviously led to inaccurate results. This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just before any epilogue sections. This is where the final HALT should be generated, and makes it easy to ensure the correct ordering: 1. Final HALT 2. Shader Time Epilogue 3. Framebuffer Write Epilogue For shaders that don't discard, this opcode compiles away to nothing. The scheduler adds barrier dependencies to make sure that it doesn't get moved above any FS_OPCODE_DISCARD_JUMP instructions. One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to a mere 5.13 Gcycles. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2013-03-29 11:39:32 -07:00
Eric Anholt	20d846ce8b	i965: Add names for all instructions to dump_instruction() in FS and VS. I'd previously added the minimum names to understand my dumps, but this makes dumps in general much easier to read. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-03-29 11:39:21 -07:00
Matt Turner	ed6186f0e8	i965: Enable ARB_texture_query_lod. v2: Support Ironlake as well. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-03-29 10:21:14 -07:00
Matt Turner	b8aa9f7d3a	i965/fs: Generate LOD sampler message from ir_lod. v2: Support Ironlake as well. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-03-29 10:21:14 -07:00
Dave Airlie	110ca8b1f3	glsl: Implement ARB_texture_query_lod v2 [mattst88]: - Rebase. - #define GL_ARB_texture_query_lod to 1. - Remove comma after ir_lod in ir.h for MSVC. - Handled ir_lod in ir_hv_accept.cpp, ir_rvalue_visitor.cpp, opt_tree_grafting.cpp. - Rename textureQueryLOD to textureQueryLod, see https://www.khronos.org/bugzilla/show_bug.cgi?id=821 - Fix ir_reader of (lod ...). v3 [mattst88]: - Rename textureQueryLod to textureQueryLOD, pending resolution of Khronos 821. - Add ir_lod case to ir_to_mesa.cpp. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-03-29 10:20:26 -07:00
Matt Turner	0e0ab8a071	i965/fs: Use measured Gen7 instruction timings on Gen6. x before + after +------------------------------------------------------------------------------+ \| x x + \| \| xx ++ x + \| \| xx ++ + xx ++ \| \|x xxx x+++++ + xxx xx++++ + x +\| \| \|_____\|____________A______A____M____M_\|_______\| \| +------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 23 8083.78 8287.83 8205.55 8162.7461 68.307951 + 23 8107.56 8358.74 8224.33 8186.1765 71.506301 No difference proven at 95.0% confidence Reviewed-by: Eric Anholt <eric@anholt.net>	2013-03-29 10:13:27 -07:00
Matt Turner	f085b21b25	i965/fs: Increase and document MAD latency on Gen7. 58% of mad(8) generated in shader-db are reading registers from the same bank. Reviewed-by: Eric Anholt <eric@anholt.net>	2013-03-29 10:13:27 -07:00
Matt Turner	414ea2f560	i965/fs: Add LRP instruction latency. Set its latency to what happens to be the default floating-point instruction latency. One day we may want to handle latency based on register bank information. Reviewed-by: Eric Anholt <eric@anholt.net>	2013-03-29 10:13:27 -07:00

1 2 3 4 5 ...

55813 Commits All Branches Search

55813 Commits

All Branches