KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Kristian Høgsberg	ee5fb8d1ba	i965: Generate vs code using scalar backend for BDW+ With everything in place, we can now use the scalar backend compiler for vertex shaders on BDW+. We make scalar vertex shaders the default on BDW+ but add a new vec4vs debug option to force the vec4 backend. No piglit regressions. Performance impact is minimal, I see a ~1.5 improvement on the T-Rex GLBenchmark case, but in general it's in the noise. Some of our internal synthetic, vs bounded benchmarks show great improvement, 20%-40% in some cases, but real-world cases are mostly unaffected. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-12-10 12:29:27 -08:00
Kristian Høgsberg	7ff457b930	i965: Clean up fs_visitor::run and rename to run_fs Now that fs_visitor::run is back to being only fragment shader compilation, we can clean up a few stage == MESA_SHADER_FRAGMENT conditions and rename it to run_fs. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-12-10 12:29:23 -08:00
Kristian Høgsberg	8b6a797d74	i965: Add fs_visitor::run_vs() to generate scalar vertex shader code This patch uses the previous refactoring to add a new run_vs() method that generates vertex shader code using the scalar visitor and optimizer. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-12-10 12:29:19 -08:00
Kristian Høgsberg	bf23079379	i965: Rename brw_vec4_prog_data/key to brw_bue_prog_data/key These structs aren't vec4 specific, they are shared by shader stages operating on Vertex URB Entries (VUEs). VUEs are the data structures in the URB that hold vertex data between the pipeline geometry stages. Using vue in the name instead of vec4 makes a lot more sense, especially when we add scalar vertex shader support. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-12-10 12:29:16 -08:00
Kristian Høgsberg	3d10f0a98c	i965: Prepare for using the ATTR register file in the fs backend The scalar vertex shader will use the ATTR register file for vertex attributes. This patch adds support for the ATTR file to fs_visitor. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-12-10 12:29:11 -08:00
Kristian Høgsberg	df0966fb1a	i965: Consolidate code to get struct brw_sampler_prog_key_data This chunk of code is repeated in a few places, and we're going to add a MESA_SHADER_VERTEX case to it soon. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-12-10 12:29:08 -08:00
Kristian Høgsberg	c5b3878714	i965: Add new SIMD8 VS prog data flag This flag signals that we have a SIMD8 VS shader so we can set up the corresponding state accordingly. This boils down to setting the BDW+ SIMD8 enable bit in 3DSTATE_VS and making UBO and pull constant buffers use dword pitch. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-12-10 12:29:04 -08:00
Kristian Høgsberg	d9e29f5d88	i965: Add SIMD8 URB write low-level IR instruction This is all we need from the generator for SIMD8 vertex shaders. This opcode is just the send instruction, all the hard work will happen in the visitor using LOAD_PAYLOAD. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-12-10 12:29:00 -08:00
Kristian Høgsberg	686ef091a4	i965: Remove shader program argument and member from fs_generator Now that the caller passes in the shader debug name, we don't need this anymore. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-12-10 12:28:55 -08:00
Kristian Høgsberg	9a1af7b318	i965: Set shader name for generator from call site fs_generator no longer knows what stage it's generating code for, so we have to set the debug name of the shader from the call site. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-12-10 12:28:51 -08:00
Kristian Høgsberg	7bb9d33b8d	i965: Generalize fs_generator further This removes all stage specific data from the generator, and lets us create a generator for any stage. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-12-10 12:28:48 -08:00
Kristian Høgsberg	840e8fc920	i965: Don't copy propagate constants from sources with saturate We don't propagate the saturate bit and some instructions can't saturate at all. If the source has saturate set, just skip propagation. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-12-10 12:28:32 -08:00
Matt Turner	47aaabda47	i965: Replace 'noann' debug flag with 'ann'. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2014-12-10 10:19:16 -08:00
Matt Turner	1a2de7dce8	i965: Disable unlit-centroid workaround on Gen < 6. Back to the original commit (`8313f444`) adding the workaround, we were enabling it on gens <= 7, even though gens <= 5 can't do multisampling. I cannot find documentation that says that Sandybridge needs this workaround but in practice disabling it causes these piglit tests to fail: EXT_framebuffer_multisample/interpolation {2,4} centroid-deriv{,-disabled} On Ironlake: total instructions in shared programs: 4358478 -> 4349671 (-0.20%) instructions in affected programs: 117680 -> 108873 (-7.48%) A bunch of shaders in TF2, Portal 2, and L4D2 are cut by 25~30%. Cc: "10.4" <mesa-stable@lists.freedesktop.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>	2014-12-10 10:18:39 -08:00
Adrien Destugues	13e42fc025	hgl: traverse add-on entries * Allow using symlinks to add-ons when developing.	2014-12-10 14:01:01 +00:00
Alexander von Gluck IV	03e237e9f2	gallium/target: Haiku softpipe * Use print macro to fix warning on 64-bit systems	2014-12-10 14:01:01 +00:00
Alexander von Gluck IV	63d3f621e3	gallium/aux: Avoid redefining MAX * Can be redefined on some platforms through u_debug.h	2014-12-10 14:01:00 +00:00
Jan Vesely	3a18fc6058	clover: Use switch when creating kernel arguments. This way we get a warning if an enum value is not handled. v2: codestyle Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2014-12-10 15:48:20 +02:00
Dave Airlie	7f21cf7198	r600g: only init GS_VERT_ITEMSIZE on r600 On evergreen there are 4 regs, on r600/700 there is only one. Don't initialise regs and trash someone elses state. Not sure this fixes anything, but hey one less stupid. Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com> Cc: "10.3 10.4" mesa-stable@lists.freedesktop.org Signed-off-by: Dave Airlie <airlied@redhat.com>	2014-12-10 16:34:40 +10:00
Eric Anholt	8812dc503e	vc4: Do QPU scheduling across uniform loads. This means another pass of reordering the uniform data store, but it lets us pair up a lot more instructions. total instructions in shared programs: 44639 -> 43176 (-3.28%) instructions in affected programs: 36938 -> 35475 (-3.96%)	2014-12-09 21:19:11 -08:00
Eric Anholt	c5b544403f	vc4: Populate the delay field better, and schedule high delay first. This is a standard scheduling heuristic, and clearly helps. total instructions in shared programs: 46418 -> 44467 (-4.20%) instructions in affected programs: 42531 -> 40580 (-4.59%)	2014-12-09 18:32:36 -08:00
Eric Anholt	45a8923771	vc4: Skip raddr dependencies for 32-bit immediate loads. These don't have raddr fields.	2014-12-09 18:32:36 -08:00
Eric Anholt	f431b4f110	vc4: Mark VPM read setup as impacting VPM reads, not writes. Fixes assertion failures if we adjust scheduling priorities to emphasize VPM reads more.	2014-12-09 18:32:36 -08:00
Eric Anholt	cff8c96a0d	vc4: Refuse to merge instructions involving 32-bit immediate loads. An immediate load overwrites the mul and add operations, so you can't merge with them.	2014-12-09 18:32:36 -08:00
Aaron Watry	25db8729dc	clover: Fix build after llvm r223802 Signed-off-by: Aaron Watry <awatry at gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2014-12-09 19:28:50 -06:00
Rob Clark	69d23809d0	freedreno/a4xx: frag-coord / face fixes Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-12-09 18:03:55 -05:00
Rob Clark	3dbcd25022	freedreno/a4xx: fix rendering to layer != 0 Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-12-09 18:03:40 -05:00
Rob Clark	6a5ba23fa6	freedreno/a4xx: temp hack for FLAT varyings Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-12-09 18:03:09 -05:00
Rob Clark	eb6fd3b8eb	freedreno/ir3: lower TXP as needed On a3xx, lower TXP for 3D textures, on a4xx lower all TXP. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-12-09 18:03:01 -05:00
Rob Clark	5b38a1740b	freedreno/a4xx: XA gpu hang at startup Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-12-09 18:02:45 -05:00
Rob Clark	1e3a732603	freedreno/a4xx: texture fixes Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-12-09 18:01:49 -05:00
Rob Clark	5d7c9c9160	freedreno: cleanup slice alignment/setup Collapse things back into a setup_slices() which takes the desired alignment as a param. This gets things ready for a4xx which has some slightly different requirements. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-12-09 18:01:21 -05:00
Rob Clark	8ecbcbf0aa	freedreno: update generated headers Signed-off-by: Rob Clark <robclark@freedesktop.org>	2014-12-09 18:01:10 -05:00
Rob Clark	219440ddeb	tgsi/lowering: add support to lower TXP (v2) v2: actually do perspective divide for RECT/SHADOWRECT Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2014-12-09 17:47:44 -05:00
Timothy Arceri	f1b5f2b157	mesa: use build flag to ensure stack is realigned on x86 Nowadays GCC assumes stack pointer is 16-byte aligned even on 32-bits, but that is an assumption OpenGL drivers (or any dynamic library for that matter) can't afford to make as there are many closed- and open- source application binaries out there that only assume 4-byte stack alignment. V4: fix comment and indentation V3: move all sse4.1 build flag config to the same location and add comment as to why we need to do the realign V2: use $target_cpu rather than $host_cpu and setup build flags in config rather than makefile https://bugs.freedesktop.org/show_bug.cgi?id=86788 Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au> Reviewed-by: Matt Turner <mattst88@gmail.com> CC: "10.4" <mesa-stable@lists.freedesktop.org>	2014-12-10 07:35:38 +11:00
Marek Olšák	65ef78e861	draw: implement TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION Required by Nine. Tested with util_run_tests. It's added to softpipe, llvmpipe, and r300g/swtcl. Tested-by: David Heidelberg <david@ixit.cz>	2014-12-09 12:27:10 +01:00
Samuel Iglesias Gonsalvez	6cc7251185	main: return two minor digits for ES shading language version For OpenGL ES 3.0 spec, the minor number for SHADING_LANGUAGE_VERSION is always two digits, matching the OpenGL ES Shading Language Specification release number. For example, this query might return the string "3.00". This patch fixes the following dEQP test: dEQP-GLES3.functional.state_query.string.shading_language_version No piglit regression observed. Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-12-09 11:40:00 +01:00
Samuel Iglesias Gonsalvez	426a50e208	glsl: invariant qualifier is not valid for shader inputs in GLSL ES 3.00 GLSL ES 3.00 spec, chapter 4.6.1 "The Invariant Qualifier", Only variables output from a shader can be candidates for invariance. This includes user-defined output variables and the built-in output variables. As only outputs can be declared as invariant, an invariant output from one shader stage will still match an input of a subsequent stage without the input being declared as invariant. This patch fixes the following dEQP tests: dEQP-GLES3.functional.shaders.qualification_order.variables.valid.invariant_interp_storage_precision dEQP-GLES3.functional.shaders.qualification_order.variables.valid.invariant_interp_storage dEQP-GLES3.functional.shaders.qualification_order.variables.valid.invariant_storage_precision dEQP-GLES3.functional.shaders.qualification_order.variables.valid.invariant_storage dEQP-GLES3.functional.shaders.qualification_order.variables.invalid.invariant_interp_storage_precision_invariant_input dEQP-GLES3.functional.shaders.qualification_order.variables.invalid.invariant_interp_storage_invariant_input dEQP-GLES3.functional.shaders.qualification_order.variables.invalid.invariant_storage_precision_invariant_input dEQP-GLES3.functional.shaders.qualification_order.variables.invalid.invariant_storage_invariant_input No piglit regressions observed. v2: - Add spec content in the code Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-12-09 11:40:00 +01:00
Iago Toral Quiroga	e1ed4f2532	mesa: Recompute LegalTypesMask if the GL API has changed The current code computes ctx->Array.LegalTypesMask just once, however, computing this needs to consider ctx->API so we need to make sure that the API for that context has not changed if we intend to reuse the result. The context API can change, at least, if we go through _mesa_meta_begin, since that will always force API_OPENGL_COMPAT until we call _mesa_meta_end. If any operation in between these two calls triggers a call to update_array_format, then we might be caching a value for LegalTypesMask that will not be right once we have called _mesa_meta_end and restored the context API. Fixes the following 179 dEQP tests in i965: dEQP-GLES3.functional.vertex_arrays.single_attribute.strides.fixed.* dEQP-GLES3.functional.vertex_arrays.single_attribute.normalize.fixed.* dEQP-GLES3.functional.vertex_arrays.single_attribute.output_types.fixed.* dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.static_draw.fixed dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.stream_draw.fixed dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.dynamic_draw.fixed dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.static_copy.fixed dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.stream_copy.fixed dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.dynamic_copy.fixed dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.static_read.fixed dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.stream_read.fixed dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.dynamic_read.fixed dEQP-GLES3.functional.vertex_arrays.multiple_attributes.input_types.3_fixed2 dEQP-GLES3.functional.draw.random.{2,18,28,68,83,106,109,156,181,191} Reviewed-by: Brian Paul <brianp@vmware.com>	2014-12-09 11:40:00 +01:00
Eduardo Lima Mitev	09cb149ba7	mesa: Returns zero samples when querying GL_NUM_SAMPLE_COUNTS when internal format is integer From GL ES 3.0 specification, section 6.1.15 Internal Format Queries (page 236), multisampling is not supported for signed and unsigned integer internal formats. Fixes 19 dEQP tests under 'dEQP-GLES3.functional.state_query.internal_format.*'. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-12-09 11:40:00 +01:00
Eduardo Lima Mitev	7894278717	mesa: Enables GL_RGB and GL_RGBA unsized internal formats for OpenGL ES 3.0 GL_RGB and GL_RGBA are valid internal formats on a GLES3 profile. See "Table 1. Unsized Internal Formats" at https://www.khronos.org/opengles/sdk/docs/man3/html/glTexImage2D.xhtml. Fixes 2 dEQP tests: - dEQP-GLES3.functional.state_query.internal_format.rgb_samples - dEQP-GLES3.functional.state_query.internal_format.rgba_samples Reviewed-by: Brian Paul <brianp@vmware.com>	2014-12-09 11:40:00 +01:00
Eduardo Lima Mitev	242ad32655	mesa: Considers GL_DEPTH_STENCIL_ATTACHMENT a valid argument for FBO invalidation under GLES3 In OpenGL and OpenGL-ES 3+, GL_DEPTH_STENCIL_ATTACHMENT is a valid attachment point for the family of functions that invalidate a framebuffer object (e.g, glInvalidateFramebuffer, glInvalidateSubFramebuffer, etc). Currently, a GL_INVALID_ENUM error is emitted for this attachment point. Fixes 21 dEQP test failures under 'dEQP-GLES3.functional.fbo.invalidate.*'. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-12-09 11:40:00 +01:00
Eric Anholt	8420a95692	vc4: Reserve rb31 instead of r3 for raddr conflict spills. This increases the cost of a raddr b conflict spill (save r3 to rb31, move src1 to r3, move rb31 back to r3 when done, instead of just move src1 to r3), but on average thanks to instruction pairing it's more worthwhile to have another accumulator. total instructions in shared programs: 46428 -> 46171 (-0.55%) instructions in affected programs: 38030 -> 37773 (-0.68%)	2014-12-09 01:04:46 -08:00
Eric Anholt	ab1b1fa6fb	vc4: Prioritize allocating accumulators to short-lived values. The register allocator walks from the end of the nodes array looking for trivially-allocatable things to put on the stack, meaning (assuming everything is trivially colorable and gets put on the stack in a single pass) the low node numbers get allocated first. The things allocated first happen to get the lower-numbered registers, which is to say the fast accumulators that can be paired more easily. When we previously made the nodes match the temporary register numbers, we'd end up putting the shader inputs (VS or FS) in the accumulators, which are often long-lived values. By prioritizing the shortest-lived values for allocation, we can get a lot more instructions that involve accumulators, and thus fewer conflicts for raddr and WS. total instructions in shared programs: 52870 -> 46428 (-12.18%) instructions in affected programs: 52260 -> 45818 (-12.33%)	2014-12-09 00:55:14 -08:00
Dave Airlie	0d4272cd8e	r600g: fix regression since UCMP change Since `d8da6decea` where the state tracker started using UCMP on cayman a number of tests regressed. this seems to be r600g is doing CNDGE_INT for UCMP which is >= 0, we should be doing CNDE_INT with reverse arguments. Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2014-12-09 11:54:46 +10:00
Matt Turner	2a0bef91ca	program: Delete dead _mesa_realloc_instructions. Dead since 2010 (commit `284ce209`). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-12-08 17:02:19 -08:00
Matt Turner	811a1836c8	swrast: Remove 'inline' from tex filter functions. Reduces .text size of mesa_dri_drivers.so (i965-only) by 62k, or 1.4%. Note that we don't remove inline from lerp_2d(), which has a comment above it saying it definitely should be inlined. Though, removing the inline keyword from it doesn't actually change the compiled code for me. Reviewed-by: Brian Paul <brianp@vmware.com>	2014-12-08 17:02:19 -08:00
Matt Turner	8af4aaf351	Don't cast the return value of malloc/realloc See commit `2b7a972e` for the Coccinelle script. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-12-08 17:02:19 -08:00
Matt Turner	f0a8bcd84e	Use calloc instead of malloc/memset-0 See commit `6bda027e` for the Coccinelle script. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-12-08 17:02:19 -08:00
Matt Turner	9019e5e195	Remove useless checks for NULL before freeing See commits `5067506e` and `b6109de3` for the Coccinelle script. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-12-08 17:02:19 -08:00

1 2 3 4 5 ...

66852 Commits All Branches Search

66852 Commits

All Branches