KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Chris Forbes	f6a3fda25d	i965: vs: Add fixup for textureSize with cube array samplers V3: Fixed weird whitespace V4: Use sampler's type rather than variable's type; otherwise broken with arrays of samplers. (Thanks Eric) v5: Fix a couple more style nits (by anholt) Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:26:31 -08:00
Chris Forbes	1cb57ea493	i965/vs: Fix gen6+ math operand quirks in one place This causes immediate values to get moved to a temp on gen7, which is needed for an upcoming change but hadn't happened in the visitor until then. v2: Drop gen > 7 checks (doesn't exist), and style-fix comments (changes by anholt). Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:26:28 -08:00
Chris Forbes	0cda3382a6	i965: Add various plumbing for cubemap arrays V4: Fixed style nits Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:26:12 -08:00
Eric Anholt	2cae9f2d4a	i965/fs: Add empirically-determined instruction latencies for gen7. v2: Actually switch on the other math instructions mentioned in the comment. v3: Add timing data for textureSize(), and clean up some long comment lines. Testing shader_time of fs16 shaders on a few frames of various apps: nexuiz improved by 2.9% +/- 1.5% (n=10) no difference on GLB2.5 (n=36, outliers removed) no difference on GLB2.7 (n=25) etqw improved by 2.6% +/- 2.2% (n=25) no difference on lightsmark (n=25) Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:18:22 -08:00
Eric Anholt	4df1e18864	i965/fs: Fix the clock increment in scheduling. I've tested this to be true with various ALU ops on gen7 (with the exception of MADs, which go at either 3 or 4 cycles per dispatch). Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:18:14 -08:00
Eric Anholt	6255fc7426	i965/fs: Move the old gen4 bspec-based scheduling info to a helper func. For gen7 everything changes, and we have actual information on latency. Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:18:10 -08:00
Eric Anholt	461a29783a	i965/fs: Set up gen7 UBO loads as sends from GRFs. This gives the instruction scheduler a chance to schedule between the loads, whereas before it was restricted due to the dependencies between the MRFs for setting them up. For one shader in gles3conform, it goes from getting stuck in register allocation for as long as anybody's bothered to leave it running down to 23 seconds, thanks to the LIFO scheduling. Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:18:05 -08:00
Eric Anholt	456dbcc337	i965/fs: Before reg alloc, schedule instructions to reduce live ranges. This came from an idea by Ben Segovia. 16-wide pixel shaders are very important for latency hiding on i965, so we want to try really hard to get them. If scheduling an instruction makes some set of instructions available, those are probably the ones that make the instruction's result dead. By choosing those first, we'll have a tendency to reduce the amount of live data as opposed to creating more. Previously, we were sometimes getting this behavior out of the scheduler, which was what produced the scheduler's original performance wins on lightsmark. Unfortunately, that was mostly an accident of the lame instruction latency information that I had, which made it impossible to fix the actual scheduling for performance. Now that we've fixed the scheduling for setup for register allocation, we can safely update the latency parameters for the final schedule. In shader-db, we lose 37 16-wide shaders, but gain 90 new ones. 4 shaders that were spilling change how many registers spill, for a reduction of 70/3899 instructions. v2: Simplify the new loop. Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:17:59 -08:00
Eric Anholt	ba864bfcfa	i965/fs: Add some optional debug printfs to scheduling. Seeing when instructions become available to schedule is really useful. Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:17:55 -08:00
Eric Anholt	7a9f940cab	i965/fs: Schedule instructions both before and after register allocation. Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-14 15:17:41 -08:00
Eric Anholt	1315f3b4b3	i965: Make sure that the shader_time report at context destroy happens. Otherwise, you end up with some report from within a second of context destroy, which is now what you really want for testing the impact of changes	2012-12-14 15:05:10 -08:00
Eric Anholt	81c247404a	i965: Print a total time for the different shader stages. Sometimes I've got a patch for a performance optimization that's not showing a statistically significant performance difference on reported FPS, but still seems like a good idea because it ought to reduce time spent in the shader. If I can see the total number of cycles spent in the shader stage being optimized, it may show that the patch is still worthwhile (or point out that it's actually broken in some way).	2012-12-14 15:05:10 -08:00
Eric Anholt	f74560f3fb	i965: Scale shader_time to compensate for resets. Some shaders experience resets more than others, which skews the numbers reported. Attempt to correct for this by linearly scaling according to the number of resets that happen. Note that will not be accurate if invocations of shaders have varying times and longer invocations are more likely to reset. However, this should at least be better than the previous situation.	2012-12-14 15:05:10 -08:00
Eric Anholt	338b5f887d	i965: Adjust the split between shader_time_end() and shader_time_write(). I'm about to emit other kinds of writes besides time deltas, and it turns out with the frequency of resets, we couldn't really use the old time delta write() function more than once in a shader.	2012-12-14 15:05:10 -08:00
Paul Berry	ca7e891e8a	glsl/linker: Pack between varyings. This patch implements varying packing between varyings. Previously, each varying occupied components 0 through N-1 of its assigned varying slot, so there was no way to pack two varyings into the same slot. For example, if the varyings were a float, a vec2, a vec3, and another vec2, they would be stored as follows: <----slot1----> <----slot2----> <----slot3----> <----slot4----> slots * * * * * * * * * * * * * * * * flt x x x <vec2-> x x <--vec3---> x <vec2-> x x varyings (Each * represents a varying component, and the "x"s represent wasted space). This change packs the varyings together to eliminate wasted space between varyings, like so: <----slot1----> <----slot2----> <----slot3----> <----slot4----> slots * * * * * * * * * * * * * * * * <vec2-> <vec2-> flt <--vec3---> x x x x x x x x varyings Note that we take advantage of the sort order introduced in previous patches (vec4's first, then vec2's, then scalars, then vec3's) to minimize how often a varying is "double parked" (split across varying slots). Reviewed-by: Eric Anholt <eric@anholt.net> v2: Skip varying packing if ctx->Const.DisableVaryingPacking is true.	2012-12-14 10:51:21 -08:00
Paul Berry	df87722bec	glsl/linker: Pack within compound varyings. This patch implements varying packing within varyings that are composed of multiple vectors of size less than 4 (e.g. arrays of vec2's, or matrices with height less than 4). Previously, such varyings used up a full 4-wide varying slot for each constituent vector, meaning that some of the components of each varying slot went unused. For example, a mat4x3 would be stored as follows: <----slot1----> <----slot2----> <----slot3----> <----slot4----> slots * * * * * * * * * * * * * * * * <-column1-> x <-column2-> x <-column3-> x <-column4-> x matrix (Each * represents a varying component, and the "x"s represent wasted space). In addition to wasting precious varying components, this layout complicated transform feedback, since the constituents of the varying are expected to be output to the transform feedback buffer contiguously (e.g. without gaps between the columns, in the case of a matrix). This change packs the constituents of each varying together so that all wasted space is at the end. For the mat4x3 example, this looks like so: <----slot1----> <----slot2----> <----slot3----> <----slot4----> slots * * * * * * * * * * * * * * * * <-column1-> <-column2-> <-column3-> <-column4-> x x x x matrix Note that matrix columns 2 and 3 now cross a boundary between varying slots (a characteristic I call "double parking" of a varying). We don't bother trying to eliminate the wasted space at the end of the varying, since the patch that follows will take care of that. Since compiler back-ends don't (yet) support this packed layout, the lower_packed_varyings function is used to rewrite the shader into a form where each varying occupies a full varying slot. Later, if we add native back-end support for varying packing, we can make this lowering pass optional. Reviewed-by: Eric Anholt <eric@anholt.net> v2: Skip varying packing if ctx->Const.DisableVaryingPacking is true.	2012-12-14 10:51:18 -08:00
Paul Berry	4bb8661b1b	gallium: Disable varying packing on hardware with <=8 texture indirections. In practice this will disable varying packing on R300, R400, i915g, and nv30. Reviewed-by: Marek Olšák <maraeo@gmail.com>	2012-12-14 10:51:10 -08:00
Paul Berry	6ee500cfd2	mesa: Add an option so driver can opt out of varying packing. On hardware that supports a limited number of texture indirections, varying packing will comsume an extra texture indirection, since ALU operations are needed in the fragment shader to unpack the varyings before any texturing can be done. This patch introduces a new driver option, ctx->Const.DisableVaryingPacking, which can be used by a driver to opt out of varying packing if the extra texture indirection is costly enough to outweigh the advantages of packing varyings. Reviewed-by: Marek Olšák <maraeo@gmail.com>	2012-12-14 10:49:32 -08:00
Paul Berry	1745a4d751	glsl: Add a lowering pass for packing varyings. This lowering pass generates GLSL code that manually packs varyings into vec4 slots, for the benefit of back-ends that don't support packed varyings natively. No functional change--the lowering pass is not yet used. Reviewed-by: Eric Anholt <eric@anholt.net> v2: Don't use ir_hierarchical_visitor--just loop over instructions directly. Also, make the names of the packed varyings include the names of the original varyings that were packed into them.	2012-12-14 10:49:21 -08:00
Paul Berry	f3993107f0	glsl/linker: Sort varyings by packing class, then vector size. This patch paves the way for varying packing by adding a sorting step before varying assignment, which sorts the varyings into an order that increases the likelihood of being able to find an efficient packing. First, varyings are sorted into "packing classes" by considering attributes that can't be mixed during varying packing--at the moment this includes base type (float/int/uint/bool) and interpolation mode (smooth/noperspective/flat/centroid), though later we will hopefully be able to relax some of these restrictions. The number of packing classes places an upper limit on the amount of space that must be wasted by varying packing, since in theory a shader might nave 4n+1 components worth of varyings in each of m packing classes, resulting in 3m components worth of wasted space. Then, within each packing class, varyings are sorted by vector size, with vec4's coming first, then vec2's, then scalars, and then finally vec3's. The motivation for this order is that it ensures that the only vectors that might be "double parked" (with part of the vector in one varying slot and the remainder in another) are vec3's. Note that the varyings aren't actually packed yet, merely placed in an order that will facilitate packing. Reviewed-by: Eric Anholt <eric@anholt.net>	2012-12-14 10:49:12 -08:00
Paul Berry	eb989e37cb	glsl/linker: Subdivide the first phase of varying assignment. This patch further subdivides the loop that assigns varying locations into two phases: one phase to match up the varyings between shader stages, and one phase to assign them varying locations. In between the two phases the matched varyings are stored in a new data structure called varying_matches. This will free us to be able to assign varying locations in any order, which will pave the way for packing varyings. Note that the new varying_matches::assign_locations() function returns the number of varying slots that were used; this return value will be used in a future patch. Reviewed-by: Eric Anholt <eric@anholt.net>	2012-12-14 10:49:08 -08:00
Paul Berry	25ed3bef9b	glsl/linker: Defer recording transform feedback locations. This patch subdivides the loop that assigns varying locations into two phases: one phase to match up varyings between shader stages (and assign them varying locations), and a second phase to record the varying assignments for use by transform feedback. This paves the way for varying packing, which will require us to further subdivide the first phase. In addition, it lets us avoid a clumsy O(n^2) algorithm, since we can now record the locations of all transform feedback varyings in a single pass through the tfeedback_decls array, rather than have to iterate through the array after assigning each varying. Reviewed-by: Eric Anholt <eric@anholt.net>	2012-12-14 10:49:05 -08:00
Paul Berry	3e81c666db	glsl: Create a field to store fractional varying locations. Currently, the location of each varying is recorded in ir_variable as a multiple of the size of a vec4. In order to pack varyings, we need to be able to record, e.g. that a vec2 is stored in the second half of a varying slot rather than the first half. This patch introduces a field ir_variable::location_frac, which represents the offset within a vec4 where a varying's value is stored. Varyings that are not subject to packing will always have a location_frac value of zero. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2012-12-14 10:48:52 -08:00
Paul Berry	3c9c17db4a	glsl/linker: Make separate ir_variable field to mean "unmatched". Previously, the linker used a value of -1 in ir_variable::location to denote a generic input or output of the shader that had not yet been matched up to a variable in another pipeline stage. This patch introduces a new ir_variable field, is_unmatched_generic_inout, for that purpose. In future patches, this will allow us to separate the process of matching varyings between shader stages from the processes of assigning locations to those varying. That will in turn pave the way for packing varyings. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2012-12-14 10:48:38 -08:00
Paul Berry	50895d443a	glsl/linker: Always invalidate shader ins/outs, even in corner cases. Previously, link_invalidate_variable_locations() was only called during assign_attribute_or_color_locations() and assign_varying_locations(). This meant that in the corner case when there was only a vertex shader, and varyings were being captured by transform feedback, link_invalidate_variable_locations() wasn't being called for the varyings. This patch migrates the calls to link_invalidate_variable_locations() to link_shaders(), so that they will be called in all circumstances. In addition, it modifies the call semantics so that link_invalidate_variable_locations() need only be called once per shader stage (rather than once for inputs and once for outputs). Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2012-12-14 10:48:35 -08:00
Paul Berry	18392443d4	glsl/lower_clip_distance: Update symbol table. This patch modifies the clip distance lowering pass so that the new symbol it generates (glClipDistanceMESA) is added to the shader's symbol table. This will allow a later patch to modify the linker so that it finds transform feedback varyings using the symbol table rather than having to iterate through all the declarations in the shader. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2012-12-14 10:48:28 -08:00
Tapani Pälli	d249159fe6	android: build fix for libmesa_glsl_utils hash_table.c compilation requires ralloc.h include path Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Chad Versace <chad.versace@linux.intel.com>	2012-12-14 10:01:45 -08:00
Brian Paul	a12a8c910f	mesa: minor indentation fixes in texcompress_etc.c	2012-12-14 06:33:08 -07:00
Brian Paul	b29f2d5ff5	mesa: remove old swrast-based compressed texel fetch code	2012-12-14 06:33:08 -07:00
Brian Paul	7dc36a50de	swrast: use new core Mesa compressed texel fetch functions	2012-12-14 06:33:08 -07:00
Brian Paul	faa95fd7fa	mesa: reimplement _mesa_decompress_image() using new tex fetch code	2012-12-14 06:33:08 -07:00
Brian Paul	ccbe7db1e6	mesa: added _mesa_get_compressed_fetch_func()	2012-12-14 06:33:08 -07:00
Brian Paul	ad3e39bb6d	mesa: add new texel fetch code for etc formats	2012-12-14 06:33:07 -07:00
Brian Paul	cd7baf5bf4	mesa: add new texel fetch code for rgtc formats	2012-12-14 06:33:07 -07:00
Brian Paul	141d299965	mesa: add new texel fetch code for fxt formats	2012-12-14 06:33:07 -07:00
Brian Paul	a774eaa57e	mesa: add new texel fetch code for dxt formats	2012-12-14 06:33:07 -07:00
Brian Paul	2037a06da9	mesa: add compressed_fetch_func typedef This is a first step in removing the swrast-related code in core Mesa's texture compression files.	2012-12-14 06:33:07 -07:00
Brian Paul	90b7797a1d	swrast: merge get_texel_fetch_func() and set_fetch_functions() No real need for separate functions anymore.	2012-12-14 06:33:07 -07:00
Brian Paul	f4896cea04	swrast: make _mesa_get_texel_fetch_func() static Not called from any other file.	2012-12-14 06:33:07 -07:00
Dave Airlie	9e41b0badb	draw/llvmpipe: fix transform feedback position + enable other extensions This builds on the previous draw/softpipe patch. So llvmpipe does streamout calls after clip/viewport stages, but we have the pre-clip position stored for later use, so when we are doing transform feedback, and its the position vertex grab the vertex from the stored pre clip position. The perfect fix is too probably add a codegen transform feedback stage in between shader and clip stages, but this is good enough for now. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2012-12-14 11:34:40 +10:00
Dave Airlie	55d37eb40e	draw: add support for later transform feedback extensions This adds support to draw for the new features of transform feedback. a) fix count_from_stream_output, using max_index+1 for now but it looks like it should be valid as its derived from the vertex elements/vbo. b) fix striding and dst offsets in output buffers - was just wrong before. c) fix crash if tfb is suspended (so.num_targets == 0) This also enables the new features on softpipe. It should be possible to enable them on llvmpipe as well after this commit, but would need to schedule piglit runs. Signed-off-by: Dave Airlie <airlied@redhat.com>	2012-12-14 11:34:15 +10:00
Tom Stellard	4330cfec8b	clover: Fix build since removal of pipe_surface::usage by commit `25409c6da8`	2012-12-13 20:04:34 +00:00
Maxence Le Dore	6d7d821e3d	r600g/radeonsi: Silence warnings Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2012-12-13 19:40:28 +00:00
Tom Stellard	c68babfc3c	clover: Add support for compiler flags Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2012-12-13 19:22:44 +00:00
Tom Stellard	7f71efcf7a	clover: Don't erase build info of devices not being built Every call to _cl_program::build() was erasing the binaries and logs for every device associated with the program. This is incorrect because it is possible to build a program for only a subset of devices and so any device not being build should not have this information erased. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2012-12-13 19:22:35 +00:00
Vincent Lejeune	c7f9fb37ea	r600g: use load_ar checks with llvm output. Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2012-12-13 19:22:10 +00:00
Thierry Reding	60e05d7388	build: Fix AX_PROG_{CC,CXX}_FOR_BUILD macros Override the cross_compiling and ac_tool_prefix variables by reassigning to them instead of redefining the macros. Redefining them will actually cause the variable names to be replaced instead of their content. Furthermore push the definition of CPPFLAGS before running the checks for the build tools to avoid the host CPPFLAGS from leaking into the build CPPFLAGS. While at it drop the redefinition of AC_TRY_COMPILER which hasn't been used since autoconf 2.50 and make sure that all definitions are properly popped when done (LDFLAGS, ac_cv_prog_CPP, ac_cv_prog_CXXCPP). Acked-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>	2012-12-13 10:58:11 -08:00
Roland Scheidegger	a460aea3f1	gallivm: fix texel fetch for array textures Since we don't call lp_build_sample_common() in the texel fetch path we missed the layer fixup code. If someone would have tried to do texelFetch with array textures it would have crashed for sure. Not really tested (can't run the piglit test being able to use texelFetch with array samplers for now with llvmpipe). Reviewed-by: José Fonseca <jfonseca@vmware.com>	2012-12-13 19:17:09 +01:00
Paul Berry	6267853055	mesa: Fix computation of default vertex attrib stride for 2_10_10_10 formats. Previously, if the client program didn't specify a stride when setting up a vertex attribute, we used _mesa_sizeof_type() to compute the size of the type, and multiplied it by the number of components. This didn't work for the 2_10_10_10 formats, since _mesa_sizeof_type() returns -1 for those types, resulting in all kinds of havoc, since it was causing the hardware to be programmed with a negative stride value. This patch adds a new function _mesa_bytes_per_vertex_attrib(), which is similar to the existing function _mesa_bytes_per_pixel(), but which computes the size of a vertex attribute based on the type and the number of formats. For packed formats (currently only the 2_10_10_10 formats), it verifies that the number of components is correct and returns the size of the packed format. For unpacked formats, it returns the size of the type times the number of components. In addition, this patch adds an assertion so that if we ever forget to update _mesa_bytes_per_vertex_attrib() when adding a new vertex format, we'll see the problem quickly rather than having to debug a subtle conformance test failure. Fixes GLES3 conformance tests vertex_type_2_10_10_10_rev_{conversion,divisor,stride_pointer}.test. Reviewed-by: Brian Paul <brianp@vmware.com>	2012-12-13 10:09:03 -08:00
Matt Turner	11cea47246	mesa/uniform_query: Don't write to *params if there is an error The GL 3.1 and ES 3.0 specs say of glGetActiveUniformsiv: "If an error occurs, nothing will be written to params." So, make a pass through the indices and check that they're valid before the pass that actually writes to params. Checking pname happens on the first iteration of the second loop. Fixes es3conform's getactiveuniformsiv_for_nonexistent_uniform_indices test. NOTE: This is a candidate for the 9.0 branch. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-12-13 09:53:28 -08:00

1 2 3 4 5 ...

54179 Commits All Branches Search

54179 Commits

All Branches