Commit Graph

55924 Commits

Author SHA1 Message Date
Christian König e4ed58763a radeonsi: add instanceid support
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2013-04-02 13:01:43 +02:00
Christian König 83df955ca9 radeon/llvm: move system value fetching to common code
This should be used by both SI and R600.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2013-04-02 13:01:42 +02:00
Michel Dänzer c6efb4870b radeonsi: Handle arbitrary 2-byte formats in resource_copy_region
Fixes mplayer -vo vdpau OSD.

NOTE: This is a candidate for the 9.1 branch.

Reported-by: Igor Vagulin <igor.vagulin@gmail.com>

Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Christian König <christian.koenig@amd.com>
2013-04-02 11:42:35 +02:00
Maarten Lankhorst 6d20c646d6 nvc0: Fix fd leak in nvc0_create_decoder
NOTE: This is a candidate for the 9.0 and 9.1 branches.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
2013-04-02 10:25:26 +02:00
Aras Pranckevicius b2eee0869f GLSL: fix lower_jumps to report progress properly
A fix for lower_jumps progress reporting, very much like similar in
c1e591eed.

NOTE: This is a candidate for stable branches.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-04-01 16:57:17 -07:00
Eric Anholt 62501c3af8 i965/fs: Allow CSE on pre-gen7 varying-index uniform loads
All the other expression types allowed here have inst->mlen == 0, and this
one has implied MRF writes for all of its payload, so nothing else in the
implementation should need to change.

Reduces SEND messages for loading from pull constants in kwin's Lanczos
shader from 16 to 6.  (Due to a deficiency in constant propagation, I
can't use the hack I did in the previous commit to test the performance
change)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
NOTE: This is a candidate for the 9.1 branch.
2013-04-01 16:17:26 -07:00
Eric Anholt 70b27e0e4b i965/fs: Use LD messages for pre-gen7 varying-index uniform loads
This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on
my GM45 forced to load all uniforms through the varying-index path), but we
get a whole vec4 at a time to reuse in the next commit.

v2: Fix comment about channels in the other message.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.1 branch.
2013-04-01 16:17:26 -07:00
Eric Anholt ce316f62ef i965/fs: Don't double-emit SEND dependency workarounds at control flow.
We weren't setting needs_dep[i] in the loops, so we'd continue on to
potentially add the same workaround MOVs to the later basic block
boundaries, too.  We can either set needs_dep[i] to exit through the
normal path, or we can just return since we know we're done.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 16:17:26 -07:00
Eric Anholt 3cf69b2284 i965/fs: Bake regs_written into the IR instead of recomputing it later.
For sampler messages, it depends on the target gen, and on gen4
SIMD16-sampler-on-SIMD8-execution we were returning 4 instead of 8 like we
should.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.1 branch.
2013-04-01 16:17:26 -07:00
Eric Anholt 8edc7cbe64 i965/fs: Clean up the setup of gen4 simd16 message destinations.
I think this makes it much more obvious what's going on here.

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 16:17:26 -07:00
Eric Anholt 9f43b84928 i965/fs: Do CSE on gen7's varying-index pull constant loads.
This is our first CSE on a regs_written() > 1 instruction, so it takes a
bit of extra fixup.  Reduces the number of loads on kwin's Lanczos shader
from 12 to 2.

v2: Fix compiler warning (false positive on possibly-uninitialized variable)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
NOTE: This is a candidate for the 9.1 branch.
2013-04-01 16:17:25 -07:00
Eric Anholt dca5fc1435 i965/fs: Improve performance of varying-index uniform loads on IVB.
Like we have done for the VS and for constant-index uniform loads, we use
the sampler engine to get caching in front of the L3 to avoid tickling the
IVB L3 bug.  This is also a bit of a functional change, as we're now
loading a vec4 instead of a single dword, though we're not taking
advantage of the other 3 components of the vec4 (yet).

With the driver hacked to always take the varying-index path for all
uniforms, improves performance of my old GLSL demo by 315% +/- 2% (n=4).
This a major fix for some blur shaders in compositors from the
varying-index uniforms support I introduced in 9.1.

v2: Move old offset computation into the pre-gen7 path.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
NOTE: This is a candidate for the 9.1 branch.
2013-04-01 16:17:25 -07:00
Eric Anholt bc0e1591f6 i965/fs: Avoid inappropriate optimization with regs_written > 1.
Right now we don't have anything with regs_written() > 1 and !inst->mlen,
but that's about to change.

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 16:17:25 -07:00
Eric Anholt 740350c982 i965: Make the fragment shader pull constants index by dwords, not vec4s.
We want to load vec4s, since loading a vec4 instead of a dword is
basically no increased latency.  But for variable indexed access, the
previous requirement of aligned vec4s for a sampler LD was hard to
implement.

Note that this change only affects those messages that use the surface
format, like sampler LDs, but not to the untyped data cache loads we've
used in other cases.

No significant performance difference on my GLSL demo with uniforms forced
to take the varying pull constants path (n=4).

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 16:17:25 -07:00
Eric Anholt 2f41a60145 i965: Make the constant surface interface take a normal byte size.
This puts the rounding-up logic into the function itself instead of all
the callers having to manage it.  Also drop an "unused" comment in gen4,
as the stride *is* used for texbos (and will be for uniforms soon).

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 16:17:25 -07:00
Eric Anholt 8c694dfe64 i965/fs: Move varying uniform offset compuation into the helper func.
I'm going to want to change the math for gen7 using sampler LD
instructions in a way that gets CSE to occur like we'd hope.

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 16:17:25 -07:00
Eric Anholt 59e858861c i965/fs: Remove creation of a MOV instruction that's never used.
We weren't inserting it into the list, so it did nothing.  This line was
replaced by the MOV/MUL block above.

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 16:17:24 -07:00
Eric Anholt 1d6ead3804 i965/fs: Allow constant propagation into MACH.
This happens quite a bit with varying-index uniform loads.  We could also
do better by avoiding the MACH entirely, but there's no reason not to at
least take this step.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 16:17:24 -07:00
Vincent Lejeune 50fd9c4544 r600g/llvm: Update LLVM_REVISION.txt 2013-04-01 23:50:20 +02:00
Vincent Lejeune 8c8c4e3977 r600g/llvm: Use stack_size provided from llvm. 2013-04-01 23:43:57 +02:00
Vincent Lejeune 4ac0d85ca6 r600g/llvm: uses function attribute to pass shader type 2013-04-01 23:43:42 +02:00
Vincent Lejeune af38695f51 r600g/llvm: Add support for cf_alu native encode 2013-04-01 23:43:27 +02:00
Haixia Shi bc0cc2944f ACTIVE_UNIFORM_MAX_LENGTH should include 3 extra characters for arrays.
If the active uniform is an array, then the length of the uniform name should
include the three extra characters for the "[0]" suffix, which is required by
the GL 4.2 spec to be appended to the uniform name in glGetActiveUniform().

This avoids the situation where the output buffer does not have enough space
to hold the "[0]" suffix, resulting in an incomplete array specification like
"foobar[0".

NOTE: This is a candidate for the 9.1 branch.

Change-Id: I41e87ba347a7169eec8c575596cc3416adbe0728
Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-04-01 13:39:13 -07:00
Matt Turner e2b40e253b i965/fs: Fix bad interaction between tex swizzles and textureQueryLOD.
Reported-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 13:11:43 -07:00
Eric Anholt 4ee892ee8a i965: Remove the old brw_optimize() code.
This is now done in the VS backend before instruction emit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 11:36:06 -07:00
Eric Anholt 4fee05b020 i965/vs: Add a pass to set dependency control fields on instructions.
This is a more aggressive version of the old brw_optimize() path.  Reduces
cycles spent in the vertex shader on minecraft by 18.6% +/- 10.0% (n=15).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 11:36:05 -07:00
Eric Anholt 229a51cdbe i965: Dump shader source for linked shader programs.
We dump shader source in ir_to_mesa.cpp, and we dump linked programs here,
but we had no reference from the linked programs to their source.  This
was preventing improvement of shader-db to use linked shader programs
instead of individual shader files (which is bogus, because it means we
optimize out VS outputs, and don't interpolate FS inputs!)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 11:30:36 -07:00
Mike Lothian 777a7f2003 clover: Fix build with LLVM 3.3 2013-04-01 10:50:23 -07:00
Brian Paul 1165ff1af1 llvmpipe: use triangle subdivision to avoid fixed-point overflow issues
If we're drawing to a surface that's 2048 x 2048 pixels or larger there's
danger of fixed-point overflow in the triangle rasterization code.  That
leads to various rendering glitches.

Rather than implement some intricate changes to the rasterization code,
simply subdivide triangles into smaller subtriangles to avoid the issue.
Only do this when the drawing surface is larger than 2048 by 2048.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-01 08:40:35 -06:00
Brian Paul 95df2b2883 mesa: remove platform checks around __builtin_ffs, __builtin_ffsll
Use the __builtin_ffs, __builtin_ffsll functions whenever we have GCC,
not just for specific platforms.  Fixes Solaris build.

Note: This is a candidate for the stable branches.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62868
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-01 08:40:35 -06:00
Brian Paul 99811c344b docs: add a new page documenting known application issues
Let's try to update this when we find other broken applications...

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-01 08:40:35 -06:00
Brian Paul fe30fa9ad6 drirc: set always_have_depth_buffer for Topogon
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-01 08:18:09 -06:00
Adam Jackson e26d5940ff gallivm: Minor comment cleanup
Signed-off-by: Adam Jackson <ajax@redhat.com>
2013-04-01 09:45:38 -04:00
Dave Airlie 135bb3c1a9 mesa: fix texture storage multisample prototypes harder.
I just noticed the warnings since I fixed the other bit.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-04-01 19:54:56 +10:00
Vincent Lejeune c3fb34ee8d r600g/llvm: Update LLVM_REVISION 2013-03-31 21:37:20 +02:00
Vincent Lejeune 67a8ee7aaa r600g/llvm: use native encode for tex 2013-03-31 21:35:47 +02:00
Dave Airlie 5b36bc05be glapi: fix storage multisample build errors
Reported on #radeon by udovdh

Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-03-31 20:41:28 +10:00
Chris Forbes 2a528889a3 docs: mark ARB_texture_storage_multisample done
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:42 +13:00
Chris Forbes d25b4d5e90 i965: enable ARB_texture_storage_multisample on Gen6+
This can be enabled everywhere that ARB_texture_multisample is
supported -- ARB_texture_storage is supported on everything.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:40 +13:00
Chris Forbes e0015c819c mesa: allow multisample texture targets in [Get]TexParameter*
ARB_texture_storage_multisample allows texture parameters to be
queried for TEXTURE_2D_MULTISAMPLE and TEXTURE_2D_MULTISAMPLE_ARRAY
targets.

Some parameters may also be set, with the following exceptions:

- TEXTURE_BASE_LEVEL may not be set to a nonzero value; generates
   INVALID_OPERATION

- any state which appears in the `per-sampler` state table may not
  be set; generates INVALID_OPERATION

V2: Don't introduce bogus handling of TEXTURE_MAX_LEVEL

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:36 +13:00
Chris Forbes b15c558c85 mesa: improve reported function name in Tex*Multisample
Now that there are 4 variants, just pass the function name into
teximagemultisample rather than reconstructing it.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:34 +13:00
Chris Forbes 9cbfe98bfc mesa: add enable bit for ARB_texture_storage_multisample
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:32 +13:00
Chris Forbes 719974b54c glapi: add definition of ARB_texture_storage_multisample
Adds XML for the extension, dispatch_sanity enabling, and the two new
entrypoints. These are both implemented by calling the shared
teximagemultisample() with immutable=GL_TRUE.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:28 +13:00
Chris Forbes 788b0f8535 mesa: add support for immutable textures to teximagemultisample()
The new entrypoints will come later, but this adds the actual logic for
supporting immutable multisample textures:

- The immutability flag is set as desired.
- Attempting to modify an immutable multisample texture produces
  INVALID_OPERATION.

Note: The extension spec does not mention adding this behavior to
TexImage*Multisample, but it seems like the reasonable thing to do.

V2: - Cover missing error cases (unsized formats; texture object zero)

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
[V1] Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:22 +13:00
Chris Forbes 7f32b9560b mesa: extract _mesa_is_legal_tex_storage_format helper
This is about to be used in teximagemultisample() when immutable=true.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:13 +13:00
Kenneth Graunke fdc5941972 mesa: Delete VERT_ATTRIB_GENERIC_NV and VERT_BIT_GENERIC_NV macros.
These haven't been used since we deleted NV_vertex_program support.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-30 19:19:45 -07:00
Eric Anholt 0967c362bf i965: Fix an inconsistency inb the VUE map with gl_ClipVertex on gen4/5.
We are intentionally not allocating a slot for gl_ClipVertex.  But by
leaving the bit set in the slots_valid, the fragment shader's computation
of where varyings are in urb entry coming out of the SF would be off by
one.  Fixes rendering in Freespace 2 SCP, and improves rendering in TF2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62830
Tested-by: Joaquín Ignacio Aramendía <samsagax@gmail.com>
NOTE: This is a candidate for the 9.1 branch.
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-03-30 17:24:18 -07:00
Eric Anholt 9dd19575d3 intel: Remove a never-taken debug print path.
Alessandro Pignotti noted when I added this code in commit
0e723b135b that it's in the else block for
"if (busy)", so this debug print couldn't happen.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-30 17:23:50 -07:00
Brian Paul c34bbe110d st/mesa: add ir_lod case in GLSL->TGSI code to silence warning 2013-03-29 17:21:33 -06:00
Ian Romanick e0131196ca glsl: Generated masked write instead of vector array index for UBO lowering
When reading a column from a row-major matrix, we would slot the single
value read into the vector using an ir_dereference_array of the vector
with a constant index.  This will (eventually) get optimized to a
masked-write, so just generate the masked write in the first place.

v2: Remove unused variable 'chan'.  Suggested by Ken.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Eric Anholt <eric@anholt.net>
2013-03-29 12:01:14 -07:00