Commit Graph

55813 Commits

Author SHA1 Message Date
Eric Anholt dca5fc1435 i965/fs: Improve performance of varying-index uniform loads on IVB.
Like we have done for the VS and for constant-index uniform loads, we use
the sampler engine to get caching in front of the L3 to avoid tickling the
IVB L3 bug.  This is also a bit of a functional change, as we're now
loading a vec4 instead of a single dword, though we're not taking
advantage of the other 3 components of the vec4 (yet).

With the driver hacked to always take the varying-index path for all
uniforms, improves performance of my old GLSL demo by 315% +/- 2% (n=4).
This a major fix for some blur shaders in compositors from the
varying-index uniforms support I introduced in 9.1.

v2: Move old offset computation into the pre-gen7 path.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
NOTE: This is a candidate for the 9.1 branch.
2013-04-01 16:17:25 -07:00
Eric Anholt bc0e1591f6 i965/fs: Avoid inappropriate optimization with regs_written > 1.
Right now we don't have anything with regs_written() > 1 and !inst->mlen,
but that's about to change.

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 16:17:25 -07:00
Eric Anholt 740350c982 i965: Make the fragment shader pull constants index by dwords, not vec4s.
We want to load vec4s, since loading a vec4 instead of a dword is
basically no increased latency.  But for variable indexed access, the
previous requirement of aligned vec4s for a sampler LD was hard to
implement.

Note that this change only affects those messages that use the surface
format, like sampler LDs, but not to the untyped data cache loads we've
used in other cases.

No significant performance difference on my GLSL demo with uniforms forced
to take the varying pull constants path (n=4).

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 16:17:25 -07:00
Eric Anholt 2f41a60145 i965: Make the constant surface interface take a normal byte size.
This puts the rounding-up logic into the function itself instead of all
the callers having to manage it.  Also drop an "unused" comment in gen4,
as the stride *is* used for texbos (and will be for uniforms soon).

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 16:17:25 -07:00
Eric Anholt 8c694dfe64 i965/fs: Move varying uniform offset compuation into the helper func.
I'm going to want to change the math for gen7 using sampler LD
instructions in a way that gets CSE to occur like we'd hope.

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 16:17:25 -07:00
Eric Anholt 59e858861c i965/fs: Remove creation of a MOV instruction that's never used.
We weren't inserting it into the list, so it did nothing.  This line was
replaced by the MOV/MUL block above.

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 16:17:24 -07:00
Eric Anholt 1d6ead3804 i965/fs: Allow constant propagation into MACH.
This happens quite a bit with varying-index uniform loads.  We could also
do better by avoiding the MACH entirely, but there's no reason not to at
least take this step.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 16:17:24 -07:00
Vincent Lejeune 50fd9c4544 r600g/llvm: Update LLVM_REVISION.txt 2013-04-01 23:50:20 +02:00
Vincent Lejeune 8c8c4e3977 r600g/llvm: Use stack_size provided from llvm. 2013-04-01 23:43:57 +02:00
Vincent Lejeune 4ac0d85ca6 r600g/llvm: uses function attribute to pass shader type 2013-04-01 23:43:42 +02:00
Vincent Lejeune af38695f51 r600g/llvm: Add support for cf_alu native encode 2013-04-01 23:43:27 +02:00
Haixia Shi bc0cc2944f ACTIVE_UNIFORM_MAX_LENGTH should include 3 extra characters for arrays.
If the active uniform is an array, then the length of the uniform name should
include the three extra characters for the "[0]" suffix, which is required by
the GL 4.2 spec to be appended to the uniform name in glGetActiveUniform().

This avoids the situation where the output buffer does not have enough space
to hold the "[0]" suffix, resulting in an incomplete array specification like
"foobar[0".

NOTE: This is a candidate for the 9.1 branch.

Change-Id: I41e87ba347a7169eec8c575596cc3416adbe0728
Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-04-01 13:39:13 -07:00
Matt Turner e2b40e253b i965/fs: Fix bad interaction between tex swizzles and textureQueryLOD.
Reported-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 13:11:43 -07:00
Eric Anholt 4ee892ee8a i965: Remove the old brw_optimize() code.
This is now done in the VS backend before instruction emit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 11:36:06 -07:00
Eric Anholt 4fee05b020 i965/vs: Add a pass to set dependency control fields on instructions.
This is a more aggressive version of the old brw_optimize() path.  Reduces
cycles spent in the vertex shader on minecraft by 18.6% +/- 10.0% (n=15).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 11:36:05 -07:00
Eric Anholt 229a51cdbe i965: Dump shader source for linked shader programs.
We dump shader source in ir_to_mesa.cpp, and we dump linked programs here,
but we had no reference from the linked programs to their source.  This
was preventing improvement of shader-db to use linked shader programs
instead of individual shader files (which is bogus, because it means we
optimize out VS outputs, and don't interpolate FS inputs!)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-01 11:30:36 -07:00
Mike Lothian 777a7f2003 clover: Fix build with LLVM 3.3 2013-04-01 10:50:23 -07:00
Brian Paul 1165ff1af1 llvmpipe: use triangle subdivision to avoid fixed-point overflow issues
If we're drawing to a surface that's 2048 x 2048 pixels or larger there's
danger of fixed-point overflow in the triangle rasterization code.  That
leads to various rendering glitches.

Rather than implement some intricate changes to the rasterization code,
simply subdivide triangles into smaller subtriangles to avoid the issue.
Only do this when the drawing surface is larger than 2048 by 2048.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-01 08:40:35 -06:00
Brian Paul 95df2b2883 mesa: remove platform checks around __builtin_ffs, __builtin_ffsll
Use the __builtin_ffs, __builtin_ffsll functions whenever we have GCC,
not just for specific platforms.  Fixes Solaris build.

Note: This is a candidate for the stable branches.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62868
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-01 08:40:35 -06:00
Brian Paul 99811c344b docs: add a new page documenting known application issues
Let's try to update this when we find other broken applications...

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-01 08:40:35 -06:00
Brian Paul fe30fa9ad6 drirc: set always_have_depth_buffer for Topogon
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-01 08:18:09 -06:00
Adam Jackson e26d5940ff gallivm: Minor comment cleanup
Signed-off-by: Adam Jackson <ajax@redhat.com>
2013-04-01 09:45:38 -04:00
Dave Airlie 135bb3c1a9 mesa: fix texture storage multisample prototypes harder.
I just noticed the warnings since I fixed the other bit.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-04-01 19:54:56 +10:00
Vincent Lejeune c3fb34ee8d r600g/llvm: Update LLVM_REVISION 2013-03-31 21:37:20 +02:00
Vincent Lejeune 67a8ee7aaa r600g/llvm: use native encode for tex 2013-03-31 21:35:47 +02:00
Dave Airlie 5b36bc05be glapi: fix storage multisample build errors
Reported on #radeon by udovdh

Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-03-31 20:41:28 +10:00
Chris Forbes 2a528889a3 docs: mark ARB_texture_storage_multisample done
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:42 +13:00
Chris Forbes d25b4d5e90 i965: enable ARB_texture_storage_multisample on Gen6+
This can be enabled everywhere that ARB_texture_multisample is
supported -- ARB_texture_storage is supported on everything.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:40 +13:00
Chris Forbes e0015c819c mesa: allow multisample texture targets in [Get]TexParameter*
ARB_texture_storage_multisample allows texture parameters to be
queried for TEXTURE_2D_MULTISAMPLE and TEXTURE_2D_MULTISAMPLE_ARRAY
targets.

Some parameters may also be set, with the following exceptions:

- TEXTURE_BASE_LEVEL may not be set to a nonzero value; generates
   INVALID_OPERATION

- any state which appears in the `per-sampler` state table may not
  be set; generates INVALID_OPERATION

V2: Don't introduce bogus handling of TEXTURE_MAX_LEVEL

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:36 +13:00
Chris Forbes b15c558c85 mesa: improve reported function name in Tex*Multisample
Now that there are 4 variants, just pass the function name into
teximagemultisample rather than reconstructing it.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:34 +13:00
Chris Forbes 9cbfe98bfc mesa: add enable bit for ARB_texture_storage_multisample
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:32 +13:00
Chris Forbes 719974b54c glapi: add definition of ARB_texture_storage_multisample
Adds XML for the extension, dispatch_sanity enabling, and the two new
entrypoints. These are both implemented by calling the shared
teximagemultisample() with immutable=GL_TRUE.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:28 +13:00
Chris Forbes 788b0f8535 mesa: add support for immutable textures to teximagemultisample()
The new entrypoints will come later, but this adds the actual logic for
supporting immutable multisample textures:

- The immutability flag is set as desired.
- Attempting to modify an immutable multisample texture produces
  INVALID_OPERATION.

Note: The extension spec does not mention adding this behavior to
TexImage*Multisample, but it seems like the reasonable thing to do.

V2: - Cover missing error cases (unsized formats; texture object zero)

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
[V1] Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:22 +13:00
Chris Forbes 7f32b9560b mesa: extract _mesa_is_legal_tex_storage_format helper
This is about to be used in teximagemultisample() when immutable=true.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:13 +13:00
Kenneth Graunke fdc5941972 mesa: Delete VERT_ATTRIB_GENERIC_NV and VERT_BIT_GENERIC_NV macros.
These haven't been used since we deleted NV_vertex_program support.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-30 19:19:45 -07:00
Eric Anholt 0967c362bf i965: Fix an inconsistency inb the VUE map with gl_ClipVertex on gen4/5.
We are intentionally not allocating a slot for gl_ClipVertex.  But by
leaving the bit set in the slots_valid, the fragment shader's computation
of where varyings are in urb entry coming out of the SF would be off by
one.  Fixes rendering in Freespace 2 SCP, and improves rendering in TF2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62830
Tested-by: Joaquín Ignacio Aramendía <samsagax@gmail.com>
NOTE: This is a candidate for the 9.1 branch.
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-03-30 17:24:18 -07:00
Eric Anholt 9dd19575d3 intel: Remove a never-taken debug print path.
Alessandro Pignotti noted when I added this code in commit
0e723b135b that it's in the else block for
"if (busy)", so this debug print couldn't happen.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-30 17:23:50 -07:00
Brian Paul c34bbe110d st/mesa: add ir_lod case in GLSL->TGSI code to silence warning 2013-03-29 17:21:33 -06:00
Ian Romanick e0131196ca glsl: Generated masked write instead of vector array index for UBO lowering
When reading a column from a row-major matrix, we would slot the single
value read into the vector using an ir_dereference_array of the vector
with a constant index.  This will (eventually) get optimized to a
masked-write, so just generate the masked write in the first place.

v2: Remove unused variable 'chan'.  Suggested by Ken.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Eric Anholt <eric@anholt.net>
2013-03-29 12:01:14 -07:00
Ian Romanick 65cc68f430 glsl: Replace open-coded dot-product with dot
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Paul Berry <stereotype441@gmail.com>
2013-03-29 12:01:11 -07:00
Ian Romanick dbf94d105a glsl: Replace constant-index vector array accesses with swizzles
Search and replace:

    ][0] -> ].x
    ][1] -> ].y
    ][2] -> ].z
    ][3] -> ].w

Fixes piglit tests inverse-mat[234].{vert,frag}.  These tests call the
inverse function with constant parameters and expect proper constant
folding to happen.  My suspicion is that this patch papers over some bug
in constant propagation involving array accesses.

Either way, all of these accesses eventually get lowered to swizzles.
This cuts out the middle man (saving a trivial amount of CPU).

NOTE: This is a candidate for the 9.1 branch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Paul Berry <stereotype441@gmail.com>
2013-03-29 12:01:07 -07:00
Ian Romanick c770faea0a glsl: Add missing bool case in glsl_type::get_scalar_type
Since the case was missing bec4->get_scalar_type() would return bvec4,
but vec4->get_scalar_type() would return float.

NOTE: This is a candidate for stable branches.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2013-03-29 12:01:01 -07:00
Kenneth Graunke 57a502518e i965: Fix INTEL_DEBUG=shader_time for fragment shaders with discards.
"discard" instructions generate HALT instructions which jump to a final
HALT near the end of the shader.  Previously, fs_generator created this
final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it
to jump right before the FB write epilogue.  This is normally good.

However, INTEL_DEBUG=shader_time also has an epilogue section which
records the final timestamp.  The frontend emits IR for this just before
FS_OPCODE_FB_WRITE.  Unfortunately, this led to the following ordering:

1. Shader Time Epilogue
2. Final HALT (where discards jump)
3. Framebuffer Write Epilogue

This meant that discarded pixels completely skipped the shader time
epilogue, causing no ending timestamp to be written.  This obviously
led to inaccurate results.

This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just
before any epilogue sections.  This is where the final HALT should be
generated, and makes it easy to ensure the correct ordering:

1. Final HALT
2. Shader Time Epilogue
3. Framebuffer Write Epilogue

For shaders that don't discard, this opcode compiles away to nothing.
The scheduler adds barrier dependencies to make sure that it doesn't
get moved above any FS_OPCODE_DISCARD_JUMP instructions.

One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to
a mere 5.13 Gcycles.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-29 11:39:32 -07:00
Eric Anholt 20d846ce8b i965: Add names for all instructions to dump_instruction() in FS and VS.
I'd previously added the minimum names to understand my dumps, but this
makes dumps in general much easier to read.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-29 11:39:21 -07:00
Matt Turner ed6186f0e8 i965: Enable ARB_texture_query_lod.
v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-29 10:21:14 -07:00
Matt Turner b8aa9f7d3a i965/fs: Generate LOD sampler message from ir_lod.
v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-29 10:21:14 -07:00
Dave Airlie 110ca8b1f3 glsl: Implement ARB_texture_query_lod
v2 [mattst88]:
   - Rebase.
   - #define GL_ARB_texture_query_lod to 1.
   - Remove comma after ir_lod in ir.h for MSVC.
   - Handled ir_lod in ir_hv_accept.cpp, ir_rvalue_visitor.cpp,
     opt_tree_grafting.cpp.
   - Rename textureQueryLOD to textureQueryLod, see
     https://www.khronos.org/bugzilla/show_bug.cgi?id=821
   - Fix ir_reader of (lod ...).
v3 [mattst88]:
   - Rename textureQueryLod to textureQueryLOD, pending resolution of
     Khronos 821.
   - Add ir_lod case to ir_to_mesa.cpp.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-29 10:20:26 -07:00
Matt Turner 0e0ab8a071 i965/fs: Use measured Gen7 instruction timings on Gen6.
x before
+ after
+------------------------------------------------------------------------------+
|   x                                   x   +                                  |
|   xx  ++                              x   +                                  |
|   xx  ++ +                           xx   ++                                 |
|x xxx x+++++          +           xxx x*x+*+++ +         x                   +|
|   |_____|____________A______A____M____M_|_______|                            |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
    x  23       8083.78       8287.83       8205.55     8162.7461     68.307951
    +  23       8107.56       8358.74       8224.33     8186.1765     71.506301
    No difference proven at 95.0% confidence

Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-29 10:13:27 -07:00
Matt Turner f085b21b25 i965/fs: Increase and document MAD latency on Gen7.
58% of mad(8) generated in shader-db are reading registers from the same
bank.

Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-29 10:13:27 -07:00
Matt Turner 414ea2f560 i965/fs: Add LRP instruction latency.
Set its latency to what happens to be the default floating-point
instruction latency. One day we may want to handle latency based on
register bank information.

Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-29 10:13:27 -07:00