Commit Graph

55813 Commits

Author SHA1 Message Date
Matt Turner ad4507b355 i965/fs: Add Haswell cycle timings
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-29 10:13:27 -07:00
Matt Turner 7997e59b65 i965: Note that write-after-write dependencies are blocking.
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-29 10:13:26 -07:00
Matt Turner f91e371fee i965: Reword comment about the shared mathbox.
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-29 10:13:26 -07:00
Roland Scheidegger 5f41e08cf3 gallivm: consolidate some half-to-float and r11g11b10-to-float code
Similar enough that we can try to use shared code.
v2: fix a stupid bug using wrong variable causing mayhem with Inf and NaNs.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com
2013-03-29 16:39:40 +01:00
Chris Forbes 4412f3bc13 mesa: provide default implementation of QuerySamplesForFormat
Previously at least i915 failed to provide an implementation, but
exposed ARB_internalformat_query anyway, leading to crashes when
QueryInternalformativ was called.

Default implementation just returns 1 for everything, so is suitable for
any driver which does not support multisampling.

V2: - Move from intel to core mesa.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-29 20:54:36 +13:00
Christoph Bumiller ee624ced36 nvc0: implement MP performance counters
There's more, but this only adds (most) of the counters that are
handled directly by the shader processors.
The other counter domains are not handled on the multiprocessor and
there are no FIFO object methods for configuring them.
Instead, they have to be programmed by the kernel via PCOUNTER, and
the interface for this isn't in place yet.
2013-03-29 00:33:01 +01:00
Christoph Bumiller 480359bcf6 nvc0: enable compression when supported 2013-03-29 00:33:01 +01:00
Christoph Bumiller 25722e3454 nvc0: use NOUVEAU_GETPARAM_GRAPH_UNITS to get MP count 2013-03-29 00:33:00 +01:00
Christoph Bumiller 443b247878 nv50,nvc0: fix 3d blits, restore viewport after blit 2013-03-29 00:33:00 +01:00
Christoph Bumiller 090e73fc46 nv50: fix 3D render target setup 2013-03-29 00:33:00 +01:00
Brian Paul b54ce3738a llvmpipe: put .bmp extension on dumped image files 2013-03-28 17:17:26 -06:00
Brian Paul e90c56bc4e llvmpipe: add 'f' suffix to 1.0 in fixed_to_float() 2013-03-28 17:17:26 -06:00
Brian Paul 499aa3ddb4 draw: fix some build breakage when LLVM is not used
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62883
Tested-by: Vinson Lee <vlee@freedesktop.org>
2013-03-28 17:15:58 -06:00
Marek Olšák 9ad9141917 mesa: handle STATE_CURRENT_ATTRIB_MAYBE_VP_CLAMPED for parameter printing
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-28 20:02:50 +01:00
Kenneth Graunke 9fe47756b3 i965: Tidy shader time printing code by using printf's field widths.
We can use %-6s%-6s rather than manually counting characters, resulting
in much more readable code.

This necessitates a small secondary change: using "total fs16" and ""
now causes the "" string to be padded out to 6 characters, resulting in
too much whitespace.  Splitting it into "total" and "fs16" produces the
same output as before.

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:44 -07:00
Eric Anholt 6192e9b377 i965/vs: Include URB payload setup in shader_time.
This much more accurately reflects the cost of the vertex shader, since
the payload setup is often a significant fraction of the instructions in
the VS.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:41 -07:00
Eric Anholt 55feb19704 i965/vs: Use a send from a 2-register VGRF for shader time writes.
This will let us emit it later, after we're setting up MRFs for the
URB write.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:37 -07:00
Eric Anholt 130138030a i965/vs: Teach copy propagation about sends from GRFs.
This incidentally also teaches it a bit about gen6 math -- we now allow
unswizzled, unmodified GRF temps as the sources for math.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:34 -07:00
Eric Anholt c3a22d42a8 i965/vs: Prepare split_virtual_grfs() for the presence of SENDs from GRFs.
v2: Fix silly bool handling, and don't add new tabs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:29 -07:00
Eric Anholt 47e795d861 i965/fs: Include everything but the final FB write in shader_time.
Previously, if you just wrote a constant color to the render target, no
time got noted at all.  This is convenient for doing single-instruction
timings, but not so much for actual program analysis.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:23 -07:00
Eric Anholt 5c5218ea61 i965/fs: Switch shader_time writes to using GRFs.
This avoids conflicts between shader_time and FB writes, so we can include
more of the program under our profiling.  This does mean hiding more of
the message setup from the optimizer, which doesn't have a way to handle
multi-reg sends from GRFs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:15 -07:00
Eric Anholt 5c039543db i965: Provide more detailed information to match shader_time to programs.
Ken asked me the other day what -1 vs 0 vs 3 vs other meant in our shader
names, and I realized that it was really unclear.  I'd like to do even
better, like noting which one is the clear shader, but that would require
exposing the metaops struct to the driver.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:11 -07:00
Eric Anholt d2ba1c24b4 i965: Track ARB program state along with GLSL state for shader_time.
This will let us do much better printouts for non-GLSL programs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:01 -07:00
Marek Olšák a19f6e880a st/dri: fix crash with HUD and single buffering 2013-03-28 18:17:21 +01:00
Marek Olšák 6b5dfa42c9 st/mesa: remove leftover printfs from ReadPixels
Oops, I thought I had removed all debugging code.
2013-03-28 18:17:21 +01:00
Eric Anholt eda434921d i965/fs: Improve performance of copy propagation dataflow using bitsets.
Reduces compile time of l4d2's slowest shader by 17.8% +/- 1.3% (n=10).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 09:48:50 -07:00
Zack Rusin d066133a76 llvmpipe/draw: Fix texture sampling in geometry shaders
We weren't correctly propagating the samplers and sampler views
when they were related to geometry shaders.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-27 03:53:02 -07:00
Zack Rusin 186a6bffdd draw/llvm: Cleanup the store debugging code
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-27 03:53:02 -07:00
Zack Rusin 10964fc73d draw: Allocate the output buffer for output primitives
We were allocating the output buffer but using the input
primitives. We need to allocate that buffer using the
maximum number of output, not input, primitives.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-27 03:53:02 -07:00
Zack Rusin f20f981553 gallivm: Implement the breakc instruction
Required by more modern examples. Like BRK but with a condition.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-27 03:53:02 -07:00
Zack Rusin b66ffcf2f8 gallivm: implement implicit primitive flushing
TGSI semantics currently require an implicit endprim at the end
of GS if an ending primitive hasn't been emitted.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-27 03:53:02 -07:00
Zack Rusin e96f4e3b85 gallium/llvm: implement geometry shaders in the llvm paths
This commits implements code generation of the geometry shaders in
the SOA paths. All the code is there but bugs are likely present.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-27 03:53:02 -07:00
Zack Rusin edcebe665d draw/gs: Fetch more than one primitive per invocation
Allows executing gs on up to 4 primitives at a time. Will also be
required by the llvm code because there we definitely don't want
to flush with just a single primitive.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-27 03:53:01 -07:00
Zack Rusin 014c4d1cd7 draw/gs: Abstract the portions of GS that are tgsi specific
To be able to add llvm paths later on we need to have some common
interface for them.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-27 03:53:01 -07:00
Zack Rusin a85c83e427 draw/llvm: Remove unused gs_constants from jit_context
The member was never used and we'll need to handle it differently
because gs will also need samplers/textures setup.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-27 03:53:01 -07:00
Zack Rusin 90ee8de700 graw/gs: add missing max output vertices to all tests
A few tests were missing this crucial property.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-27 03:53:01 -07:00
Jerome Glisse 3f7d9710e8 radeonsi: add cs tracing v3
Same as on r600, trace cs execution by writting cs offset after each
states, this allow to pin point lockup inside command stream and
narrow down the scope of lockup investigation.

v2: Use WRITE_DATA packet instead of WRITE_MEM
v3: Remove useless nop packet

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
2013-03-27 11:38:02 -04:00
Chris Forbes 21a2dfa55d mesa: only check sample count if we actually wanted multisampling
Fixes various test fallout from 90b5a2425a on Pineview, which claims to
support ARB_internalformat_query but doesn't actually provide the
driverfunc.

That driver is still broken [GetInternalformativ will still segfault!]
but it was silly to be going through the sample count logic in the
nonmultisampling case at all.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-27 07:49:12 +13:00
Christian König c77159cc11 radeon/llvm: document LLVM commit
We need at least that revision to work correctly now.

Signed-off-by: Christian König <christian.koenig@amd.com>
2013-03-26 15:08:00 +01:00
Christian König 1c10018925 radeonsi: add preloading for all samplers
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-03-26 12:57:43 +01:00
Christian König 0f6cf2bc79 radeonsi: add preloading of all constants
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-03-26 12:57:40 +01:00
Christian König 44e3224554 radeonsi: mark most intrinsics as readnone/nounwind
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-03-26 12:57:36 +01:00
Christian König 206f059e1f radeonsi: mark all loads as constant
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-03-26 12:57:33 +01:00
Christian König 86f6fc2f1d radeonsi: remove wqm intrinsic
Now the backend handles that itself.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-03-26 12:57:30 +01:00
Christian König 6249db73ea radeon/llvm: remove uneeded inclusion
The include isn't needed and the file has moved with LLVM master.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-03-26 12:57:23 +01:00
Christian König 0f001fbff1 glsl_to_tgsi: avoid creating arrays if driver doesn't support them
Avoid creating arrays if we replace indirect addressing anyway.

Signed-off-by: Christian König <christian.koenig@amd.com>
2013-03-26 10:22:27 +01:00
Christian König 462de2e65f glsl_to_tgsi: make simplify_cmp work with arrays
Even when we have arrays it is possible for simplify_cmp
to work on temps, just not on arrays.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=62696

Signed-off-by: Christian König <christian.koenig@amd.com>
2013-03-26 10:22:27 +01:00
Marek Olšák 98a8e5b87e gallium/docs: document get_driver_query_info 2013-03-26 01:37:40 +01:00
Marek Olšák 8ddae684af r600g: add a driver query returning the amount of requested VRAM and GTT memory 2013-03-26 01:28:19 +01:00
Marek Olšák 2504380aaf r600g: add a driver query returning the number of draw_vbo calls
between begin_query and end_query
2013-03-26 01:28:19 +01:00