Commit Graph

60206 Commits

Author SHA1 Message Date
Matthew McClure 0319ea9ff6 llvmpipe: clamp fragment shader depth write to the current viewport depth range.
With this patch, generate_fs_loop will clamp any fragment shader depth writes
to the viewport's min and max depth values. Viewport selection is determined
by the geometry shader output for the viewport array index. If no index is
specified, then the default viewport index is zero. Semantics for this path
can be found in draw_clamp_viewport_idx and lp_clamp_viewport_idx.

lp_jit_viewport was created to store viewport information visible to JIT code,
and is validated when the LP_NEW_VIEWPORT dirty flag is set.

lp_rast_shader_inputs is responsible for passing the viewport_index through
the rasterizer stage to fragment stage (via lp_jit_thread_data).

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-12-09 12:57:02 +00:00
Neil Roberts 992a2dbba8 wayland: Add support for eglSwapInterval
The Wayland EGL platform now respects the eglSwapInterval value. The value is
clamped to either 0 or 1 because it is difficult (and probably not useful) to
sync to more than 1 redraw.

The main change is that if the swap interval is 0 then Mesa won't install a
frame callback so that eglSwapBuffers can be executed as often as necessary.
Instead it will do a sync request after the swap buffers. It will block for
sync complete event in get_back_bo instead of the frame callback. The
compositor is likely to send a release event while processing the new buffer
attach and this makes sure we will receive that before deciding whether to
allocate a new buffer.

If there are no buffers available then instead of returning with an error,
get_back_bo will now poll the compositor by repeatedly sending sync requests
every 10ms. This is a last resort and in theory this shouldn't happen because
there should be no reason for the compositor to hold on to more than three
buffers. That means whenever we attach the fourth buffer we should always get
an immediate release event which should come in with the notification for the
first sync request that we are throttled to.

When the compositor is directly scanning out from the application's buffer it
may end up holding on to three buffers. These are the one that is is currently
scanning out from, one that has been given to DRM as the next buffer to flip
to, and one that has been attached and will be given to DRM as soon as the
previous flip completes. When we attach a fourth buffer to the compositor it
should replace that third buffer so we should get a release event immediately
after that. This patch therefore also changes the number of buffer slots to 4
so that we can accomodate that situation.

If DRM eventually gets a way to cancel a pending page flip then the compositors
can be changed to only need to hold on to two buffers and this value can be
put back to 3.

This also moves the vblank configuration defines from platform_x11.c to the
common egl_dri2.h header so they can be shared by both platforms.
2013-12-07 22:36:02 -08:00
Neil Roberts 25cc889004 wayland: Block for the frame callback in get_back_bo not dri2_swap_buffers
Consider a typical game-style main loop which might be like this:

while (1) {
	draw_something();
	eglSwapBuffers();
}

In this case the game is relying on eglSwapBuffers to throttle to a sensible
frame rate. Previously this game would end up using three buffers even though
it should only need two. This is because Mesa decides whether to allocate a
new buffer in get_back_bo which would be before it has tried to read any
events from the compositor so it wouldn't have seen any buffer release events
yet.

This patch just moves the block for the frame callback to get_back_bo.
Typically the compositor will send a release event immediately after one of
the attaches so if we block for the frame callback here then we can be sure to
have completed at least one roundtrip and received that release event after
attaching the previous buffer before deciding whether to allocate a new one.

dri2_swap_buffers always calls get_back_bo so even if the client doesn't
render anything we will still be sure to block to the frame callback. The code
to create the new frame callback has been moved to after this call so that we
can be sure to have cleared the previous frame callback before requesting a
new one.
2013-12-07 22:36:02 -08:00
Vinson Lee 965cde9232 glapi: Do not include dlfcn.h on Windows.
This patch fixes this MinGW build error.

  CC     glapi_gentable.lo
glapi_gentable.c:47:19: fatal error: dlfcn.h: No such file or directory

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-12-07 14:31:01 -08:00
Vincent Lejeune 797894036d r600/llvm: Allow arbitrary amount of temps in tgsi to llvm 2013-12-07 18:39:10 +01:00
Rob Clark a1d808638d freedreno/a3xx: add adreno 330 support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-12-07 09:37:24 -05:00
Rob Clark d36ae204d5 freedreno/a3xx/compiler: add ROUND
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-12-07 08:45:27 -05:00
Chris Forbes 88dc246630 mesa: Require per-sample shading if the `sample` qualifier is used.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2013-12-07 17:15:05 +13:00
Chris Forbes 2625a34bfc glsl: Populate gl_fragment_program::IsSample bitfield
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2013-12-07 17:15:03 +13:00
Chris Forbes 6429cc05ca mesa: add IsSample bitfield to gl_fragment_program
Drivers will need to look at this to decide if they need to do
per-sample fragment shader dispatch.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2013-12-07 17:15:01 +13:00
Chris Forbes 5d326fa963 glsl: Put `sample`-qualified varyings in their own packing classes
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2013-12-07 17:14:59 +13:00
Chris Forbes 51c5fc85e1 glsl: Add ir support for `sample` qualifier; adjust compiler and linker
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2013-12-07 17:14:58 +13:00
Chris Forbes 51aa15aca2 glsl: Add frontend support for `sample` auxiliary storage qualifier
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2013-12-07 17:14:39 +13:00
Chris Forbes a1ca580240 i965: Don't flag gather quirks for Gen8+
My understanding is that Broadwell retains the same SCS mechanism
that Haswell has, so even if the underlying issue with this format
is not fixed, the w/a will be applied in SCS rather than needing
shader code.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-12-07 16:17:27 +13:00
Chris Forbes 83b83fb984 i965/Gen7: Allow CMS layout for multisample textures
Now that all the pieces are in place, this should provide
a nice performance boost for apps using multisample textures.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-12-07 16:10:04 +13:00
Chris Forbes 3122c2421a i965/vs: Sample from MCS surface when required
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-12-07 16:10:02 +13:00
Chris Forbes 7810162053 i965/fs: Sample from MCS surface when required
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-12-07 16:09:49 +13:00
Chris Forbes 7629c489c8 i965: Add shader opcode for sampling MCS surface
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-12-07 16:09:32 +13:00
Chris Forbes 27359b8079 i965/Gen7: Include bitfield in the sampler key for CMS layout
We need to emit extra shader code in this case to sample the
MCS surface first; we can't just blindly do this all the time
since IVB will sometimes try to access the MCS surface even if
disabled.

V3: Use actual MSAA layout from the texture's mt, rather
then computing what would have been used based on the format.
This is simpler and less fragile - there's at least one case where
we might want to have a texture's MSAA layout change based on what
the app does (CMS SINT falling back to UMS if the app ever attempts
to render to it with a channel disabled.)

This also obsoletes V2's 1/10 -- compute_msaa_layout can now remain
an implementation detail of the miptree code.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-12-07 16:09:12 +13:00
Chris Forbes b1604841c2 i965/Gen7: Move decision to allocate MCS surface into intel_mipmap_create
This gives us correct behavior for both renderbuffers (which previously
worked) and multisample textures (which would never get an MCS surface
allocated, even if CMS layout was selected)

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-12-07 16:08:55 +13:00
Chris Forbes 6ca9a6f4d7 i965/Gen7: emit mcs info for multisample textures
Previously this was only done for render targets.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-12-07 16:08:52 +13:00
Chris Forbes dfa952da97 i965/wm: Set copy of sample mask in 3DSTATE_PS correctly for Haswell
The bspec says:

"SW must program the sample mask value in this field so that it matches
with 3DSTATE_SAMPLE_MASK"

I haven't observed this to actually fix anything, but stumbled across it
while adding the rest of the support for CMS layout for multisample
   textures.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-12-07 16:08:47 +13:00
Chris Forbes 8064b0f2c4 i965: refactor sample mask calculation
Haswell needs a copy of the sample mask in 3DSTATE_PS; this makes that
convenient.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-12-07 16:07:53 +13:00
Ian Romanick 758658850b glsl: Don't emit empty declaration warning for a struct specifier
The intention is that things like

   int;

will generate a warning.  However, we were also accidentally emitting
the same warning for things like

  struct Foo { int x; };

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68838
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Aras Pranckevicius <aras@unity3d.com>
Cc: "9.2 10.0" <mesa-stable@lists.freedesktop.org>
2013-12-06 08:06:54 -08:00
Thomas Hellstrom 453651e521 st/xa: Bump major version number to 2
For some reason this was left out when the version was changed...

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
2013-12-06 06:18:03 -08:00
Ben Skeggs 92ceb327ba nvc0: fixup gk110 and up not being listed in various switch statements
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2013-12-06 11:28:45 +10:00
Kenneth Graunke 26f3ff8a91 i965: Replace non-standard INLINE macro with "inline".
These are identical: main/compiler.h defines INLINE to "inline".

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2013-12-05 13:59:18 -08:00
Kenneth Graunke 11d9af7c0a i965: Don't use GL types in files shared with intel-gpu-tools.
sed -i -e 's/GLuint/unsigned/g' -e 's/GLint/int/g' \
       -e 's/GLfloat/float/g' -e 's/GLubyte/uint8_t/g' \
       -e 's/GLshort/int16_t/g' \
       brw_eu* brw_disasm.c brw_structs.h

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2013-12-05 13:59:18 -08:00
Kenneth Graunke a7bdd4cba8 i965: Drop trailing whitespace from the rest of the driver.
Performed via:
$ for file in *; do sed -i 's/  *//g'; done

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2013-12-05 13:59:18 -08:00
Kenneth Graunke d542c45c75 i965: Drop trailing whitespace from files shared with intel-gpu-tools.
Performed via s/  *$//g.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2013-12-05 13:59:18 -08:00
José Fonseca 3be333ed30 tools/trace: More tweaks to state dumping.
- Ignore buffer format (it is totally arbitrary)
- Initialize state.
- Handle begin/end_query statements.
2013-12-05 13:35:06 +00:00
José Fonseca 9648b76dc4 trace: Reorder dumping of pipe_rasterizer_state.
Such that it matches the pipe_rasterizer_state declaration, making it
easier to double-check that all state is being actually dumped.

Trivial.
2013-12-05 13:35:06 +00:00
José Fonseca 10450cbbe6 trace: Dump pipe_sampler_state::seamless_cube_map.
Trivial.
2013-12-05 13:35:06 +00:00
Michel Dänzer 7435d9f77c radeonsi: Remove some stale XXX / FIXME comments
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-12-05 13:50:07 +09:00
Matt Turner cbb49cb2f7 i965: Emit better code for ir_unop_sign.
total instructions in shared programs: 1550449 -> 1550048 (-0.03%)
instructions in affected programs:     15207 -> 14806 (-2.64%)

Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2013-12-04 20:05:44 -08:00
Matt Turner d30b2ed5f8 i965/fs: New peephole optimization to flatten IF/BREAK/ENDIF.
total instructions in shared programs: 1550713 -> 1550449 (-0.02%)
instructions in affected programs:     7931 -> 7667 (-3.33%)

Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-12-04 20:05:44 -08:00
Matt Turner 9658b04fc4 i965/fs: Emit a MOV instead of a SEL if the sources are the same.
One program affected.

instructions in affected programs:     436 -> 428 (-1.83%)

Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-12-04 20:05:44 -08:00
Matt Turner 4532cac06a i965/fs: Extend SEL peephole to handle only matching MOVs.
Before this patch, the following code would not be optimized even though
the first two instructions were common to the then and else blocks:

   (+f0) IF
   MOV dst0 ...
   MOV dst1 ...
   MOV dst2 ...
   ELSE
   MOV dst0 ...
   MOV dst1 ...
   MOV dst3 ...
   ENDIF

This commit extends the peephole to handle this case.

No shader-db changes.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-12-04 20:05:44 -08:00
Matt Turner 13de9f03f1 i965/fs: New peephole optimization to generate SEL.
fs_visitor::try_replace_with_sel optimizes only if statements whose
"then" and "else" bodies contain a single MOV instruction. It also
could not handle constant arguments, since they cause an extra MOV
immediate to be generated (since we haven't run constant propagation,
there are more than the single MOV).

This peephole fixes both of these and operates as a normal optimization
pass.

fs_visitor::try_replace_with_sel is still arguably necessary, since it
runs before pull constant loads are lowered.

total instructions in shared programs: 1559129 -> 1545833 (-0.85%)
instructions in affected programs:     167120 -> 153824 (-7.96%)
GAINED:                                13
LOST:                                  6

Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-12-04 20:05:44 -08:00
Matt Turner fa227e7cbc i965/fs: Add SEL() convenience function.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-12-04 20:05:43 -08:00
Matt Turner 4b0ef4bf38 glsl: Use fabs() on floating point values.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-12-04 20:05:43 -08:00
Matt Turner 8814806c97 i965: Print conditional mod in dump_instruction().
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-12-04 20:05:43 -08:00
Matt Turner b9af66528e i965: Externalize conditional_modifier for use in dump_instruction().
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-12-04 20:05:43 -08:00
Matt Turner 637dda1c30 i965: Print argument types in dump_instruction().
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-12-04 20:05:43 -08:00
Matt Turner 21e92e74c8 i965: Externalize reg_encoding for use in dump_instruction().
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-12-04 20:05:43 -08:00
Matt Turner 729fe77e3b i965/vec4: Don't print swizzles for immediate values.
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-12-04 20:05:43 -08:00
Matt Turner 2b8e0a73fb i965/vec4: Print negate and absolute value for src args.
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-12-04 20:05:43 -08:00
Matt Turner a85f1b7adf i965/vec4: Add support for printing HW_REGs in dump_instruction().
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-12-04 20:05:43 -08:00
Matt Turner 942151af30 i965/fs: Print ARF registers properly in dump_instruction().
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-12-04 20:05:43 -08:00
Matt Turner 0e4053234d i965: Don't print extra (null) arguments in dump_instruction().
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-12-04 20:05:42 -08:00