Commit Graph

67133 Commits

Author SHA1 Message Date
Jeremy Huddleston Sequoia 61711316f5 swrast: Fix -Wduplicate-decl-specifier warning
swrast.c:67:12: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]
const char const *swrast_vendor_string = "Mesa Project";
           ^
swrast.c:68:12: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]
const char const *swrast_renderer_string = "Software Rasterizer";
           ^

Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
2015-01-01 19:55:43 -08:00
Roy Spliet c3260f8d98 nv50/ir: Fold sat into mad
The mad instruction emitter already supported the saturate modifier,
but the ModifierFolding pass never tried folding cvt sat operations
in for NV50.

Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-01-01 21:40:35 -05:00
Ilia Mirkin 9e94b87b60 nv50/ir: fold MAD when one of the multiplicands is const
Fold MAD dst, src0, immed, src2 (or src0/immed swapped) when
 - immed = 0 -> MOV dst, src2
 - immed = +/- 1 -> ADD dst, src0, src2

These types of MAD patterns were observed in some st/nine shaders.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-01-01 21:40:35 -05:00
Alexander von Gluck IV 290553b6d6 gallium/state_tracker: Rewrite Haiku's state tracker
* More gallium-like
* Leverage stamps properly and don't call mesa functions
2015-01-01 21:33:36 -05:00
Marek Olšák b77eaafcdc radeonsi: fix warnings 2015-01-01 14:42:32 +01:00
Kenneth Graunke c633528cba i965: Fix start/base_vertex_location for >1 prims but !BRW_NEW_VERTICES.
This is a partial revert of c89306983c.
It split the {start,base}_vertex_location handling into several steps:

1. Set brw->draw.start_vertex_location = prim[i].start
   and brw->draw.base_vertex_location = prim[i].basevertex.
   (This happened once per _mesa_prim, in the main drawing loop.)
2. Add brw->vb.start_vertex_bias and brw->ib.start_vertex_offset
   appropriately.  (This happened in brw_prepare_shader_draw_parameters,
   which was called just after brw_prepare_vertices, as part of state
   upload, and only happened when BRW_NEW_VERTICES was flagged.)
3. Use those values when emitting 3DPRIMITIVE (once per _mesa_prim).

If we drew multiple _mesa_prims, but didn't flag BRW_NEW_VERTICES on
the second (or later) primitives, we would do step #1, but not #2.
The first _mesa_prim would get correct values, but subsequent ones
would only get the first half of the summation.

The reason I originally did this was because I needed the value of
gl_BaseVertexARB to exist in a buffer object prior to uploading
3DSTATE_VERTEX_BUFFERS.  I believed I wanted to upload the value
of 3DPRIMITIVE's "Base Vertex Location" field, which was computed
as: (prims[i].indexed ? prims[i].start : prims[i].basevertex) +
brw->vb.start_vertex_bias.  The latter value wasn't available until
after brw_prepare_vertices, and the former weren't available in the
state upload code at all.  Hence the awkward split.

However, I believe that including brw->vb.start_vertex_bias was a
mistake.  It's an extra bias we apply when uploading vertex data into
VBOs, to move [min_index, max_index] to [0, max_index - min_index].

>From the GL_ARB_shader_draw_parameters specification:
"<gl_BaseVertexARB> holds the integer value passed to the <baseVertex>
 parameter to the command that resulted in the current shader
 invocation.  In the case where the command has no <baseVertex>
 parameter, the value of <gl_BaseVertexARB> is zero."

I conclude that gl_BaseVertexARB should only include the baseVertex
parameter from glDraw*Elements*, not any internal biases we add for
optimization purposes.

With that in mind, gl_BaseVertexARB only needs prim[i].start or
prim[i].basevertex.  We can simply store that, and go back to computing
start_vertex_location and base_vertex_location in brw_emit_prim(), like
we used to.  This is much simpler, and should actually fix two bugs.

Fixes missing geometry in Unvanquished.

Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85529
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-12-31 17:10:47 -08:00
Kenneth Graunke faa615a798 i965: Use WARN_ONCE for the single-primitive-exceeded-aperture message.
This makes it show up via ARB_debug_output and is also less code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2014-12-31 17:06:51 -08:00
Eric Anholt a6f6d6188c u_primconvert: Fix leak of the upload BO on context destroy.
v2: Conditionalize it on having done any uploads (Turns out
    u_upload_destroy() isn't safe with a NULL arg).

Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
2014-12-31 13:50:17 -08:00
Eric Anholt 37478c638a vc4: Fix memory leak as of 0404e7fe0a.
Can't reset the CL before looking at how much we had pupt in it.
2014-12-31 11:34:28 -08:00
Ilia Mirkin be0311c962 nv50,nvc0: set vertex id base to index_bias
Fixes the piglits which check that gl_VertexID includes the base vertex
offset:
  arb_draw_indirect-vertexid elements
  gl-3.2-basevertex-vertexid

Note that this leaves out the original G80, for which this will continue
to fail. It could be fixed by passing a driver constbuf value in, but
that's beyond the scope of this change.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
2014-12-30 23:30:23 -05:00
Tiziano Bacocco 609c3e51f5 nv50,nvc0: implement half_pixel_center
LAST_LINE_PIXEL has actually been renamed to PIXEL_CENTER_INTEGER in
rnndb; use that method to implement the rasterizer setting, used for
st/nine.

Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
2014-12-30 20:11:55 -05:00
Eric Anholt 3ba57bae47 vc4: Only render tiles where the scissor ever intersected them.
This gives a 2.7x improvement in x11perf -rect100, since we only end up
load/storing the x11perf window, not the whole screen.
2014-12-30 14:33:52 -08:00
Eric Anholt 0404e7fe0a vc4: Move draw call reset handling to a helper function.
This will be more important in the next commit, when there's more state to
reset to nonzero values, and I want an early exit from the submit
function.
2014-12-30 14:30:59 -08:00
Eric Anholt effb39e899 vc4: Drop the content of vc4_flush_resource().
The callers all follow it with a flush of the context, and the flush of
the context gives us more information about how things are being flushed.
2014-12-30 14:30:59 -08:00
Emil Velikov 64dcb2bb0a docs: add news item and link release notes for mesa 10.3.6/10.4.1
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-12-30 02:50:43 +00:00
Emil Velikov 4fa6024b5f docs: Add sha256 sums for the 10.4.1 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-12-30 02:45:36 +00:00
Emil Velikov 73ec4e2265 Add release notes for the 10.4.1 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-12-30 02:45:34 +00:00
Emil Velikov dd0f2f3695 docs: Add sha256 sums for the 10.3.6 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-12-30 02:45:30 +00:00
Emil Velikov 184246b6d9 Add release notes for the 10.3.6 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2014-12-30 02:45:29 +00:00
Matt Turner 6c18279b9f mesa: Remove __SSE4_1__ guards from sse_minmax.c.
See commit e07c9a288.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
2014-12-29 12:17:06 -08:00
Matt Turner 798c094e62 i965/vec4: Do separate copy followed by constant propagation after opt_vector_float().
total instructions in shared programs: 5877012 -> 5876617 (-0.01%)
instructions in affected programs:     33140 -> 32745 (-1.19%)

From before the commit that allows VF constant propagation (which hurt
some programs) to here, the results are:

total instructions in shared programs: 5877951 -> 5876617 (-0.02%)
instructions in affected programs:     123444 -> 122110 (-1.08%)

with no programs hurt.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-29 10:08:18 -08:00
Matt Turner d61c519822 i965/vec4: Allow constant propagation of VF immediates.
total instructions in shared programs: 5877951 -> 5877012 (-0.02%)
instructions in affected programs:     155923 -> 154984 (-0.60%)

Helps 1233, hurts 156 shaders. The hurt shaders are addressed in the
next commit.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-29 10:08:18 -08:00
Matt Turner c855f49c99 i965/vec4: Add parameter to skip doing constant propagation.
After CSEing some MOV ..., VF instructions we have code like

   mov tmp, [1F, 2F, 3F, 4F]VF
   mov r10, tmp
   mov r11, tmp
   ...
   use r10
   use r11

We want to copy propagate tmp into the uses of r10 and r11, but *not*
constant propagate the VF immediate into the uses of tmp.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-29 10:08:18 -08:00
Matt Turner bbdd3198a5 i965/vec4: Do CSE, copy propagation, and DCE after opt_vector_float().
total instructions in shared programs: 5869005 -> 5868220 (-0.01%)
instructions in affected programs:     70208 -> 69423 (-1.12%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-29 10:08:18 -08:00
Matt Turner 7463e6d61b i965/vec4: Perform CSE on MOV ..., VF instructions.
Port of commit a28ad9d4 from the fs backend.

No shader-db changes since we don't emit MOV ..., VF instructions yet.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-29 10:08:18 -08:00
Matt Turner 44573458bd i965/vec4: Add pass to gather constants into a vector-float MOV.
Currently only handles consecutive instructions with the same
destination that collectively write all channels.

total instructions in shared programs: 5879798 -> 5869011 (-0.18%)
instructions in affected programs:     465236 -> 454449 (-2.32%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-29 10:08:18 -08:00
Matt Turner 7bc6e455e2 i965: Add support for saturating immediates.
I don't feel great about assert(!"unimplemented: ...") but these
cases do only seem possible under some currently impossible circumstances.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-29 10:08:18 -08:00
Matt Turner 3978585bcc i965: Add fs_reg/src_reg constructors that take vf[4].
Sometimes it's easier to generate 4x values into an array, and the
memcpy is 1 instruction, rather than 11 to piece 4 arguments together.

I'd forgotten to remove the prototype from fs_reg from a previous patch,
so it's already there for us here.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2014-12-29 10:05:03 -08:00
Alexander von Gluck IV 0c7f895995 gallium/target: Drop no longer needed Haiku viewport override
* Drop no longer needed mesa headers
* Haiku LLVM pipe working with LLVM 3.5.0 on x86_64
2014-12-27 06:12:54 +00:00
Alexander von Gluck IV 2b3a570920 gallium/st: Clean up Haiku depth mapping, fix colorspace errors 2014-12-27 05:55:29 +00:00
Eric Anholt cb5a37249c vc4: Handle unaligned accesses in CL emits.
As of 229bf4475f we started getting SIBGUS
from unaligned accesses on the hardware, for reasons I haven't figured
out.  However, we should be avoiding unaligned accesses anyway, and our CL
setup certainly would have produced them.
2014-12-25 15:47:39 -10:00
Eric Anholt db6e054eb0 vc4: Don't bother zero-initializing the shader reloc indices.
They should all be set to real values by the time they're read, and
ideally if you used valgrind you'd see uninitialized value uses.
2014-12-25 12:25:41 -10:00
Eric Anholt 0b607b54ce vc4: Fix the argument type for cl_u16().
It doesn't matter, since it just got truncated to 16 inside, anyway.
2014-12-25 12:25:41 -10:00
Alexander von Gluck IV 890ef622d6 egl: Fix non-dri SCons builds re #87657
* Revert change to egl main producing Shared Libraries
* Check for dri before including dri code
2014-12-25 10:34:49 -05:00
Michel Dänzer b3057f8097 radeonsi: Don't modify PA_SC_RASTER_CONFIG register value if rb_mask == 0
E.g. this could happen on older kernels which don't support the
RADEON_INFO_SI_BACKEND_ENABLED_MASK query yet. The code in
si_write_harvested_raster_configs() doesn't deal with this correctly and
would probably mangle the value badly.

Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-12-25 12:06:22 +09:00
Eric Anholt 229bf4475f vc4: Optimize CL emits by doing size checks up front.
The optimizer obviously doesn't have the ability to rewrite these to skip
the size checks per call, so we have to do it manually.

Improves a norast benchmark on simulation by 0.779706% +/- 0.405838%
(n=6087).
2014-12-24 10:28:26 -10:00
Eric Anholt 20e3a2430e vc4: Avoid repeated hindex lookups in the loop over tiles.
Improves norast performance of a microbenchmark by 11.1865% +/- 2.37673%
(n=20).
2014-12-24 08:28:33 -10:00
Kenneth Graunke 4616b2ef85 i965: Add missing BRW_NEW_*_PROG_DATA to texture/renderbuffer atoms.
This was probably missed when moving from a fixed binding table layout
to a dynamic one that changes based on the shader.

Fixes newly proposed Piglit test fbo-mrt-new-bind.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87619
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Mike Stroyan <mike@LunarG.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
2014-12-24 00:15:40 -08:00
Kenneth Graunke b7f14e03e3 i965: Cache register write capability checks.
Our ability to perform register writes depends on the hardware and
kernel version.  It shouldn't ever change on a per-context basis,
so we only need to check once.

Checking introduces a synchronization point between the CPU and GPU:
even though we submit very few GPU commands, the GPU might be busy doing
other work, which could cause us to stall for a while.

On an idle i7 4750HQ, this improves performance in OglDrvCtx (a context
creation microbenchmark) by 6.14748% +/- 1.6837% (n=20).  With Unigine
Valley running in the background (to keep the GPU busy), it improves
performance in OglDrvCtx by 2290.92% +/- 29.5274% (n=5).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
2014-12-24 00:15:40 -08:00
Rob Clark f332cf92b6 freedreno/ir3: split out legalize pass
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-23 19:53:01 -05:00
Rob Clark 4097ef6ee8 freedreno/ir3: ra debug
Some compile time RA debug

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2014-12-23 19:53:01 -05:00
Alexander von Gluck IV 402c808372 egl/haiku: Clean up SConscript whitespace 2014-12-23 09:07:58 -05:00
Alexander von Gluck IV 49ce07878d egl/dri2: Fix build of dri2 egl driver with SCons
* egl/dri2 was missing a SConscript
* Problem caught by Adrián Arroyo Calle
2014-12-23 09:07:58 -05:00
Alexander von Gluck IV e7ac21202d egl: Clean up Haiku visual creation
* Only create one struct
* 'final' also is a language conflict
* Some style cleanup
2014-12-23 09:07:58 -05:00
Alexander von Gluck IV 400b833592 egl: Add Haiku code and support
* This is the cleaned up work of the Haiku GCI student
  Adrián Arroyo Calle adrian.arroyocalle@gmail.com
* Several patches were consolidated to prevent
  unnecessary touching of non-related code
2014-12-23 09:07:57 -05:00
Timothy Arceri da4fb3e7a1 glsl: check if implicitly sized arrays match explicitly sized arrays across the same stage
V2: Improve error message.

Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2014-12-23 19:32:56 +11:00
Chad Versace 414be86c96 i965: Use safer pointer arithmetic in gather_oa_results()
This patch reduces the likelihood of pointer arithmetic overflow bugs in
gather_oa_results(), like the one fixed by b69c7c5dac.

I haven't yet encountered any overflow bugs in the wild along this
patch's codepath. But I get nervous when I see code patterns like this:

   (void*) + (int) * (int)

I smell 32-bit overflow all over this code.

This patch retypes 'snapshot_size' to 'ptrdiff_t', which should fix any
potential overflow.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2014-12-22 15:47:14 -06:00
Chad Versace 225a09790d i965: Use safer pointer arithmetic in intel_texsubimage_tiled_memcpy()
This patch reduces the likelihood of pointer arithmetic overflow bugs in
intel_texsubimage_tiled_memcpy() , like the one fixed by b69c7c5dac.

I haven't yet encountered any overflow bugs in the wild along this
patch's codepath. But I recently solved, in commit b69c7c5dac, an overflow
bug in a line of code that looks very similar to pointer arithmetic in
this function.

This patch conceptually applies the same fix as in b69c7c5dac. Instead
of retyping the variables, though, this patch adds some casts. (I tried
to retype the variables as ptrdiff_t, but it quickly got very messy. The
casts are cleaner).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2014-12-22 15:47:11 -06:00
Chad Versace aebcf26d82 i965: Fix intel_miptree_map() signature to be more 64-bit safe
This patch should diminish the likelihood of pointer arithmetic overflow
bugs, like the one fixed by b69c7c5dac.

Change the type of parameter 'out_stride' from int to ptrdiff_t. The
logic is that if you call intel_miptree_map() and use the value of
'out_stride', then you must be doing pointer arithmetic on 'out_ptr'.
Using ptrdiff_t instead of int should make a little bit harder to hit
overflow bugs.

As a side-effect, some function-scope variables needed to be retyped to
avoid compilation errors.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2014-12-22 15:47:07 -06:00
Chad Versace d11bc9fe8d i965: Remove spurious casts in copy_image_with_memcpy()
If a pointer points to raw, untyped memory and is never dereferenced,
then declare it as 'void*' instead of casting it to 'void*'.

Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2014-12-22 15:46:54 -06:00