Commit Graph

54822 Commits

Author SHA1 Message Date
Chad Versace 20dfa501b3 i965/fs/gen7: Emit code for GLSL 3.00 pack/unpack operations (v4)
v2: Remove lewd comment. [for idr]
v3: - Optimize away tmp register for packHalf2x16. [for anholt, paul]
    - Improve comments. [for anholt, paul]
    - Reduce near-duplicate code by removing vec4_visitor emit_pack/unpack
      methods. [for chadv]
v4: Factor our UD/W register conversion into helper function. [for anholt]

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2)
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:31:06 -08:00
Chad Versace 203c12b18f i965/vs/gen7: Emit code for GLSL ES 3.00 pack/unpack operations (v3)
FIXME: This patch emits VS code that violates documented hardware
restrictions and then relies on undocumented behavior that results from
that violation.  This patch passes all tests, but should be fixed ASAP to
conform to the hardware documentation.

v2: Explain undocumented hardware behavior. Improve comments.
v3: Use ALU1 helper methods F32TO16() and F16TO32(). [for anholt]

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:24:11 -08:00
Chad Versace 7093558b31 i965: Quote the PRM on a HorzStride subtlety
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:24:11 -08:00
Chad Versace 7e21910f23 i965: Add opcodes for F32TO16 and F16TO32
The GLSL ES 3.00 operations packHalf2x16 and unpackHalf2x16 will emit
these opcodes.

- Define the opcodes BRW_OPCODE_{F32TO16,F16TO32}.
- Add the opcodes to the brw_disasm table.
- Define convenience functions brw_{F32TO16,F16TO32}.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:24:10 -08:00
Chad Versace ee0ed52d69 i965: Lower the GLSL ES 3.00 pack/unpack operations (v2)
On gen < 7, we fully lower all operations to arithmetic and bitwise
operations.

On gen >= 7, we fully lower the Snorm2x16 and Unorm2x16 operations, and
partially lower the Half2x16 operations.

v2:
  - Comment that scalarization is needed only for SOA code [for idr].
  - Replace switch-statement with if-statement [for idr].
  - Remove misplaced hunk from previous patch [found by idr].

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Tuner <mattst88@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:24:10 -08:00
Chad Versace b9f56ea923 glsl: Add lowering pass for GLSL ES 3.00 pack/unpack operations (v4)
Lower them to arithmetic and bit manipulation expressions.

v2: Rewrite using ir_builder [for idr].
v3: Comment typos. [for mattst88]
v4: Fix arithmetic error in comments.
    Factor out a shift instruction.
    Don't heap allocate factory.instructions.
    [for paul]

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2)
Reviewed-by: Matt Tuner <mattst88@gmail.com> (v3)
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v4)
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:24:10 -08:00
Chad Versace 9d7931ddf0 glsl: Fix type-deduction for and/or/xor expressions
In ir_expression's constructor, the cases for {bit,logic}_{and,or,xor}
failed to handle the case when both operands were vectors.

Note: This is a candidate for the stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:24:10 -08:00
Chad Versace ccf87f2199 glsl: Reformat and/or/xor cases in ir_expression ctor
Replace tabs with spaces. According to docs/devinfo.html, Mesa's
indetation style is:
  indent -br -i3 -npcs --no-tabs infile.c -o outfile.c

This patch prevents whitespace weirdness in the next patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:24:10 -08:00
Chad Versace f859e4fbd1 glsl/ir_builder: Add helpers for making if-statements
Add two overloaded variants of
    ir_if *if_tree()

The new functions allow one to chain together if-trees within a single C++
expression that resembles a real if-statement.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:24:10 -08:00
Chad Versace a32bc53029 glsl/ir_builder: Add `enum writemask`
Using this enum improves the readibility of calls to assign(), whose third
argument is a writemask.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:24:10 -08:00
Chad Versace a6479ef968 glsl/ir_factory: Add helper method for making an ir_constant
Add method ir_factory::constant.  This little method constructs an
ir_constant using the factory's mem_ctx.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:24:10 -08:00
Chad Versace 5790174e37 glsl/ir_builder: Add more helpers for constructing expressions
Add the following functions, each of which construct the similarly named
ir expression:
    div, round_even, clamp

    equal, less, greater, lequal, gequal

    logic_not, logic_and, logic_or

    bit_not, bit_or, bit_and, lshift, rshift

    f2i, i2f, f2u, u2f, i2u, u2i

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:24:10 -08:00
Chad Versace fafcbf52b7 glsl/ir_factory: Initialize members to NULL in constructor
This eliminates unexpected behavior due to unitialized values.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:24:10 -08:00
Chad Versace 542c7a3022 glsl: Evaluate constant GLSL ES 3.00 pack/unpack expressions (v3)
That is, evaluate constant expressions of the following functions:
  packSnorm2x16  unpackSnorm2x16
  packUnorm2x16  unpackUnorm2x16
  packHalf2x16   unpackHalf2x16

v2: Reuse _mesa_pack_float_to_half and its inverse to evaluate
    pack/unpackHalf2x16. [for idr]
v3: Whitespace fixes. [for mattst88]
    Don't cast neg floats directly to uint16; use an intermediate cast to
    int16. [for paul]

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2)
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Matt Tuner <mattst88@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:24:10 -08:00
Chad Versace 529b6d1f3d mesa: Remove rounding bias in _mesa_float_to_half()
Not all float32 values can be exactly represented as a float16.
_mesa_float_to_half() rounded such intermediate float32 values to zero by
truncating unrepresentable bits in the mantissa.

This patch improves _mesa_float_to_half() by rounding intermediate float32
values to the nearest float16; when the float32 is exactly between two
float16 values we round to the one with an even mantissa. This behavior is
preferred over the old behavior because:
  - It has reduced bias relative to the old behavior.

  - It reproduces the behavior of real hardware: opcode F32TO16 in
    Intel's GPU ISA.

  - By reproducing the behavior of the GPU (at least on Intel hardware),
    compile-time evaluation of constant packHalf2x16 GLSL expressions will
    result in the same value as if the expression were executed on the GPU.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:24:10 -08:00
Chad Versace eac030e38e mesa,glsl: Move round_to_even() from glsl to mesa/main (v2)
Move round_to_even's definition to mesa/main so that _mesa_float_to_half()
can use it in order to eliminate rounding bias.

In additon to moving the fuction definition, prefix its name with "_mesa",
just as all other functions in mesa/main are prefixed.

v2: Fix Android build.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:24:07 -08:00
Chad Versace 1fafd00839 glsl/standalone_scaffolding: Add stub for _mesa_warning()
A subsequent patch will add mesa/main/imports.c as a dependency to the
compiler, which in turn requires that _mesa_warning() be defined.

The real definition of _mesa_warning() is in mesa/main/errors.c, but to
pull that file into the standalone scaffolding would require transitively
pulling in the dispatch tables.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:11:41 -08:00
Chad Versace ee5921ad0d glsl: Extend ir_expression_operation for GLSL 3.00 pack/unpack functions (v2)
For each function {pack,unpack}{Snorm,Unorm,Half}2x16, add a corresponding
opcode to enum ir_expression_operation.  Validate the new opcodes in
ir_validate.cpp.

Also, add opcodes for scalarized variants of the Half2x16 functions.  (The
code generator for the i965 fragment shader requires that all vector
operations be scalarized.  A lowering pass, to be added later, will
scalarize the Half2x16 functions).

v2: Fix assertion message in ir_to_mesa [for idr].

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Tuner <mattst88@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:11:41 -08:00
Chad Versace 3a88d71d35 glsl: Add IR lisp for GLSL ES 3.00 pack/unpack functions
For each of the following functions, add a declaration to
builtins/profiles/300es.glsl and create new file
builtins/ir/${funcname}.ir:

  packSnorm2x16  unpackSnorm2x16
  packUnorm2x16  unpackUnorm2x16
  packHalf2x16   unpackHalf2x16

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Tuner <mattst88@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:11:41 -08:00
Chad Versace 6f8f919a53 glsl: Fix typo in comment
s/num_operands()/get_num_operands()/

Discovered because Eclipse failed to resolve the false reference.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:11:41 -08:00
Chad Versace ca7d332253 i965/disasm: Fix horizontal stride of dest registers
The bug: The printed horizontal stride was the numerical value of the
  BRW_HORIZONTAL_$N enum.
The fix: Translate the enum before printing.

Note: This is a candidate for the stable releases.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:10:46 -08:00
Paul Berry d1f2e9699f intel: Fix glCopyTexSubImage on buffers whose width >= 32kbytes
When possible, glCopyTexSubImage calls are performed using the
hardware blitter.  However, according to the Ivy Bridge PRM, Vol1
Part4, section 1.2.1.2 (Graphics Data Size Limitations):

    The BLT engine is capable of transferring very large quantities of
    graphics data. Any graphics data read from and written to the
    destination is permitted to represent a number of pixels that
    occupies up to 65,536 scan lines and up to 32,768 bytes per scan
    line at the destination. The maximum number of pixels that may be
    represented per scan line’s worth of graphics data depends on the
    color depth.

With an RGBA32F color buffer (which has 16 bytes per pixel) this
imposes a maximum width of 2048 pixels.  Other pixel formats have
accordingly larger limits.

To make matters worse, if the pitch of the buffer is 32k or greater,
intel_copy_texsubimage's call to intelEmitCopyBlit will overflow
intelEmitCopyBlit's src_pitch and dst_pitch parameters (which are
16-bit signed integers).

We can conveniently avoid both problems by avoiding use of the blitter
when the miptree's pitch is >= 32k.

Fixes gles3conform "framebuffer_blit_functionality_magnifying_blit"
tests when the buffer width is equal to 8192.

Note: this is very similar to the recent patch "intel: Fix ReadPixels
on buffers whose width >= 32kbytes" except that it applies to
glCopyTexSubImage instead of glReadPixels.  In a future patch it would
be nice to refactor the code so that (a) overflow is avoided, and (b)
intelEmitCopyBlit is responsible for checking whether the blitter can
handle the width, so that all callers of intelEmitCopyBlit work
properly, rather than just these two.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-01-24 18:35:08 -08:00
Paul Berry c6a50ddfcb glsl: Allow varying structs in GLSL ES 3.00 and GLSL 1.50.
Previously I thought that varying structs had been added to GLSL ES
3.00 by mistake, because chapter 11 of the GLSL ES 3.00 spec
("Counting of Inputs and Outputs") failed to mention how structs
should be handled.  Khronos has clarified
(https://cvs.khronos.org/bugzilla/show_bug.cgi?id=9828) that varying
structs are indeed required, and that chapter 11 will be modified to
indicate that the minimal reference packing algorithm flattens varying
structs to their individual components.

Mesa doesn't flatten varying structs to their individual components,
but this is ok, since it packs varyings of all kinds with no wasted
space at all (except where this is impossible due to differing
interpolation modes), so it will outperform the minimal reference
packing algorithm in all but the most pathological cases.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-01-24 16:30:49 -08:00
Paul Berry cd53457ffa glsl: Disable transform feedback of varying structs.
It is not clear from the GLSL ES 3.00 spec how transform feedback is
supposed to apply to varying structs:

- There is no specification for how the structure is to be packed when
  it is recorded into the transform feedback buffer.

- There is no reasonable value for GetTransformFeedbackVarying to
  return as the "type" of the variable.

We currently have a Khronos bug requesting clarification on how this
feature is supposed to work
(https://cvs.khronos.org/bugzilla/show_bug.cgi?id=9856).

This patch just disables transform feedback of varying structs for
now; we can implement the proper behaviour once we find out from
Khronos what it is.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-01-24 16:30:46 -08:00
Paul Berry 1ecd23dea9 glsl: Update lower_packed_varyings to handle varying structs.
This patch adds code to lower_packed_varyings to handle varyings of
type struct.  Varying structs are currently packed in the most naive
possible way (in declaration order, with no gaps), so there is a
potential loss of runtime efficiency.  In a later patch it would be
nice to replace this with a "flattening" approach (wherein a varying
struct is flattened to individual varyings corresponding to each of
its structure elements), so that the linker can align each structure
element independently.  However, that would require a significantly
more complex implementation.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-01-24 16:30:43 -08:00
Paul Berry 88e4bfde26 glsl: Generalize compute_packing_order for varying structs.
This patch paves the way for allowing varying structs by generalizing
varying_matches::compute_packing_order to handle any type of varying.
Previously, we packed in the order (vec4, vec2, float, vec3), with
matrices being packed according to the size of their columns.  Now, we
pack everything according to its number of components mod 4, in the
order (0, 2, 1, 3).

There is no behavioural change for vectors.  Matrices are now packed
slightly differently:

- mat2x2 gets assigned PACKING_ORDER_VEC4 instead of
  PACKING_ORDER_VEC2.  This is slightly better, because it guarantees
  that the matrix occupies a single varying slot.

- mat2x3 gets assigned PACKING_ORDER_VEC2 instead of
  PACKING_ORDER_VEC3.  This is kind of a wash.  Previously, mat2x3 had
  a 25% chance of having neither of its columns double parked, a 50%
  chance of having exactly one of its columns double parked, and a 25%
  chance of having both of its columns double parked.  Now it always
  has exactly one of its columns double parked.

- mat3x3 gets assigned PACKING_ORDER_SCALAR instead of
  PACKING_ORDER_VEC3.  This doesn't affect much, since in both cases
  there is no guarantee of how the matrix will be aligned.

- mat4x2 gets assigned PACKING_ORDER_VEC4 instead of
  PACKING_ORDER_VEC2.  This is slightly better for the same reason as
  in mat2x2.

- mat4x3 gets assigned PACKING_ORDER_VEC4 instead of
  PACKING_ORDER_VEC3.  This is slightly better for the same reason as
  in mat2x2.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-01-24 16:30:40 -08:00
Paul Berry 3680864c0b glsl: Disable structure splitting for shader ins/outs.
Previously, it didn't matter whether structure splitting tried to
split shader ins/outs, because structs were prohibited from being used
for shader ins/outs.  However, GLSL 3.00 ES supports varying structs.
In order for varying structs to work, we need to make sure that
structure splitting doesn't get applied to them, because if it does,
then the linker won't be able to match up varyings properly.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-01-24 16:30:37 -08:00
Paul Berry 42a29d89fd glsl: Eliminate ambiguity between function ins/outs and shader ins/outs
This patch replaces the three ir_variable_mode enums:

- ir_var_in
- ir_var_out
- ir_var_inout

with the following five:

- ir_var_shader_in
- ir_var_shader_out
- ir_var_function_in
- ir_var_function_out
- ir_var_function_inout

This eliminates a frustrating ambiguity: it used to be impossible to
tell whether an ir_var_{in,out} variable was a shader in/out or a
function in/out without seeing where the variable was declared in the
IR.  This complicated some optimization and lowering passes, and would
have become a problem for implementing varying structs.

In the lisp-style serialization of GLSL IR to strings performed by
ir_print_visitor.cpp and ir_reader.cpp, I've retained the names "in",
"out", and "inout" for function parameters, to avoid introducing code
churn to the src/glsl/builtins/ir/ directory.

Note: a couple of comments in the code seemed to indicate that we were
planning for a possible future in which geometry shaders could have
shader-scope inout variables.  Our GLSL grammar rejects shader-scope
inout variables, and I've been unable to find any evidence in the GLSL
standards documents (or extensions) that this will ever be allowed, so
I've eliminated these comments.

Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-01-24 16:30:30 -08:00
Paul Berry 7d51ead56e glsl: Clean up case statement in builtin_variables.cpp's add_variable.
The case statement purported to handle the addition of ir_var_const_in
and ir_var_inout builtin variables.  But no such variables exist.
This patch removes the unnecessary cases, and adds a comment
explaining why they're not needed.

Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-01-24 16:30:27 -08:00
Kenneth Graunke fce9e5d41b i965/vs: Do headerless texturing for texelFetchOffset().
For texelFetchOffset(), we just add the texel offsets to the coordinate
rather than using the message header's offset fields.  So we don't
actually need a header on Gen5+.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-01-24 15:19:08 -08:00
Matt Turner 0412864ae8 libgl-xlib/build: Link with C++ when LLVM is used
Also link-in libX11 and libXext.

Tested-by: Brian Paul <brianp@vmware.com>
2013-01-24 14:00:27 -08:00
Paul Berry b50c0feb2c intel: Fix ReadPixels on buffers whose width >= 32kbytes
When possible, glReadPixels calls are performed using the hardware
blitter.  However, according to the Ivy Bridge PRM, Vol1 Part4,
section 1.2.1.2 (Graphics Data Size Limitations):

    The BLT engine is capable of transferring very large quantities of
    graphics data. Any graphics data read from and written to the
    destination is permitted to represent a number of pixels that
    occupies up to 65,536 scan lines and up to 32,768 bytes per scan
    line at the destination. The maximum number of pixels that may be
    represented per scan line’s worth of graphics data depends on the
    color depth.

With an RGBA32F color buffer (which has 16 bytes per pixel) this
imposes a maximum width of 2048 pixels.

To make matters worse, if the pitch of the buffer is 32k or greater,
intel_miptree_map_blit's call to intelEmitCopyBlit will overflow
intelEmitCopyBlit's src_pitch and dst_pitch parameters (which are
16-bit signed integers).

We can conveniently avoid both problems by avoiding the readpixels
blit path when the miptree's pitch is >= 32k.

Fixes gles3conform "half_float" tests when the buffer width is greater
than 2048.

Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
2013-01-24 13:17:07 -08:00
Ian Romanick ac158f8ee7 intel: callocing a 32 byte temp is silly, so don't
I believe that the size used to vary, so the dynamic allocation is
necessary.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2013-01-24 13:57:46 -05:00
Marek Olšák 7a23029b2f st/mesa: implement ARB_internalformat_query v2
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-01-24 18:39:28 +01:00
Marek Olšák 041234ee1e st/mesa: advertise OES_depth_texture_cube_map if GLSL 1.30 is supported
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-01-24 18:38:49 +01:00
Marek Olšák 4f0563a658 st/dri: disallow recursion in dri_flush
ST_FLUSH_FRONT may call driThrottle, which is implemented with dri_flush.
This prevents double flush as well as fence leaks caused by a recursion
in the middle of throttling.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58839

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2013-01-24 18:22:14 +01:00
Marek Olšák fffe3e0908 st/dri: add null-pointer check, remove duplicated local variable
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2013-01-24 18:22:14 +01:00
Tom Stellard 0261b4ecdb Revert "Revert "targets/opencl: Link against libgallium.la instead of libgallium.a""
This reverts commit 7824ab8070.

Now that we force linking with LLVM shared libs when building clover,
we can link against libgallium.la with no problems.
2013-01-24 15:45:32 +00:00
Tom Stellard cf69a591e1 configure.ac: Force use of LLVM shared libs with --enable-opencl v2
If we build clover with LLVM static libraries, then clover and also each
pipe_*.so driver that is built will contain their own static copy of
LLVM.  The recent automake changes have uncovered a problem where
the pipe_*.so drivers try to use clover's LLVM symbols.  This causes
LLVM's static registry objects to be initialized each time
a pipe_*.so driver is loaded by clover.  Initializing these objects
multiple times is not allowed and leads to assertion failures in the
LLVM code.

We can avoid all these problems by having clover and all the pipe_*.so
drivers link against the same LLVM shared library.

https://bugs.freedesktop.org/show_bug.cgi?id=59334
https://bugs.freedesktop.org/show_bug.cgi?id=59534

v2:
  - Fix shared library detection when LLVM is built with CMake
2013-01-24 15:45:18 +00:00
Tom Stellard 69d639ba8b configure.ac: Compute the required llvm static libraries only once
In order to determine which static LLVM libraries are needed we pass
a list of components to llvm-config and it generates the list of
library dependencies for us.  The advantage of only calling llvm-config
one time is that it can determine if two components depend on the same
library and then add it to the output list only once.  The old practice
of having each driver call llvm-config to add its own dependencies to
$(LLVM_LIBS) caused many libraries to be added to this variable multiple
times.
2013-01-24 15:44:53 +00:00
Michel Dänzer 35f0dc2cc7 radeonsi: Fall back to dummy pixel shader instead of trying indirect addressing.
Indirect addressing isn't fully handled yet.

Fixes crashes with piglit tests using indirect addressing.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
2013-01-24 08:46:48 +01:00
Marek Olšák 68cebb9a8f radeonsi: make sure copying of all texture formats is accelerated
[ Cherry-picked from r600g commit 7c371f4695 ]

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
2013-01-24 08:46:31 +01:00
Michel Dänzer de4e448095 radeonsi: Handle PIPE_FORMAT_L32A32_S/UINT for rendering.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
2013-01-24 08:46:31 +01:00
Michel Dänzer d0096dfa85 radeonsi: Make sure to use float number format for packed float colour formats.
These aren't covered by UTIL_FORMAT_TYPE_FLOAT.

Fixes 15 piglit (sub)tests.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
2013-01-24 08:46:31 +01:00
Ian Romanick 5bd86b26df intel: Enable S3TC extensions always
Always enable the use of pre-compressed texture data.  The ability to
perform on-line compression still requires the presence of libtxc_dxtn
or an explicit driconf over-ride.  Applications that just want to submit
precompessed data when an on-line compressor is not available can look
for the GL_EXT_texture_compression_dxt1 and
GL_ANGLE_texture_compression_dxt[35] extensions.

v2: Only enable the extensions that do not require on-line compression
by default.  The previous statement "This should not impact many (if
any) real applications." proved to be false for at least Sauerbraten.
This application mostly submits pre-compressed data, but it also can
submit uncompressed data that it asks the driver to compress.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Acked-by: Eric Anholt <eric@anholt.net> [v1]
Acked-by: Lee Salzman <lsalzman@gmail.com>
2013-01-23 23:38:04 -05:00
Ian Romanick 53f8251107 mesa: Like EXT_texture_compression_dxt1, advertise ANGLE_texture_compression_dxt in all APIs
This is technically outside the ANGLE spec, but it seems unlikely to
cause any harm.

v2: Simplify the extension checks by assuming the ANGLE extension will
always be enabled by any driver that enables the EXT.  Suggested by
Eric Anholt.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Lee Salzman <lsalzman@gmail.com>
2013-01-23 23:38:04 -05:00
Ian Romanick d45c6c817d mesa: Simplify _mesa_choose_tex_format handling of compressed formats
For non-generic compressed format we assert two things:

1. The format has already been validated against the set of available
   extensions.

2. The driver only enables the extension if it supports all of the
   formats that are part of that extension.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-01-23 23:38:04 -05:00
Ian Romanick a021881ccd mesa: Use a single flag for the S3TC extensions that don't require on-line compression
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Lee Salzman <lsalzman@gmail.com>
2013-01-23 23:38:04 -05:00
Carl Worth 8059c2ea90 i965: Use swizzles to force R, G, and B to 0.0 for ALPHA textures.
Similar to the previous commit, we may be using a texture with actual RGBA
storage for the GL_ALPHA format, so force the color values to 0.0.

This commit fixes the following piglit (sub) tests:

	EXT_texture_snorm/fbo-blending-formats
		GL_ALPHA16_SNORM
	        GL_ALPHA8_SNORM
		GL_ALPHA_SNORM

Note: Haswell bypasses this swizzle code, so may require an independent fix
for this bug.

Reviewed-by: Eric Anholt <eric@anholt.net>
2013-01-23 17:41:09 -08:00
Carl Worth 33599433c7 i965: Use swizzles to force alpha to 1.0 for RED, RG, or RGB textures.
We may be using a texture with actual RGBA storage for these formats, so force
the alpha value read to 1.0.

This commit fixes the following piglit (sub) tests:

	ARB_texture_float/fb-blending-formats
		GL_RGB16F_ARB
	EXT_framebuffer_object/fbo-blending-formats
                GL_RGB10
		GL_RGB12
	        GL_RGB16
	EXT_texture_snorm/fbo-blending-formats
		GL_RGB16_SNORM
		GL_RGB8_SNORM
		GL_RGB_SNORM

These test improvements depend on the previous commit as well. That commit
smashes alpha to 1.0 for the case of ReadPixels (so fixes "FBO testing" as
reported by this test), while this commit smashes alpha to 1.0 for the case of
texturing (fixed the "window testing" as reported by this test).

Note: Haswell bypasses this swizzle code, so may require an independent fix
for this bug.

Reviewed-by: Eric Anholt <eric@anholt.net>
2013-01-23 17:40:52 -08:00