Commit Graph

330 Commits

Author SHA1 Message Date
Roland Scheidegger 26cac02c51 gallivm: give more verbose names to modules
When we had just one module "gallivm" was an appropriate name. But now we have
modules containing all functions for a particular variant, so give it a
corresponding name (this is really just for helping debugging).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-16 22:50:14 +02:00
Roland Scheidegger 1d28650b55 llvmpipe: kill off llvmpipe_variant_count
Unused except it was increased for both fs and setup shader variants created.
Probably some leftover from ages ago.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-15 02:35:26 +02:00
José Fonseca 0b239d9ed9 llvmpipe: Delete unneeded LLVM stuff earlier.
Same as Frank's change to draw module but for llvmpipe module.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-05-14 11:05:00 +01:00
Roland Scheidegger 9477d8c862 llvmpipe: add support for b5g6r5_srgb
The conversion code for srgb was tuned for n x 4x8bit AoS -> 4 x nxfloat SoA
(and vice versa), fix this to handle also 16bit 565-style srgb formats.
Still not really all that generic, things like r10g10b10a2_srgb or
r4g4b4a4_srgb wouldn't work (the latter trivial to fix, the former would not
require more work to not crash but near certainly need some higher precision
calculation) but not needed right now.
The code is not fully optimized for this (could use more direct calculation
instead of expanding to 8-bit range first) but should be good enough.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-03-21 17:23:38 +01:00
Roland Scheidegger 1d53603f1f llvmpipe: fix denorm handling for r11g11b10_float format when blending
The code re-enabling denorms for small float formats did not recognize
this format due to format handling hacks (mainly, the lp_type doesn't have
the floating bit set).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-01-31 19:51:06 +01:00
José Fonseca 8771285054 s/Tungsten Graphics/VMware/
Tungsten Graphics Inc. was acquired by VMware Inc. in 2008.  Leaving the
old copyright name is creating unnecessary confusion, hence this change.

This was the sed script I used:

    $ cat tg2vmw.sed
    # Run as:
    #
    #   git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed
    #

    # Rename copyrights
    s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g
    /Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./
    s/TUNGSTEN GRAPHICS/VMWARE/g

    # Rename emails
    s/alanh@tungstengraphics.com/alanh@vmware.com/
    s/jens@tungstengraphics.com/jowen@vmware.com/g
    s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/
    s/jrfonseca\?@tungstengraphics.com/jfonseca@vmware.com/g
    s/keithw\?@tungstengraphics.com/keithw@vmware.com/g
    s/michel@tungstengraphics.com/daenzer@vmware.com/g
    s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/
    s/zack@tungstengraphics.com/zackr@vmware.com/

    # Remove dead links
    s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g

    # C string src/gallium/state_trackers/vega/api_misc.c
    s/"Tungsten Graphics, Inc"/"VMware, Inc"/

Reviewed-by: Brian Paul <brianp@vmware.com>
2014-01-17 20:00:32 +00:00
Brian Paul d6fa71fbb0 llvmpipe: handle NULL color buffer pointers
Fixes regression from 9baa45f78b

v2: incorporate a few small changes suggested by Roland.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-01-17 08:52:11 -08:00
Zack Rusin 93b953d139 llvmpipe: do constant buffer bounds checking in shaders
It's possible to bind a smaller buffer as a constant buffer, than
what the shader actually uses/requires. This could cause nasty
crashes. This patch adds the architecture to pass the maximum
allowable constant buffer index to the jit to let it make
sure that the constant buffer indices are always within bounds.
The behavior follows the d3d10 spec, which says the overflow
should always return all zeros, and overflow is only defined
as access beyond the size of the currently bound buffer. Accesses
beyond the declared shader constant register size are not
considered an overflow and expected to return garbage but consistent
garbage (we follow the behavior which some wlk tests expect which
is to return the actual values from the bound buffer).

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-01-16 16:33:57 -05:00
Si Chen 72c6d0e506 llvmpipe: Implement alpha_to_coverage for non-MSAA framebuffers.
Implement Alpha to Coverage by discarding a fragment alpha component is
less than 0.5.  This is a joint work of Jose and Si.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-01-07 16:04:42 +00:00
Matthew McClure e84a1ab3c4 llvmpipe: add plumbing for ARB_depth_clamp
With this patch llvmpipe will adhere to the ARB_depth_clamp enabled state when
clamping the fragment's zw value. To support this, the variant key now includes
the depth_clamp state. key->depth_clamp is derived from pipe_rasterizer_state's
(depth_clip == 0), thus depth clamp is only enabled when depth clip is disabled.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-12-11 18:24:21 +00:00
Zack Rusin 155139059b llvmpipe: fix blending with half-float formats
The fact that we flush denorms to zero breaks our half-float
conversion and blending. This patches enables denorms for
blending. It's a little tricky due to the llvm bug that makes
it incorrectly reorder the mxcsr intrinsics:
http://llvm.org/bugs/show_bug.cgi?id=6393

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
2013-12-10 16:39:48 -05:00
Matthew McClure 0319ea9ff6 llvmpipe: clamp fragment shader depth write to the current viewport depth range.
With this patch, generate_fs_loop will clamp any fragment shader depth writes
to the viewport's min and max depth values. Viewport selection is determined
by the geometry shader output for the viewport array index. If no index is
specified, then the default viewport index is zero. Semantics for this path
can be found in draw_clamp_viewport_idx and lp_clamp_viewport_idx.

lp_jit_viewport was created to store viewport information visible to JIT code,
and is validated when the LP_NEW_VIEWPORT dirty flag is set.

lp_rast_shader_inputs is responsible for passing the viewport_index through
the rasterizer stage to fragment stage (via lp_jit_thread_data).

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-12-09 12:57:02 +00:00
Roland Scheidegger 754319490f gallivm,llvmpipe: fix float->srgb conversion to handle NaNs
d3d10 requires us to convert NaNs to zero for any float->int conversion.
We don't really do that but mostly seems to work. In particular I suspect the
very common float->unorm8 path only really passes because it relies on sse2
pack intrinsics which just happen to work by luck for NaNs (float->int
conversion in hw gives integer indeterminate value, which just happens to be
-0x80000000 hence gets converted to zero in the end after pack intrinsics).
However, float->srgb didn't get so lucky, because we need to clamp before
blending and clamping resulted in NaN behavior being undefined (and actually
got converted to 1.0 by clamping with sse2). Fix this by using a zero/one clamp
with defined nan behavior as we can handle the NaN for free this way.
I suspect there's more bugs lurking in this area (e.g. converting floats to
snorm) as we don't really use defined NaN behavior everywhere but this seems
to be good enough.
While here respecify nan behavior modes a bit, in particular the return_second
mode didn't really do what we wanted. From the caller's perspective, we really
wanted to say we need the non-nan result, but we already know the second arg
isn't a NaN. So we use this now instead, which means that cpu architectures
which actually implement min/max by always returning non-nan (that is adhering
to ieee754-2008 rules) don't need to bend over backwards for nothing.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-11-14 12:24:55 +00:00
Vinson Lee 76df7edacf llvmpipe: Remove unnecessary null check of shader.
shader has already been dereferenced earlier so cannot be null here.

Fixes "Dereference before null check" defect reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-09-30 22:00:54 -07:00
José Fonseca 1569b3e536 llvmpipe: Fix rendering to PIPE_FORMAT_R10G10B10A2_UNORM.
We must take rounding in consideration when re-scaling to narrow
normalized channels, such as 2-bit normalized alpha.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-09-20 17:34:57 +01:00
Zack Rusin 27cedd8aec llvmpipe: fix pipeline statistics with a null ps
If the fragment shader is null then pixel shader invocations have
to be equal to zero. And if we're running a null ps then clipper
invocations and primitives should be equal to zero but only
if both stancil and depth testing are disabled.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-08-14 18:23:36 -04:00
Roland Scheidegger 4ef19f7fec llvmpipe: clamp inputs for srgb render buffers
Usually with fixed point renderbuffers clamping is done as part of conversion.
However, since we blend in float format, we essentially skip all conversion
steps pre-blend but since this is still a fixed point renderbuffer we must
still clamp the inputs in this case. Makes no difference for piglit though.
Obviously we could skip this if fragment color clamping is enabled, but a)
this is deprecated in OpenGL (d3d never had it) and b) we don't support it
natively so it gets baked into the shader.
Also add some comment about logic ops being broken for srgb, luckily no test
tries to do that as there's no easy fix...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-07-18 19:04:20 +02:00
Roland Scheidegger e57b98bad3 llvmpipe: fix blending with SRC_ALPHA_SATURATE with some formats without alpha
We were fixing up the blend factor to ZERO, however this only works correctly
with fixed point render buffers where the input values are clamped to 0/1
(because src_alpha_saturate is min(As, 1-Ad) so can be negative with unclamped
inputs). Haven't seen any failure anywhere due to that with fixed point SNORM
buffers (which clamp inputs to -1/1) but it should apply there as well (snorm
blending is rare, even opengl 4.3 doesn't require snorm rendertargets at all,
d3d10 requires them but they are not blendable).
Doesn't look like piglit hits this though (some internal testing hits the
float case at least). (With legacy OpenGL we could theoretically still use the
fixup to zero if the fragment color clamp is enabled, but we can't detect that
easily since we don't support native clamping hence it gets baked into the
shader.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-07-18 19:03:35 +02:00
Roland Scheidegger dc1cc928ed llvmpipe: support sRGB framebuffers
Just use the new conversion functions to do the work. The way it's plugged
in into the blend code is quite hacktastic but follows all the same hacks
as used by packed float format already.
Only support 4x8bit srgb formats (rgba/rgbx plus swizzle), 24bit formats never
worked anyway in the blend code and are thus disabled, and I don't think anyone
is interested in L8/L8A8. Would need even more hacks otherwise.
Unless I'm missing something, this is the last feature except MSAA needed for
OpenGL 3.0, and for OpenGL 3.1 as well I believe.

v2: prettify a bit, use separate function for packing.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-07-16 01:54:51 +02:00
Roland Scheidegger 2e4da1f594 llvmpipe: add support for nested / overlapping queries
OpenGL doesn't support this but d3d10 does.
It is a bit of a pain as it is necessary to keep track of queries
still active at the end of a scene, which is also why I cheat a bit
and limit the amount of simultaneously active queries to (arbitrary)
16 (simplifies things because don't have to deal with a real list
that way). I can't think of a reason why you'd really want large
numbers of overlapping/nested queries so it is hopefully fine.
(This only affects queries which need to be binned.)

v2: don't copy remainder of array when deleting an entry simply replace
the deleted entry with the last one (order doesn't matter).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-26 23:17:53 +02:00
Adam Jackson 2151d893fb gallium: Fix llvmpipe on big-endian machines
Squashed commit of the following:

commit 0857a7e105bfcbc4d1431b2cc56612094c747ca3
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:07 2013 -0400

    gallivm: Fix lp_build_rgba8_to_fi32_soa for big endian

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 0d65131649a8aa140e2db228ba779d685c4333e3
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:07 2013 -0400

    gallivm: Fix big-endian machines

    This adds a bit-shift count to the format table, and adds the concept of
    vector or bitwise alignment on gathers.

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 9740bda9b7dc894b629ed38be9b51059ce90818f
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:07 2013 -0400

    llvmpipe: Fix convert_to_blend_type on big-endian

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit ae037c2de0f029e4e99371c0de25560484f0d8df
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    util: Convert color pack to packed formats

    This fixes them on big-endian.

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 5b05ac0c89ae092ea8ba5bba9f739708d7396b5c
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    graw-xlib: Convert to packed formats

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 51396e7d098cb6ff794391cf11afe4dbf86dbea0
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    format: Convert to packed formats

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 417b60bc66eb450e68a92ab0e47f76e292b385e6
Author: Adam Jackson <ajax@redhat.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    st/dri: Convert to packed formats

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 0934b2e022a5e0847d312c40734e2b44cac52fd8
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    st/xlib: Convert to packed formats

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit a307ea3c3716a706963acce7966b5e405ba11db9
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    gbm: Convert to packed formats

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 53eebdd253e1960a645ea278f31d7ef6a6cf4aeb
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    tests: Convert to packed formats

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 2f77fe3ee524945eacd546efcac34f7799fb3124
Author: Adam Jackson <ajax@redhat.com>
Date:   Tue Jun 18 13:07:37 2013 -0400

    gallium: Document packed formats

    Signed-off-by: Adam Jackson <ajax@redhat.com>

commit 1f1017159ce951f922210a430de9229f91f62714
Author: Richard Sandiford <r.sandiford@uk.ibm.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    gallium: Introduce 32-bit packed format names

    These are for interacting with buffers natively described in terms of
    bit shifts, like X11 visuals:

        uint32_t xyzw8888 = (x << 0) | (y << 8) | (z << 16) | (w << 24);

    Define these in terms of (endian-dependent) aliases to the array-style
    format names.

    Reviewed-by: Adam Jackson <ajax@redhat.com>
    Signed-off-by: Richard Sandiford <r.sandiford@uk.ibm.com>

commit 6cc7ab1ee66ed668da78c1d951dfd7782b4e786a
Author: Adam Jackson <ajax@redhat.com>
Date:   Mon Jun 3 12:10:32 2013 -0400

    gallium: Document format name conventions

    v2:
    - Fix a channel name thinko (Michel Dänzer)
    - Elaborate on SCALED versus INT
    - Add links to DirectX and FOURCC docs

    Signed-off-by: Adam Jackson <ajax@redhat.com>

commit df4d269e7fb62051a3c029b84147465001e5776e
Author: Adam Jackson <ajax@redhat.com>
Date:   Tue Jun 18 12:25:06 2013 -0400

    gallivm: Remove all notion of byte-swapping

    Signed-off-by: Adam Jackson <ajax@redhat.com>

Signed-off-by: Adam Jackson <ajax@redhat.com>
2013-06-24 09:48:56 -04:00
Roland Scheidegger 008fd03600 llvmpipe: improve alignment calculation for fetching/storing pixels
This was always doing per-pixel alignment which isn't necessary, except
for the buffer case (due to the per-element offset). The disabled code
for calculating it was incorrect because it assumed that always the full
block would be fetched, which may not be the case, so fix this up.
The original code failed for instance for r10g10b10a2 the alignment would
have been calculated as 4 (block_width) * 4 (bytes) so 16, but the actual
fetch may have only fetched 2 values at a time, hence only alignment 8 -
it is unclear what exactly would happen in this case (alignment larger
than size to fetch).
So just use the (already calculated) fetch size instead and get alignment
from that which should always work, no matter if fetching 1,2 or 4 pixels.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-05 00:29:47 +02:00
Roland Scheidegger ffe2a1ca3c llvmpipe: reduce alignment requirement for 1d resources from 4x4 to 4x1
For rendering to buffers, we cannot have any y alignment.
So make sure that tile clear commands only clear up to the fb width/height,
not more (do this for all resources actually as clearing more seems
pointless for other resources too). For the jit fs function, skip execution
of the lower half of the fragment shader for the 4x4 stamp completely,
for depth/stencil only load/store the values from the first row
(replace other row with undef).
For the blend function, also only load half the values from fs output,
replace the rest with undefs so that everything still operates on the
full 4x4 block to keep code the same between 4x1 and 4x4 (except for
load/store of course which also needs to skip (store) or replace these
values with undefs (load))., at the cost of slightly less optimal code
being produced in some cases.
Also reduce 1d and 1d array alignment too, because they can be handled the
same as buffers so don't need to waste memory.

v2: don't try to run special blend code for 4x1, (very) slightly less
complexity if we just use the same code as for 4x4 which may or may not
make it easier to optimize in the future (as we care a lot more about 4x4
performance than 1d).

v2: don't use undef values for unused fs src outputs with llvm 3.1 as it
apparently can trigger a bug in llvm.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-05 00:29:47 +02:00
Roland Scheidegger ef3e887084 llvmpipe: cleanup of generate_unswizzled_blend
Some parameters were used inconsistently, for instance not using
block_width/block_height/block_size for deferring number of pixels
but rather relying on guesses from the number of fragment shaders etc,
so fix this up (no actual change in behavior since the block size stays
fixed). (Though most of the code would work with different block_height,
with three exceptions, one being the hacked r11g11b10 conversions and
twiddle code which only work with block_height 2 not 1, and the last
one being blend vector type not being 128bit wide.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-06-05 00:29:47 +02:00
Roland Scheidegger 80e2cc0f97 llvmpipe: disable simple_shader optimization
This optimization disabled mask checks if the shader is simple enough.
While this should work correctly, the problem is that it can hide real issues
because shaders in practice are usually complex enough (8 instructions or 1
texture is already enough) so this doesn't get used, whereas dumbed-down
tests which should hit all the same code paths suddenly do something quite
different. This was the reason that bug 41787 could not be easily tracked as
stencil test not working correctly (piglit would in fact have failed some
tests without that optimization).
So disable it for now, it's unclear if it's much of a win in any case.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-05-22 22:57:27 +02:00
Roland Scheidegger e108716429 llvmpipe: fix early depth test / late depth write stencil issues
We actually did early depth/stencil test and late depth/stencil write even
when the shader could kill the fragment (alpha test or discard). Since it
matters for the new stencil value if the fragment is killed by depth/stencil
test or by the shader (in which case it will not reach the depth/stencil
test) this simply cannot work (we also would possibly skip writing the new
stencil value due to mask checks but this is a secondary issue).
So use late depth test / late depth write instead in this case.
(No piglit changes as it doesn't seem to hit such bogus early depth test
/ late depth write path.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-05-22 22:57:27 +02:00
Roland Scheidegger 82d7733b52 llvmpipe: fix issue with not writing new stencil values
We did mask checks between depth/stencil testing and depth/stencil write.
This meant that if the depth/stencil test killed off all fragments we never
actually wrote the new stencil value. This issue affected all early/late
test/write combinations.
So move the mask check after depth/stencil write (for early depth test,
could do the same for late depth test but might not be worth it at that
point so just skip it there).
This addresses https://bugs.freedesktop.org/show_bug.cgi?id=41787.
Piglit does not hit this issue because of the simple_shader optimization
in generate_fs_loop() which means we're skipping the mask checks.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-05-22 22:57:27 +02:00
Roland Scheidegger 070a9afb54 llvmpipe: handle z32s8x24 depth/stencil format
We need to split up the depth and stencil values in this case, and there's
some new logic required to handle float depth and stencil simultaneously.
Also make sure we get the 64bit zs clear values and masks propagated
correctly.
2013-05-18 00:32:33 +02:00
Roland Scheidegger ae507b6260 llvmpipe: get rid of depth swizzling.
Eliminating this we no longer need to copy between linear and swizzled layout.
This is probably not quite ideal since it's a bit more work for now, could do
some optimizations by moving depth testing outside the fragment shader loop
(but tricky for early depth test as we don't have neither the mask nor the
interpolated z in the right order handy).
The large amount of tile/untile code is no longer needed will be deleted
in next commit.
No piglit regressions.
v2: change a forgotten LAYOUT_NONE to LAYOUT_LINEAR.
v3: fix (bogus) uninitialized variable warnings, add comments, fix a bad type

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-05-03 21:36:20 +02:00
José Fonseca c08b04992a llvmpipe: Ignore depth-stencil state if format has no depth/stencil.
Prevents assertion failures inside the driver for such state combinations.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-20 23:25:36 +01:00
José Fonseca a930136977 llvmpipe: Support half integer pixel center fs coord.
Tested with graw/fs-fragcoord 2/3, and piglit
glsl-arb-fragment-coord-conventions.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-04-18 14:18:25 +01:00
José Fonseca b191be52f2 llvmpipe: Remove the static interpolation.
No longer used.

If we ever want the old behavior we can run a loop unroller pass.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-04-18 14:18:22 +01:00
José Fonseca 6e833d4d09 gallivm: Drop pos arg from lp_build_tgsi_soa.
Never used.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-04-18 14:18:13 +01:00
Zack Rusin e96f4e3b85 gallium/llvm: implement geometry shaders in the llvm paths
This commits implements code generation of the geometry shaders in
the SOA paths. All the code is there but bugs are likely present.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-27 03:53:02 -07:00
Brian Paul c0f16df938 gallivm: init vars to silence warnings
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-03-25 12:24:11 -06:00
Vinson Lee 7d0c1f2437 llvmpipe: Fix assertions with assignment instead of comparison.
Fixes assign instead of compare defects reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-03-24 14:49:22 -07:00
Roland Scheidegger b101a094b5 llvmpipe: add EXT_packed_float render target format support
New conversion code to handle conversion from/to r11g11b10 AoS to/from
SoA floats, and also add code for conversion from rgb9e5 AoS to float SoA
(which works pretty much the same as r11g11b10 except for the packing).
(This code should also be used for texture sampling instead of
relying on u_format conversion but it's not yet, so rgb9e5 is unused.)
Unfortunately a crazy amount of hacks is necessary to get the conversion
code running in llvmpipe's generate_unswizzled_blend, which isn't well
suited for formats where the storage representation has nothing to do
with what's needed for blending (moreover, the conversion will convert
from packed AoS values, which is the storage format, to float SoA values,
because this is much more natural for the conversion, and likewise from
SoA values to packed AoS values - but the "blend" (which includes
trivial things like partial mask) works on AoS values, so incoming fs
values will go SoA->AoS, values from destination will go packed
AoS->SoA->AoS, then do blend, then AoS->SoA->packed AoS which probably
isn't the most efficient way though the shuffles are probably bearable).

Passes piglit fbo-blending-formats (with GL_EXT_packed_float parameter),
still need to verify Inf/NaNs (where most of the complexity in the
conversion comes from actually).

v2: drop the (very bogus) rgb9e5 part, and do component extraction
in the helper code for r11g11b10 to float conversion, making the code
slightly more compact (suggested by Jose), now that there are no other
callers left this works quite well. (Could do the same for the
opposite way but it's less than ideal there, final part of packing
needs to be done in caller anyway and there'd be another conditional.)

v3: minor style and comment fixes. Also fix a potential issue with
negative zero being potentially returned by max(src, zero) as we
don't have well-defined min/max behavior (fortunately no additonal cost).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-03-22 20:10:53 +01:00
Roland Scheidegger b6f15954b4 llvmpipe: Fix rendering into PIPE_FORMAT_X8*_UNORM.
Mesa state tracker recently started using PIPE_FORMAT_X8B8G8R8_UNORM,
causing segfaults in texture-packed-formats, because swizze[chan] was
0xff for padding channel (X).

Signed-off-by: José Fonseca <jfonseca@vmware.com>
2013-02-22 09:00:45 +00:00
Roland Scheidegger 8b8bca06df llvmpipe: implement dual source blending
link up the fs outputs and blend inputs, and make sure the second blend source
is correctly loaded and converted (which is quite complex).
There's a slight refactoring of the monster generate_unswizzled_blend()
function where it makes sense to factor out alpha conversion (which needs
to run twice for dual source blend).
This passes piglit arb_blend_func_extended tests.

v2: remove new but ultimately not used function...

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-02-12 03:41:48 +01:00
Roland Scheidegger 67906f91c9 llvmpipe: first steps of adding dual source blend support
This adds support of the additional blending factors to the blend function
itself, and also enables testing of it in lp_test_blend (which passes).
Still need to add the glue code of linking fs shader outputs to blend inputs
in llvmpipe, and probably need to add special handling if destination doesn't
include alpha (which lp_test_blend doesn't test).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-02-08 16:32:30 -08:00
Roland Scheidegger 8e44f4117a llvmpipe: refactoring of visibility counter handling
There can be other per-thread data than just vis_counter, so pass a struct
around instead (some of our non-public code uses this already and this
difference is a major cause of merge pain).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-02-08 16:32:30 -08:00
José Fonseca 0ca384fb39 llvmpipe: Support Z16_UNORM as depth-stencil format.
Simply by adjusting the vector element width after/before
reading/writing the depth-stencil values.

Ran several GL_DEPTH_COMPONENT16 piglit tests without regressions.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-01-29 07:06:36 +00:00
Roland Scheidegger c789b981b2 gallivm: split sampler and texture state
Split the sampler interface to use separate sampler and texture (sampler_view)
state. This is needed to support dx10-style sampling instructions.
This is not quite complete since both draw/llvmpipe don't really track
textures/samplers independently yet, as well as the gallivm code not quite
using the right sampler or texture index respectively (but it should work
for the sampling codes used by opengl).
We are however losing some optimizations in the process, apply_max_lod will
no longer work, and we potentially could end up with more (unnecessary)
recompiles (if switching textures with/without mipmaps only so it shouldn't
be too bad).

v2: don't use different callback structs for sampler/sampler view functions
(which just complicates things), fix up sampling code to actually use the
right texture or sampler index, and similar for llvmpipe/draw actually
distinguish between samplers and sampler views.

v3: fix more of PIPE_MAX_SAMPLER / PIPE_MAX_SHADER_SAMPLER_VIEWS mismatches
(both in draw and llvmpipe), based on feedback from José get rid of unneeded
static sampler derived state.(which also fixes the only 2 piglit regressions
due to a forgotten assignment), fix comments based on Brian's feedback.

v4: remove some accidental unrelated whitespace changes

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-01-28 06:50:36 -08:00
Roland Scheidegger f2a87a1f5b llvmpipe: more fixes for integer color buffers
Cast back the fake floats to ints, and make sure we don't try to do scaling
in format conversion (which only makes sense with normalized values).
Also need to disable blending and alpha test (as per spec) for such buffers.
This makes fbo-blending from the piglit ext_texture_integer tests work for most
formats (some crash, and the luminance and intensity variants have the GB or
GBA channels respectively wrong).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-01-18 09:14:52 -08:00
Roland Scheidegger dc6bc3b642 llvmpipe: trivial code and comment cleanup.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-01-18 09:14:52 -08:00
Roland Scheidegger 8c84a82383 llvmpipe: fix using wrong format with MRT in blend code
We were passing in the rt index however this was always 0 for non-independent
blend case. (The format was only actually used to decide if the color mask
covered all channels so this went unnoticed and was discovered by accident.)
Additionally, there was a second problem because we do fixups in the key based
on color buffer format we cannot use non-independent blend anyway as the fixed
up values would never get used.
So always turn non-independent blending into independent.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-01-18 09:14:52 -08:00
Brian Paul 8ef27e8fa9 llvmpipe: remove unneeded draw_flush() call
This is redundant since we're calling draw_bind_fragment_shader()
which already does a flush.

v2: the redundant flush in llvmpipe_set_constant_buffer() has
already been removed by commit 3427466e6d

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2012-12-12 08:45:45 -07:00
Brian Paul 3427466e6d llvmpipe: support pipe_resource-based constant buffers
Before this we only supported user-based constant buffers.

First, we basically plumb pipe_constant_buffer objects through llvmpipe
rather than pipe_resource objects.

Second, update llvmpipe_set_constant_buffer() and try_update_scene_state()
so they understand both resource- and user-based constant buffers.

The problem with user constant buffers is the potential for use-after-free,
as seen in some WebGL tests.  The next patch will flip the switch for
resource-based const buffers.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-12-11 12:48:06 -07:00
José Fonseca 1d35f77228 gallivm,llvmpipe,draw: Support multiple constant buffers.
Support 16 (defined in LP_MAX_TGSI_CONST_BUFFERS) as opposed to 32 (as
defined by PIPE_MAX_CONSTANT_BUFFERS) because that would make the jit
context become unnecessarily large.

v2: Bump limit from 4 to 16 to cover ARB_uniform_buffer_object needs,
per Dave Airlie.

Reviewed-by: Brian Paul <brianp@vmware.com>
2012-12-07 15:03:07 +00:00
José Fonseca 294d8a71ef llvmpipe: Fix alignment.
My understanding and actual implementation of how the pixels are being
fetch differed.

This fixes bug 57863.

Trivial.
2012-12-04 19:33:04 +00:00
James Benton 16f0d70ffe llvmpipe: Implement PIPE_QUERY_TIMESTAMP and PIPE_QUERY_TIME_ELAPSED.
This required an update for the query storage in llvmpipe, there
can now be an active query per query type, so an occlusion query
can run at the same time as a time elapsed query.

Based on PIPE_QUERY_TIME_ELAPSED patch from Dave Airlie.

v2: fix up piglits for timers (also from Dave Airlie)

a) if we don't render anything the result is 0, so just
return the current time

b) add missing screen get_timestamp callback.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-12-03 17:21:57 +00:00
José Fonseca 6a2f2300a8 llvmpipe: Refactor convert_to/from_blend_type to convert in place.
This fixes the "Source and destination overlap in memcpy" valgrind
warnings.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2012-12-03 14:02:43 +00:00
José Fonseca 03aa3fd54b llvmpipe: Improve color buffer loads/stores alignment.
Tell LLVM the exact alignment we can guarantee, based on the fs block
dimensions, pixel format, and the alignment of the resource base pointer
and stride.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2012-12-03 14:02:43 +00:00
Vinson Lee f126f34c1d llvmpipe: Fix incorrect sizeof.
Fixes sizeof not portable defects reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2012-11-29 21:08:48 -08:00
José Fonseca 75da95c50a llvmpipe: Eliminate color buffer swizzling.
Now dead code.

Also had to remove the show_tiles/show_subtiles because now the color
buffers are always stored in their native format, so there is no longer
an easy way to paint the tile sizes.

Depth-stencil buffers are still swizzled.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2012-11-29 14:08:43 +00:00
José Fonseca 547efc76df llvmpipe: Don't use dynamically sized arrays.
Unfortunately for MSVC arrays with a constant variable size are still
considered dynamically sized.
2012-11-28 19:58:47 +00:00
James Benton fa1b481c09 llvmpipe: Unswizzled rendering.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-11-28 19:14:36 +00:00
Brian Paul b3538d3563 llvmpipe: combine vertex/fragment sampler state into an array
This will allow code consolidation in the next patch.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2012-08-06 08:33:17 -06:00
Roland Scheidegger 70a969f123 llvmpipe: use runtime loop instead of static loop for looping over quads
This can potentially cut shader program size by a factor of 4 for 4-wide
execution respectively 2 for 8-wide execution and while this ratios aren't
quite reached for more complex shaders it can be close.
Could not really measure a performance difference so far except for trivial
shaders (glxgears).
There seems to be a fair amount of unnecessary move's generated especially
at the beginning it might be possible to optimize those away somehow.
Things aren't quite as clean, some additional stuff needs to be done for
keeping both paths working (though llvm might be able to optimize this away).
glxgears seems to lose about 5-10% of performance, looking at the generated
shaders this is actually less than I'd think it would be - both 4 and 8-wide
shaders, despite containing a loop actually have about 10% more instructions
in total, and will have roughly 50% more executed instructions (though mostly
cheap ones). Need to figure out how to reduce overhead...

v2: keep complex interpolation for 4-wide mode, adapt to interface changes.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2012-07-20 20:17:15 +01:00
José Fonseca 3469715a8a gallivm,draw,llvmpipe: Support wider native registers.
Squashed commit of the following:

commit 7acb7b4f60dc505af3dd00dcff744f80315d5b0e
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon Jul 9 17:46:31 2012 +0100

    draw: Don't use dynamically sized arrays.

    Not supported by MSVC.

commit 5810c28c83647612cb372d1e763fd9d7780df3cb
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon Jul 9 17:44:16 2012 +0100

    gallivm,llvmpipe: Don't use expressions with PIPE_ALIGN_VAR().

    MSVC doesn't accept exceptions in _declspec(align(...)). Use a
    define instead.

commit 8aafd1457ba572a02b289b3f3411e99a3c056072
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon Jul 9 17:41:56 2012 +0100

    gallium/util: Make u_cpu_detect.h header C++ safe.

commit 5795248350771f899cfbfc1a3a58f1835eb2671d
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon Jul 2 12:08:01 2012 +0100

    gallium/util: Add ULL suffix to large constants.

    As suggested by Andy Furniss: it looks like some old gcc versions
    require it.

commit 4c66c22727eff92226544c7d43c4eb94de359e10
Author: José Fonseca <jfonseca@vmware.com>
Date:   Fri Jun 29 13:39:07 2012 +0100

    gallium/util: Truly disable INF/NAN tests on MSVC.

    Thanks to Brian for spotting this.

commit 8bce274c7fad578d7eb656d9a1413f5c0844c94e
Author: José Fonseca <jfonseca@vmware.com>
Date:   Fri Jun 29 13:39:07 2012 +0100

    gallium/util: Disable INF/NAN tests on MSVC.

    Somehow they are not recognized as constants.

commit 6868649cff8d7fd2e2579c28d0b74ef6dd4f9716
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Jul 5 15:05:24 2012 +0200

    gallivm: Cleanup the 2 x 8 float -> 16 ub special path in lp_build_conv.

    No behaviour change intended, like 7b98455fb40c2df84cfd3cdb1eb7650f67c8a751.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit 5147a0949c4407e8bce9e41d9859314b4a9ccf77
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Jul 5 14:28:19 2012 +0200

    gallivm: (trivial) fix issues with multiple-of-4 texture fetch

    Some formats can't handle non-multiple of 4 fetches I believe, but
    everything must support length 1 and multiples of 4.
    So avoid going to scalar fetch (which is very costly) just because length
    isn't 4.
    Also extend the hack to not use shift with variable count for yuv formats to
    arbitrary length (larger than 1) - doesn't matter how many elements we
    have we always want to avoid it unless we have variable shift count
    instruction (which we should get with avx2).

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit 87ebcb1bd71fa4c739451ec8ca89a7f29b168c08
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Jul 4 02:09:55 2012 +0200

    gallivm: (trivial) fix typo for wrap repeat mode in linear filtering aos code

    This would lead to bogus coordinates at the edges.
    (undetected by piglit because this path is only taken for block-based
    formats).

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit 3a42717101b1619874c8932a580c0b9e6896b557
Author: José Fonseca <jfonseca@vmware.com>
Date:   Tue Jul 3 19:42:49 2012 +0100

    gallivm: Fix TGSI integer translation with AVX.

commit d71ff104085c196b16426081098fb0bde128ce4f
Author: José Fonseca <jfonseca@vmware.com>
Date:   Fri Jun 29 15:17:41 2012 +0100

    llvmpipe: Fix LLVM JIT linear path.

    It was not working properly because it was looking at the JIT function
    before it was actually compiled.

    Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit a94df0386213e1f5f9a6ed470c535f9688ec0a1b
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Jun 28 18:07:10 2012 +0100

    gallivm: Refactor lp_build_broadcast(_scalar) to share code.

    Doesn't really change the generated assembly, but produces more compact IR,
    and of course, makes code more consistent.

    Reviewed-by: Brian Paul <brianp@vmware.com>

commit 66712ba2731fc029fa246d4fc477d61ab785edb5
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Jun 27 17:30:13 2012 +0100

    gallivm: Make LLVMContextRef a singleton.

    There are any places inside LLVM that depend on it.  Too many to attempt
    to fix.

    Reviewed-by: Brian Paul <brianp@vmware.com>

commit ff5fb7897495ac263f0b069370fab701b70dccef
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Jun 28 18:15:27 2012 +0200

    gallivm: don't use 8-wide texture fetch in aos path

    This appears to be a slight loss usually.
    There are probably several reasons for that:
    - fetching itself is scalar
    - filtering is pure int code hence needs splitting anyway, same
      for the final texel offset calculations
    - texture wrap related code, which can be done 8-wide, is slightly more
      complex with floats (with clamp_to_edge) and float operations generally
      more costly hence probably not much faster overall
    - the code needed to split when encountering different mip levels for the
      quads, adding complexity
    So, just split always for aos path (but leave it 8-wide for soa, since we
    do 8-wide filtering there when possible).
    This should certainly be revisited if we'd have avx2 support.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit ce8032b43dcd8e8d816cbab6428f54b0798f945d
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Jun 27 18:41:19 2012 +0200

    gallivm: (trivial) don't extract fparts variable if not needed

    Did not have any consequences but unnecessary.

commit aaa9aaed8f80dc282492f62aa583a7ee23a4c6d5
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Jun 27 18:09:06 2012 +0200

    gallivm: fix precision issue in aos linear int wrap code

    now not just passes at a quick glance but also with piglit...
    If we do the wrapping with floats, we also need to set the
    weights accordingly. We can potentially end up with different
    (integer) coordinates than what the integer calculations would
    have chosen, which means the integer weights calculated previously
    in this case are completely wrong. Well at least that's what I think
    happens, at least recalculating the weights helps.
    (Some day really should refactor all the wrapping, so we do whatever is
    fastest independent of 16bit int aos or 32bit float soa filtering.)

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit fd6f18588ced7ac8e081892f3bab2916623ad7a2
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Jun 27 11:15:53 2012 +0100

    gallium/util: Fix parsing of options with underscore.

    For example

      GALLIVM_DEBUG=no_brilinear

    which was being parsed as two options, "no" and "brilinear".

commit 09a8f809088178a03e49e409fa18f1ac89561837
Author: James Benton <jbenton@vmware.com>
Date:   Tue Jun 26 15:00:14 2012 +0100

    gallivm: Added a generic lp_build_print_value which prints a LLVMValueRef.

    Updated lp_build_printf to share common code.
    Removed specific lp_build_print_vecX.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>
    Reviewed-by: Brian Paul <brianp@vmware.com>

commit e59bdcc2c075931bfba2a84967a5ecd1dedd6eb0
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed May 16 15:00:23 2012 +0100

    draw,llvmpipe: Avoid named struct types on LLVM 3.0 and later.

    Starting with LLVM 3.0, named structures are meant not for debugging, but
    for recursive data types, previously also known as opaque types.

    The recursive nature of these types leads to several memory management
    difficulties.  Given that we don't actually need recursive types, avoid
    them altogether.

    This is an attempt to address fdo bugs 41791 and 44466. The issue is
    somewhat random so there's no easy way to check how effective this is.

    Cherry-picked from 9af1ba565d

commit df6070f618a203c7a876d984c847cde4cbc26bdb
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Jun 27 14:42:53 2012 +0200

    gallivm: (trivial) fix typo in faster aos linear int wrap code

    no longer crashes, now REALLY tested.

commit d8f98dce452c867214e6782e86dc08562643c862
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Jun 26 18:20:58 2012 +0200

    llvmpipe: (trivial) remove bogus optimization for float aos repeat wrap

    This optimization for nearest filtering on the linear path generated
    likely bogus results, and the int path didn't have any optimizations
    there since the only shader using force_nearest apparently uses
    clamp_to_edge not repeat wrap anyway.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit c4e271a0631087c795e756a5bb6b046043b5099d
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Jun 26 23:01:52 2012 +0200

    gallivm: faster repeat wrap for linear aos path too

    Even if we already have scaled integer coords, it's way faster to use
    the original float coord (plus some conversions) rather than use URem.
    The choice of what to do for texture wrapping is not really tied to int
    aos or float soa filtering though for some modes there can be some gains
    (because of easier weight calculations).

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit 1174a75b1806e92aee4264ffe0ffe7e70abbbfa3
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Jun 26 14:39:22 2012 +0200

    gallivm: improve npot tex wrap repeat in linear soa path

    URem gets translated into series of scalar divisions so
    just about anything else is faster.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit f849ffaa499ed96fa0efd3594fce255c7f22891b
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Jun 26 00:40:35 2012 +0100

    gallivm: (trivial) fix near-invisible shift-space typo

    I blame the keyboard.

commit 5298a0b19fe672aebeb70964c0797d5921b51cf0
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 25 16:24:28 2012 +0200

    gallivm: add new intrinsic helper to deal with arbitrary vector length

    This helper will split vectors which are too large for the hw, or expand
    them if they are too small, so a caller of a function using intrinsics which
    uses such sizes need not split (or expand) the vectors manually and the
    function will still use the intrinsic instead of dropping back to generic
    llvm code. It can also accept scalars for use with pseudo-vector intrinsics
    (only useful for float arguments, all x86 scalar simd float intrinsics use
    4vf32).
    Only used for lp_build_min/max() for now (also added the scalar float case
    for these while there). (Other basic binary functions could use it easily,
    whereas functions with a different interface would need different helpers.)
    Expanding vectors isn't widely used, because we always try to use
    build contexts with native hw vector sizes. But it might (or not) be nicer
    if this wouldn't need to be done, the generated code should in theory stay
    the same (it does get hit by lp_build_rho though already since we
    didn't have a intrinsic for the scalar lp_build_max case before).

    v2: incorporated Brian's feedback, and also made the scalar min/max case work
        instead of crash (all scalar simd float intrinsics take 4vf32 as argument,
        probably the reason why it wasn't used before).
        Moved to lp_bld_intr based on José's request, and passing intrinsic size
        instead of length.
        Ideally we'd derive the source type info from the passed in llvm value refs
        and process some llvmtype return type so we could handle intrinsics where
        the source and destination type isn't the same (like float/int conversions,
        packing instructions) but that's a bit too complicated for now.

    Reviewed-by: Brian Paul <brianp@vmware.com>
    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit 01aa760b99ec0b2dc8ce57a43650e83f8c1becdf
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 25 16:19:18 2012 +0200

    gallivm: (trivial) increase max code size for shader disassembly

    64kB was just short of what I needed (which caused a crash) hence
    increase to 96kB (should probably be smarter about that).

commit 74aa739138d981311ce13076388382b5e89c6562
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 25 11:53:29 2012 +0100

    gallivm: simplify aos float tex wrap repeat nearest

    just handle pot and npot the same. The previous pot handling
    ended up with exactly the same instructions plus 2 more (leave it
    in the soa path though since it is probably still cheaper there).
    While here also fix a issue which would cause a crash after an assert.

commit 0e1e755645e9e49cfaa2025191e3245ccd723564
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 25 11:29:24 2012 +0100

    gallivm: (trivial) skip floor rounding in ifloor when not signed

    This was only done for the non-sse41 case before, but even with
    sse41 this is obviously unnecessary (some callers already call
    itrunc in this case anyway but some might not).

commit 7f01a62f27dcb1d52597b24825931e88bae76f33
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 25 11:23:12 2012 +0100

    gallivm: (trivial) fix bogus comments

commit 5c85be25fd82e28490274c468ce7f3e6e8c1d416
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Jun 20 11:51:57 2012 +0100

    translate: Free elt8_func/elt16_func too.

    These were leaking.

    Reviewed-by: Brian Paul <brianp@vmware.com>
    Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit 0ad498f36fb6f7458c7cffa73b6598adceee0a6c
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Jun 19 15:55:34 2012 +0200

    gallivm: fix bug for tex wrap repeat with linear sampling in aos float path

    The comparison needs to be against length not length_minus_one, otherwise
    the max texel is never chosen (for the second coordinate).

    Fixes piglit texwrap-1D-npot-proj (and 2D/3D versions).

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit d1ad65937c5b76407dc2499b7b774ab59341209e
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Jun 19 16:13:43 2012 +0200

    gallivm: simplify soa tex wrap repeat with npot textures and no mip filtering

    Similar to what is already done in aos sampling for the float path (but not
    the int path since we don't get normalized float coordinates there).
    URem is expensive and the calculation is done trivially with
    normalized floats instead (at least with sse41-capable cpus).
    (Some day should probably do the same for the mip filter path but it's much
    more complicated there hence the gain is smaller.)

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit e1e23f57ba9b910295c306d148f15643acc3fc83
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 18 20:38:56 2012 +0200

    llvmpipe: (trivial) remove duplicated function declaration

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit 07ca57eb09e04c48a157733255427ef5de620861
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 18 20:37:34 2012 +0200

    llvmpipe: destroy setup variants on context destruction

    lp_delete_setup_variants() used to be called in garbage collection,
    but this no longer exists hence the setup shaders never got freed.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit ed0003c633859a45f9963a479f4c15ae0ef1dca3
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 18 16:25:29 2012 +0100

    gallivm: handle different ilod parts for multiple quad sampling

    This fixes filtering when the integer part of the lod is not the same
    for all quads. I'm not fully convinced of that solution yet as it just
    splits the vector if the levels to be sampled from are different.
    But otherwise we'd need to do things like some minify steps, and getting
    mip level base address separately anyway hence it wouldn't really look
    like much of a win (and making the code even more complex).
    This should now give identical results to single quad sampling.

commit 8580ac4cfc43a64df55e84ac71ce1a774d33c0d2
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Jun 14 18:14:47 2012 +0200

    gallivm: de-duplicate sample code common to soa and aos sampling

    There doesn't seem to be any reason why this code dealing with cube face
    selection, lod and mip level calculation is separate in aos and
    soa sampling, and I am sick of having it to change in both places.

commit fb541e5f957408ce305b272100196f1e12e5b1e8
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Jun 14 18:15:41 2012 +0200

    gallivm: do mip filtering with per quad lod_fpart

    This gives better results for mip filtering, though the generated code might
    not be optimal. For now it also creates some artifacts if the lod_ipart isn't
    the same for all quads, since instead of using the same mip weight for all
    quads as previously (which just caused non-smooth gradients) this now will
    use the right weights but with the wrong mip level in this case (can easily
    be seen with things like texfilt, mipmap_tunnel).
    v2: use logic helper suggested by José, and fix issue with negative lod_fpart
        values

commit f1cc84eef7d826a20fab6cd8ccef9a275ff78967
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Jun 13 18:35:25 2012 +0200

    gallivm: (trivial) fix bogus assert in lp_build_unpack_broadcast_aos_scalars

commit 7c17dbae8ae290df9ce0f50781a09e8ed640c044
Author: James Benton <jbenton@vmware.com>
Date:   Tue Jun 12 12:11:14 2012 +0100

    util: Reimplement half <-> float conversions.

    Removed u_half.py used to generate the table for previous method.

    Previous implementation of float to half conversion was faulty for
    denormalised and NaNs and would require extra logic to fix,
    thus making the speedup of using tables irrelevant.

commit 7762f59274070e1dd4b546f5cb431c2eb71ae5c3
Author: James Benton <jbenton@vmware.com>
Date:   Tue Jun 12 12:12:16 2012 +0100

    tests: Updated tests to properly handle NaN for half floats.

commit fa94c135aea5911fd93d5dfb6e6f157fb40dce5e
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 11 18:33:10 2012 +0200

    gallivm: do mip level calculations per quad

    This is the final piece which shouldn't change the rendering output yet.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit 23cbeaddfe03c09ca18c45d28955515317ffcf4c
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 9 00:54:21 2012 +0200

    gallivm: do per-quad cube face selection

    Doesn't quite fix the piglit cubemap test (not sure why actually)
    but doing per-quad face selection is doing the right thing and
    definitely an improvement.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit abfb372b3702ac97ac8b5aa80ad1b94a2cc39d33
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 11 18:22:59 2012 +0200

    gallivm: do all lod calculations per quad

    Still no functional change but lod is now converted to scalar after
    lod calculations.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit 519368632747ae03feb5bca9c655eccbc5b751b4
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 16:46:10 2012 +0100

    gallivm: Added support for half-float to float conversion in lp_build_conv.

    Updated various utility functions to support this change.

commit 135b4d683a4c95f7577ba27b9bffa4a6fbd2c2e7
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 16:02:46 2012 +0100

    gallivm: Added function for half-float to float conversion.

    Updated lp_build_format_aos_array to support half-float source.

commit 37d648827406a20c5007abeb177698723ed86673
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 14:55:18 2012 +0100

    util: Updated u_format_tests to rigidly test half-float boundary values.

commit 2ad18165d96e578aa9046df7c93cb1c3284d8c6b
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 14:54:16 2012 +0100

    llvmpipe: Updated lp_test_format to properly handle Inf/NaN results.

commit 78740acf25aeba8a7d146493dd5c966e22c27b73
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 14:53:30 2012 +0100

    util: Added functions for checking NaN / Inf for double and half-floats.

commit 35e9f640ae01241f9e0d67fe893bbbf564c05809
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu May 24 21:05:13 2012 +0200

    gallivm: Fix calculating rho for 3d textures for the single-quad case

    Discovered by accident, this looks like a very old typo bug.

commit fc1220c636326536fd0541913154e62afa7cd1d8
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu May 24 21:04:59 2012 +0200

    gallivm: do calcs per-quad in lp_build_rho

    Still convert to scalar at the end of the function.

commit 50a887ffc550bf310a6988fa2cea5c24d38c1a41
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon May 21 23:21:50 2012 +0200

    gallivm: (trivial) return scalar in lp_build_extract_range for length 1 vectors

    Our type system on top of llvm's one doesn't generally support vectors of
    length 1, instead using scalars. So we should return a scalar from this
    function instead of having to bitcast the vector with length 1 later elsewhere.

commit 80c71c621f9391f0f9230460198d861643324876
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 17:49:15 2012 +0100

    draw: Fixed bad merge error

commit c47401cfad0c9167de20ff560654f533579f452c
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 15:29:30 2012 +0100

    draw: Updated store_clip to store whole vectors instead of individual elements.

commit 2d9c1ad74b0b0b41861fffcecde39f09cc27f1cf
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 15:28:32 2012 +0100

    gallivm: Added lp_build_fetch_rgba_aos_array.

    A version of lp_build_fetch_rgba_aos which is targeted at simple array formats.

    Reads the whole vector from memory in one, instead of reading each element
    individually.

    Tested with mesa tests and demos.

commit ff7805dc2b6ef6d8b11ec4e54aab1633aef29ac8
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 15:27:40 2012 +0100

    gallivm: Added lp_build_pad_vector.

    This function pads a vector with undef to a desired length.

commit 701f50acef24a2791dabf4730e5b5687d6eb875d
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 17:27:19 2012 +0100

    util: Added util_format_is_array.

    This function checks whether a format description is in a simple array format.

commit 5e0a7fa543dcd009de26f34a7926674190fa6246
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 19:13:47 2012 +0100

    draw: Removed draw_llvm_translate_from and draw/draw_llvm_translate.c.

    This is "replaced" by adding an optimised path in lp_build_fetch_rgba_aos
    in an upcoming patch.

commit 8c886d6a7dd3fb464ecf031de6f747cb33e5361d
Author: James Benton <jbenton@vmware.com>
Date:   Wed May 16 15:02:31 2012 +0100

    draw: Modified store_aos to write the vector as one, not individual elements.

commit 37337f3d657e21dfd662c7b26d61cb0f8cfa6f17
Author: James Benton <jbenton@vmware.com>
Date:   Wed May 16 14:16:23 2012 +0100

    draw: Changed aos_to_soa to use lp_build_transpose_aos.

commit bd2b69ce5d5c94b067944d1dcd5df9f8e84548f1
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 19:14:27 2012 +0100

    draw: Changed soa_to_aos to use lp_build_transpose_aos.

commit 0b98a950d29a116e82ce31dfe7b82cdadb632f2b
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 18:57:45 2012 +0100

    gallivm: Added lp_build_transpose_aos which converts between aos and soa.

commit 69ea84531ad46fd145eb619ed1cedbe97dde7cb5
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 18:57:01 2012 +0100

    gallivm: Added lp_build_interleave2_half aimed at AVX unpack instructions.

commit 7a4cb1349dd35c18144ad5934525cfb9436792f9
Author: José Fonseca <jfonseca@vmware.com>
Date:   Tue May 22 11:54:14 2012 +0100

    gallivm: Fix build on Windows.

    MC-JIT not yet supported there.

    Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit afd105fc16bb75d874e418046b80d9cc578818a1
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 16:17:26 2012 +0100

    llvmpipe: Added a error counter to lp_test_conv.

    Useful for keeping track of progress when fixing errors!

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit b644907d08c10a805657841330fc23db3963d59c
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 16:16:46 2012 +0100

    llvmpipe: Changed known failures in lp_test_conv.

    To comply with the recent fixes to lp_bld_conv.

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit d7061507bd94f6468581e218e61261b79c760d4f
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 16:14:38 2012 +0100

    llvmpipe: Added fixed point types tests to lp_test_conv.

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit 146b3ea39b4726dbe125ac666bd8902ea3d6ca8c
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 16:26:35 2012 +0100

    llvmpipe: Changed lp_test_conv src/dst alignment to be correct.

    Now based on the define rather than a fixed number.

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit f3b57441f834833a4b142a951eb98df0aa874536
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 16:06:44 2012 +0100

    gallivm: Fixed erroneous optimisation in lp_build_min/max.

    Previously assumed normalised was 0 to 1, but it can be -1 to 1
    if type is signed.
    Tested with lp_test_conv and lp_test_format, reduced errors.

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit a0613382e5a215cd146bb277646a6b394d376ae4
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 16:04:49 2012 +0100

    gallivm: Compensate for lp_const_offset in lp_build_conv.

    Fixing a /*FIXME*/ to remove errors in integer conversion in lp_build_conv.
    Tested using lp_test_conv and lp_test_format, reduced errors.

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit a3d2bf15ea345bc8a0664f8f441276fd566566f3
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 16:01:25 2012 +0100

    gallivm: Fixed overflow in lp_build_clamped_float_to_unsigned_norm.

    Tested with lp_test_conv and lp_test_format, reduced errors.

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit e7b1e76fe237613731fa6003b5e1601a2e506207
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon May 21 20:07:51 2012 +0100

    gallivm: Fix build with LLVM 2.6

    Trivial, and useful.

commit d3c6bbe5c7f5ba1976710831281ab1b6a631082d
Author: José Fonseca <jfonseca@vmware.com>
Date:   Tue May 15 17:15:59 2012 +0100

    gallivm: Enable MCJIT/AVX with vanilla LLVM 3.1.

    Add the necessary C++ glue, so that we don't need any modifications
    to the soon to be released LLVM 3.1.

    Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit 724a019a14d40fdbed21759a204a2bec8a315636
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon May 14 22:04:06 2012 +0100

    gallivm: Use HAVE_LLVM 0x0301 consistently.

commit af6991e2a3868e40ad599b46278551b794839748
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon May 14 21:49:06 2012 +0100

    gallivm: Add MCRegisterInfo.h to silence benign warnings about missing implementation.

    Trivial.

commit 6f8a1d75458daae2503a86c6b030ecc4bb494e23
Author: Vinson Lee <vlee@freedesktop.org>
Date:   Mon Apr 2 22:14:15 2012 -0700

    gallivm: Pass in a MCInstrInfo to createMCInstPrinter on llvm-3.1.

    llvm-3.1svn r153860 makes MCInstrInfo available to the MCInstPrinter.

    Signed-off-by: Vinson Lee <vlee@freedesktop.org>
    Reviewed-by: Brian Paul <brianp@vmware.com>

commit 62555b6ed8760545794f83064e27cddcb3ce5284
Author: Vinson Lee <vlee@freedesktop.org>
Date:   Tue Mar 27 21:51:17 2012 -0700

    gallivm: Fix method overriding in raw_debug_ostream.

    Use matching type qualifers to avoid method hiding.

    Signed-off-by: Vinson Lee <vlee@freedesktop.org>
    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit 6a9bd784f4ac68ad0a731dcd39e5a3c39989f2be
Author: Vinson Lee <vlee@freedesktop.org>
Date:   Tue Mar 13 22:40:52 2012 -0700

    gallivm: Fix createOProfileJITEventListener namespace with llvm-3.1.

    llvm-3.1svn r152620 refactored the OProfile profiling code.
    createOProfileJITEventListener was moved from the llvm namespace to the
    llvm::JITEventListener namespace.

    Signed-off-by: Vinson Lee <vlee@freedesktop.org>
    Reviewed-by: Brian Paul <brianp@vmware.com>

commit b674955d39adae272a779be85aa1bd665de24e3e
Author: Vinson Lee <vlee@freedesktop.org>
Date:   Mon Mar 5 22:00:40 2012 -0800

    gallivm: Pass in a MCRegisterInfo to MCInstPrinter on llvm-3.1.

    llvm-3.1svn r152043 changes createMCInstPrinter to take an additional
    MCRegisterInfo argument.

    Signed-off-by: Vinson Lee <vlee@freedesktop.org>
    Reviewed-by: Brian Paul <brianp@vmware.com>

commit 11ab69971a8a31c62f6de74905dbf8c02884599f
Author: Vinson Lee <vlee@freedesktop.org>
Date:   Wed Feb 29 21:20:53 2012 -0800

    Revert "gallivm: Change getExtent and readByte to non-const with llvm-3.1."

    This reverts commit d5a6c17254.

    llvm-3.1svn r151687 makes MemoryObject accessor members const again.

    Signed-off-by: Vinson Lee <vlee@freedesktop.org>
    Reviewed-by: Brian Paul <brianp@vmware.com>

commit 339960c82d2a9f5c928ee9035ed31dadb7f45537
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon May 14 16:19:56 2012 +0200

    gallivm: (trivial) fix assertion failure for mipmapped 1d textures

    In lp_build_rho, we may end up with a 1-element vector (for mipmapped 1d
    textures), but in this case we require the type to be a non-vector type,
    so need a cast.

commit 9d73edb727bd6d196030dc3026b7bf0c574b3e19
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu May 10 18:12:07 2012 +0200

    gallivm: prepare for per-quad lod calculations for large vectors

    to be able to handle multiple quads at once in texture sampling and still
    do lod calculations per quad, it is necessary to get the per-quad derivatives
    into the lp_build_rho function.
    Until now these derivative values were just scalars, which isn't going to work.
    So we now use vectors, and since the interface needs to change we also do some
    different (slightly more efficient) packing of the values.
    For 8-wide vectors the packed derivative values for 3 coords would look like
    this, this scales to a arbitrary (multiple of 4) vector size:
    ds1dx ds1dy dt1dx dt1dy ds2dx ds2dy dt2dx dt2dy
    dr1dx dr1dy _____ _____ dr2dx dr2dy _____ _____
    The second vector will be unused for 1d and 2d textures.
    To facilitate future changes the derivative values are put into a struct, since
    quite some functions just pass these values through.
    The generated code seems to be very slightly better for 2d textures (with
    4-wide vectors) than before with sse2 (if you have a cpu with physical 128bit
    simd units - otherwise it's probably not a win).
    v2: suggestions from José, rename variables, add comments, use swizzle helper

commit 0aa21de0d31466dac77b05c97005722e902517b8
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu May 10 18:10:31 2012 +0200

    gallivm: add undefined swizzle handling to lp_build_swizzle_aos

    This is useful for vectors with "holes", it lets llvm choose the most
    efficient shuffle instructions if some elements aren't needed without having to
    worry what elements to manually pick otherwise.

commit 00faf3f370e7ce92f5ef51002b0ea42ef856e181
Author: José Fonseca <jfonseca@vmware.com>
Date:   Fri May 4 17:25:16 2012 +0100

    gallivm: Get the LLVM IR optimization passes before JIT compilation.

    MC-JIT engine compiles the module immediately on creation, so the optimization
    passes were being run too late.

    So now we create a target data layout from a string, that matches the
    ABI parameters reported by the compiler.

    The backend optimization passes were always been run, so the performance
    improvement is modest (3% on multiarb mesa demo).

    Reviewed-by: Roland Scheidegger <sroland@vmware.com>
    Reviewed-by: Brian Paul <brianp@vmware.com>

commit 40a43f4e2ce3074b5ce9027179d657ebba68800a
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed May 2 16:03:54 2012 +0200

    gallivm: (trivial) fix wrong define used in lp_build_pack2

    should fix stack-smashing crashes.

commit e6371d0f4dffad4eb3b7a9d906c23f1c88a2ab9e
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Apr 30 21:25:29 2012 +0200

    gallivm: add perf warnings when not using intrinsics with 256bit vectors

    Helper functions using integer sse2 intrinsics could split the vectors with AVX
    instead of using generic fallback (which should be faster).
    We don't actually expect to hit these paths (hence don't fix them up to actually
    do the vector splitting) so just emit warnings (for those functions where it's
    obvious doing split/intrinsic is faster than using generic path).
    Only emit warnings for 256bit vectors since we _really_ don't expect to hit
    arbitrary large vectors which would affect a lot more functions.
    The warnings do not actually depend on avx since the same logic applies to
    plain sse2 too (but of course again there's _really_ no reason we should hit
    these functions with 256bit vectors without avx).

commit 8a9ea701ea7295181e846c6383bf66a5f5e47637
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue May 1 20:37:07 2012 +0200

    gallivm: split vectors manually for avx in lp_build_pack2 (v2)

    There's 2 reasons for this:
    First, there's a llvm bug (fixed in 3.1) which generates tons of byte
    inserts/extracts otherwise, and second, more importantly, we want to use
    pack intrinsics instead of shuffles.
    We do this in lp_build_pack2 and not the calling code (aos sample path)
    because potentially other callers might find that useful too, even if
    for larger sequences of code using non-native vector sizes it might be
    better to manually split vectors.
    This should boost texture performance in the aos path considerably.
    v2: fix issues with intrinsics types with old llvm

commit 27ac5b48fa1f2ea3efeb5248e2ce32264aba466e
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue May 1 20:26:22 2012 +0200

    llvmpipe: refactor lp_build_pack2 (v2)

    prettify, and it's unnecessary to assert when there's no intrinsic due to
    unsupported bit width - the shuffle path will work regardless.
    In contrast lp_build_packs2, should only rely on lp_build_pack2 doing the
    clamping for element sizes for which there is a sse2 intrinsic.
    v2: fix bug spotted by Jose regarding the intrinsic type for packusdw
    on old llvm versions.

commit ddf279031f0111de4b18eaf783bdc0a1e47813c8
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue May 1 20:13:59 2012 +0200

    gallivm: add src width check in lp_build_packs2()

    not doing so would skip clamping even if no sse2 pack instruction is
    available, which is incorrect (in theory only, such widths would also always
    hit a (unnecessary) assertion in lp_build_pack2().

commit e7f0ad7fe079975eae7712a6e0c54be4fae0114b
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Apr 27 15:57:00 2012 +0200

    gallivm: (trivial) fix crash-causing typo for npot textures with avx

commit 28a9d7f6f655b6ec508c8a3aa6ffefc1e79793a0
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Apr 25 19:38:45 2012 +0200

    gallivm: (trivial) remove code mistakenly added twice.

commit d5926537316f8ff67ad0a52e7242f7c5478d919b
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Apr 24 21:16:15 2012 +0200

    gallivm: add a new avx aos sample path (v2)

    Try to avoid mixing float and int address calculations. This does texture wrap
    modes with floats, and then the offset calculations still with ints (because
    of lack of precision with floats, though we could do some effort to make it work
    with not too large (16MB) textures).
    This also handles wrap repeat mode with npot-sized textures differently than
    either the old soa or aos int path (likely way faster but untested).
    Otherwise the actual address wrap code is largely similar to the soa path (not
    quite the same as this one also has some int code), it should get used by avx
    soa sampling later as well but doesn't handle more complex address modes yet
    (this will also have the benefit that we can use aos sampling path for all
    texture address modes).
    Generated code for that looks reasonable, but still does not split vectors
    explicitly for fetch/filter which means still get hit by llvm (fixed upstream)
    which generates hundreds of pinsrb/pextrb instead of two shuffles.
    It is not obvious though if it's much of a win over just doing address calcs
    4-wide but with ints, even if it is definitely much less instructions on avx.
    piglit's texwrap seems to look exactly the same but doesn't test
    neither the non-normalized nor the npot cases.
    v2: fix comments, prettify based on Brian's and Jose's feedback.

commit bffecd22dea66fb416ecff8cffd10dd4bdb73fce
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Apr 19 01:58:29 2012 +0200

    gallivm: refactor aos lp_build_sample_image_nearest/linear

    split them up to separate address calculations and fetching/filtering.
    Need this for being able to do 8-wide float address calcs and 4-wide
    fetch/filter later (for avx). Plus the functions were very big scary monsters
    anyway (in particular lp_build_sample_image_linear).

commit a80b325c57529adddcfa367f96f03557725c4773
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Apr 16 17:17:18 2012 +0200

    gallivm: fix lp_build_resize when truncating width but expanding vector size

    Missed this case which I thought was impossible - the assertion for it was
    right after the division by zero...
    (AoS) texture sampling may ask us to do this, for things like 8 4x32int
    vectors to 1 32x8int vector conversion (eventually, we probably don't want
    this to happen).

commit f9c8337caa3eb185830d18bce8b95676a065b1d7
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Apr 14 18:00:59 2012 +0200

    gallivm: fix cube maps with larger vectors

    This makes the branchless cube face selection code work with larger vectors.
    Because the complexity is quite high (cannot really be improved it seems,
    per-face selection would reduce complexity a lot but this leads to errors
    unless the derivatives are calculated all from the same face which almost
    doubles the work to be done) it is still slower than the branching version,
    hence only enable this with large vectors.
    It doesn't actually do per-quad face selection yet (only makes sense with
    matching lod selection, in fact it will select the same face for all pixels
    based on the average of the first four pixels for now) but only different
    shuffles are required to make it work (the branching version actually should
    work with larger vectors too now thanks to the improved horizontal add but of
    course it cannot be extended to really select the face per-quad unless doing
    branching per quad).

commit 7780c58869fc9a00af4f23209902db7e058e8a66
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Mar 30 21:11:12 2012 +0100

    llvmpipe: (trivial) fix compiler warning

    and also clarify comment regarding availability of popcnt instruction.

commit a266dccf477df6d29a611154e988e8895892277e
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Mar 30 14:21:07 2012 +0100

    gallivm: remove unneeded members in lp_build_sample_context

    Minor cleanup, the texture width, height, depth aren't accessed in their
    scalar form anywhere. Makes it more obvious those values should probably be
    fetched already vectorized (but this requires more invasive changes)...

commit b678c57fb474e14f05e25658c829fc04d2792fff
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Mar 29 15:53:55 2012 +0100

    gallivm: add a helper for concatenating vectors

    Similar to the extract_range helper intended to get around slow code generated
    by llvm for 128bit insertelements.
    Concatenating two 128bit vectors this way will result in a single vinsertf128
    operation rather than two 64bit stores plus one 128bit load, though it might be
    mildly useful for other purposes as well.

commit 415ff228bcd0cf5e44a4c15350a661f0f5520029
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Mar 28 19:41:15 2012 +0100

    gallivm: add a custom 2x8f->1x16ub avx conversion path

    Similar to the existing 4x4f->1x16ub sse2 path, shaves off a couple
    instructions (min/max mostly) because it relies on pack intrinsics clamping.

commit 78c08fc89f8fbcc6dba09779981b1e873e2a0299
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Mar 28 18:44:07 2012 +0100

    gallivm: add avx arithmetic intrinsics

    Add all avx intrinsics for arithmetic functions (with the exception
    of the horizontal add function which needs another look).
    Seems to pass basic tests.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit a586caa2800aa5ce54c173f7c0d4fc48153dbc4e
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Mar 28 15:31:35 2012 +0100

    gallivm: add avx logic intrinsics

    Add the blend intrinsics for 8-wide float and 4-wide double vectors.
    Since we lack 256bit int instructions these are used for int vectors as well,
    though obviously not for byte or word element values.
    The comparison intrinsics aren't extended for avx since these are only used
    for pre-2.7 llvm versions.

commit 70275e4c13c89315fc2560a4c488c0e6935d5caf
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Mar 28 00:40:53 2012 +0100

    gallivm: new helper function for extract shuffles.

    Based on José's idea as we can need that in a couple places.
    Note that such shuffles should not be used lightly, since data layout
    of <4 x i8> is different to <16 x i8> for instance, hence might cause
    data rearrangement.

commit 4d586dbae1b0c55915dda1759d2faea631c0a1c2
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Mar 27 18:27:25 2012 +0100

    gallivm: (trivial) don't overallocate shuffle variable

    using wrong define meant huge array...

commit 06b0ec1f6d665d98c135f9573ddf4ba04b2121ad
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Mar 27 17:54:20 2012 +0100

    gallivm: don't do per-element extract/insert for vector element resize

    Instead of doing per-element extract/insert if the src vectors
    and dst vector differ in total size (which generates atrocious code)
    first change the src vectors size by using shuffles to destination
    vector size.
    We can still do better than that on AVX for packing to color buffer
    (by exploiting pack intrinsics characteristics hence eleminating the
    need for some clamps) but this already generates much better code.

    v2: incorporate feedback from José, Keith and use shuffle instead of
    bitcasts/extracts. Due to llvm deficiencies the latter cause all data
    to get moved to GPRs and back in pieces (even though the data in the
    regs actually stays the same...).

commit c9970d70e05f95d3f52fe7d2cd794176a52693aa
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Mar 23 19:33:19 2012 +0000

    gallivm: fix bug in simple position interpolation

    Accidental use of position attribute instead of just pixel coordinates.
    Caused failures in piglit glsl-fs-ceil and glsl-fs-floor.

commit d0b6fcdb008d04d7f73d3d725615321544da5a7e
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Mar 23 15:31:14 2012 +0000

    gallivm: fix emission of ceil opcode

    lp_build_ceil seems more appropriate than lp_build_trunc.
    This seems to be never hit though someone performs some ceil
    to floor magic.

commit d97fafed7e62ffa6bf76560a92ea246a1a26d256
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Mar 22 11:46:52 2012 +0000

    gallivm: new vectorized path for cubemap calculations

    should be faster when adapted to multiple quads as only selection masks need to be different.
    The code is more or less a per-pixel version adapted to only do it per quad.
    A per pixel version would be much simpler (could drop 2 selects, 6 broadcasts and the messy
    horizontal add of 3 vectors at the expense of only 2 more absolute value instructions -
    would also just work for arbitary large vectors).
    This version doesn't yet work with larger vectors because the horizontal add isn't adjusted
    to be able to work with 2x4 vectors (and also because face selection wouldn't be done per
    quad just per block though that would be only a correctness issue just as with lod selection).
    The downside is this code is quite a bit slower. On a Core2 it can be sped up by disabling the
    hw blend instructions for selection and using logicop fallbacks instead, but it is still slower
    than the old code, hence leave that in for now. Probably will chose one or the other version
    based on vector length in the end.

commit b375fbb18a3fd46859b7fdd42f3e9908ea4ff9a3
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Mar 21 14:42:29 2012 +0000

    gallivm: fix optimized occlusion query intrinsic name

commit a9ba0a3b611e48efbb0e79eb09caa85033dbe9a2
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Mar 21 16:19:43 2012 +0000

    draw,gallivm,llvmpipe: Call gallivm_verify_function everywhere.

commit f94c2238d2bc7383e088b8845b7410439a602071
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Mar 20 18:54:10 2012 +0000

    gallivm: optimize calculations for cube maps a bit

    this does some more vectorized calculations and uses horizontal adds if possible.
    A definite win with sse3 otherwise it doesn't seem to make much of a difference.
    In any case this is arithmetically identical, cannot handle larger vectors.
    Should be useful as a reference point against larger vector version later...

commit 21a2c1cf3c8e1ac648ff49e59fdc0e3be77e2ebb
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Mar 20 15:16:27 2012 +0000

    llvmpipe: slight optimization of occlusion queries

    using movmskps when available.
    While this is slightly better for cpus without popcnt we should
    really sum the vectors ourselves (it is also possible to cast to i4 before
    doing the popcnt but that doesn't help that much neither since llvm
    is using some optimized popcnt version for i32)

commit 5ab5a35f216619bcdf55eed52b0db275c4a06c1b
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Mar 20 13:32:11 2012 +0000

    llvmpipe: fix occlusion queries with larger vectors

    need to adjust casts etc.

commit ff95e6fdf5f16d4ef999ffcf05ea6e8c7160b0d5
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon Mar 19 20:15:25 2012 +0000

    gallivm: Restore optimization passes.

commit 57b05b4b36451e351659e98946dae27be0959832
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Mar 19 19:34:22 2012 +0000

    llvmpipe: use existing min2 macro

commit bc9a20e19b4f600a439f45679451f2e87cd4b299
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Mar 19 19:07:27 2012 +0000

    llvmpipe: add some safeguards against really large vectors

    As per José's suggestion, prevent things from blowing up if some cpu
    would have 1024bit or larger vectors.

commit 0e2b525e5ca1c5bbaa63158bde52ad1c1564a3a9
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Mar 19 18:31:08 2012 +0000

    llvmpipe: fix mask generation for uberwide vectors

    this was the only piece preventing 16-wide vectors from working
    (apart from the LP_MAX_VECTOR_WIDTH define that is), which is the maximum
    as we don't get more pixels in the fragment shader at once.
    Hence adjust that so things could be tested properly with that size
    even though there seems to be no practical value.

commit 3c8334162211c97f3a11c7f64e9e5a2a91ad9656
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Mar 19 18:19:41 2012 +0000

    llvmpipe: fix the simple interpolation method with larger vectors

    so both methods actually _really_ work now. Makes textures look
    nice with larger vectors...

commit 1cb0464ef8871be1778d43b0c56adf9c06843e2d
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Mar 19 17:26:35 2012 +0000

    llvmpipe: fix mask generation and position interpolation with 8-wide vectors

    trivial bugs, with these things start to look somewhat reasonable.
    Textures though have some swizzling issues it seems.

commit 168277a63ef5b72542cf063c337f2d701053ff4b
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Mar 19 16:04:03 2012 +0000

    llvmpipe: don't overallocate variables

    we never have more than 16 (stamp size) / 4 (minimum possible vector size).
    (With larger vectors those variables are still overallocated a bit.)

commit 409b54b30f81ed0aa9ed0b01affe15c72de9abd2
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Mar 19 15:56:48 2012 +0000

    llvmpipe: add some 32f8 formats to lp_test_conv

    Also add the ability to handle different sized vectors.

commit 55dcd3af8366ebdac0af3cdb22c2588f24aa18ce
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Mar 19 15:47:27 2012 +0000

    gallivm: handle different sized vectors in conversion / pack

    only fully generic path for now (extract/insert per element).

commit 9c040f78c54575fcd94a8808216cf415fe8868f6
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sun Mar 18 00:58:28 2012 +0100

    llvmpipe: fix harmless use of unitialized values

commit 551e9d5468b92fc7d5aa2265db9a52bb1e368a36
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Mar 16 23:31:21 2012 +0100

    gallivm: drop special path in extract_broadcast with different sized vectors

    Not needed, llvm can handle shuffles with different sized result vector just
    fine. Should hopefully generate the same code in the end, but simpler IR.

commit 44da531119ffa07a421eaa041f63607cec88f6f8
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Mar 16 23:28:49 2012 +0100

    llvmpipe: adapt interpolation for handling multiple quads at once

    this is still WIP there are actually two methods possible not quite
    sure what makes the most sense, so there's code for both for now:
    1) the iterative method as used before (compute attrib values at upper left
    corner of stamp and upper left corner of each quad initially).
    It is improved to handle more than one quad at once, and also do some more vectorized
    calculations initially for slightly better code - newer cpus have full throughput with
    4 wide float vectors, hence don't try to code up a path which might be faster if there's
    just one channel active per attribute.
    2) just do straight interpolation for each pixel.
    Method 2) is more work per quad, but less initially - if all quads are executed
    significantly more overall though. But this might change with larger vector lengths.
    This method would also be needed if we'd do some kind of active quad merging when
    operating on multiple quads at once.
    This path contains some hack to force llvm to generate better code, it is still far
    from ideal though, still generates far too many unnecessary register spills/reloads.
    Both methods should work with different sized vectors.
    Not very well tested yet, still seems to work with four-wide vectors, need changes
    elsewhere to be able to test with wider vectors.

commit be5d3e82e2fe14ad0a46529ab79f65bf2276cd28
Author: José Fonseca <jfonseca@vmware.com>
Date:   Fri Mar 16 20:59:37 2012 +0000

    draw: Cleanup.

commit f85bc12c7fbacb3de2a94e88c6cd2d5ee0ec0e8d
Author: José Fonseca <jfonseca@vmware.com>
Date:   Fri Mar 16 20:43:30 2012 +0000

    gallivm: More module compilation refactoring.

commit d76f093198f2a06a93b2204857e6fea5fd0b3ece
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Mar 15 21:29:11 2012 +0000

    llvmpipe: Use gallivm_compile/free_function() in linear code.

    Should had been done before.

commit 122e1adb613ce083ad739b153ced1cde61dfc8c0
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Mar 13 14:47:10 2012 +0100

    llvmpipe: generate partial pixel mask for multiple quads

    still works with one quad, cannot be tested yet with more
    At least for now always fixed order with multiple quads.

commit 4c4f15081d75ed585a01392cd2dcce0ad10e0ea8
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Mar 8 22:09:24 2012 +0100

    llvmpipe: refactor state setup a bit

    Refactor to make it easier to emit (and potentially later fetch in fs)
    coefficients for multiple attributes at once.
    Need to think more about how to make this actually happen however, the
    problem is different attributes can have different interpolation modes,
    requiring different handling in both setup and fs (though linear and
    perspective handling is close).

commit 9363e49722ff47094d688a4be6f015a03fba9c79
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Mar 8 19:23:23 2012 +0100

    llvmpipe: vectorize tri offset calc

    cuts number of instructions in quad-offset-factor from 107 to 75.
    This code actually duplicated the (scalar) code calculating the determinant
    except it used different vertex order (leading to different sign but it doesn't
    matter) hence llvm could not have figured out it's the same (of course with
    determinant vectorized in the other place that wouldn't have worked any longer
    neither).
    Note this particular piece doesn't actually vectorize well, not many arithmetic
    instructions left but tons of shuffle instructions...
    Probably would need to work on n tris at a time for better vectorization.

commit 63169dcb9dd445c94605625bf86d85306e2b4297
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Mar 8 03:11:37 2012 +0100

    llvmpipe: vectorize some scalar code in setup

    reduces number of arithmetic instructions, and avoids loading
    vector x,y values twice (once as scalars once as vectors).
    Results in a reduction of instructions from 76 to 64 in fs setup for glxgears
    (16%) on a cpu with sse41.
    Since this code uses vec2 disguised as vec4, on old cpus which had physical
    64bit sse units (pre-Core2) it probably is less of a win in practice (and if
    you have no vectors you can only hope llvm eliminates the arithmetic for
    unneeded elements).

commit 732ecb877f951ab89bf503ac5e35ab8d838b58a1
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Mar 7 00:32:24 2012 +0100

    draw: fix clipping

    bug introduced by 4822fea3f0440b5205e957cd303838c3b128419c broke
    clipping pretty badly (verified with lineclip test)

commit ef5d90b86d624c152d200c7c4056f47c3c6d2688
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Mar 6 23:38:59 2012 +0100

    draw: don't store vertex header per attribute

    storing the vertex header once per attribute is totally unnecessary.
    Some quick look at the generated assembly says llvm in fact cannot optimize
    away the additional stores (maybe due to potentially aliasing pointers
    somewhere).
    Plus, this makes the code cleaner and also allows using a vector "or"
    instead of scalar ones.

commit 6b3a5a57b0b9850854cfbd7b586e4e50102dda71
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Mar 6 19:11:01 2012 +0100

    draw: do the per-vertex "boolean" clipmask "or" with vectors

    no point extracting the values and doing it per component.
    Doesn't help that much since we still extract the values elsewhere anyway.

commit 36519caf1af40e4480251cc79a2d527350b7c61f
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Mar 2 22:27:01 2012 +0100

    gallivm: fix lp_build_extract_broadcast with different sized vectors

    Fix the obviously wrong argument, so it doesn't blow up.

commit 76d0ac3ad85066d6058486638013afd02b069c58
Author: José Fonseca <jfonseca@vmware.com>
Date:   Fri Mar 2 12:16:23 2012 +0000

    draw: Compile per module and not per function (WIP).

    Enough to get gears w/ LLVM draw + softpipe to work on AVX doing:

      GALLIUM_DRIVER=softpipe SOFTPIPE_USE_LLVM=yes glxgears

    But still hackish -- will need to rethink and refactor this.

commit 78e32b247d2a7a771be9a1a07eb000d1e54ea8bd
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Feb 29 12:01:05 2012 +0000

    llvmpipe: Remove lp_state_setup_fallback.

    Never used.

commit 6895d5e40d19b4972c361e8b83fdb7eecda3c225
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon Feb 27 19:14:27 2012 +0000

    llvmpipe: Don't emit EMMS on x86

    We already take precautions to ensure that LLVM never emits MMX code.

commit 4822fea3f0440b5205e957cd303838c3b128419c
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Feb 29 15:58:19 2012 +0100

    draw: modifications for larger vector sizes

    We want to be able to use larger vectors especially for running the vertex
    shader. With this patch we build soa vectors which might have a different
    length than 4.
    Note that aos structures really remain the same, only when aos structures
    are converted to soa potentially different sized vectors are used.
    Samplers probably don't work yet, didn't look at them.
    Testing done:
    glxgears works with both 128bit and 256bit vectors.

commit f4950fc1ea784680ab767d3dd0dce589f4e70603
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Feb 29 15:51:57 2012 +0100

    gallivm: override native vector width with LP_NATIVE_VECTOR_WIDTH env var for debug

commit 6ad6dbf0c92f3bf68ae54e5f2aca035d19b76e53
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Feb 29 15:51:24 2012 +0100

    draw: allocate storage with alignment according to native vector width

commit 7bf0e3e7c9bd2469ae7279cabf4c5229ae9880c1
Author: José Fonseca <jfonseca@vmware.com>
Date:   Fri Feb 24 19:06:08 2012 +0000

    gallivm: Fix comment grammar.

    Was missing several words. Spotted by Roland.

commit b20f1b28eb890b2fa2de44a0399b9b6a0d453c52
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 19:22:09 2012 +0000

    gallivm: Use MC-JIT on LLVM 3.1 + (i.e, SVN)

    MC-JIT

    Note: MC-JIT is still WIP. For this to work correctly it requires
    LLVM changes which are not yet upstream.

commit b1af4dfcadfc241fd4023f4c3f823a1286d452c0
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Feb 23 20:03:15 2012 +0100

    llvmpipe: use new lp_type_width() helper in lp_test_blend

commit 04e0a37e888237d4db2298f31973af459ef9c95f
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Feb 23 19:50:34 2012 +0100

    llvmpipe: clean up lp_test_blend a little

    Using variables just sized and aligned right makes it a bit more obvious
    what's going on.
    The test still only tests vector length 4.
    For AoS anything else probably isn't going to work.
    For SoA other lengths should work (at least with floats).

commit e61c393d3ec392ddee0a3da170e985fda885a823
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 17:48:30 2012 +0000

    gallivm: Ensure vector width consistency.

    Instead of assuming that everything is the max native size.

commit 330081ac7bc41c5754a92825e51456d231bf84dd
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 17:44:14 2012 +0000

    draw: More simd vector width consistency fixes.

commit d90ca002753596269e37297e2e6c139b19f29f03
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 17:43:00 2012 +0000

    gallivm: Remove unused lp_build_int32_vec4_type() helper.

commit cae23417824d75869c202aaf897808d73a2c1db0
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Feb 23 17:32:16 2012 +0100

    gallivm: use global variable for native vector width instead of define

    We do not know the simd extensions (and hence the simd width we should use)
    available at compile time.
    At least for now keep a define for maximum vector width, since a global
    variable obviously can't be used to adjust alignment of automatic stack
    variables.
    Leave the runtime-determined value at 128 for now in all cases.

commit 51270ace6349acc2c294fc6f34c025c707be538a
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 15:41:02 2012 +0000

    gallivm: Add a hunk inadvertedly lost when rebasing.

commit bf256df9cfdd0236637a455cbaece949b1253e98
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 14:24:23 2012 +0000

    llvmpipe: Use consistent vector width in depth/stencil test.

commit 5543b0901677146662c44be2cfba655fd55da94b
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 14:19:59 2012 +0000

    draw: Use a consistent the vector register width.

    Instead of 4x32 sometimes, LP_NATIVE_VECTOR_WIDTH other times.

commit eada8bbd22a3a61f549f32fe2a7e408222e5c824
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 12:08:04 2012 +0000

    gallivm: Remove garbagge collection.

    MC-JIT will require one compilation per module (as opposed to one
    compilation per function), therefore no state will be shared,
    eliminating the need to do garbagge collection.

commit 556697ea0ed72e0641851e4fbbbb862c470fd7eb
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 10:33:41 2012 +0000

    gallivm: Move all native target initialization to lp_set_target_options().

commit c518e8f3f2649d5dc265403511fab4bcbe2cc5c8
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 09:52:32 2012 +0000

    llvmpipe: Create one gallivm instance for each test.

commit 90f10af8920ec6be6f2b1e7365cfc477a0cb111d
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 09:48:08 2012 +0000

    gallivm: Avoid LLVMAddGlobalMapping() in lp_bld_assert().

    Brittle, complex, and unecesary. Just use function pointer constant.

commit 98fde550b33401e3fe006af59db4db628bcbf476
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 09:21:26 2012 +0000

    gallivm: Add a lp_build_const_func_pointer() helper.

    To be reused in all places where we want to call C code.

commit 6cfedadb62c2ce5af8d75969bc95a607f3ece118
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 09:44:41 2012 +0000

    gallivm: Cleanup/simplify lp_build_const_string_variable.

    - Move to lp_bld_const where it belongs
    - Rename to lp_build_const_string
    - take the length from the argument (and don't count the zero terminator twice)
    - bitcast the constant to generic i8 *

commit db1d4018c0f1fa682a9da93c032977659adfb68c
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 11:52:17 2012 +0000

    gallivm: Set NoFramePointerElimNonLeaf to true where supported.

commit 088614164aa915baaa5044fede728aa898483183
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Feb 22 19:38:47 2012 +0100

    llvmpipe: pass in/out pointers rather scalar floats in lp_bld_arit

    we don't want llvm to potentially optimize away the vectors (though it doesn't
    seem to currently), plus we want to be able to handle in/out vectors of arbitrary
    length.

commit 3f5c4e04af8a7592fdffa54938a277c34ae76b51
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Feb 21 23:22:55 2012 +0100

    gallivm: fix lp_build_sqrt() for vector length 1

    since we optimize away vectors with length 1 need to emit intrinsic
    without vector type.

commit 79d94e5f93ed8ba6757b97e2026722ea31d32c06
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Feb 22 17:00:46 2012 +0000

    llvmpipe: Remove lp_test_round.

commit 81f41b5aeb3f4126e06453cfc78990086b85b78d
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Feb 21 23:56:24 2012 +0100

    llvmpipe: subsume lp_test_round into lp_test_arit

    Much simpler, and since the arguments aren't passed as 128bit values can run
    on any arch.
    This also uses the float instead of the double versions of the c functions
    (which probably was the intention anyway).
    In contrast to lp_test_round the output is much less verbose however.
    Tested vector width of 32 to 512 bits - all pass except 32 (length 1) which
    crashes in lp_build_sqrt() due to wrong type.

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit 945b338b421defbd274481d8c4f7e0910fd0e7eb
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Feb 22 09:55:03 2012 +0000

    gallivm: Centralize the function compilation logic.

    This simplifies a lot of code.

    Also doing this in a central place will make it easier to carry out the
    changes necessary to use MC-JIT in the future.

gallivm: Fix typo in explicit derivative shuffle.

Trivial.

draw: make DEBUG_STORE work again

adapt to lp_build_printf() interface changes

Reviewed-by: José Fonseca <jfonseca@vmware.com>

draw: get rid of vecnf_from_scalar()

just use lp_build_broadcast directly (cannot assign a name but don't really
need it, vecnf_from_scalar() was producing much uglier IR due to using
repeated insertelement instead of insertelement+shuffle).

Reviewed-by: José Fonseca <jfonseca@vmware.com>

llvmpipe: fix typo in complex interpolation code

Fixes position interpolation when using complex mode
(piglit fp-fragment-position and similar)

Reviewed-by: José Fonseca <jfonseca@vmware.com>

draw: fix clipvertex/position storing again

This appears to be the result of a bad merge.
Fixes piglit tests relying on clipping, like a lot of the interpolation tests.

Reviewed-by: José Fonseca <jfonseca@vmware.com>

gallivm: Fix explicit derivative manipulation.

Same counter variable was being used in two nested loops. Use more
meanigful variable names for the counter to fix and avoid this.

gallivm: Prevent buffer overflow in repeat wrap mode for NPOT.

Based on Roland's patch, discussion, and review .

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

gallivm: Fix dims for TGSI_TEXTURE_1D in emit_tex.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

gallivm: Fix explicit volume texture derivatives.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

gallivm: fix 1d shadow texture sampling

Always r coordinate is used, hence need 3 coords not two
(the second one is unused).

Reviewed-by: José Fonseca <jfonseca@vmware.com>

gallivm: Enable AVX support without MCJIT, where available.

For now, this just enables AVX on Windows for testing.  If the code is
stable then we might consider prefering the old JIT wherever possible.

No change elsewhere.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2012-07-17 13:42:39 +01:00
Olivier Galibert c790c2c759 llvmpipe: Add vertex id support.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2012-06-19 14:40:44 -06:00
Olivier Galibert 46931ecf48 llvmpipe: Simplify and fix system variables fetch.
The system array values concept doesn't really because it expects the
system values to be fixed per call, which is wrong for gl_VertexID and
iffy for gl_SampleID.  So this patch does two things:

- kill the array, have emit_fetch_system_value directly pick the
  values it needs (only gl_InstanceID for now, as the previous code)

- correctly handle the expected type in emit_fetch_system_value

Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2012-06-19 14:40:44 -06:00
James Benton f34e2f484b llvmpipe: Implement cylindrical wrapping.
Tested against mesa demos cylwrap and dx9 DCT address.exe which now passes 100%.

Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-06-18 17:55:05 +01:00
José Fonseca 7a75e7d6e8 llvmpipe: Fix alpha testing precision on rgba8 formats.
This is a long standing problem, that recently surfaced with the change
to enable perspective correct color interpolation.

A fix for all possible formats is left to the future.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2012-05-22 19:23:49 +01:00
Olivier Galibert 982df3c1a5 llvmpipe: Color slot interpolation can be flat or perspective, not linear.
Fixes a bunch of glsl 1.10 interpolation piglit tests.

Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-05-16 13:12:04 +01:00
Brian Paul 550de24c17 llvmpipe: add cast to silence warning 2012-05-11 16:16:11 -06:00
Marek Olšák bb4c5d72d7 Merge branch 'gallium-userbuf'
Conflicts:
	src/gallium/docs/source/screen.rst
	src/gallium/drivers/nv50/nv50_state.c
	src/gallium/include/pipe/p_defines.h
	src/mesa/state_tracker/st_draw.c
2012-05-11 16:38:13 +02:00
James Benton 0b0f4628d6 llvmpipe: Added support for color masks in AoS blending.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-05-02 10:12:48 +01:00
Marek Olšák 0b7d48cbad gallium: add void *user_buffer to pipe_constant_buffer
This reduces CPU overhead when updating constants.
2012-04-30 01:18:48 +02:00
Marek Olšák 507337864f gallium: change set_constant_buffer to be UBO-friendly 2012-04-30 01:09:57 +02:00
Tom Stellard 6b63e25b3d gallium: Prefix #defines in tgsi_exec.h with TGSI_ 2012-01-30 13:37:00 -05:00
José Fonseca ec4d691474 llvmpipe: Update for TGSI_INTERPOLATE_COLOR.
Not thoroughly tested nor reviewed. But should at least prevent the
assertion failure.
2012-01-11 17:35:14 +00:00
Dave Airlie 24668a38d1 llvmpipe: fix blending for intensity formats
This fixes the piglit fbo-blending-formats test for standard, ARB_texture_float
and EXT_texture_snorm.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-12-31 12:37:48 +00:00
José Fonseca 6cf7245f69 llvmpipe: Trim the fragment shader cached based on LLVM IR instruction count.
Number of fragment shader variants is not very representative of the
memory used by LLVM, neither is number of shader instructions, as often
texture sampling constitutes most of the generated code.

This change adds an additional trim criteria: least recently used
fragment shader variants will be freed until the total number of LLVM IR
instruction falls below a specified threshold.

Reviewed-by: Brian Paul <brianp@vmware.com>
2011-12-08 17:59:33 +00:00
José Fonseca f32c7232a8 llvmpipe,draw,gallivm: Ensure we don't walk beyond the end of the shader variant list.
u_simple_list.h uses a sentinel element, and not a NULL element. So
ensure list is not empty when reducing the list of shader variants.

Something I noticed while trying to free variants more aggressively.

Reviewed-by: Brian Paul <brianp@vmware.com>
2011-12-08 17:59:33 +00:00
José Fonseca d7edd5db31 llvmpipe: Remove unused variables.
Reviewed-by: Brian Paul <brianp@vmware.com>
2011-11-14 10:06:01 +00:00
José Fonseca b8d1242c0b llvmpipe: Prevent segfault during fs variant cache shrinking. 2011-09-29 17:43:38 +01:00
Brian Paul 7e86d9bd8c llvmpipe: implement TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=33284
2011-01-19 18:46:59 -07:00
Brian Paul 652901e95b Merge branch 'draw-instanced'
Conflicts:
	src/gallium/auxiliary/draw/draw_llvm.c
	src/gallium/drivers/llvmpipe/lp_state_fs.c
	src/glsl/ir_set_program_inouts.cpp
	src/mesa/tnl/t_vb_program.c
2011-01-15 10:24:08 -07:00
Vinson Lee 442fcd0620 llvmpipe: Remove unnecessary headers. 2010-12-22 00:55:41 -08:00
Brian Paul 1d6f3543a0 gallivm/llvmpipe: implement system values and instanceID 2010-12-08 19:04:11 -07:00
Brian Paul efc82aef35 gallivm/llvmpipe: squash merge of the llvm-context branch
This branch defines a gallivm_state structure which contains the
LLVMBuilderRef, LLVMContextRef, etc.  All data structures built with
this object can be periodically freed during a "garbage collection"
operation.

The gallivm_state object has to be passed to most of the builder
functions where LLVMBuilderRef used to be used.

Conflicts:
	src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
	src/gallium/drivers/llvmpipe/lp_state_setup.c
2010-11-30 16:35:12 -07:00
Brian Paul 2996ce72b1 llvmpipe: add a cast 2010-11-02 11:53:14 -06:00
Brian Paul 9fbf744389 llvmpipe: assign context's frag shader pointer before using it
The call to draw_bind_fragment_shader() was using the old fragment
shader.  This bug would have really only effected the draw module's
use of the fragment shader in the wide point stage.
2010-11-02 11:50:37 -06:00
José Fonseca 85a08f8fc7 gallivm: Remove the EMMS opcodes.
Unnecessary now that lp_set_target_options() successful disables MMX code
emission.
2010-10-28 20:42:02 +01:00
Keith Whitwell 0072acd447 Merge remote branch 'origin/master' into lp-setup-llvm
Conflicts:
	src/gallium/drivers/llvmpipe/lp_setup_coef.c
	src/gallium/drivers/llvmpipe/lp_setup_coef.h
	src/gallium/drivers/llvmpipe/lp_setup_coef_intrin.c
	src/gallium/drivers/llvmpipe/lp_setup_point.c
	src/gallium/drivers/llvmpipe/lp_setup_tri.c
	src/gallium/drivers/llvmpipe/lp_state_derived.c
	src/gallium/drivers/llvmpipe/lp_state_fs.h
2010-10-17 19:09:42 -07:00
José Fonseca a0add0446c llvmpipe: Fix bad refactoring.
'i' and 'chan' have random values here, which could cause a buffer
overflow in debug builds, if chan > 4.
2010-10-17 09:58:04 -07:00
Keith Whitwell ac98519c4e llvmpipe: validate color outputs against key->nr_cbufs 2010-10-15 14:49:13 +01:00
Keith Whitwell ffab84c9a2 llvmpipe: check shader outputs are non-null before using 2010-10-15 14:49:13 +01:00
Keith Whitwell 0a1c900103 llvmpipe: don't pass frontfacing as a float 2010-10-15 13:27:47 +01:00
Brian Paul 3d7479d705 llvmpipe: code to dump bytecode to file (disabled) 2010-10-14 17:28:24 -06:00
Keith Whitwell f0bd76f28d llvmpipe: don't try to emit non-existent color outputs 2010-10-14 14:08:20 +01:00
José Fonseca 95c18abb03 llvmpipe: Unbreak Z32_FLOAT.
Z32_FLOAT uses <4 x float> as intermediate/destination type,
instead of <4 x i32>.

The necessary bitcasts got removed with commit
5b7eb868fd

Also use depth/stencil type and build contexts consistently, and
make the depth pointer argument a ordinary <i8 *>, to catch this
sort of issues in the future (and also to pave way for Z16 and
Z32_FLOAT_S8_X24 support).
2010-10-13 15:25:15 +01:00
José Fonseca 986cb9d5cf llvmpipe: Use lp_tgsi_info. 2010-10-11 13:06:25 +01:00
José Fonseca 307df6a858 gallivm: Cleanup the rest of the flow module. 2010-10-09 21:39:14 +01:00
José Fonseca d45c379027 gallivm: Remove support for Phi generation.
Simply rely on mem2reg pass. It's easier and more reliable.
2010-10-09 20:14:03 +01:00
José Fonseca cc40abad51 gallivm: Don't generate Phis for execution mask. 2010-10-09 12:55:31 +01:00
Keith Whitwell 5b7eb868fd llvmpipe: clean up shader pre/postamble, try to catch more early-z
Specifically, can do early-depth-test even when alpahtest or
kill-pixel are active, providing we defer the actual z write until the
final mask is avaialable.

Improves demos/fire.c especially in the case where you get close to
the trees.
2010-10-09 11:44:45 +01:00
Keith Whitwell aa4cb5e2d8 llvmpipe: try to be sensible about whether to branch after mask updates
Don't branch more than once in quick succession.  Don't branch at the
end of the shader.
2010-10-09 11:44:45 +01:00
Keith Whitwell d2cf757f44 gallivm: specialized x8z24 depthtest path
Avoid unnecessary masking of non-existant stencil component.
2010-10-09 11:44:09 +01:00