Commit Graph

939 Commits

Author SHA1 Message Date
Zack Rusin f313b0c850 gallivm: cleanup the gs interface
Instead of void pointers use a base interface.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Roland Scheidegger 450950c57a gallivm: bring back optimized but incorrect float to smallfloat optimizations
Conceptually the same as previously done in float_to_half.
Should cut down number of instructions from 14 to 10 or so, but
will promote some NaNs to Infs, so it's disabled.
It gets a bit tricky though handling all the cases correctly...
Passes basic tests either way (though there are no tests testing special
cases, but some manual tests injecting them seemed promising).

v2: style and comment fixes suggested by Jose

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-02 18:24:31 +02:00
Roland Scheidegger 3febc4a1cd gallivm: consolidate code for float-to-half and float-to-packed conversion.
This replaces the existing float-to-half implementation.
There are definitely a couple of differences - the old implementation
had unspecified(?) rounding behavior, and could at least in theory
construct Inf values out of NaNs. NaNs and Infs should now always be
properly propagated, and rounding behavior is now towards zero
(note this means too large but non-Infinity values get propagated to max
representable value, not Infinity).
The implementation will definitely not match util code, however (which
does nearest rounding, which also means too large values will get
propagated to Infinity).

Also fix a bogus round mask probably leading to rounding bugs...
v2: fix a logic bug in handling infs/nans.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-02 18:24:31 +02:00
Roland Scheidegger 9b329f4c09 gallivm: fix signed small float to float conversion
Introduced by 5f41e08cf3,
just a silly typo.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=62921.
2013-04-02 13:21:07 +02:00
Adam Jackson e26d5940ff gallivm: Minor comment cleanup
Signed-off-by: Adam Jackson <ajax@redhat.com>
2013-04-01 09:45:38 -04:00
Roland Scheidegger 5f41e08cf3 gallivm: consolidate some half-to-float and r11g11b10-to-float code
Similar enough that we can try to use shared code.
v2: fix a stupid bug using wrong variable causing mayhem with Inf and NaNs.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com
2013-03-29 16:39:40 +01:00
Zack Rusin f20f981553 gallivm: Implement the breakc instruction
Required by more modern examples. Like BRK but with a condition.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-27 03:53:02 -07:00
Zack Rusin b66ffcf2f8 gallivm: implement implicit primitive flushing
TGSI semantics currently require an implicit endprim at the end
of GS if an ending primitive hasn't been emitted.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-27 03:53:02 -07:00
Zack Rusin e96f4e3b85 gallium/llvm: implement geometry shaders in the llvm paths
This commits implements code generation of the geometry shaders in
the SOA paths. All the code is there but bugs are likely present.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-27 03:53:02 -07:00
Brian Paul eb92f89587 gallium: undef PACKAGE_* macros to silence warnings
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-03-25 12:24:11 -06:00
Roland Scheidegger 92b8a37fdf gallivm: move code for dealing with rgb9e5 and r11g11b10 formats to own file
This is really not generic conversion stuff and the code very particular to
these formats.
2013-03-24 22:54:45 +01:00
Roland Scheidegger b50e362dbb gallivm: Add code for rgb9e5 shared exponent format to float conversion
And use this (and the code for r11g11b10 packed float to float conversion)
in the soa texturing code (the generated code looks quite good).
Should be an order of magnitude faster probably than using the fallback
(not measured).
Tested with piglit texwrap GL_EXT_packed_float and
GL_EXT_texture_shared_exponent respectively (didn't find much else using
it).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-03-24 02:09:02 +01:00
Roland Scheidegger b101a094b5 llvmpipe: add EXT_packed_float render target format support
New conversion code to handle conversion from/to r11g11b10 AoS to/from
SoA floats, and also add code for conversion from rgb9e5 AoS to float SoA
(which works pretty much the same as r11g11b10 except for the packing).
(This code should also be used for texture sampling instead of
relying on u_format conversion but it's not yet, so rgb9e5 is unused.)
Unfortunately a crazy amount of hacks is necessary to get the conversion
code running in llvmpipe's generate_unswizzled_blend, which isn't well
suited for formats where the storage representation has nothing to do
with what's needed for blending (moreover, the conversion will convert
from packed AoS values, which is the storage format, to float SoA values,
because this is much more natural for the conversion, and likewise from
SoA values to packed AoS values - but the "blend" (which includes
trivial things like partial mask) works on AoS values, so incoming fs
values will go SoA->AoS, values from destination will go packed
AoS->SoA->AoS, then do blend, then AoS->SoA->packed AoS which probably
isn't the most efficient way though the shuffles are probably bearable).

Passes piglit fbo-blending-formats (with GL_EXT_packed_float parameter),
still need to verify Inf/NaNs (where most of the complexity in the
conversion comes from actually).

v2: drop the (very bogus) rgb9e5 part, and do component extraction
in the helper code for r11g11b10 to float conversion, making the code
slightly more compact (suggested by Jose), now that there are no other
callers left this works quite well. (Could do the same for the
opposite way but it's less than ideal there, final part of packing
needs to be done in caller anyway and there'd be another conditional.)

v3: minor style and comment fixes. Also fix a potential issue with
negative zero being potentially returned by max(src, zero) as we
don't have well-defined min/max behavior (fortunately no additonal cost).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-03-22 20:10:53 +01:00
Roland Scheidegger 5af7b45986 gallivm: fix return opcode handling in main function of a shader
If we're in some conditional or loop we must not return, or the code
after the condition is never executed.
(v2): And, we also can't just continue as nothing happened, since the
mask update code would later check if we actually have a mask, so we
need to remember that there was a return in main where we didn't exit
(to illustrate this, a ret in a if clause would cause a mask update
which is still ok as we're in a conditional, but after the endif the
mask update code would drop the mask hence bringing execution back to
pixels which should have their execution mask set to zero by the ret).
Thanks to Christoph Bumiller for figuring this out.

This fixes https://bugs.freedesktop.org/show_bug.cgi?id=62357.

Note: This is a candidate for the stable branches.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-03-19 18:04:05 +01:00
Christian König 21190fbd56 tgsi: use separate structure for indirect address v2
To further improve the optimization of source and destination
indirect addressing we need the ability to store a reference
to the declaration of the addressed operands.

Since most of the fields in tgsi_src_register doesn't apply for
an indirect addressing operand replace it with a separate
tgsi_ind_register structure and so make room for extra information.

v2: rename Declaration to ArrayID, put the ArrayID into () instead of []

Signed-off-by: Christian König <christian.koenig@amd.com>
2013-03-19 13:38:32 +01:00
Roland Scheidegger 5c41d1c222 gallivm: clean up passing derivatives around
Previously, the derivatives were calculated and passed in a packed form
to the sample code (for implicit derivatives, explicit derivatives were
packed to the same format).
There's several reasons why this wasn't such a good idea:
1) the derivatives may not even be needed (not as bad as it sounds since
llvm will just throw the calculations needed for them away but still)
2) the special packing format really shouldn't be part of the sampler
interface
3) depending what the sample code actually does the derivatives will
be processed differently, hence there is no "ideal" packing. For cube
maps with explicit derivatives (which we don't do yet) for instance the
packing looked downright useless, and for non-isotropic filtering we'd
need different calculations too.

So, instead just pass the derivatives as is (for explicit derivatives),
or let the rho calculating sample code calculate them itself. This still
does exactly the same packing stuff for implicit derivatives for now,
though explicit ones are handled in a more straightforward manner (quick
estimates show performance should be quite similar, though it is much
easier to follow and also does the rho calculation per-pixel until the
end, which we eventually need for spec compliance anyway).

No piglit changes.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-03-12 00:24:22 +01:00
Roland Scheidegger b3b3b389fa gallivm: add support for texel offsets for ordinary texturing.
This was previously only handled for texelFetch (much easier).
Depending on the wrap mode this works slightly differently (for somewhat
efficient implementation), hence have to do that separately in all roughly
137 places - it is easy if we use fixed point coords for wrapping, however
some wrapping modes are near impossible with fixed point (the repeat stuff)
hence we have to normalize the offsets if we can't do the wrapping in
unnormalized space (which is a division which is slow but should still be
much better than the alternative, which would be integer modulo for wrapping
which is just unusable). This should still give accurate results in all
cases that really matter, though it might be not quite conformant behavior
for some apis (but we have much worse problems there anyway even without
using offsets).
(Untested, no piglit test.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-03-02 02:54:30 +01:00
Maxence Le Doré 0845d16976 gallivm: fix mis-matching AOS instruction emission
Signed-off-by: José Fonseca <jfonseca@vmware.com>
2013-02-27 20:23:01 +00:00
Roland Scheidegger c0ba1080df draw: make sure pipeline is revalidated when sampler views or samplers change.
Since with llvm execution parts of sampler view and sampler state is baked into
the shader, we need to revalidate otherwise the wrong shader might get used.
(Not completely sure but I think this would not be required for non-llvm case,
along with everything else in these functions.)
This caused bugs in piglit arb_texture_buffer_object-formats, because we never
noticed that the view format changed.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-02-25 20:38:23 +01:00
Roland Scheidegger 20183177a5 llvmpipe: support GL_ARB_texture_buffer_object/GL_ARB_texture_buffer_range
This also fixes not honoring first/last_layer view parameters for array
textures, plus not honoring last_level view parameter for all textures
(neither is really used by OpenGL).
This mostly passes piglit arb_texture_buffer_object tests (it needs, however,
glsl 140 version override, plus GL 3.1 override, the latter only because
mesa does not allow ARB_tbo in non-core contexts).
Most arb_texture_buffer_object tests pass, with the exception of
arb_texture_buffer_object-formats. With "arb" parameter it passes most weirdo
formats before it segfaults in the state tracker, this looks to be some issue
with using legacy formats in core context (fails the same in softpipe).
With "core" parameter it passes with "fs", however fails with "vs" (for most
formats). This will be fixed later (debugging shows we're completely missing
the shader recompile depending on format).

v2: based on Jose's feedback, fix comments, variable/function names.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-02-25 20:38:23 +01:00
Roland Scheidegger 83f7cde182 gallivm: fix indirect src register fetches requiring bitcast
For constant and temporary register fetches, the bitcasts weren't done
correctly for the indirect case, leading to crashes due to type mismatches.
Simply do the bitcasts after fetching (much simpler than fixing up the load
pointer for the various cases).

This fixes https://bugs.freedesktop.org/show_bug.cgi?id=61036

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-02-20 19:37:30 +01:00
Roland Scheidegger f1ab67c13a gallivm/tgsi: fix issues with sample opcodes
We need to encode them as Texture instructions since the NumOffsets field
is encoded there. However, we don't encode the actual target in there, this
is derived from the sampler view src later.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-02-16 02:40:59 +01:00
Roland Scheidegger cb2e678294 gallivm/tgsi: fix src modifier fetching with non-float types.
Need to take the type into account. Also, if we want to allow
mov's with modifiers we need to pick a type (assume float).

v2: don't allow all modifiers on all type, in particular don't allow
absolute on non-float types and don't allow negate on unsigned.
Also treat UADD as signed (despite the name) since it is used
for handling both signed and unsigned integer arguments and otherwise
modifiers don't work.
Also add tgsi docs clarifying this.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-02-16 02:40:51 +01:00
Roland Scheidegger c25ae5d27b gallivm: fix issues with trunc/round/floor/ceil with no arch rounding
The emulation of these if there's no rounding instruction available
is a bit more complicated than what the code did.
In particular, doing fp-to-int/int-to-fp will not work if the exponent
is large enough (and with NaNs, Infs). Hence such values need to be filtered
out and the original value returned in this case (which fortunately should
always be exact). This comes at the expense of performance (if your cpu
doesn't support rounding instructions).
Furthermore, floor/ifloor/ceil/iceil were affected by precision issues for
values near negative (for floor) or positive (for ceil) zero, fix that as well
(fixing this issue might not actually be slower except for ceil/iceil if the
type is not signed which is probably rare - note iceil has no callers left
in any case).

Also add some new rounding test values in lp_test_arit to actually test
for that stuff (which previously would have failed without sse41).

This fixes https://bugs.freedesktop.org/show_bug.cgi?id=59701.
2013-02-16 02:40:44 +01:00
Roland Scheidegger 70daad6a99 gallivm: DIV shouldn't be deprecated.
(Though it looks glsl won't emit it.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-02-16 02:40:36 +01:00
Roland Scheidegger 427d36a227 gallium: fix tgsi SAMPLE_L opcode to use separate source for explicit lod
It looks like using coord.w as explicit lod value is a mistake, most likely
because some dx10 docs had it specified that way. Seems this was changed though:
http://msdn.microsoft.com/en-us/library/windows/desktop/hh447229%28v=vs.85%29.aspx
- let's just hope it doesn't depend on runtime build version or something.
Not only would this need translation (so go against the stated goal these
opcodes should be close to dx10 semantics) but it would prevent usage of this
opcode with cube arrays, which is apparently possible:
http://msdn.microsoft.com/en-us/library/windows/desktop/bb509699%28v=vs.85%29.aspx
(Note not only does this show cube arrays using explicit lod, but also the
confusion with this opcode: it lists an explicit lod parameter value, but then
states last component of location is used as lod).
(For "true" hw drivers, only nv50 had code to handle it, and it appears the
code was already right for the new semantics, though fix up the seemingly
wrong c/d arguments while there.)

v2: fix comment, separate out other changes.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-02-12 16:51:11 +01:00
Roland Scheidegger 614982d320 gallivm: fix up size queries for dx10 sviewinfo opcode
Need to calculate the number of mip levels (if it would be worthwile could
store it in dynamic state).
While here, the query code also used chan 2 for the lod value.
This worked with mesa state tracker but it seems safer to use chan 0.
Still passes piglit textureSize (with some handwaving), though the non-GL
parts are (largely) untested.

v2: clarify and expect the sviewinfo opcode to return ints, not floats,
just like the OpenGL textureSize (dx10 supports dst modifiers with resinfo).
Also simplify some code.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-02-08 18:54:40 -08:00
Roland Scheidegger 0a8043bb76 gallivm: hook up dx10 sampling opcodes
They are similar to old-style tex opcodes but with separate sampler and
texture units (and other arguments in different places).
Also adjust the debug tgsi dump code.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-02-08 18:54:40 -08:00
Roland Scheidegger 49f8825c49 gallivm: fix typo in lp_build_mul_norm
The signed case didn't do what the comment indicated. Should increase rounding
precision (at the expense of performance since the former code was effectively
a no-op).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-02-08 16:32:30 -08:00
Brian Paul 2d367e40d9 gallivm: implement support for SQRT opcode 2013-02-04 09:33:44 -07:00
Roland Scheidegger cbf0f66631 gallivm,draw,llvmpipe: mass rename of unit->texture_unit/sampler_unit
Make it obvious what "unit" this is (no change in functionality).
draw still uses "unit" in places where it changes the shader by adding
texture sampling itself - it seems like this can't work with shaders
using dx10-style sample opcodes (can't mix gl-style and dx10-style
sample instructions in a shader).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-01-28 06:58:06 -08:00
Roland Scheidegger c789b981b2 gallivm: split sampler and texture state
Split the sampler interface to use separate sampler and texture (sampler_view)
state. This is needed to support dx10-style sampling instructions.
This is not quite complete since both draw/llvmpipe don't really track
textures/samplers independently yet, as well as the gallivm code not quite
using the right sampler or texture index respectively (but it should work
for the sampling codes used by opengl).
We are however losing some optimizations in the process, apply_max_lod will
no longer work, and we potentially could end up with more (unnecessary)
recompiles (if switching textures with/without mipmaps only so it shouldn't
be too bad).

v2: don't use different callback structs for sampler/sampler view functions
(which just complicates things), fix up sampling code to actually use the
right texture or sampler index, and similar for llvmpipe/draw actually
distinguish between samplers and sampler views.

v3: fix more of PIPE_MAX_SAMPLER / PIPE_MAX_SHADER_SAMPLER_VIEWS mismatches
(both in draw and llvmpipe), based on feedback from José get rid of unneeded
static sampler derived state.(which also fixes the only 2 piglit regressions
due to a forgotten assignment), fix comments based on Brian's feedback.

v4: remove some accidental unrelated whitespace changes

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-01-28 06:50:36 -08:00
Roland Scheidegger 5785f22d23 gallivm: fix border color for integer textures
Need to bitcast the float border color (luckily we already get
the color as int just disguised as float).
Fixes piglit texwrap GL_EXT_texture_integer bordercolor.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-01-10 18:02:01 -08:00
Roland Scheidegger 31884946b5 gallivm: more integer texture format fetch fixes
Change the texel type to int/uint instead of float throughout the sampling
code which makes it easier to catch errors (as llvm will complain about wrong
types if we mistakenly treat these values as real floats somewhere).
This should also get things like e.g. sampler swizzles (for unused channels)
right.
This fixes piglit texture_integer_glsl130 test.
Border color not working (crashing) yet.
(These formats are not exposed yet in llvmpipe.)

v2: couple cleanups according to José's comments

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-01-10 18:02:01 -08:00
Brian Paul 1b6ba9c4c8 gallivm: support more immediates in lp_build_tgsi_info()
Bump limit from 32 to 128.

Fixes http://bugs.freedesktop.org/show_bug.cgi?id=58545
2013-01-04 15:30:45 -07:00
Roland Scheidegger dc613f11dd gallivm: fix conversion for pure integer formats
Since the idea is to just expand or shrink the bit width but not otherwise do
conversion we also need to adjust the sign bit according to src, otherwise
the conversion code will incorrectly clamp the values. (Since this only works
for casting to ordinary floats the norm and fixed bits should always be fine.)

This fixes the remaining piglit attribs GL3 failures.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2012-12-18 01:57:35 +01:00
Roland Scheidegger 3d14b25030 gallivm: fix texel fetch for array textures (2)
a460aea3f1 wasn't entirely correct,
since all coords are already ints hence need to skip the iround.
Passes piglit texelFetch with sampler1DArray/sampler2DArray.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2012-12-17 11:50:27 +01:00
Roland Scheidegger a460aea3f1 gallivm: fix texel fetch for array textures
Since we don't call lp_build_sample_common() in the texel fetch path we missed
the layer fixup code. If someone would have tried to do texelFetch with array
textures it would have crashed for sure.
Not really tested (can't run the piglit test being able to use texelFetch with
array samplers for now with llvmpipe).

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2012-12-13 19:17:09 +01:00
Tom Stellard ffe1794e0c gallivm: Lower TGSI_OPCODE_MUL to fmul by default
This fixes a number of crashes on r600g due to the fact that
lp_build_mul assumes vector types when optimizing mul to bit shifts.

This bug was uncovered by 0ad1fefd69
2012-12-10 19:22:37 -05:00
Dave Airlie 8000e7b4b6 llvmpipe: fix txq for 1d/2d arrays. (v3)
Noticed would fail, we were doing two things wrong

a) 1d arrays require the layers in height
b) minifying the layers field.

v2: don't change height code, fixup completely inside txq
as suggested by Roland.

v3: just add minify before texture array size

v1: Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-12-11 09:38:01 +10:00
Dave Airlie 41f4f094c4 llvmpipe: increase texture target width to reflect increase
Now that we've gone over 7.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-12-11 09:37:55 +10:00
José Fonseca e7bbd9c243 gallivm: Rudimentary native integer support.
Just enough for draw module to work ok.

This improves "piglit attribs GL3", though something fishy is still
happening with certain unsigned integer values.

Reviewed-by: Brian Paul <brianp@vmware.com>
2012-12-07 15:03:07 +00:00
José Fonseca 3b7ce72625 gallivm: Allow indirection from TEMP registers too.
The ADDR file is cumbersome for native integer capable drivers.  We
should consider deprecating it eventually, but this just adds support
for indirection from TEMP registers.

Reviewed-by: Brian Paul <brianp@vmware.com>
2012-12-07 15:03:07 +00:00
José Fonseca 1d35f77228 gallivm,llvmpipe,draw: Support multiple constant buffers.
Support 16 (defined in LP_MAX_TGSI_CONST_BUFFERS) as opposed to 32 (as
defined by PIPE_MAX_CONSTANT_BUFFERS) because that would make the jit
context become unnecessarily large.

v2: Bump limit from 4 to 16 to cover ARB_uniform_buffer_object needs,
per Dave Airlie.

Reviewed-by: Brian Paul <brianp@vmware.com>
2012-12-07 15:03:07 +00:00
José Fonseca 5e99cd9159 gallivm: Fix lerping of (un)signed normalized numbers.
Several issues actually:

- Fix a regression in unsigned normalized in the rescaling
  [0, 255] to [0, 256]

- Ensure we use signed shifts where appropriate (instead of
  unsigned shifts)

- Refactor the code slightly -- move all the logic inside
  lp_build_lerp_simple().

This change, plus an adjustment in the tolerance of signed normalized
results in piglit fbo-blending-formats fixes bug 57903

Reviewed-by: Brian Paul <brianp@vmware.com>
2012-12-06 15:58:40 +00:00
José Fonseca 33ffca713a gallivm: Fix lp_build_print_value of smaller integer types.
They need to be converted to the native integer type to prevent garbage
in higher order bits from being printed.

Reviewed-by: Brian Paul <brianp@vmware.com>
2012-12-06 15:58:40 +00:00
Vincent Lejeune 2d97f77b9f gallivm: Have a default emit function for min/max opcode
Reveiwed-by: Tom Stellard <thomas.stellard at amd.com>
2012-12-05 18:31:18 +01:00
Vincent Lejeune 0a2f58f6ed gallivm: have a default emit function for fdiv/rcp
Reveiwed-by: Tom Stellard <thomas.stellard at amd.com>
2012-12-05 18:30:39 +01:00
José Fonseca fb6d901ad2 gallivm: Re-add the kludge for lp_build_lerp of fixed point types.
I removed it in commit 7d44d354bd but
texture sample code still relies on it.

Not sure how to this cleanly, so put it pack for now.
2012-12-04 21:18:18 +00:00
José Fonseca 7d44d354bd gallivm: Generalize lp_build_mul and lp_build_lerp for signed normalized types.
This fixes fdo bug 57755 and most of the failures of piglit fbo-blending-formats
GL_EXT_texture_snorm.

GL_INTENSITY_SNORM is still failing, but problem is probably elsewhere,
as GL_R8_SNORM works fine.
2012-12-04 19:32:50 +00:00
Roland Scheidegger 041966801e gallivm: fix srgb format fetch
we need to rely on util code for fetching those, just like before
9f06061d50.
Fixes bugs 57699 and 57756.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2012-12-03 14:10:36 +00:00
Roland Scheidegger 587bd16d0d gallivm: drop border wrap clamping code
The border clamping code is unnecessary, since we don't care if a wrapped
coord value is -1 or <-1 (same for length vs. >length), in either case the
border handling code will mask out the offset and replace the texel value with
the border color.
Note that technically this is not entirely correct. Omitting clamping on the
float coords means that flt->int conversion may result in undefined values for
values of very large magnitude.
However there's no reason we should honor this here since:
a) we don't care for that for ordinary wrap modes in the aos code when
   converting coords and the problem is worse there (as we've got only
   effectively 24 instead of 32bits)
b) at least in some cases the clamping was done already in int space hence
   doing nothing to fix that problem.
c) with sse2 flt->int conversion with such values results in 0x80000000 which
   is just perfect (for clamp to border - not so much for the ordinary clamp to
   edge).

Reviewed-by: Brian Paul <brianp@vmware.com>
2012-12-01 17:05:48 +01:00
José Fonseca 9c9c18a395 gallivm: Fix lp_build_float_to_half.
The current implementation was close by not fully correct: several
operations that should be done in floating point were being done in
integer.

Fixes piglit fbo-clear-formats GL_ARB_texture_float

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2012-11-29 16:52:42 +00:00
Roland Scheidegger b5918d8f1d gallivm: fix a trivial txq issue for 2d shadow and cube shadow samplers
untested (couldn't get the piglit test to run even with version overrides)
but seemed blatantly wrong.
In any case it would only affect an error case which when it would happen
probably all hope is lost anyway.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2012-11-29 15:31:46 +01:00
Roland Scheidegger 95e03914d8 gallivm: support array textures
Support 1d and 2d array textures (including shadow samplers),
and (as a side effect mostly) also shadow cube samplers.
Seems to pass the relevant piglit tests both for sampling and rendering
to (though some require version overrides).
Since we don't support render target indices rendering to array textures
is still restricted to a single layer at a time.
Also, the min/max layer in the sampler view (which is unnecessary for GL)
is ignored (always use all layers).

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2012-11-29 15:28:25 +01:00
José Fonseca 9f06061d50 util/u_format: Kill util_format_is_array().
It is buggy (it was giving wrong results for some of the formats with
padding), and util_format_description::is_array already does precisely
what's intended.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2012-11-29 14:08:42 +00:00
Adhemerval Zanella e25abacc18 gallivm: Fix format manipulation for big-endian
This patch fixes various format manipulation for big-endian
architectures.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-11-29 11:54:18 +00:00
Adhemerval Zanella b772d784b2 gallivm: Add byte-swap construct calls
This patch adds two more functions in type conversions header:
* lp_build_bswap: construct a call to llvm.bswap intrinsic for an
  element
* lp_build_bswap_vec: byte swap every element in a vector base on the
  input and output types.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-11-29 11:54:14 +00:00
Adhemerval Zanella 86902b5134 gallivm: Fix vector constant for shuffle
This patch fixes the vector constant generation used for vector shuffle
for big-endian machines.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-11-29 11:54:10 +00:00
Adhemerval Zanella 29ba79b2c9 gallivm: clear Altivec NJ bit
This patch enforces the clear of NJ bit in VSCR Altivec register so
denormal numbers are handles as expected by IEEE standards.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-11-29 11:52:05 +00:00
Adhemerval Zanella 43ce9efdbf gallivm: Altivec floating-point rounding
This patch adds Altivec intrinsics for float vector types. It changes
the SSE specific definitions to a platform neutral and adds the calls
to Altivec intrinsic builder.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-11-29 11:52:00 +00:00
Adhemerval Zanella dd5c580816 gallivm: Altivec vector add/sub intrisics
This patch add correct vector addition and substraction intrisics when
using Altivec with PPC. Current code uses default path and LLVM backend
ends up issuing carry-out arithmetic instruction while it is expected
saturated ones.

It also includes a fix for PowerPC where char are unsigned by default,
resulting in bogus values for vector shifting.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-11-29 11:51:53 +00:00
Adhemerval Zanella 2ea7d3dabd gallivm: Altivec vector max/min intrisics
This patch adds the PPC Altivec instrics max/min instruction for
supported Altivec vector types (16xi8, 8xi16, 4xi32, 4xf32).

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-11-29 11:51:46 +00:00
Adhemerval Zanella 31c63b058e gallivm: Altivec pack/unpack intrisics
This patch adds PPC Altivec support for pack/unpack operations using Altivec
supported vector type (8xi8, 16xi16, 4xi32, 4xf32).

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-11-29 11:51:41 +00:00
James Benton fa1b481c09 llvmpipe: Unswizzled rendering.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-11-28 19:14:36 +00:00
James Benton 1d3789bccb gallivm: Updated lp_build_const_mask_aos to input number of channels.
Also updated lp_build_const_mask_aos_swizzled to reflect this.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-11-28 19:14:36 +00:00
James Benton e66ec7c46b gallivm: Added support for float to half-float conversion in lp_build_conv.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-11-28 19:14:36 +00:00
James Benton d7a8390a82 gallivm: Changed lp_build_pad_vector to correctly handle scalar argument.
Removed the lp_type argument as it was unnecessary.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-11-28 19:14:36 +00:00
James Benton 71c6fe76c0 gallivm: Add a function to generate lp_type for a format.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-11-28 19:14:36 +00:00
James Benton cd548836a1 gallivm: Add support for unorm16 in lp_build_mul.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-11-28 19:14:20 +00:00
Roland Scheidegger 406b76ca32 gallivm: fix multiple lods with different min/mag filter and wide vectors
broken since 529fe420ba,
I forgot some code, only added the comment...
Fixes bug 57644.
2012-11-28 18:07:27 +01:00
James Benton 978df710f2 gallivm: Fix bug in lp_build_one which would incorrectly return a vector for length 1.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-11-27 16:23:04 +00:00
Roland Scheidegger 529fe420ba gallivm: use the new mip per quad handling in texture fetch path
No longer have to split fetching into quads dynamically if mip levels
are not the same for all quads (aos sampling still always splits due
to performance reasons).
Instead handle multiple mip levels further down, minification etc. takes
this into account.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2012-11-27 03:30:55 +01:00
Roland Scheidegger 0b6554ba6f gallivm,llvmpipe: handle TXF (texelFetch) instruction, including offsets
This also adds some code to handle per-quad lods for more than 4-wide fetches,
because otherwise I'd have to integrate the texelFetch function into
the splitting stuff... (but it is not used yet outside texelFetch).
passes piglit fs-texelFetch-2D, fails fs-texelFetchOffset-2D due to I believe
a test error (results are undefined for out-of-bounds fetches, we return
whatever is at offset 0, whereas the test expects [0,0,0,1]).
Texel offsets are only handled by texelFetch for now, though the interface
can handle it for everything.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2012-11-27 03:26:49 +01:00
Roland Scheidegger 26097c4855 gallivm,draw,llvmpipe: use base ptr + mip offsets instead of mip pointers
This might have a slight overhead but handling mip offsets more like
the width (and image) strides should make some things easier (mip level
being just part of the offset calculation) later.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2012-11-12 21:02:59 +01:00
Alexander V. Nikolaev eaa8e56108 gallium/gallivm: code generation options for LLVM 3.1+
LLVM 3.1+ haven't more "extern unsigned llvm::StackAlignmentOverride"
and friends for configuring code generation options, like stack
alignment.

So I restrict assiging of lvm::StackAlignmentOverride and other
variables to LLVM 3.0 only, and wrote similiar code using
TargetOptions.

This patch fix segfaulting of WINE using llvmpipe built with LLVM 3.1

Signed-off-by: Alexander V. Nikolaev <avn@daemon.hole.ru>
Signed-off-by: José Fonseca <jose.r.fonseca@gmail.com>
2012-10-28 10:34:26 +00:00
Brian Paul 369b5a311c gallivm/llvmpipe: fix 64-bit %ll format compiler warnings for mingw32
Use the PRIx64 and PRIu64 format macros from inttypes.h.  We made a
similar change in prog_print.c in df2d81ea59.
2012-10-26 10:59:29 -06:00
José Fonseca aa2067c757 gallivm: Hide AVX support when requested by LP_NATIVE_VECTOR_WIDTH or unsupported by LLVM.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2012-10-17 18:07:43 +01:00
Will Schmidt 54821c0e99 gallivm: Use mcjit for ppc_64 architecture
Per commentary and direction in the LLVM community, support for ppc64 is
going into MCJIT rather than the old JIT.  There is no existing support
in prior llvm versions, so no need to specify LLVM version numbers.

Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-10-17 18:07:43 +01:00
José Fonseca 2ab6e67d90 Revert "gallivm: Don't use llvm.x86.avx.max/min.ps.256 inadvertently."
This reverts commit bf2edc776b.
2012-10-17 15:04:20 +01:00
José Fonseca bf2edc776b gallivm: Don't use llvm.x86.avx.max/min.ps.256 inadvertently.
Could happen when CPU supports AVX, but LLVM doesn't.
2012-10-12 18:52:28 +01:00
Roland Scheidegger d366520e85 gallivm: fix rsqrt failures
lp_build_rsqrt initially did not do any newton-raphson step. This meant that
precision was only ~11 bits, but this handled both input 0.0 and +infinity
correctly. It did not however handle input 1.0 accurately, and denormals
always generated infinity result.
Doing a newton-raphson step increased precision significantly (but notably
input 1.0 still doesn't give output 1.0), however this fails for inputs
0.0 and infinity (both result in NaNs).
Try to fix this up by using cmp/select but since this is all quite fishy
(and still doesn't handle denormals) disable for now. Note that even with
workarounds it should still have been faster since the fallback uses sqrt/div
(which both use the usually unpipelined and slow divider hw).
Also add some more test values to lp_test_arit and test lp_build_rcp() too while
there.

v2: based on José's feedback, avoid hacky infinity definition which doesn't
work with msvc (unfortunately using INFINITY won't cut it neither on non-c99
compilers) in lp_build_rsqrt, and while here fix up the input infinity case
too (it's disabled anyway). Only test infinity input case if we have c99,
and use float cast for calculating reference rsqrt value so we really get
what we expect.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2012-10-12 18:51:18 +01:00
Marek Olšák d284613422 llvmpipe: remove unused variables to fix compile warnings 2012-10-09 01:10:58 +02:00
José Fonseca 7eb5040197 gallivm,llvmpipe: Use 4-wide vectors on AMD Bulldozer.
8-wide vectors is slower.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2012-09-04 08:49:00 +01:00
Brian Paul f7af4beae5 gallivm: fix crash in lp_sampler_static_state()
Fixes WebGL conformance/uniforms/uniform-default-values.html crash.

We need to check for the null view pointer before accessing view->texture.

Fixes http://bugs.freedesktop.org/show_bug.cgi?id=53317

Note: This is a candidate for the 8.0 branch.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2012-08-10 09:45:25 -06:00
Brian Paul b4d6502fcd gallivm: remove unused src_elem_type variable
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2012-08-08 09:39:36 -06:00
Vinson Lee c3894bc2d5 gallivm: Add constructor for raw_debug_ostream.
Fixes uninitialized scalar field defect reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2012-08-06 22:07:31 -07:00
José Fonseca c30bf68946 gallivm: Prefer the standard JIT engine whenever possible.
Testing shows that the standard JIT engine retrofited with AVX support is quite
stable and as capable to handle AVX instructions as MC-JIT is.

And the old JIT is much more memory efficient, as we don't need to
allocate one engine instance per shader, as we do for MC-JIT due to its
incompleteness.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2012-07-23 17:46:38 +01:00
Brian Paul c4d2a14d6e gallivm: silence uninitialized variable warnings 2012-07-17 14:41:29 -06:00
Roland Scheidegger bf484024b9 gallivm: (trivial) remove unnecessary bogus include 2012-07-17 17:11:18 +02:00
José Fonseca 3469715a8a gallivm,draw,llvmpipe: Support wider native registers.
Squashed commit of the following:

commit 7acb7b4f60dc505af3dd00dcff744f80315d5b0e
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon Jul 9 17:46:31 2012 +0100

    draw: Don't use dynamically sized arrays.

    Not supported by MSVC.

commit 5810c28c83647612cb372d1e763fd9d7780df3cb
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon Jul 9 17:44:16 2012 +0100

    gallivm,llvmpipe: Don't use expressions with PIPE_ALIGN_VAR().

    MSVC doesn't accept exceptions in _declspec(align(...)). Use a
    define instead.

commit 8aafd1457ba572a02b289b3f3411e99a3c056072
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon Jul 9 17:41:56 2012 +0100

    gallium/util: Make u_cpu_detect.h header C++ safe.

commit 5795248350771f899cfbfc1a3a58f1835eb2671d
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon Jul 2 12:08:01 2012 +0100

    gallium/util: Add ULL suffix to large constants.

    As suggested by Andy Furniss: it looks like some old gcc versions
    require it.

commit 4c66c22727eff92226544c7d43c4eb94de359e10
Author: José Fonseca <jfonseca@vmware.com>
Date:   Fri Jun 29 13:39:07 2012 +0100

    gallium/util: Truly disable INF/NAN tests on MSVC.

    Thanks to Brian for spotting this.

commit 8bce274c7fad578d7eb656d9a1413f5c0844c94e
Author: José Fonseca <jfonseca@vmware.com>
Date:   Fri Jun 29 13:39:07 2012 +0100

    gallium/util: Disable INF/NAN tests on MSVC.

    Somehow they are not recognized as constants.

commit 6868649cff8d7fd2e2579c28d0b74ef6dd4f9716
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Jul 5 15:05:24 2012 +0200

    gallivm: Cleanup the 2 x 8 float -> 16 ub special path in lp_build_conv.

    No behaviour change intended, like 7b98455fb40c2df84cfd3cdb1eb7650f67c8a751.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit 5147a0949c4407e8bce9e41d9859314b4a9ccf77
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Jul 5 14:28:19 2012 +0200

    gallivm: (trivial) fix issues with multiple-of-4 texture fetch

    Some formats can't handle non-multiple of 4 fetches I believe, but
    everything must support length 1 and multiples of 4.
    So avoid going to scalar fetch (which is very costly) just because length
    isn't 4.
    Also extend the hack to not use shift with variable count for yuv formats to
    arbitrary length (larger than 1) - doesn't matter how many elements we
    have we always want to avoid it unless we have variable shift count
    instruction (which we should get with avx2).

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit 87ebcb1bd71fa4c739451ec8ca89a7f29b168c08
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Jul 4 02:09:55 2012 +0200

    gallivm: (trivial) fix typo for wrap repeat mode in linear filtering aos code

    This would lead to bogus coordinates at the edges.
    (undetected by piglit because this path is only taken for block-based
    formats).

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit 3a42717101b1619874c8932a580c0b9e6896b557
Author: José Fonseca <jfonseca@vmware.com>
Date:   Tue Jul 3 19:42:49 2012 +0100

    gallivm: Fix TGSI integer translation with AVX.

commit d71ff104085c196b16426081098fb0bde128ce4f
Author: José Fonseca <jfonseca@vmware.com>
Date:   Fri Jun 29 15:17:41 2012 +0100

    llvmpipe: Fix LLVM JIT linear path.

    It was not working properly because it was looking at the JIT function
    before it was actually compiled.

    Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit a94df0386213e1f5f9a6ed470c535f9688ec0a1b
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Jun 28 18:07:10 2012 +0100

    gallivm: Refactor lp_build_broadcast(_scalar) to share code.

    Doesn't really change the generated assembly, but produces more compact IR,
    and of course, makes code more consistent.

    Reviewed-by: Brian Paul <brianp@vmware.com>

commit 66712ba2731fc029fa246d4fc477d61ab785edb5
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Jun 27 17:30:13 2012 +0100

    gallivm: Make LLVMContextRef a singleton.

    There are any places inside LLVM that depend on it.  Too many to attempt
    to fix.

    Reviewed-by: Brian Paul <brianp@vmware.com>

commit ff5fb7897495ac263f0b069370fab701b70dccef
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Jun 28 18:15:27 2012 +0200

    gallivm: don't use 8-wide texture fetch in aos path

    This appears to be a slight loss usually.
    There are probably several reasons for that:
    - fetching itself is scalar
    - filtering is pure int code hence needs splitting anyway, same
      for the final texel offset calculations
    - texture wrap related code, which can be done 8-wide, is slightly more
      complex with floats (with clamp_to_edge) and float operations generally
      more costly hence probably not much faster overall
    - the code needed to split when encountering different mip levels for the
      quads, adding complexity
    So, just split always for aos path (but leave it 8-wide for soa, since we
    do 8-wide filtering there when possible).
    This should certainly be revisited if we'd have avx2 support.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit ce8032b43dcd8e8d816cbab6428f54b0798f945d
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Jun 27 18:41:19 2012 +0200

    gallivm: (trivial) don't extract fparts variable if not needed

    Did not have any consequences but unnecessary.

commit aaa9aaed8f80dc282492f62aa583a7ee23a4c6d5
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Jun 27 18:09:06 2012 +0200

    gallivm: fix precision issue in aos linear int wrap code

    now not just passes at a quick glance but also with piglit...
    If we do the wrapping with floats, we also need to set the
    weights accordingly. We can potentially end up with different
    (integer) coordinates than what the integer calculations would
    have chosen, which means the integer weights calculated previously
    in this case are completely wrong. Well at least that's what I think
    happens, at least recalculating the weights helps.
    (Some day really should refactor all the wrapping, so we do whatever is
    fastest independent of 16bit int aos or 32bit float soa filtering.)

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit fd6f18588ced7ac8e081892f3bab2916623ad7a2
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Jun 27 11:15:53 2012 +0100

    gallium/util: Fix parsing of options with underscore.

    For example

      GALLIVM_DEBUG=no_brilinear

    which was being parsed as two options, "no" and "brilinear".

commit 09a8f809088178a03e49e409fa18f1ac89561837
Author: James Benton <jbenton@vmware.com>
Date:   Tue Jun 26 15:00:14 2012 +0100

    gallivm: Added a generic lp_build_print_value which prints a LLVMValueRef.

    Updated lp_build_printf to share common code.
    Removed specific lp_build_print_vecX.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>
    Reviewed-by: Brian Paul <brianp@vmware.com>

commit e59bdcc2c075931bfba2a84967a5ecd1dedd6eb0
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed May 16 15:00:23 2012 +0100

    draw,llvmpipe: Avoid named struct types on LLVM 3.0 and later.

    Starting with LLVM 3.0, named structures are meant not for debugging, but
    for recursive data types, previously also known as opaque types.

    The recursive nature of these types leads to several memory management
    difficulties.  Given that we don't actually need recursive types, avoid
    them altogether.

    This is an attempt to address fdo bugs 41791 and 44466. The issue is
    somewhat random so there's no easy way to check how effective this is.

    Cherry-picked from 9af1ba565d

commit df6070f618a203c7a876d984c847cde4cbc26bdb
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Jun 27 14:42:53 2012 +0200

    gallivm: (trivial) fix typo in faster aos linear int wrap code

    no longer crashes, now REALLY tested.

commit d8f98dce452c867214e6782e86dc08562643c862
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Jun 26 18:20:58 2012 +0200

    llvmpipe: (trivial) remove bogus optimization for float aos repeat wrap

    This optimization for nearest filtering on the linear path generated
    likely bogus results, and the int path didn't have any optimizations
    there since the only shader using force_nearest apparently uses
    clamp_to_edge not repeat wrap anyway.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit c4e271a0631087c795e756a5bb6b046043b5099d
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Jun 26 23:01:52 2012 +0200

    gallivm: faster repeat wrap for linear aos path too

    Even if we already have scaled integer coords, it's way faster to use
    the original float coord (plus some conversions) rather than use URem.
    The choice of what to do for texture wrapping is not really tied to int
    aos or float soa filtering though for some modes there can be some gains
    (because of easier weight calculations).

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit 1174a75b1806e92aee4264ffe0ffe7e70abbbfa3
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Jun 26 14:39:22 2012 +0200

    gallivm: improve npot tex wrap repeat in linear soa path

    URem gets translated into series of scalar divisions so
    just about anything else is faster.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit f849ffaa499ed96fa0efd3594fce255c7f22891b
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Jun 26 00:40:35 2012 +0100

    gallivm: (trivial) fix near-invisible shift-space typo

    I blame the keyboard.

commit 5298a0b19fe672aebeb70964c0797d5921b51cf0
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 25 16:24:28 2012 +0200

    gallivm: add new intrinsic helper to deal with arbitrary vector length

    This helper will split vectors which are too large for the hw, or expand
    them if they are too small, so a caller of a function using intrinsics which
    uses such sizes need not split (or expand) the vectors manually and the
    function will still use the intrinsic instead of dropping back to generic
    llvm code. It can also accept scalars for use with pseudo-vector intrinsics
    (only useful for float arguments, all x86 scalar simd float intrinsics use
    4vf32).
    Only used for lp_build_min/max() for now (also added the scalar float case
    for these while there). (Other basic binary functions could use it easily,
    whereas functions with a different interface would need different helpers.)
    Expanding vectors isn't widely used, because we always try to use
    build contexts with native hw vector sizes. But it might (or not) be nicer
    if this wouldn't need to be done, the generated code should in theory stay
    the same (it does get hit by lp_build_rho though already since we
    didn't have a intrinsic for the scalar lp_build_max case before).

    v2: incorporated Brian's feedback, and also made the scalar min/max case work
        instead of crash (all scalar simd float intrinsics take 4vf32 as argument,
        probably the reason why it wasn't used before).
        Moved to lp_bld_intr based on José's request, and passing intrinsic size
        instead of length.
        Ideally we'd derive the source type info from the passed in llvm value refs
        and process some llvmtype return type so we could handle intrinsics where
        the source and destination type isn't the same (like float/int conversions,
        packing instructions) but that's a bit too complicated for now.

    Reviewed-by: Brian Paul <brianp@vmware.com>
    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit 01aa760b99ec0b2dc8ce57a43650e83f8c1becdf
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 25 16:19:18 2012 +0200

    gallivm: (trivial) increase max code size for shader disassembly

    64kB was just short of what I needed (which caused a crash) hence
    increase to 96kB (should probably be smarter about that).

commit 74aa739138d981311ce13076388382b5e89c6562
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 25 11:53:29 2012 +0100

    gallivm: simplify aos float tex wrap repeat nearest

    just handle pot and npot the same. The previous pot handling
    ended up with exactly the same instructions plus 2 more (leave it
    in the soa path though since it is probably still cheaper there).
    While here also fix a issue which would cause a crash after an assert.

commit 0e1e755645e9e49cfaa2025191e3245ccd723564
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 25 11:29:24 2012 +0100

    gallivm: (trivial) skip floor rounding in ifloor when not signed

    This was only done for the non-sse41 case before, but even with
    sse41 this is obviously unnecessary (some callers already call
    itrunc in this case anyway but some might not).

commit 7f01a62f27dcb1d52597b24825931e88bae76f33
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 25 11:23:12 2012 +0100

    gallivm: (trivial) fix bogus comments

commit 5c85be25fd82e28490274c468ce7f3e6e8c1d416
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Jun 20 11:51:57 2012 +0100

    translate: Free elt8_func/elt16_func too.

    These were leaking.

    Reviewed-by: Brian Paul <brianp@vmware.com>
    Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit 0ad498f36fb6f7458c7cffa73b6598adceee0a6c
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Jun 19 15:55:34 2012 +0200

    gallivm: fix bug for tex wrap repeat with linear sampling in aos float path

    The comparison needs to be against length not length_minus_one, otherwise
    the max texel is never chosen (for the second coordinate).

    Fixes piglit texwrap-1D-npot-proj (and 2D/3D versions).

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit d1ad65937c5b76407dc2499b7b774ab59341209e
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Jun 19 16:13:43 2012 +0200

    gallivm: simplify soa tex wrap repeat with npot textures and no mip filtering

    Similar to what is already done in aos sampling for the float path (but not
    the int path since we don't get normalized float coordinates there).
    URem is expensive and the calculation is done trivially with
    normalized floats instead (at least with sse41-capable cpus).
    (Some day should probably do the same for the mip filter path but it's much
    more complicated there hence the gain is smaller.)

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit e1e23f57ba9b910295c306d148f15643acc3fc83
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 18 20:38:56 2012 +0200

    llvmpipe: (trivial) remove duplicated function declaration

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit 07ca57eb09e04c48a157733255427ef5de620861
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 18 20:37:34 2012 +0200

    llvmpipe: destroy setup variants on context destruction

    lp_delete_setup_variants() used to be called in garbage collection,
    but this no longer exists hence the setup shaders never got freed.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit ed0003c633859a45f9963a479f4c15ae0ef1dca3
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 18 16:25:29 2012 +0100

    gallivm: handle different ilod parts for multiple quad sampling

    This fixes filtering when the integer part of the lod is not the same
    for all quads. I'm not fully convinced of that solution yet as it just
    splits the vector if the levels to be sampled from are different.
    But otherwise we'd need to do things like some minify steps, and getting
    mip level base address separately anyway hence it wouldn't really look
    like much of a win (and making the code even more complex).
    This should now give identical results to single quad sampling.

commit 8580ac4cfc43a64df55e84ac71ce1a774d33c0d2
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Jun 14 18:14:47 2012 +0200

    gallivm: de-duplicate sample code common to soa and aos sampling

    There doesn't seem to be any reason why this code dealing with cube face
    selection, lod and mip level calculation is separate in aos and
    soa sampling, and I am sick of having it to change in both places.

commit fb541e5f957408ce305b272100196f1e12e5b1e8
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Jun 14 18:15:41 2012 +0200

    gallivm: do mip filtering with per quad lod_fpart

    This gives better results for mip filtering, though the generated code might
    not be optimal. For now it also creates some artifacts if the lod_ipart isn't
    the same for all quads, since instead of using the same mip weight for all
    quads as previously (which just caused non-smooth gradients) this now will
    use the right weights but with the wrong mip level in this case (can easily
    be seen with things like texfilt, mipmap_tunnel).
    v2: use logic helper suggested by José, and fix issue with negative lod_fpart
        values

commit f1cc84eef7d826a20fab6cd8ccef9a275ff78967
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Jun 13 18:35:25 2012 +0200

    gallivm: (trivial) fix bogus assert in lp_build_unpack_broadcast_aos_scalars

commit 7c17dbae8ae290df9ce0f50781a09e8ed640c044
Author: James Benton <jbenton@vmware.com>
Date:   Tue Jun 12 12:11:14 2012 +0100

    util: Reimplement half <-> float conversions.

    Removed u_half.py used to generate the table for previous method.

    Previous implementation of float to half conversion was faulty for
    denormalised and NaNs and would require extra logic to fix,
    thus making the speedup of using tables irrelevant.

commit 7762f59274070e1dd4b546f5cb431c2eb71ae5c3
Author: James Benton <jbenton@vmware.com>
Date:   Tue Jun 12 12:12:16 2012 +0100

    tests: Updated tests to properly handle NaN for half floats.

commit fa94c135aea5911fd93d5dfb6e6f157fb40dce5e
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 11 18:33:10 2012 +0200

    gallivm: do mip level calculations per quad

    This is the final piece which shouldn't change the rendering output yet.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit 23cbeaddfe03c09ca18c45d28955515317ffcf4c
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 9 00:54:21 2012 +0200

    gallivm: do per-quad cube face selection

    Doesn't quite fix the piglit cubemap test (not sure why actually)
    but doing per-quad face selection is doing the right thing and
    definitely an improvement.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit abfb372b3702ac97ac8b5aa80ad1b94a2cc39d33
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 11 18:22:59 2012 +0200

    gallivm: do all lod calculations per quad

    Still no functional change but lod is now converted to scalar after
    lod calculations.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit 519368632747ae03feb5bca9c655eccbc5b751b4
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 16:46:10 2012 +0100

    gallivm: Added support for half-float to float conversion in lp_build_conv.

    Updated various utility functions to support this change.

commit 135b4d683a4c95f7577ba27b9bffa4a6fbd2c2e7
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 16:02:46 2012 +0100

    gallivm: Added function for half-float to float conversion.

    Updated lp_build_format_aos_array to support half-float source.

commit 37d648827406a20c5007abeb177698723ed86673
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 14:55:18 2012 +0100

    util: Updated u_format_tests to rigidly test half-float boundary values.

commit 2ad18165d96e578aa9046df7c93cb1c3284d8c6b
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 14:54:16 2012 +0100

    llvmpipe: Updated lp_test_format to properly handle Inf/NaN results.

commit 78740acf25aeba8a7d146493dd5c966e22c27b73
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 14:53:30 2012 +0100

    util: Added functions for checking NaN / Inf for double and half-floats.

commit 35e9f640ae01241f9e0d67fe893bbbf564c05809
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu May 24 21:05:13 2012 +0200

    gallivm: Fix calculating rho for 3d textures for the single-quad case

    Discovered by accident, this looks like a very old typo bug.

commit fc1220c636326536fd0541913154e62afa7cd1d8
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu May 24 21:04:59 2012 +0200

    gallivm: do calcs per-quad in lp_build_rho

    Still convert to scalar at the end of the function.

commit 50a887ffc550bf310a6988fa2cea5c24d38c1a41
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon May 21 23:21:50 2012 +0200

    gallivm: (trivial) return scalar in lp_build_extract_range for length 1 vectors

    Our type system on top of llvm's one doesn't generally support vectors of
    length 1, instead using scalars. So we should return a scalar from this
    function instead of having to bitcast the vector with length 1 later elsewhere.

commit 80c71c621f9391f0f9230460198d861643324876
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 17:49:15 2012 +0100

    draw: Fixed bad merge error

commit c47401cfad0c9167de20ff560654f533579f452c
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 15:29:30 2012 +0100

    draw: Updated store_clip to store whole vectors instead of individual elements.

commit 2d9c1ad74b0b0b41861fffcecde39f09cc27f1cf
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 15:28:32 2012 +0100

    gallivm: Added lp_build_fetch_rgba_aos_array.

    A version of lp_build_fetch_rgba_aos which is targeted at simple array formats.

    Reads the whole vector from memory in one, instead of reading each element
    individually.

    Tested with mesa tests and demos.

commit ff7805dc2b6ef6d8b11ec4e54aab1633aef29ac8
Author: James Benton <jbenton@vmware.com>
Date:   Tue May 22 15:27:40 2012 +0100

    gallivm: Added lp_build_pad_vector.

    This function pads a vector with undef to a desired length.

commit 701f50acef24a2791dabf4730e5b5687d6eb875d
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 17:27:19 2012 +0100

    util: Added util_format_is_array.

    This function checks whether a format description is in a simple array format.

commit 5e0a7fa543dcd009de26f34a7926674190fa6246
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 19:13:47 2012 +0100

    draw: Removed draw_llvm_translate_from and draw/draw_llvm_translate.c.

    This is "replaced" by adding an optimised path in lp_build_fetch_rgba_aos
    in an upcoming patch.

commit 8c886d6a7dd3fb464ecf031de6f747cb33e5361d
Author: James Benton <jbenton@vmware.com>
Date:   Wed May 16 15:02:31 2012 +0100

    draw: Modified store_aos to write the vector as one, not individual elements.

commit 37337f3d657e21dfd662c7b26d61cb0f8cfa6f17
Author: James Benton <jbenton@vmware.com>
Date:   Wed May 16 14:16:23 2012 +0100

    draw: Changed aos_to_soa to use lp_build_transpose_aos.

commit bd2b69ce5d5c94b067944d1dcd5df9f8e84548f1
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 19:14:27 2012 +0100

    draw: Changed soa_to_aos to use lp_build_transpose_aos.

commit 0b98a950d29a116e82ce31dfe7b82cdadb632f2b
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 18:57:45 2012 +0100

    gallivm: Added lp_build_transpose_aos which converts between aos and soa.

commit 69ea84531ad46fd145eb619ed1cedbe97dde7cb5
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 18:57:01 2012 +0100

    gallivm: Added lp_build_interleave2_half aimed at AVX unpack instructions.

commit 7a4cb1349dd35c18144ad5934525cfb9436792f9
Author: José Fonseca <jfonseca@vmware.com>
Date:   Tue May 22 11:54:14 2012 +0100

    gallivm: Fix build on Windows.

    MC-JIT not yet supported there.

    Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit afd105fc16bb75d874e418046b80d9cc578818a1
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 16:17:26 2012 +0100

    llvmpipe: Added a error counter to lp_test_conv.

    Useful for keeping track of progress when fixing errors!

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit b644907d08c10a805657841330fc23db3963d59c
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 16:16:46 2012 +0100

    llvmpipe: Changed known failures in lp_test_conv.

    To comply with the recent fixes to lp_bld_conv.

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit d7061507bd94f6468581e218e61261b79c760d4f
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 16:14:38 2012 +0100

    llvmpipe: Added fixed point types tests to lp_test_conv.

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit 146b3ea39b4726dbe125ac666bd8902ea3d6ca8c
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 16:26:35 2012 +0100

    llvmpipe: Changed lp_test_conv src/dst alignment to be correct.

    Now based on the define rather than a fixed number.

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit f3b57441f834833a4b142a951eb98df0aa874536
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 16:06:44 2012 +0100

    gallivm: Fixed erroneous optimisation in lp_build_min/max.

    Previously assumed normalised was 0 to 1, but it can be -1 to 1
    if type is signed.
    Tested with lp_test_conv and lp_test_format, reduced errors.

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit a0613382e5a215cd146bb277646a6b394d376ae4
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 16:04:49 2012 +0100

    gallivm: Compensate for lp_const_offset in lp_build_conv.

    Fixing a /*FIXME*/ to remove errors in integer conversion in lp_build_conv.
    Tested using lp_test_conv and lp_test_format, reduced errors.

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit a3d2bf15ea345bc8a0664f8f441276fd566566f3
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 18 16:01:25 2012 +0100

    gallivm: Fixed overflow in lp_build_clamped_float_to_unsigned_norm.

    Tested with lp_test_conv and lp_test_format, reduced errors.

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit e7b1e76fe237613731fa6003b5e1601a2e506207
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon May 21 20:07:51 2012 +0100

    gallivm: Fix build with LLVM 2.6

    Trivial, and useful.

commit d3c6bbe5c7f5ba1976710831281ab1b6a631082d
Author: José Fonseca <jfonseca@vmware.com>
Date:   Tue May 15 17:15:59 2012 +0100

    gallivm: Enable MCJIT/AVX with vanilla LLVM 3.1.

    Add the necessary C++ glue, so that we don't need any modifications
    to the soon to be released LLVM 3.1.

    Reviewed-by: Roland Scheidegger <sroland@vmware.com>

commit 724a019a14d40fdbed21759a204a2bec8a315636
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon May 14 22:04:06 2012 +0100

    gallivm: Use HAVE_LLVM 0x0301 consistently.

commit af6991e2a3868e40ad599b46278551b794839748
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon May 14 21:49:06 2012 +0100

    gallivm: Add MCRegisterInfo.h to silence benign warnings about missing implementation.

    Trivial.

commit 6f8a1d75458daae2503a86c6b030ecc4bb494e23
Author: Vinson Lee <vlee@freedesktop.org>
Date:   Mon Apr 2 22:14:15 2012 -0700

    gallivm: Pass in a MCInstrInfo to createMCInstPrinter on llvm-3.1.

    llvm-3.1svn r153860 makes MCInstrInfo available to the MCInstPrinter.

    Signed-off-by: Vinson Lee <vlee@freedesktop.org>
    Reviewed-by: Brian Paul <brianp@vmware.com>

commit 62555b6ed8760545794f83064e27cddcb3ce5284
Author: Vinson Lee <vlee@freedesktop.org>
Date:   Tue Mar 27 21:51:17 2012 -0700

    gallivm: Fix method overriding in raw_debug_ostream.

    Use matching type qualifers to avoid method hiding.

    Signed-off-by: Vinson Lee <vlee@freedesktop.org>
    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit 6a9bd784f4ac68ad0a731dcd39e5a3c39989f2be
Author: Vinson Lee <vlee@freedesktop.org>
Date:   Tue Mar 13 22:40:52 2012 -0700

    gallivm: Fix createOProfileJITEventListener namespace with llvm-3.1.

    llvm-3.1svn r152620 refactored the OProfile profiling code.
    createOProfileJITEventListener was moved from the llvm namespace to the
    llvm::JITEventListener namespace.

    Signed-off-by: Vinson Lee <vlee@freedesktop.org>
    Reviewed-by: Brian Paul <brianp@vmware.com>

commit b674955d39adae272a779be85aa1bd665de24e3e
Author: Vinson Lee <vlee@freedesktop.org>
Date:   Mon Mar 5 22:00:40 2012 -0800

    gallivm: Pass in a MCRegisterInfo to MCInstPrinter on llvm-3.1.

    llvm-3.1svn r152043 changes createMCInstPrinter to take an additional
    MCRegisterInfo argument.

    Signed-off-by: Vinson Lee <vlee@freedesktop.org>
    Reviewed-by: Brian Paul <brianp@vmware.com>

commit 11ab69971a8a31c62f6de74905dbf8c02884599f
Author: Vinson Lee <vlee@freedesktop.org>
Date:   Wed Feb 29 21:20:53 2012 -0800

    Revert "gallivm: Change getExtent and readByte to non-const with llvm-3.1."

    This reverts commit d5a6c17254.

    llvm-3.1svn r151687 makes MemoryObject accessor members const again.

    Signed-off-by: Vinson Lee <vlee@freedesktop.org>
    Reviewed-by: Brian Paul <brianp@vmware.com>

commit 339960c82d2a9f5c928ee9035ed31dadb7f45537
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon May 14 16:19:56 2012 +0200

    gallivm: (trivial) fix assertion failure for mipmapped 1d textures

    In lp_build_rho, we may end up with a 1-element vector (for mipmapped 1d
    textures), but in this case we require the type to be a non-vector type,
    so need a cast.

commit 9d73edb727bd6d196030dc3026b7bf0c574b3e19
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu May 10 18:12:07 2012 +0200

    gallivm: prepare for per-quad lod calculations for large vectors

    to be able to handle multiple quads at once in texture sampling and still
    do lod calculations per quad, it is necessary to get the per-quad derivatives
    into the lp_build_rho function.
    Until now these derivative values were just scalars, which isn't going to work.
    So we now use vectors, and since the interface needs to change we also do some
    different (slightly more efficient) packing of the values.
    For 8-wide vectors the packed derivative values for 3 coords would look like
    this, this scales to a arbitrary (multiple of 4) vector size:
    ds1dx ds1dy dt1dx dt1dy ds2dx ds2dy dt2dx dt2dy
    dr1dx dr1dy _____ _____ dr2dx dr2dy _____ _____
    The second vector will be unused for 1d and 2d textures.
    To facilitate future changes the derivative values are put into a struct, since
    quite some functions just pass these values through.
    The generated code seems to be very slightly better for 2d textures (with
    4-wide vectors) than before with sse2 (if you have a cpu with physical 128bit
    simd units - otherwise it's probably not a win).
    v2: suggestions from José, rename variables, add comments, use swizzle helper

commit 0aa21de0d31466dac77b05c97005722e902517b8
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu May 10 18:10:31 2012 +0200

    gallivm: add undefined swizzle handling to lp_build_swizzle_aos

    This is useful for vectors with "holes", it lets llvm choose the most
    efficient shuffle instructions if some elements aren't needed without having to
    worry what elements to manually pick otherwise.

commit 00faf3f370e7ce92f5ef51002b0ea42ef856e181
Author: José Fonseca <jfonseca@vmware.com>
Date:   Fri May 4 17:25:16 2012 +0100

    gallivm: Get the LLVM IR optimization passes before JIT compilation.

    MC-JIT engine compiles the module immediately on creation, so the optimization
    passes were being run too late.

    So now we create a target data layout from a string, that matches the
    ABI parameters reported by the compiler.

    The backend optimization passes were always been run, so the performance
    improvement is modest (3% on multiarb mesa demo).

    Reviewed-by: Roland Scheidegger <sroland@vmware.com>
    Reviewed-by: Brian Paul <brianp@vmware.com>

commit 40a43f4e2ce3074b5ce9027179d657ebba68800a
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed May 2 16:03:54 2012 +0200

    gallivm: (trivial) fix wrong define used in lp_build_pack2

    should fix stack-smashing crashes.

commit e6371d0f4dffad4eb3b7a9d906c23f1c88a2ab9e
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Apr 30 21:25:29 2012 +0200

    gallivm: add perf warnings when not using intrinsics with 256bit vectors

    Helper functions using integer sse2 intrinsics could split the vectors with AVX
    instead of using generic fallback (which should be faster).
    We don't actually expect to hit these paths (hence don't fix them up to actually
    do the vector splitting) so just emit warnings (for those functions where it's
    obvious doing split/intrinsic is faster than using generic path).
    Only emit warnings for 256bit vectors since we _really_ don't expect to hit
    arbitrary large vectors which would affect a lot more functions.
    The warnings do not actually depend on avx since the same logic applies to
    plain sse2 too (but of course again there's _really_ no reason we should hit
    these functions with 256bit vectors without avx).

commit 8a9ea701ea7295181e846c6383bf66a5f5e47637
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue May 1 20:37:07 2012 +0200

    gallivm: split vectors manually for avx in lp_build_pack2 (v2)

    There's 2 reasons for this:
    First, there's a llvm bug (fixed in 3.1) which generates tons of byte
    inserts/extracts otherwise, and second, more importantly, we want to use
    pack intrinsics instead of shuffles.
    We do this in lp_build_pack2 and not the calling code (aos sample path)
    because potentially other callers might find that useful too, even if
    for larger sequences of code using non-native vector sizes it might be
    better to manually split vectors.
    This should boost texture performance in the aos path considerably.
    v2: fix issues with intrinsics types with old llvm

commit 27ac5b48fa1f2ea3efeb5248e2ce32264aba466e
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue May 1 20:26:22 2012 +0200

    llvmpipe: refactor lp_build_pack2 (v2)

    prettify, and it's unnecessary to assert when there's no intrinsic due to
    unsupported bit width - the shuffle path will work regardless.
    In contrast lp_build_packs2, should only rely on lp_build_pack2 doing the
    clamping for element sizes for which there is a sse2 intrinsic.
    v2: fix bug spotted by Jose regarding the intrinsic type for packusdw
    on old llvm versions.

commit ddf279031f0111de4b18eaf783bdc0a1e47813c8
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue May 1 20:13:59 2012 +0200

    gallivm: add src width check in lp_build_packs2()

    not doing so would skip clamping even if no sse2 pack instruction is
    available, which is incorrect (in theory only, such widths would also always
    hit a (unnecessary) assertion in lp_build_pack2().

commit e7f0ad7fe079975eae7712a6e0c54be4fae0114b
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Apr 27 15:57:00 2012 +0200

    gallivm: (trivial) fix crash-causing typo for npot textures with avx

commit 28a9d7f6f655b6ec508c8a3aa6ffefc1e79793a0
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Apr 25 19:38:45 2012 +0200

    gallivm: (trivial) remove code mistakenly added twice.

commit d5926537316f8ff67ad0a52e7242f7c5478d919b
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Apr 24 21:16:15 2012 +0200

    gallivm: add a new avx aos sample path (v2)

    Try to avoid mixing float and int address calculations. This does texture wrap
    modes with floats, and then the offset calculations still with ints (because
    of lack of precision with floats, though we could do some effort to make it work
    with not too large (16MB) textures).
    This also handles wrap repeat mode with npot-sized textures differently than
    either the old soa or aos int path (likely way faster but untested).
    Otherwise the actual address wrap code is largely similar to the soa path (not
    quite the same as this one also has some int code), it should get used by avx
    soa sampling later as well but doesn't handle more complex address modes yet
    (this will also have the benefit that we can use aos sampling path for all
    texture address modes).
    Generated code for that looks reasonable, but still does not split vectors
    explicitly for fetch/filter which means still get hit by llvm (fixed upstream)
    which generates hundreds of pinsrb/pextrb instead of two shuffles.
    It is not obvious though if it's much of a win over just doing address calcs
    4-wide but with ints, even if it is definitely much less instructions on avx.
    piglit's texwrap seems to look exactly the same but doesn't test
    neither the non-normalized nor the npot cases.
    v2: fix comments, prettify based on Brian's and Jose's feedback.

commit bffecd22dea66fb416ecff8cffd10dd4bdb73fce
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Apr 19 01:58:29 2012 +0200

    gallivm: refactor aos lp_build_sample_image_nearest/linear

    split them up to separate address calculations and fetching/filtering.
    Need this for being able to do 8-wide float address calcs and 4-wide
    fetch/filter later (for avx). Plus the functions were very big scary monsters
    anyway (in particular lp_build_sample_image_linear).

commit a80b325c57529adddcfa367f96f03557725c4773
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Apr 16 17:17:18 2012 +0200

    gallivm: fix lp_build_resize when truncating width but expanding vector size

    Missed this case which I thought was impossible - the assertion for it was
    right after the division by zero...
    (AoS) texture sampling may ask us to do this, for things like 8 4x32int
    vectors to 1 32x8int vector conversion (eventually, we probably don't want
    this to happen).

commit f9c8337caa3eb185830d18bce8b95676a065b1d7
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Apr 14 18:00:59 2012 +0200

    gallivm: fix cube maps with larger vectors

    This makes the branchless cube face selection code work with larger vectors.
    Because the complexity is quite high (cannot really be improved it seems,
    per-face selection would reduce complexity a lot but this leads to errors
    unless the derivatives are calculated all from the same face which almost
    doubles the work to be done) it is still slower than the branching version,
    hence only enable this with large vectors.
    It doesn't actually do per-quad face selection yet (only makes sense with
    matching lod selection, in fact it will select the same face for all pixels
    based on the average of the first four pixels for now) but only different
    shuffles are required to make it work (the branching version actually should
    work with larger vectors too now thanks to the improved horizontal add but of
    course it cannot be extended to really select the face per-quad unless doing
    branching per quad).

commit 7780c58869fc9a00af4f23209902db7e058e8a66
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Mar 30 21:11:12 2012 +0100

    llvmpipe: (trivial) fix compiler warning

    and also clarify comment regarding availability of popcnt instruction.

commit a266dccf477df6d29a611154e988e8895892277e
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Mar 30 14:21:07 2012 +0100

    gallivm: remove unneeded members in lp_build_sample_context

    Minor cleanup, the texture width, height, depth aren't accessed in their
    scalar form anywhere. Makes it more obvious those values should probably be
    fetched already vectorized (but this requires more invasive changes)...

commit b678c57fb474e14f05e25658c829fc04d2792fff
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Mar 29 15:53:55 2012 +0100

    gallivm: add a helper for concatenating vectors

    Similar to the extract_range helper intended to get around slow code generated
    by llvm for 128bit insertelements.
    Concatenating two 128bit vectors this way will result in a single vinsertf128
    operation rather than two 64bit stores plus one 128bit load, though it might be
    mildly useful for other purposes as well.

commit 415ff228bcd0cf5e44a4c15350a661f0f5520029
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Mar 28 19:41:15 2012 +0100

    gallivm: add a custom 2x8f->1x16ub avx conversion path

    Similar to the existing 4x4f->1x16ub sse2 path, shaves off a couple
    instructions (min/max mostly) because it relies on pack intrinsics clamping.

commit 78c08fc89f8fbcc6dba09779981b1e873e2a0299
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Mar 28 18:44:07 2012 +0100

    gallivm: add avx arithmetic intrinsics

    Add all avx intrinsics for arithmetic functions (with the exception
    of the horizontal add function which needs another look).
    Seems to pass basic tests.

    Reviewed-by: José Fonseca <jfonseca@vmware.com>

commit a586caa2800aa5ce54c173f7c0d4fc48153dbc4e
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Mar 28 15:31:35 2012 +0100

    gallivm: add avx logic intrinsics

    Add the blend intrinsics for 8-wide float and 4-wide double vectors.
    Since we lack 256bit int instructions these are used for int vectors as well,
    though obviously not for byte or word element values.
    The comparison intrinsics aren't extended for avx since these are only used
    for pre-2.7 llvm versions.

commit 70275e4c13c89315fc2560a4c488c0e6935d5caf
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Mar 28 00:40:53 2012 +0100

    gallivm: new helper function for extract shuffles.

    Based on José's idea as we can need that in a couple places.
    Note that such shuffles should not be used lightly, since data layout
    of <4 x i8> is different to <16 x i8> for instance, hence might cause
    data rearrangement.

commit 4d586dbae1b0c55915dda1759d2faea631c0a1c2
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Mar 27 18:27:25 2012 +0100

    gallivm: (trivial) don't overallocate shuffle variable

    using wrong define meant huge array...

commit 06b0ec1f6d665d98c135f9573ddf4ba04b2121ad
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Mar 27 17:54:20 2012 +0100

    gallivm: don't do per-element extract/insert for vector element resize

    Instead of doing per-element extract/insert if the src vectors
    and dst vector differ in total size (which generates atrocious code)
    first change the src vectors size by using shuffles to destination
    vector size.
    We can still do better than that on AVX for packing to color buffer
    (by exploiting pack intrinsics characteristics hence eleminating the
    need for some clamps) but this already generates much better code.

    v2: incorporate feedback from José, Keith and use shuffle instead of
    bitcasts/extracts. Due to llvm deficiencies the latter cause all data
    to get moved to GPRs and back in pieces (even though the data in the
    regs actually stays the same...).

commit c9970d70e05f95d3f52fe7d2cd794176a52693aa
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Mar 23 19:33:19 2012 +0000

    gallivm: fix bug in simple position interpolation

    Accidental use of position attribute instead of just pixel coordinates.
    Caused failures in piglit glsl-fs-ceil and glsl-fs-floor.

commit d0b6fcdb008d04d7f73d3d725615321544da5a7e
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Mar 23 15:31:14 2012 +0000

    gallivm: fix emission of ceil opcode

    lp_build_ceil seems more appropriate than lp_build_trunc.
    This seems to be never hit though someone performs some ceil
    to floor magic.

commit d97fafed7e62ffa6bf76560a92ea246a1a26d256
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Mar 22 11:46:52 2012 +0000

    gallivm: new vectorized path for cubemap calculations

    should be faster when adapted to multiple quads as only selection masks need to be different.
    The code is more or less a per-pixel version adapted to only do it per quad.
    A per pixel version would be much simpler (could drop 2 selects, 6 broadcasts and the messy
    horizontal add of 3 vectors at the expense of only 2 more absolute value instructions -
    would also just work for arbitary large vectors).
    This version doesn't yet work with larger vectors because the horizontal add isn't adjusted
    to be able to work with 2x4 vectors (and also because face selection wouldn't be done per
    quad just per block though that would be only a correctness issue just as with lod selection).
    The downside is this code is quite a bit slower. On a Core2 it can be sped up by disabling the
    hw blend instructions for selection and using logicop fallbacks instead, but it is still slower
    than the old code, hence leave that in for now. Probably will chose one or the other version
    based on vector length in the end.

commit b375fbb18a3fd46859b7fdd42f3e9908ea4ff9a3
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Mar 21 14:42:29 2012 +0000

    gallivm: fix optimized occlusion query intrinsic name

commit a9ba0a3b611e48efbb0e79eb09caa85033dbe9a2
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Mar 21 16:19:43 2012 +0000

    draw,gallivm,llvmpipe: Call gallivm_verify_function everywhere.

commit f94c2238d2bc7383e088b8845b7410439a602071
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Mar 20 18:54:10 2012 +0000

    gallivm: optimize calculations for cube maps a bit

    this does some more vectorized calculations and uses horizontal adds if possible.
    A definite win with sse3 otherwise it doesn't seem to make much of a difference.
    In any case this is arithmetically identical, cannot handle larger vectors.
    Should be useful as a reference point against larger vector version later...

commit 21a2c1cf3c8e1ac648ff49e59fdc0e3be77e2ebb
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Mar 20 15:16:27 2012 +0000

    llvmpipe: slight optimization of occlusion queries

    using movmskps when available.
    While this is slightly better for cpus without popcnt we should
    really sum the vectors ourselves (it is also possible to cast to i4 before
    doing the popcnt but that doesn't help that much neither since llvm
    is using some optimized popcnt version for i32)

commit 5ab5a35f216619bcdf55eed52b0db275c4a06c1b
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Mar 20 13:32:11 2012 +0000

    llvmpipe: fix occlusion queries with larger vectors

    need to adjust casts etc.

commit ff95e6fdf5f16d4ef999ffcf05ea6e8c7160b0d5
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon Mar 19 20:15:25 2012 +0000

    gallivm: Restore optimization passes.

commit 57b05b4b36451e351659e98946dae27be0959832
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Mar 19 19:34:22 2012 +0000

    llvmpipe: use existing min2 macro

commit bc9a20e19b4f600a439f45679451f2e87cd4b299
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Mar 19 19:07:27 2012 +0000

    llvmpipe: add some safeguards against really large vectors

    As per José's suggestion, prevent things from blowing up if some cpu
    would have 1024bit or larger vectors.

commit 0e2b525e5ca1c5bbaa63158bde52ad1c1564a3a9
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Mar 19 18:31:08 2012 +0000

    llvmpipe: fix mask generation for uberwide vectors

    this was the only piece preventing 16-wide vectors from working
    (apart from the LP_MAX_VECTOR_WIDTH define that is), which is the maximum
    as we don't get more pixels in the fragment shader at once.
    Hence adjust that so things could be tested properly with that size
    even though there seems to be no practical value.

commit 3c8334162211c97f3a11c7f64e9e5a2a91ad9656
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Mar 19 18:19:41 2012 +0000

    llvmpipe: fix the simple interpolation method with larger vectors

    so both methods actually _really_ work now. Makes textures look
    nice with larger vectors...

commit 1cb0464ef8871be1778d43b0c56adf9c06843e2d
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Mar 19 17:26:35 2012 +0000

    llvmpipe: fix mask generation and position interpolation with 8-wide vectors

    trivial bugs, with these things start to look somewhat reasonable.
    Textures though have some swizzling issues it seems.

commit 168277a63ef5b72542cf063c337f2d701053ff4b
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Mar 19 16:04:03 2012 +0000

    llvmpipe: don't overallocate variables

    we never have more than 16 (stamp size) / 4 (minimum possible vector size).
    (With larger vectors those variables are still overallocated a bit.)

commit 409b54b30f81ed0aa9ed0b01affe15c72de9abd2
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Mar 19 15:56:48 2012 +0000

    llvmpipe: add some 32f8 formats to lp_test_conv

    Also add the ability to handle different sized vectors.

commit 55dcd3af8366ebdac0af3cdb22c2588f24aa18ce
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Mar 19 15:47:27 2012 +0000

    gallivm: handle different sized vectors in conversion / pack

    only fully generic path for now (extract/insert per element).

commit 9c040f78c54575fcd94a8808216cf415fe8868f6
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sun Mar 18 00:58:28 2012 +0100

    llvmpipe: fix harmless use of unitialized values

commit 551e9d5468b92fc7d5aa2265db9a52bb1e368a36
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Mar 16 23:31:21 2012 +0100

    gallivm: drop special path in extract_broadcast with different sized vectors

    Not needed, llvm can handle shuffles with different sized result vector just
    fine. Should hopefully generate the same code in the end, but simpler IR.

commit 44da531119ffa07a421eaa041f63607cec88f6f8
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Mar 16 23:28:49 2012 +0100

    llvmpipe: adapt interpolation for handling multiple quads at once

    this is still WIP there are actually two methods possible not quite
    sure what makes the most sense, so there's code for both for now:
    1) the iterative method as used before (compute attrib values at upper left
    corner of stamp and upper left corner of each quad initially).
    It is improved to handle more than one quad at once, and also do some more vectorized
    calculations initially for slightly better code - newer cpus have full throughput with
    4 wide float vectors, hence don't try to code up a path which might be faster if there's
    just one channel active per attribute.
    2) just do straight interpolation for each pixel.
    Method 2) is more work per quad, but less initially - if all quads are executed
    significantly more overall though. But this might change with larger vector lengths.
    This method would also be needed if we'd do some kind of active quad merging when
    operating on multiple quads at once.
    This path contains some hack to force llvm to generate better code, it is still far
    from ideal though, still generates far too many unnecessary register spills/reloads.
    Both methods should work with different sized vectors.
    Not very well tested yet, still seems to work with four-wide vectors, need changes
    elsewhere to be able to test with wider vectors.

commit be5d3e82e2fe14ad0a46529ab79f65bf2276cd28
Author: José Fonseca <jfonseca@vmware.com>
Date:   Fri Mar 16 20:59:37 2012 +0000

    draw: Cleanup.

commit f85bc12c7fbacb3de2a94e88c6cd2d5ee0ec0e8d
Author: José Fonseca <jfonseca@vmware.com>
Date:   Fri Mar 16 20:43:30 2012 +0000

    gallivm: More module compilation refactoring.

commit d76f093198f2a06a93b2204857e6fea5fd0b3ece
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Mar 15 21:29:11 2012 +0000

    llvmpipe: Use gallivm_compile/free_function() in linear code.

    Should had been done before.

commit 122e1adb613ce083ad739b153ced1cde61dfc8c0
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Mar 13 14:47:10 2012 +0100

    llvmpipe: generate partial pixel mask for multiple quads

    still works with one quad, cannot be tested yet with more
    At least for now always fixed order with multiple quads.

commit 4c4f15081d75ed585a01392cd2dcce0ad10e0ea8
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Mar 8 22:09:24 2012 +0100

    llvmpipe: refactor state setup a bit

    Refactor to make it easier to emit (and potentially later fetch in fs)
    coefficients for multiple attributes at once.
    Need to think more about how to make this actually happen however, the
    problem is different attributes can have different interpolation modes,
    requiring different handling in both setup and fs (though linear and
    perspective handling is close).

commit 9363e49722ff47094d688a4be6f015a03fba9c79
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Mar 8 19:23:23 2012 +0100

    llvmpipe: vectorize tri offset calc

    cuts number of instructions in quad-offset-factor from 107 to 75.
    This code actually duplicated the (scalar) code calculating the determinant
    except it used different vertex order (leading to different sign but it doesn't
    matter) hence llvm could not have figured out it's the same (of course with
    determinant vectorized in the other place that wouldn't have worked any longer
    neither).
    Note this particular piece doesn't actually vectorize well, not many arithmetic
    instructions left but tons of shuffle instructions...
    Probably would need to work on n tris at a time for better vectorization.

commit 63169dcb9dd445c94605625bf86d85306e2b4297
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Mar 8 03:11:37 2012 +0100

    llvmpipe: vectorize some scalar code in setup

    reduces number of arithmetic instructions, and avoids loading
    vector x,y values twice (once as scalars once as vectors).
    Results in a reduction of instructions from 76 to 64 in fs setup for glxgears
    (16%) on a cpu with sse41.
    Since this code uses vec2 disguised as vec4, on old cpus which had physical
    64bit sse units (pre-Core2) it probably is less of a win in practice (and if
    you have no vectors you can only hope llvm eliminates the arithmetic for
    unneeded elements).

commit 732ecb877f951ab89bf503ac5e35ab8d838b58a1
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Mar 7 00:32:24 2012 +0100

    draw: fix clipping

    bug introduced by 4822fea3f0440b5205e957cd303838c3b128419c broke
    clipping pretty badly (verified with lineclip test)

commit ef5d90b86d624c152d200c7c4056f47c3c6d2688
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Mar 6 23:38:59 2012 +0100

    draw: don't store vertex header per attribute

    storing the vertex header once per attribute is totally unnecessary.
    Some quick look at the generated assembly says llvm in fact cannot optimize
    away the additional stores (maybe due to potentially aliasing pointers
    somewhere).
    Plus, this makes the code cleaner and also allows using a vector "or"
    instead of scalar ones.

commit 6b3a5a57b0b9850854cfbd7b586e4e50102dda71
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Mar 6 19:11:01 2012 +0100

    draw: do the per-vertex "boolean" clipmask "or" with vectors

    no point extracting the values and doing it per component.
    Doesn't help that much since we still extract the values elsewhere anyway.

commit 36519caf1af40e4480251cc79a2d527350b7c61f
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Mar 2 22:27:01 2012 +0100

    gallivm: fix lp_build_extract_broadcast with different sized vectors

    Fix the obviously wrong argument, so it doesn't blow up.

commit 76d0ac3ad85066d6058486638013afd02b069c58
Author: José Fonseca <jfonseca@vmware.com>
Date:   Fri Mar 2 12:16:23 2012 +0000

    draw: Compile per module and not per function (WIP).

    Enough to get gears w/ LLVM draw + softpipe to work on AVX doing:

      GALLIUM_DRIVER=softpipe SOFTPIPE_USE_LLVM=yes glxgears

    But still hackish -- will need to rethink and refactor this.

commit 78e32b247d2a7a771be9a1a07eb000d1e54ea8bd
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Feb 29 12:01:05 2012 +0000

    llvmpipe: Remove lp_state_setup_fallback.

    Never used.

commit 6895d5e40d19b4972c361e8b83fdb7eecda3c225
Author: José Fonseca <jfonseca@vmware.com>
Date:   Mon Feb 27 19:14:27 2012 +0000

    llvmpipe: Don't emit EMMS on x86

    We already take precautions to ensure that LLVM never emits MMX code.

commit 4822fea3f0440b5205e957cd303838c3b128419c
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Feb 29 15:58:19 2012 +0100

    draw: modifications for larger vector sizes

    We want to be able to use larger vectors especially for running the vertex
    shader. With this patch we build soa vectors which might have a different
    length than 4.
    Note that aos structures really remain the same, only when aos structures
    are converted to soa potentially different sized vectors are used.
    Samplers probably don't work yet, didn't look at them.
    Testing done:
    glxgears works with both 128bit and 256bit vectors.

commit f4950fc1ea784680ab767d3dd0dce589f4e70603
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Feb 29 15:51:57 2012 +0100

    gallivm: override native vector width with LP_NATIVE_VECTOR_WIDTH env var for debug

commit 6ad6dbf0c92f3bf68ae54e5f2aca035d19b76e53
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Feb 29 15:51:24 2012 +0100

    draw: allocate storage with alignment according to native vector width

commit 7bf0e3e7c9bd2469ae7279cabf4c5229ae9880c1
Author: José Fonseca <jfonseca@vmware.com>
Date:   Fri Feb 24 19:06:08 2012 +0000

    gallivm: Fix comment grammar.

    Was missing several words. Spotted by Roland.

commit b20f1b28eb890b2fa2de44a0399b9b6a0d453c52
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 19:22:09 2012 +0000

    gallivm: Use MC-JIT on LLVM 3.1 + (i.e, SVN)

    MC-JIT

    Note: MC-JIT is still WIP. For this to work correctly it requires
    LLVM changes which are not yet upstream.

commit b1af4dfcadfc241fd4023f4c3f823a1286d452c0
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Feb 23 20:03:15 2012 +0100

    llvmpipe: use new lp_type_width() helper in lp_test_blend

commit 04e0a37e888237d4db2298f31973af459ef9c95f
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Feb 23 19:50:34 2012 +0100

    llvmpipe: clean up lp_test_blend a little

    Using variables just sized and aligned right makes it a bit more obvious
    what's going on.
    The test still only tests vector length 4.
    For AoS anything else probably isn't going to work.
    For SoA other lengths should work (at least with floats).

commit e61c393d3ec392ddee0a3da170e985fda885a823
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 17:48:30 2012 +0000

    gallivm: Ensure vector width consistency.

    Instead of assuming that everything is the max native size.

commit 330081ac7bc41c5754a92825e51456d231bf84dd
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 17:44:14 2012 +0000

    draw: More simd vector width consistency fixes.

commit d90ca002753596269e37297e2e6c139b19f29f03
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 17:43:00 2012 +0000

    gallivm: Remove unused lp_build_int32_vec4_type() helper.

commit cae23417824d75869c202aaf897808d73a2c1db0
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Feb 23 17:32:16 2012 +0100

    gallivm: use global variable for native vector width instead of define

    We do not know the simd extensions (and hence the simd width we should use)
    available at compile time.
    At least for now keep a define for maximum vector width, since a global
    variable obviously can't be used to adjust alignment of automatic stack
    variables.
    Leave the runtime-determined value at 128 for now in all cases.

commit 51270ace6349acc2c294fc6f34c025c707be538a
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 15:41:02 2012 +0000

    gallivm: Add a hunk inadvertedly lost when rebasing.

commit bf256df9cfdd0236637a455cbaece949b1253e98
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 14:24:23 2012 +0000

    llvmpipe: Use consistent vector width in depth/stencil test.

commit 5543b0901677146662c44be2cfba655fd55da94b
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 14:19:59 2012 +0000

    draw: Use a consistent the vector register width.

    Instead of 4x32 sometimes, LP_NATIVE_VECTOR_WIDTH other times.

commit eada8bbd22a3a61f549f32fe2a7e408222e5c824
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 12:08:04 2012 +0000

    gallivm: Remove garbagge collection.

    MC-JIT will require one compilation per module (as opposed to one
    compilation per function), therefore no state will be shared,
    eliminating the need to do garbagge collection.

commit 556697ea0ed72e0641851e4fbbbb862c470fd7eb
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 10:33:41 2012 +0000

    gallivm: Move all native target initialization to lp_set_target_options().

commit c518e8f3f2649d5dc265403511fab4bcbe2cc5c8
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 09:52:32 2012 +0000

    llvmpipe: Create one gallivm instance for each test.

commit 90f10af8920ec6be6f2b1e7365cfc477a0cb111d
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 09:48:08 2012 +0000

    gallivm: Avoid LLVMAddGlobalMapping() in lp_bld_assert().

    Brittle, complex, and unecesary. Just use function pointer constant.

commit 98fde550b33401e3fe006af59db4db628bcbf476
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 09:21:26 2012 +0000

    gallivm: Add a lp_build_const_func_pointer() helper.

    To be reused in all places where we want to call C code.

commit 6cfedadb62c2ce5af8d75969bc95a607f3ece118
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 09:44:41 2012 +0000

    gallivm: Cleanup/simplify lp_build_const_string_variable.

    - Move to lp_bld_const where it belongs
    - Rename to lp_build_const_string
    - take the length from the argument (and don't count the zero terminator twice)
    - bitcast the constant to generic i8 *

commit db1d4018c0f1fa682a9da93c032977659adfb68c
Author: José Fonseca <jfonseca@vmware.com>
Date:   Thu Feb 23 11:52:17 2012 +0000

    gallivm: Set NoFramePointerElimNonLeaf to true where supported.

commit 088614164aa915baaa5044fede728aa898483183
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Feb 22 19:38:47 2012 +0100

    llvmpipe: pass in/out pointers rather scalar floats in lp_bld_arit

    we don't want llvm to potentially optimize away the vectors (though it doesn't
    seem to currently), plus we want to be able to handle in/out vectors of arbitrary
    length.

commit 3f5c4e04af8a7592fdffa54938a277c34ae76b51
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Feb 21 23:22:55 2012 +0100

    gallivm: fix lp_build_sqrt() for vector length 1

    since we optimize away vectors with length 1 need to emit intrinsic
    without vector type.

commit 79d94e5f93ed8ba6757b97e2026722ea31d32c06
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Feb 22 17:00:46 2012 +0000

    llvmpipe: Remove lp_test_round.

commit 81f41b5aeb3f4126e06453cfc78990086b85b78d
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Feb 21 23:56:24 2012 +0100

    llvmpipe: subsume lp_test_round into lp_test_arit

    Much simpler, and since the arguments aren't passed as 128bit values can run
    on any arch.
    This also uses the float instead of the double versions of the c functions
    (which probably was the intention anyway).
    In contrast to lp_test_round the output is much less verbose however.
    Tested vector width of 32 to 512 bits - all pass except 32 (length 1) which
    crashes in lp_build_sqrt() due to wrong type.

    Signed-off-by: José Fonseca <jfonseca@vmware.com>

commit 945b338b421defbd274481d8c4f7e0910fd0e7eb
Author: José Fonseca <jfonseca@vmware.com>
Date:   Wed Feb 22 09:55:03 2012 +0000

    gallivm: Centralize the function compilation logic.

    This simplifies a lot of code.

    Also doing this in a central place will make it easier to carry out the
    changes necessary to use MC-JIT in the future.

gallivm: Fix typo in explicit derivative shuffle.

Trivial.

draw: make DEBUG_STORE work again

adapt to lp_build_printf() interface changes

Reviewed-by: José Fonseca <jfonseca@vmware.com>

draw: get rid of vecnf_from_scalar()

just use lp_build_broadcast directly (cannot assign a name but don't really
need it, vecnf_from_scalar() was producing much uglier IR due to using
repeated insertelement instead of insertelement+shuffle).

Reviewed-by: José Fonseca <jfonseca@vmware.com>

llvmpipe: fix typo in complex interpolation code

Fixes position interpolation when using complex mode
(piglit fp-fragment-position and similar)

Reviewed-by: José Fonseca <jfonseca@vmware.com>

draw: fix clipvertex/position storing again

This appears to be the result of a bad merge.
Fixes piglit tests relying on clipping, like a lot of the interpolation tests.

Reviewed-by: José Fonseca <jfonseca@vmware.com>

gallivm: Fix explicit derivative manipulation.

Same counter variable was being used in two nested loops. Use more
meanigful variable names for the counter to fix and avoid this.

gallivm: Prevent buffer overflow in repeat wrap mode for NPOT.

Based on Roland's patch, discussion, and review .

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

gallivm: Fix dims for TGSI_TEXTURE_1D in emit_tex.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

gallivm: Fix explicit volume texture derivatives.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

gallivm: fix 1d shadow texture sampling

Always r coordinate is used, hence need 3 coords not two
(the second one is unused).

Reviewed-by: José Fonseca <jfonseca@vmware.com>

gallivm: Enable AVX support without MCJIT, where available.

For now, this just enables AVX on Windows for testing.  If the code is
stable then we might consider prefering the old JIT wherever possible.

No change elsewhere.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2012-07-17 13:42:39 +01:00
José Fonseca ba9c1773d7 gallivm: Allow to force nearest filtering on a per-axis basis.
Experimental code, not really used yet.
2012-07-17 13:42:39 +01:00
José Fonseca 6dddd18480 draw,gallivm: Fix draw_get_shader_param.
- Use LLVM limits when LLVM is being used, instead of TGSI limits
- Provide draw_get_shader_param_no_llvm for when llvm is never used (softpipe)
- Eliminate several of the hacks around draw shader caps in several drivers

Unfortunately the hack for PIPE_MAX_VERTEX_SAMPLERS is still necessary.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
2012-07-13 13:01:51 +01:00
José Fonseca 978807ef01 gallivm: Use %.9g to print floats.
So that we can see them in their full denormalized glory.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2012-07-12 21:14:35 +01:00
José Fonseca e75fe7ba08 gallivm: Cleanup the 4 x float -> 16 ub special path in lp_build_conv.
No behaviour change intended.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2012-07-02 12:13:52 +01:00
José Fonseca 638779e445 gallivm: Refactor lp_build_broadcast(_scalar) to share code.
Doesn't really change the generated assembly, but produces more compact IR,
and of course, makes code more consistent.

Reviewed-by: Brian Paul <brianp@vmware.com>
2012-06-28 20:20:34 +01:00
Johannes Obermayr bf679ce1dc gallivm: Fix potential buffer overflowing in strncat.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-06-28 11:47:23 +01:00
James Benton 789436f1e0 gallivm: Added a generic lp_build_print_value which prints a LLVMValueRef.
Updated lp_build_printf to share common code.
Removed specific lp_build_print_vecX.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2012-06-27 11:16:18 +01:00
Olivier Galibert 27e94ba4ea u2f_emit: Fix type parameter in LLVM call.
The type is the destination type (i.e. float vector) and not the
source type.  Fixes piglit fs-{in,de}crement-uint.

Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-06-26 16:55:40 +01:00
Olivier Galibert c790c2c759 llvmpipe: Add vertex id support.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2012-06-19 14:40:44 -06:00
Olivier Galibert 46931ecf48 llvmpipe: Simplify and fix system variables fetch.
The system array values concept doesn't really because it expects the
system values to be fixed per call, which is wrong for gl_VertexID and
iffy for gl_SampleID.  So this patch does two things:

- kill the array, have emit_fetch_system_value directly pick the
  values it needs (only gl_InstanceID for now, as the previous code)

- correctly handle the expected type in emit_fetch_system_value

Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2012-06-19 14:40:44 -06:00
Paul Berry 9d57d483cb gallium: Add TGSI_OPCODE_F2U to gallivm backend.
Note: for the moment TGSI_OPCODE_F2U is implemented using
lp_build_itrunc() (the same function used to implement
TGSI_OPCODE_F2I).  In the long run, we should create an
lp_build_utrunc() function to do the proper conversion.  But this
should allow us to limp along with mostly correct behaviour for now.
2012-06-15 08:58:55 -07:00
Roland Scheidegger dfbb18bdb5 gallivm: Fix calculating rho for 3d textures for the single-quad case
Discovered by accident, this looks like a very old typo bug.

Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-06-08 17:46:57 +01:00
James Benton a3d4af0c00 gallivm: Fixed erroneous optimisation in lp_build_min/max.
Previously assumed normalised was 0 to 1, but it can be -1 to 1
if type is signed.
Tested with lp_test_conv and lp_test_format, reduced errors.

Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-05-21 20:24:47 +01:00
James Benton fdeb0394cb gallivm: Compensate for lp_const_offset in lp_build_conv.
Fixing a /*FIXME*/ to remove errors in integer conversion in lp_build_conv.
Tested using lp_test_conv and lp_test_format, reduced errors.

Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-05-21 20:24:46 +01:00
James Benton f89b1f4ba4 gallivm: Fixed overflow in lp_build_clamped_float_to_unsigned_norm.
Tested with lp_test_conv and lp_test_format, reduced errors.

Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-05-21 20:24:44 +01:00
José Fonseca 00eb74b275 Fix fetching integer inputs. 2012-05-18 00:55:13 +01:00
Olivier Galibert 5d10d75727 llvmpipe: Implement TXQ.
Piglits test for fragment shaders pass, vertex shaders fail.  The
actual failure seems to be in the interpolators, and not the
textureSize query.

Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jose.r.fonseca@gmail.com>
2012-05-18 00:27:28 +01:00
José Fonseca 563489e5c9 gallivm: Add MCRegisterInfo.h to silence benign warnings about missing implementation.
Trivial.
2012-05-15 23:48:24 +01:00
José Fonseca 9fb4eef6a1 gallivm: Fix lp_build_sgn for normalized/fixed-point integers.
These types got broken with the recent commit that fixed lp_build_sgn
for negative integers.
2012-05-15 22:39:24 +01:00
José Fonseca c95cea50a9 gallivm: Fix lp_build_const_xxx for negative integers.
Do proper rounding.

Thanks to Olivier Galibert for investigating this.
2012-05-15 22:39:24 +01:00
José Fonseca 23c0d469e5 gallivm: Fix copy'n'paste typo bug translating CEIL opcode.
Trivial.
2012-05-11 16:44:42 +01:00
Dave Airlie 729d914824 gallivm: implement iabs/issg opcode.
Reimplemented by Olivier Galibert <galibert@pobox.com>

Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-05-09 08:26:55 +01:00
Vadim Girlin 95ed0e9b6b radeon/llvm: add support for TXQ/TXF/DDX/DDY instructions
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2012-05-08 01:18:22 +04:00
Brian Paul e039fd079b gallivm: fix comment typo 2012-05-04 08:07:58 -06:00
José Fonseca 494619ebac gallivm: Use debug_printf in lp_build_printf.
So that its output can be seen on GUI window apps.

Tested-by: James Benton <jbenton@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2012-05-02 10:24:34 +01:00
José Fonseca 9ad2cb1885 gallivm: Avoid LLVMAddGlobalMapping() in lp_bld_assert().
Brittle, complex, and unecesary. Just use function pointer constant.
2012-05-02 10:24:34 +01:00
José Fonseca 6cd76b800b gallivm: Add a lp_build_const_func_pointer() helper.
To be reused in all places where we want to call C code.
2012-05-02 10:24:34 +01:00
José Fonseca 0005bd9da2 gallivm: Cleanup/simplify lp_build_const_string_variable.
- Move to lp_bld_const where it belongs
- Rename to lp_build_const_string
- take the length from the argument (and don't count the zero terminator twice)
- bitcast the constant to generic i8 *
2012-05-02 10:24:34 +01:00
James Benton c23fd547c0 gallivm: Added lp_build_const_mask_aos_swizzled
Allows the creation of const aos masks which have the mask swizzled
to match the correct format.

Updated existing mask creation code to use the swizzled version where
necessary (tgsi register masks and llvmpipe aos blending).

Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-05-02 10:24:34 +01:00
José Fonseca 7d1f414103 gallivm: Move loop var declaration to beginning of scope. 2012-05-02 10:24:33 +01:00
James Benton f64fe7d333 gallivm: added a debug function which allows llvm to print vectors of 16 unsigned ints
This is useful for debugging the linear llvm path as it handles pixels in this format

Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-05-02 10:24:33 +01:00
James Benton 9bc58d941a llvmpipe: Check when a shader does not satisfy 0 < imm < 1.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-05-02 10:23:21 +01:00
James Benton c426e63aa0 gallivm: fixed memory leak in lp_build_tgsi_aos
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2012-05-02 10:13:00 +01:00
James Benton 85d09d1c61 gallivm: added aligned pointer get/set 2012-05-02 10:12:48 +01:00
James Benton 630fa26886 gallivm: llvm c-style for loops, allows us to create loops with conditions on entry, rather than condition check on loop 2012-05-02 10:12:48 +01:00
José Fonseca 914244e59d gallivm: Use lp_build_alloca instead of LLVMBuildAlloca on the loop limiter.
To ensure that the alloca is at the top of the function body, otherwise
LLVM will not eliminate them, causing stack misalignment on 32bits.

Reviewed-by: James Benton <jbenton@vmware.com>
2012-04-25 18:09:38 +01:00
James Benton cf68959f99 gallivm: Updated lp_build_log2_approx to use a more accurate polynomial.
Tested with lp_test_arit with 100% passes and piglit tests with 100%
pass for log but some tests still fail for pow.

Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-04-05 20:34:11 +01:00
James Benton 7c639feb2f gallivm: Updated lp_build_polynomial to compute odd and even terms separately to decrease data dependency for faster runtime.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-04-05 20:32:54 +01:00
Vinson Lee 4f513002f6 gallivm: Pass in a MCInstrInfo to createMCInstPrinter on llvm-3.1.
llvm-3.1svn r153860 makes MCInstrInfo available to the MCInstPrinter.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2012-04-03 10:55:45 -07:00
James Benton 5db9d76a6a gallivm: Maximum loop iterations
Limits maximum loop iterations in a TGSI shader to prevent infinite
loops from occurring, any iteration in any loop counts towards this
limit

Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-04-03 10:11:27 +01:00
José Fonseca d312b224b6 gallivm: Simplify/reorder minimax helper. 2012-04-03 09:12:47 +01:00
Vinson Lee a7b8e16dc6 gallivm: Fix method overriding in raw_debug_ostream.
Use matching type qualifers to avoid method hiding.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2012-03-28 22:46:17 -07:00
ojab 60b58822f0 gallivm: Use InitializeNativeTargetDisassembler().
To initialize only native LLVM Disassembler on LLVM >= 3.1.

Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-03-27 16:28:30 +01:00
Vinson Lee fe34006908 gallivm: Fix createOProfileJITEventListener namespace with llvm-3.1.
llvm-3.1svn r152620 refactored the OProfile profiling code.
createOProfileJITEventListener was moved from the llvm namespace to the
llvm::JITEventListener namespace.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2012-03-14 09:30:40 -07:00
Christian König 461c34c0cb gallivm: add support for R8G8_R8B8 and G8R8_B8R8 formats
Just to keep lp_test_format happy.

Signed-off-by: Christian König <deathsimple@vodafone.de>
2012-03-09 12:43:27 +01:00
Vinson Lee 1633dcd890 gallivm: Pass in a MCRegisterInfo to MCInstPrinter on llvm-3.1.
llvm-3.1svn r152043 changes createMCInstPrinter to take an additional
MCRegisterInfo argument.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2012-03-06 10:31:12 -08:00
Roland Scheidegger da5e9fce47 gallivm: fix floating type in lp_build_mod helper
untested, but cannot have worked before.
2012-03-05 19:09:56 +01:00
Vinson Lee 834f515988 Revert "gallivm: Change getExtent and readByte to non-const with llvm-3.1."
This reverts commit d5a6c17254.

llvm-3.1svn r151687 makes MemoryObject accessor members const again.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2012-03-01 10:36:06 -08:00
José Fonseca 8c34a41278 gallivm: Update comments and prototype of vector-selects.
No runtime behavior change. As vector selects are still not very well
supported by LLVM.
2012-03-01 06:00:48 +00:00
Dave Airlie 579ccae73d gallivm: add major integer opcodes to the tgsi action handler
This adds support for all the opcodes needed for native integer
support with GLSL 1.20 enabled, and some of the ones for GLSL1.30
support.

I've split them between non-cpu and cpu along the same lines
Tom's code did for the other ones I think, but I'm open to review
on which ones should go where.

With instance ids fixed I get no regressions on my box here
with LLVM 2.8, will test with later LLVMs as well.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-28 15:56:18 +00:00
Dave Airlie e2a2b33544 gallivm: drop deprecated opcodes
These are integer opcodes not deprecated ones.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-28 15:53:16 +00:00
Dave Airlie 2a76609681 gallivm: only do rcp/mul for floating
rcp asserts on type.floating so don't go passing non-floating
things into it.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-28 15:52:36 +00:00
Dave Airlie a46548e0ef gallivm: add frem support to the lp_build_mod helper.
for completeness.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-28 10:43:14 +00:00
Dave Airlie aec11e4daa gallivm: add bitarit xor and not ops.
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-28 10:42:17 +00:00
Dave Airlie 4ffc8b9ae4 gallivm: add integer and unsigned mod arit functions. (v2)
use a single entry point, as per Jose's suggestion.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-28 10:31:49 +00:00
José Fonseca a206c4cd69 gallivm: Fix TGSI_OPCODE_ARR's translation.
Like TGSI_OPCODE_ARL, destination should be an integer.

This fixes invalid LLVM IR on an internal state tracker (currently Mesa
never emits this opcode).

In the future consider making ADDR register also a integer-as-float array,
like all other register kinds, or simply replace ADDR & ARR/ARL with
integer temp and instructions.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2012-02-21 08:23:20 +00:00
José Fonseca dbadd39508 llvmpipe: Don't assume vector is 4 wide in lp_build_sin()/lp_build_cos()
Reviewed-by: Dave Airlie <airlied@redhat.com>
2012-02-20 17:07:22 +00:00
Dave Airlie 7199b0b681 gallivm: fetch immediates to correct type (v2)
Fetch float/uint/int immediates.

v2: bitcast to uint/int to floats as per Jose's suggestions.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-17 17:39:02 +00:00
Dave Airlie 117a0e91af gallivm: enable stores of integer types. (v2) + fix ARL
Infer from the operand the type of value to store.
MOV is untyped but we use the float store path.

v2: make MOV use float store path.

I've had to squash merge the ARL fix to be stored
as an integer in here to avoid regressions in a number
of piglit tests.

From now on ARL stores to an integer just like HW does.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-17 17:39:02 +00:00
Dave Airlie 141f2c2fc9 gallivm: enable fetch for integer opcodes. (v2)
The infers the type of data required using the opcode,
and casts the input to the appropriate type.

So far this only handles non-indirect constant and temporaries.

v2: as per Jose suggestion, fetch immediates via floats

Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-17 17:39:02 +00:00
Dave Airlie 66461aa249 gallivm: add uint/int bld to the base builder. (v2)
These are used inside the action handlers for the integer opcodes.

v2: use uint_bld/int_bld, drop higher level uint_bld.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-17 17:39:02 +00:00
Dave Airlie f667a6f3ce gallivm: fix build gather to take a bld context
Then pass the correct build context to it.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-17 17:39:01 +00:00
Dave Airlie 639fbe2e75 gallivm: pass build context to exec_mask_store.
For now just pass the current context, but when we want to
store int or unsigned we need to pass those later.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-02-17 17:39:01 +00:00
José Fonseca 9be0f9b0e4 gallivm: Initialize x86 disassembler on x86_64 too. 2012-02-17 17:22:23 +00:00
Stéphane Marchesin d2c54fb522 gallivm: Replace architecture test with PIPE_ARCH_*
X86Target is a variable, and therefore isn't defined at compile time. So
 LLVM_NATIVE_ARCH == X86Target
is translated into
 0 == 0
and since X86 is first, we always pick it.

Therefore we replace the logic with PIPE_ARCH_*.

https://bugs.freedesktop.org/show_bug.cgi?id=45420
2012-02-12 16:32:15 -08:00
Vinson Lee d5a6c17254 gallivm: Change getExtent and readByte to non-const with llvm-3.1.
Fix build with llvm-3.1svn.

llvm-3.1svn r149918 changed BufferMemoryObject::getExtent and
BufferMemoryObject::readByte from const member functions to non-const
member functions in include/llvm/Support/MemoryObject.h.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2012-02-10 23:24:48 -08:00
ojab db312b62f2 gallivm: Fix LLVM-2.7 build.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2012-02-02 09:04:10 +00:00
José Fonseca 54fd495c41 gallivm: Remove MSVC RT hack.
The hack never worked reliably, and docs/llvmpipe.html is quite clear on
the requirement of matching CRT when building LLVM and Mesa already.
2012-02-02 09:04:10 +00:00
ojab 97329efc5f Initialize only native LLVM Disassembler.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-01-31 08:11:24 +00:00
José Fonseca 647ca47cc3 gallivm: Don't use C99 member initializers. 2012-01-30 19:05:58 +00:00
José Fonseca dde807b9dc gallivm: Move declaration before code. 2012-01-30 18:59:29 +00:00
Tom Stellard bc2875aa48 gallivm: Add a new interface for doing TGSI->LLVM conversions
lp_bld_tgsi_soa.c has been adapted to use this new interface, but
lp_bld_tgsi_aos.c has only been partially adapted, since nothing in
gallium currently uses it.

v2:
- Rename lp_bld_tgsi_action.[ch] => lp_bld_tgsi_action.[ch]
- Initialize tgsi_info in lp_bld_tgsi_aos.c
- Fix copyright dates
2012-01-30 13:37:01 -05:00
Tom Stellard 82b71db03d gallium: Move duplicated helper macros to tgsi_exec.h 2012-01-30 13:37:00 -05:00
Tom Stellard 9ee1bcf7a5 gallium: Unify defines of CHAN_[XYZW] in tgsi_exec.h 2012-01-30 13:37:00 -05:00
Tom Stellard 5204974462 gallivm: Add function lp_bld_gather_values() 2012-01-30 13:37:00 -05:00
Brian Paul 8b3c99a5eb gallivm: Swizzle constants into the right AoS ordering.
Constants array is always assumed to be RGBA, which means we need to
swizzle the constant elements into place to match the AoS ordering
(e.g., BGRA) that was passed to lp_build_tgsi_aos().

Signed-off-by: José Fonseca <jfonseca@vmware.com>
2012-01-27 18:25:32 +00:00
Tom Stellard 9611237051 gallivm: Allow target specific intrinsics in lp_declare_intrinsic()
Target specific intrinsics are also prefixed with llvm, so this assert
was preventing us from using them.
2012-01-13 11:45:49 -05:00
Brian Paul 85b5dac705 tgsi: consolidate TGSI string arrays in new tgsi_strings.h
There was some duplication between the tgsi_dump.c and tgsi_text.c
files.  Also use some static assertions to help catch errors when
adding new TGSI values.

v2: put strings in tgsi_strings.c file instead of the .h file.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2012-01-05 09:01:43 -07:00
Lauri Kasanen 2eafd07323 gallivm: Close a memory leak
Hi all

This fixes a memory leak of 32 bytes on exit.

From 924f8fdccb41b011f372bc57252005bcdb096105 Mon Sep 17 00:00:00 2001
From: Lauri Kasanen <curaga@operamail.com>
Date: Thu, 22 Dec 2011 21:28:33 +0200
Subject: [PATCH] gallivm: Close a memory leak

As reported by "valgrind --leak-check=full glxgears".

Signed-off-by: Lauri Kasanen <curaga@operamail.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
2011-12-22 23:03:18 +00:00
Vinson Lee 95aa0e5d84 gallivm: Fix build with llvm-3.1svn.
llvm-3.1svn r145714 moved global variables into a new TargetOptions
class. TargetMachine constructor now needs a TargetOptions object as
well.

Signed-off-by: Vinson Lee <vlee@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2011-12-16 21:22:39 -08:00
José Fonseca 6cf7245f69 llvmpipe: Trim the fragment shader cached based on LLVM IR instruction count.
Number of fragment shader variants is not very representative of the
memory used by LLVM, neither is number of shader instructions, as often
texture sampling constitutes most of the generated code.

This change adds an additional trim criteria: least recently used
fragment shader variants will be freed until the total number of LLVM IR
instruction falls below a specified threshold.

Reviewed-by: Brian Paul <brianp@vmware.com>
2011-12-08 17:59:33 +00:00
José Fonseca e21c5157b6 gallivm: Remove duplicate statement.
ary_ge_arx_arz is already set earlier.

Reviewed-by: Brian Paul <brianp@vmware.com>
2011-11-14 10:06:00 +00:00
José Fonseca 34930facfe gallivm: Include stddef.h before the LLVM C++ headers.
Necessary with build against LLVM 2.6, with recent gcc, as LLVM headers
depend on ptrdiff_t but don't properly include stddef.h
2011-11-14 10:06:00 +00:00
Dave Airlie 0d8deb5bc9 llvmpipe: fix typo in the depth sampling aos code.
Just found by reading llvmpipe code for no great reason.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-11-06 22:32:04 +00:00
Christian Inci 3031708e64 gallivm: change sys::getHostTriple to sys::getDefaultTargetTriple for LLVM >= 0x0301
LLVM change r143502

Signed-off-by: José Fonseca <jose.r.fonseca@gmail.com>
2011-11-06 07:41:10 +00:00
José Fonseca 0f26c6ae3f llvmpipe: Remove unsed variables. 2011-10-31 19:40:54 +00:00
Brian Paul 903a14ed91 gallivm: added lp_build_print_ivec4() function 2011-10-23 10:09:33 -06:00
José Fonseca e1e03ce492 gallivm: Eliminate tgsi_util_get_full_src_register_sign_mode call.
It complicates more than it simplifies, now that there's only one negate
bit on TGSI registers.
2011-10-16 14:18:42 +01:00
José Fonseca e9c1d87ce7 llvmpipe: Use lp_build_ifloor_fract for exp2 calculation.
Instead of separate ifloor / fract calls.

No change for SSE4.1 code, but less FP<->SI conversions on non SSE4.1
systems.
2011-10-16 14:18:41 +01:00
Brian Paul 51002968c9 gallivm: fix build with llvm 3.0svn
https://bugs.freedesktop.org/show_bug.cgi?id=41065
2011-09-21 07:24:03 -06:00
Tobias Droste 1795372fee gallivm: fix build with LLVM 3.0svn
LLVM 3.0svn added SubtargetInfo as additional parameter to
createMCDisassembler() and createMCInstPrinter().
See revision 139237 of LLVM.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
Signed-off-by: Brian Paul <brianp@vmware.com>
2011-09-15 15:53:04 -06:00
Brian Paul 0ebf83b731 gallivm: remove unused vars 2011-09-13 08:16:01 -06:00
Marek Olšák d8452a0be8 gallium: add shadow 1D and 2D array samplers to TGSI
And filling in all the switch statements in auxiliary. Mostly untested.
2011-09-10 08:53:29 +02:00
Tobias Droste 4a468de2d7 gallivm: fix build with LLVM 3.0svn
LLVM 3.0svn moved TargetRegistry.h and TargetSelect.h.
See revision 138450 of LLVM.

Signed-off-by: Tobias Droste <tdroste@gmx.de>
2011-09-05 18:49:11 +01:00
José Fonseca 5161aff48a gallivm: Add a note about log2 computation and denormalized numbers. 2011-07-22 18:52:09 -07:00
José Fonseca af82ff556c gallivm: Fix lp_build_exp2 order 4-5 polynomial coefficients and bump order.
Not sure how I computed these, but they were wrong (which explains why
bumping the polynomial order before never improved precision).

This allows to pass the EXP test cases of PSPrecision/VSPrecision DCTs.
2011-07-22 18:52:09 -07:00
José Fonseca 47d6d44a23 gallivm: Increase lp_build_rsqrt() precision.
Add an iteration step, which makes rqsqrt precision go from 12bits to
24, and fixes RSQ/NRM test case of PSPrecision/VSPrevision DCTs.

There are no uses of this function outside shader translation.
2011-07-22 18:52:09 -07:00
José Fonseca ef1a2765a4 gallivm: Update minimax comments. 2011-07-22 18:52:09 -07:00
José Fonseca 1ac86e249e gallivm: Fix lp_build_exp/lp_build_log.
Never used so far -- we only used the base 2 variants -- which is why
it went unnoticed so far.
2011-07-22 18:52:09 -07:00
Vinson Lee 9228bfb375 gallivm: Rename createAsmInfo to createMCAsmInfo with llvm-3.0.
llvm-3.0svn r135219 renamed createAsmInfo to createMCAsmInfo in
include/llvm/Target/TargetRegistry.h.
2011-07-16 00:17:46 -07:00
Vinson Lee 1844ae7e7e gallivm: Re-enable LLVMUnionTypeKind case for llvm-2.7 only.
LLVMUnionTypeKind is not in llvm-2.6, llvm-2.8, llvm-2.9, or llvm-3.0svn.
2011-07-11 14:08:24 -07:00
Vinson Lee e4189f2e2e gallivm: Remove LLVMOpaqueKindType case with llvm-3.0.
llvm-3.0svn r134829 removed LLVMOpaqueKindType from enum LLVMTypeKind in
include/llvm-c/Core.h.
2011-07-11 12:48:06 -07:00
Gustaw Smolarczyk fc98444bd5 gallivm: Fix build with llvm-3.0
LLVM 3.0svn changes pretty rapidly. The change in
Target->createMCInstPrinter() signature which inspired commits
40ae214067 and
92e29dc5b0 has been reverted.

Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
2011-07-08 07:57:27 -06:00
Vinson Lee f8fcaf0215 gallivm: Pass in CPU name to createTargetMachine when on llvm-3.0.
llvm-3.0svn revision 134127 changed createTargetMachine to take in
an additional argument of the CPU name.
2011-06-30 15:48:41 -07:00
Vinson Lee b61e56756c gallivm: Rename TargetInstrDesc to MCInstrDesc when using llvm-3.0.
llvm-3.0svn revision 134021 renamed TargetInstrDesc to MCInstrDesc.
2011-06-30 15:07:57 -07:00
Vinson Lee ad7387fe12 gallivm: Fix x86 build with llvm-3.0svn.
LLVM revision 133739 renamed StackAlignment to StackAlignmentOverride.
2011-06-23 20:48:05 -07:00
Brian Paul 5f2deba9f3 gallium: s/bool/boolean/ 2011-06-08 08:05:40 -06:00
José Fonseca a436b3b2d4 gallivm: Fix for dynamically linked LLVM 2.8 library.
This prevents the error

    prog: for the -disable-mmx option: may only occur zero or one times!

when creating a new context after XCloseDisplay with DRI drivers linked
with a shared LLVM 2.8 library.
2011-05-20 11:54:52 +01:00
José Fonseca 61c67eca7d gallivm: Tell LLVM to not assume a 16-byte aligned stack on x86.
Fixes fdo 36738.
2011-05-18 18:14:37 +01:00
Matt Turner c5ac8a8aa2 Remove redundant util_unsigned_logbase2
util_logbase2 is exactly the same function.

Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
2011-05-12 16:37:34 -06:00
Marek Olšák 31200d0688 gallivm: fix warning: ‘value’ may be used uninitialized in this function
The path where it's uninitialized is guarded by an assert.
2011-04-27 13:16:35 +02:00
Fabian Bieler 08070cead0 llvmpipe: Take the sampler view's first_level into account when sampling. 2011-04-08 04:47:04 +02:00
Vinson Lee 92e29dc5b0 gallivm: Fix build with llvm-2.9.
The build fix of commit 40ae214067 does
not apply to llvm-2.9 but rather to llvm-3.0svn.
2011-03-28 17:48:37 -07:00
Tobias Droste 40ae214067 gallivm: Fix build with llvm-2.9
In llvm-2.9 Target->createMCInstPrinter() takes different arguments

Signed-off-by: Tobias Droste <tdroste@gmx.de>
2011-03-28 17:23:45 +01:00
Vinson Lee eb0dd37094 gallivm: Fix build with llvm-2.9.
In llvm-2.9, the header file llvm/System/Host.h has been moved to
llvm/Support/Host.h.
2011-03-25 12:47:06 -07:00
José Fonseca c27e58c109 gallivm: Fix build with llvm 2.6 on 32bit platforms 2011-03-13 19:49:21 +00:00
José Fonseca e6314db0ac gallivm: Use LLVM MC disassembler, instead of udis86.
Included in LLVM 2.7+. Unlink udis86, should support all instructions that
LLVM can emit.
2011-03-13 19:24:26 +00:00
Brian Paul 2c1ef65a04 llvmpipe: clamp texcoords in lp_build_sample_compare()
See previous commit for more info.

NOTE: This is a candidate for the 7.10 branch.
2011-03-07 18:59:42 -07:00
Jakob Bornecrantz 11f9ec5422 gallivm: Initialize stack values
valgrind gives me a warning with llvmpipe with profile builds but
not debug builds, this seems to fix the issue at least.
2011-02-26 20:13:08 +01:00
José Fonseca 57d4e922a6 gallivm: Use simple scaling plus casting in more unorm->float cases. 2011-02-19 10:56:05 +00:00
Jakob Bornecrantz 4c73030d47 draw: Init llvm if not provided 2011-01-24 03:26:59 +01:00
Brian Paul 652901e95b Merge branch 'draw-instanced'
Conflicts:
	src/gallium/auxiliary/draw/draw_llvm.c
	src/gallium/drivers/llvmpipe/lp_state_fs.c
	src/glsl/ir_set_program_inouts.cpp
	src/mesa/tnl/t_vb_program.c
2011-01-15 10:24:08 -07:00
Vinson Lee 492afbce18 gallivm: Disable MMX-disabling code on llvm-2.9.
The disable-mmx option was removed in llvm-2.9svn by revisions 122188
and 122189.

Fixes FDO bug 32564.
2010-12-22 19:56:10 -08:00
Vinson Lee adaa310e39 gallivm: Fix 'cast from pointer to integer of different size' warning.
Fixes this GCC warning.
lp_bld_const.h: In function 'lp_build_const_int_pointer':
lp_bld_const.h:137: warning: cast from pointer to integer of different size
2010-12-22 16:48:19 -08:00
José Fonseca 3f94d96fce gallivm: Cleanup util_format_xxx_fetch_xxx call generation.
No need to register function prototypes in the module now that we call
the C function pointer directly -- less LLVM objects lying around.

Limited testing with lp_test_format.
2010-12-17 20:14:31 +00:00
Brian Paul 3ecf47af12 gallivm: fix copy&paste error from previous commit
Fixes piglit regression, http://bugs.freedesktop.org/show_bug.cgi?id=32452

NOTE: This is a candidate for the 7.10 branch
2010-12-16 14:30:39 -07:00
Brian Paul ee16e97ed1 gallivm: work around LLVM 2.6 bug when calling C functions
Create a constant int pointer to the C function, then cast it to the
function's type.  This avoids using trampoline code which seem to be
inadvertantly freed by LLVM in some situations (which leads to segfaults).
The root issue and work-around were found by José.

NOTE: This is a candidate for the 7.10 branch
2010-12-16 10:19:16 -07:00
Brian Paul dfbc20593e gallivm: do texture swizzle after shadow compare
We need to swizzle after the shadow comparison so that the GL_DEPTH_MODE
functionality is handled properly.

This fixes all the piglit glsl-fs-shadow2d*.shader_test cases, except
for glsl-fs-shadow2dproj-bias.shader_test which fails because of a
bug in the GLSL compiler (fd.o 32395).
2010-12-14 12:17:10 -07:00
Brian Paul b363dd43d6 gallivm: store callbacks in a linked list rather than fixed size array
Should fix http://bugs.freedesktop.org/show_bug.cgi?id=32308
2010-12-13 11:47:28 -07:00
Brian Paul 1d6f3543a0 gallivm/llvmpipe: implement system values and instanceID 2010-12-08 19:04:11 -07:00
Brian Paul 14746b1d4f gallivm: fix null builder pointers 2010-12-03 07:38:02 -07:00
Brian Paul 6299f241e9 gallivm/llvmpipe: remove lp_build_context::builder
The field was redundant.  Use the gallivm->builder value instead.
2010-12-02 18:11:16 -07:00
Roland Scheidegger 4c70014626 gallium: support for array textures and related changes
resources have a array_size parameter now.
get_tex_surface and tex_surface_destroy have been renamed to create_surface
and surface_destroy and moved to context, similar to sampler views (and
create_surface now uses a template just like create_sampler_view). Surfaces
now really should only be used for rendering. In particular they shouldn't be
used as some kind of 2d abstraction for sharing a texture. offset/layout fields
don't make sense any longer and have been removed, width/height should go too.
surfaces and sampler views now specify a layer range (for texture resources),
layer is either array slice, depth slice or cube face.
pipe_subresource is gone array slices (or cube faces) are now treated the same
as depth slices in transfers etc. (that is, they use the z coord of the
respective functions).

Squashed commit of the following:

commit a45bd509014743d21a532194d7b658a1aeb00cb7
Merge: 1aeca28 32e1e59
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Dec 2 04:32:06 2010 +0100

    Merge remote branch 'origin/master' into gallium-array-textures

    Conflicts:
    	src/gallium/drivers/i915/i915_resource_texture.c
    	src/gallium/drivers/i915/i915_state_emit.c
    	src/gallium/drivers/i915/i915_surface.c

commit 1aeca287a827f29206078fa1204715a477072c08
Merge: 912f042 6f7c8c3
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Dec 2 00:37:11 2010 +0100

    Merge remote branch 'origin/master' into gallium-array-textures

    Conflicts:
    	src/gallium/state_trackers/vega/api_filters.c
    	src/gallium/state_trackers/vega/api_images.c
    	src/gallium/state_trackers/vega/mask.c
    	src/gallium/state_trackers/vega/paint.c
    	src/gallium/state_trackers/vega/renderer.c
    	src/gallium/state_trackers/vega/st_inlines.h
    	src/gallium/state_trackers/vega/vg_context.c
    	src/gallium/state_trackers/vega/vg_manager.c

commit 912f042e1d439de17b36be9a740358c876fcd144
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Dec 1 03:01:55 2010 +0100

    gallium: even more compile fixes after merge

commit 6fc95a58866d2a291def333608ba9c10c3f07e82
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Dec 1 00:22:26 2010 +0100

    gallium: some fixes after merge

commit a8d5ffaeb5397ffaa12fb422e4e7efdf0494c3e2
Merge: f7a202f 2da02e7
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Nov 30 23:41:26 2010 +0100

    Merge remote branch 'origin/master' into gallium-array-textures

    Conflicts:
    	src/gallium/drivers/i915/i915_state_emit.c
    	src/gallium/state_trackers/vega/api_images.c
    	src/gallium/state_trackers/vega/vg_context.c

commit f7a202fde2aea2ec78ef58830f945a5e214e56ab
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Nov 24 19:19:32 2010 +0100

    gallium: even more fixes/cleanups after merge

commit 6895a7f969ed7f9fa8ceb788810df8dbcf04c4c9
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Nov 24 03:07:36 2010 +0100

    gallium: more compile fixes after merge

commit af0501a5103b9756bc4d79167bd81051ad6e8670
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Nov 23 19:24:45 2010 +0100

    gallium: lots of compile fixes after merge

commit 0332003c2feb60f2a20e9a40368180c4ecd33e6b
Merge: 26c6346 b6b91fa
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Tue Nov 23 17:02:26 2010 +0100

    Merge remote branch 'origin/master' into gallium-array-textures

    Conflicts:
    	src/gallium/auxiliary/gallivm/lp_bld_sample.c
    	src/gallium/auxiliary/util/u_blit.c
    	src/gallium/auxiliary/util/u_blitter.c
    	src/gallium/auxiliary/util/u_inlines.h
    	src/gallium/auxiliary/util/u_surface.c
    	src/gallium/auxiliary/util/u_surfaces.c
    	src/gallium/docs/source/context.rst
    	src/gallium/drivers/llvmpipe/lp_rast.c
    	src/gallium/drivers/nv50/nv50_state_validate.c
    	src/gallium/drivers/nvfx/nv04_surface_2d.c
    	src/gallium/drivers/nvfx/nv04_surface_2d.h
    	src/gallium/drivers/nvfx/nvfx_buffer.c
    	src/gallium/drivers/nvfx/nvfx_miptree.c
    	src/gallium/drivers/nvfx/nvfx_resource.c
    	src/gallium/drivers/nvfx/nvfx_resource.h
    	src/gallium/drivers/nvfx/nvfx_state_fb.c
    	src/gallium/drivers/nvfx/nvfx_surface.c
    	src/gallium/drivers/nvfx/nvfx_transfer.c
    	src/gallium/drivers/r300/r300_state_derived.c
    	src/gallium/drivers/r300/r300_texture.c
    	src/gallium/drivers/r600/r600_blit.c
    	src/gallium/drivers/r600/r600_buffer.c
    	src/gallium/drivers/r600/r600_context.h
    	src/gallium/drivers/r600/r600_screen.c
    	src/gallium/drivers/r600/r600_screen.h
    	src/gallium/drivers/r600/r600_state.c
    	src/gallium/drivers/r600/r600_texture.c
    	src/gallium/include/pipe/p_defines.h
    	src/gallium/state_trackers/egl/common/egl_g3d_api.c
    	src/gallium/state_trackers/glx/xlib/xm_st.c
    	src/gallium/targets/libgl-gdi/gdi_softpipe_winsys.c
    	src/gallium/targets/libgl-gdi/libgl_gdi.c
    	src/gallium/tests/graw/tri.c
    	src/mesa/state_tracker/st_cb_blit.c
    	src/mesa/state_tracker/st_cb_readpixels.c

commit 26c6346b385929fba94775f33838d0cceaaf1127
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Aug 2 19:37:21 2010 +0200

    fix more merge breakage

commit b30d87c6025eefe7f6979ffa8e369bbe755d5c1d
Merge: 9461bf3 1f1928d
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Aug 2 19:15:38 2010 +0200

    Merge remote branch 'origin/master' into gallium-array-textures

    Conflicts:
    	src/gallium/drivers/llvmpipe/lp_rast.c
    	src/gallium/drivers/llvmpipe/lp_rast_priv.h
    	src/gallium/drivers/r300/r300_blit.c
    	src/gallium/drivers/r300/r300_screen_buffer.c
    	src/gallium/drivers/r300/r300_state_derived.c
    	src/gallium/drivers/r300/r300_texture.c
    	src/gallium/drivers/r300/r300_texture.h
    	src/gallium/drivers/r300/r300_transfer.c
    	src/gallium/drivers/r600/r600_screen.c
    	src/gallium/drivers/r600/r600_state.c
    	src/gallium/drivers/r600/r600_texture.c
    	src/gallium/drivers/r600/r600_texture.h
    	src/gallium/state_trackers/dri/common/dri1_helper.c
    	src/gallium/state_trackers/dri/sw/drisw.c
    	src/gallium/state_trackers/xorg/xorg_exa.c

commit 9461bf3cfb647d2301364ae29fc3084fff52862a
Merge: 17492d7 0eaccb3
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Jul 15 20:13:45 2010 +0200

    Merge commit 'origin/master' into gallium-array-textures

    Conflicts:
    	src/gallium/auxiliary/util/u_blitter.c
    	src/gallium/drivers/llvmpipe/lp_rast.c
    	src/gallium/drivers/llvmpipe/lp_surface.c
    	src/gallium/drivers/r300/r300_render.c
    	src/gallium/drivers/r300/r300_state.c
    	src/gallium/drivers/r300/r300_texture.c
    	src/gallium/drivers/r300/r300_transfer.c
    	src/gallium/tests/trivial/quad-tex.c

commit 17492d705e7b7f607b71db045c3bf344cb6842b3
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Fri Jun 18 10:58:08 2010 +0100

    gallium: rename element_offset/width fields in views to first/last_element

    This is much more consistent with the other fields used there
    (first/last level, first/last layer).
    Actually thinking about removing the ugly union/structs again and
    rename first/last_layer to something even more generic which could also
    be used for buffers (like first/last_member) without inducing headaches.

commit 1b717a289299f942de834dcccafbab91361e20ab
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Jun 17 14:46:09 2010 +0100

    gallium: remove PIPE_SURFACE_LAYOUT_LINEAR definition

    This was only used by the layout field of pipe_surface, but this
    driver internal stuff is gone so there's no need for this driver independent
    layout definition neither.

commit 10cb644b31b3ef47e6c7b55e514ad24bb891fac4
Merge: 5691db9 c85971d
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Jun 17 12:20:41 2010 +0100

    Merge commit 'origin/master' into gallium-array-textures

    Conflicts:
    	src/gallium/docs/source/glossary.rst
    	src/gallium/tests/graw/fs-test.c
    	src/gallium/tests/graw/gs-test.c

commit 5691db960ca3d525ce7d6c32d9c7a28f5e907f3b
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Jun 17 11:29:03 2010 +0100

    st/wgl: fix interface changes bugs

commit 2303ec32143d363b46e59e4b7c91b0ebd34a16b2
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Jun 16 19:42:32 2010 +0100

    gallium: adapt code to interface changes...

commit dcae4f586f0d0885b72674a355e5d56d47afe77d
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Jun 16 19:42:05 2010 +0100

    gallium: separate depth0 and array_size in the resource itself.

    These fields are still mutually exclusive (since no 3d array textures exist)
    but it ultimately seemed to error-prone to adapt all code accept the new
    meaning of depth0 (drivers stick that into hardware regs, calculate mipmap
    sizes etc.). And it isn't really cleaner anyway.
    So, array textures will have depth0 of 1, but instead use array_size,
    3D textures will continue to use depth0 (and have array_size of 1). Cube
    maps also will use array_size to indicate their 6 faces, but since all drivers
    should just be fine by inferring this themselves from the fact it's a cube map
    as they always used to nothing should break.

commit 621737a638d187d208712250fc19a91978fdea6b
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Jun 16 17:47:38 2010 +0100

    gallium: adapt code to interface changes

    There are still usages of pipe_surface where pipe_resource should be used,
    which should eventually be fixed.

commit 2d17f5efe166b2c3d51957c76294165ab30b8ae2
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Wed Jun 16 17:46:14 2010 +0100

    gallium: more interface changes

    In particular to enable usage of buffers in views, and ability to use a
    different pipe_format in pipe_surface.
    Get rid of layout and offset parameter in pipe_surface - the former was
    not used in any (public) code anyway, and the latter should either be computed
    on-demand or driver can use subclass of pipe_surface.
    Also make create_surface() use a template to be more consistent with
    other functions.

commit 71f885ee16aa5cf2742c44bfaf0dc5b8734b9901
Merge: 3232d11 8ad410d
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 14 14:19:51 2010 +0100

    Merge commit 'origin/master' into gallium-array-textures

    Conflicts:
    	src/gallium/auxiliary/util/u_box.h
    	src/gallium/drivers/nv50/nv50_surface.c
    	src/gallium/drivers/nvfx/nvfx_surface.c
    	src/gallium/drivers/r300/r300_blit.c
    	src/gallium/drivers/r300/r300_texture.c
    	src/gallium/drivers/r300/r300_transfer.c
    	src/gallium/drivers/r600/r600_blit.c
    	src/gallium/drivers/r600/r600_screen.h
    	src/gallium/include/pipe/p_state.h

commit 3232d11fe3ebf7686286013c357b404714853984
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 14 11:40:04 2010 +0100

    mesa/st: adapt to interface changes

    still need to fix pipe_surface sharing
    (as that is now per-context).
    Also broken is depth0 handling - half the code assumes
    this is also used for array textures (and hence by extension
    of that cube maps would have depth 6), half the code does not...

commit f433b7f7f552720e5eade0b4078db94590ee85e1
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Mon Jun 14 11:35:52 2010 +0100

    gallium: fix a couple of bugs in interface chnage fixes

commit 818366b28ea18f514dc791646248ce6f08d9bbcf
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:42:11 2010 +0200

    targets: adapt to interface changes

    Yes even that needs adjustments...

commit 66c511ab1682c9918e0200902039247793acb41e
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:41:13 2010 +0200

    tests: adapt to interface changes

    Everything needs to be fixed :-(.

commit 6b494635d9dbdaa7605bc87b1ebf682b138c5808
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:39:50 2010 +0200

    st: adapt non-rendering state trackers to interface changes

    might not be quite right in all places, but they really don't want
    to use pipe_surface.

commit 00c4289a35d86e4fe85919ec32aa9f5ffe69d16d
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:38:48 2010 +0200

    winsys: adapt to interface changes

commit 39d858554dc9ed5dbc795626fec3ef9deae552a0
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:26:54 2010 +0200

    st/python: adapt to interface changes

    don't think that will work, sorry.

commit 6e9336bc49b32139cec4e683857d0958000e15e3
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:26:07 2010 +0200

    st/vega: adapt to interface changes

commit e07f2ae9aaf8842757d5d50865f76f8276245e11
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:25:56 2010 +0200

    st/xorg: adapt to interface changes

commit 05531c10a74a4358103e30d3b38a5eceb25c947f
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:24:53 2010 +0200

    nv50: adapt to interface changes

commit 97704f388d7042121c6d496ba8c003afa3ea2bf3
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:24:45 2010 +0200

    nvfx: adapt to interface changes

commit a8a9c93d703af6e8f5c12e1cea9ec665add1abe0
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:24:01 2010 +0200

    i965g: adapt to interface changes

commit 0dde209589872d20cc34ed0b237e3ed7ae0e2de3
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:22:38 2010 +0200

    i915g: adapt to interface changes

commit 5cac9beede69d12f5807ee1a247a4c864652799e
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:20:58 2010 +0200

    svga: adapt to interface changes

    resource_copy_region still looking fishy.
    Was not very suited to unified zslice/face approach...

commit 08b5a6af4b963a3e4c75fc336bf6c0772dce5150
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:20:01 2010 +0200

    rbug: adapt to interface changes

    Not sure if that won't need changes elsewhere?

commit c9fd24b1f586bcef2e0a6e76b68e40fca3408964
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:19:31 2010 +0200

    trace: adapt to interface changes

commit ed84e010afc5635a1a47390b32247a266f65b8d1
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:19:21 2010 +0200

    failover: adapt to interface changes

commit a1d4b4a293da933276908e3393435ec4b43cf201
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:19:12 2010 +0200

    identity: adapt to interface changes

commit a8dd73e2c56c7d95ffcf174408f38f4f35fd2f4c
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:18:55 2010 +0200

    softpipe: adapt to interface changes

commit a886085893e461e8473978e8206ec2312b7077ff
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:18:44 2010 +0200

    llvmpipe: adapt to interface changes

commit 70523f6d567d8b7cfda682157556370fd3c43460
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:18:14 2010 +0200

    r600g: adapt to interface changes

commit 3f4bc72bd80994865eb9f6b8dfd11e2b97060d19
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:18:05 2010 +0200

    r300g: adapt to interface changes

commit 5d353b55ee14db0ac0515b5a3cf9389430832c19
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:17:37 2010 +0200

    cell: adapt to interface changes

    not even compile tested

commit cf5d03601322c2dcb12d7a9c2f1745e2b2a35eb4
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:14:59 2010 +0200

    util: adapt to interface changes

    amazing how much code changes just due to some subtle interface changes?

commit dc98d713c6937c0e177fc2caf23020402cc7ea7b
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Sat Jun 12 02:12:40 2010 +0200

    gallium: more interface fail, docs

    this also changes flush_frontbuffer to use a pipe_resource instead of
    a pipe_surface - pipe_surface is not meant to be (or at least no longer)
    an abstraction for standalone 2d images which get passed around.
    (This has also implications for the non-rendering state-trackers.)

commit 08436d27ddd59857c22827c609b692aa0c407b7b
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Jun 10 17:42:52 2010 +0200

    gallium: fix array texture interface changes bugs, docs

commit 4a4d927609b62b4d7fb9dffa35158afe282f277b
Author: Roland Scheidegger <sroland@vmware.com>
Date:   Thu Jun 3 22:02:44 2010 +0200

    gallium: interface changes for array textures and related cleanups

    This patch introduces array textures to gallium (note they are not immediately
    usable without the associated changes to the shader side).
    Also, this abandons pipe_subresource in favor of using level and layer
    parameters since the distinction between several faces (which was part of
    pipe_subresource for cube textures) and several z slices (which were not part
    of pipe_subresource but instead part of pipe_box where appropriate for 3d
    textures) is gone at the resource level.
    Textures, be it array, cube, or 3d, now use a "unified" set of parameters,
    there is no distinction between array members, cube faces, or 3d zslices.
    This is unlike d3d10, whose subresource index includes layer information for
    array textures, but which considers all z slices of a 3d texture to be part
    of the same subresource.
    In contrast to d3d10, OpenGL though reuses old 2d and 3d function entry points
    for 1d and 2d array textures, respectively, which also implies that for instance
    it is possible to specify all layers of a 2d array texture at once (note that
    this is not possible for cube maps, which use the 2d entry points, although
    it is possible for cube map arrays, which aren't supported yet in gallium).
    This should possibly make drivers a bit simpler, and also get rid of mutually
    exclusive parameters in some functions (as z and face were exclusive), one
    potential downside would be that 3d array textures could not easily be supported
    without reverting this, but those are nowhere to be seen.

    Also along with adjusting to new parameters, rename get_tex_surface /
    tex_surface_destroy to create_surface / surface_destroy and move them from
    screen to context, which reflects much better what those do (they are analogous
    to create_sampler_view / sampler_view_destroy).

    PIPE_CAP_ARRAY_TEXTURES is used to indicate if a driver supports all of this
    functionality (that is, both sampling from array texture as well as use a range
    of layers as a render target, with selecting the layer from the geometry shader).
2010-12-02 04:33:43 +01:00
Brian Paul efc82aef35 gallivm/llvmpipe: squash merge of the llvm-context branch
This branch defines a gallivm_state structure which contains the
LLVMBuilderRef, LLVMContextRef, etc.  All data structures built with
this object can be periodically freed during a "garbage collection"
operation.

The gallivm_state object has to be passed to most of the builder
functions where LLVMBuilderRef used to be used.

Conflicts:
	src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
	src/gallium/drivers/llvmpipe/lp_state_setup.c
2010-11-30 16:35:12 -07:00
Zack Rusin 5572805423 gallivm: fix storing of the addr register
we store into the index specified by the register index, not an
indirect register.
2010-11-30 02:01:43 -05:00
Zack Rusin f623d0c1c2 gallivm: implement indirect addressing over inputs
Instead of messing with the callers simply copy our inputs into a
alloca array at the beginning of the function and then use it.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2010-11-10 13:00:35 -05:00
José Fonseca 10740acf46 gallivm: Allocate TEMP/OUT arrays only once. 2010-11-09 20:36:28 +00:00
Zack Rusin 528c3cd241 gallivm: implement indirect addressing of the output registers 2010-11-09 20:36:28 +00:00
Brian Paul 55c5408ad0 gallivm: add const qualifiers, fix comment string 2010-11-05 08:51:53 -06:00
Brian Paul e8d6b2793f gallivm: alloca() was called too often for temporary arrays
Need to increment the array index to point to the last value.
Before, we were calling lp_build_array_alloca() over and over for
no reason.
2010-11-05 08:49:57 -06:00
Brian Paul e7f5d19a11 gallivm: implement execution mask for scatter stores 2010-11-04 10:01:28 -06:00
Brian Paul fb94747b66 gallivm: added lp_elem_type() 2010-11-04 10:00:58 -06:00
Brian Paul ede232e989 gallivm: add pixel offsets in scatter stores
We want to do the scatter store to sequential locations in memory
for the vector of pixels we're processing in SOA format.
2010-11-04 09:31:59 -06:00
Brian Paul 5b294a5d17 gallivm: added debug code to dump temp registers 2010-11-04 09:28:06 -06:00
Brian Paul 3ded3e98ff gallivm: add some LLVM var labels 2010-11-03 17:34:07 -06:00
Brian Paul 2fefbc79ac gallivm: implement scatter stores into temp register file
Something is not quite right, however.  The piglit tests mentioned in
fd.o bug 31226 still don't pass.
2010-11-03 17:34:07 -06:00
José Fonseca 8d364221e9 gallivm: always enable LLVMAddInstructionCombiningPass() 2010-10-28 20:40:34 +01:00
Vinson Lee 50095ac87c gallivm: Silence uninitialized variable warning.
Fixes this GCC warning.
gallivm/lp_bld_tgsi_aos.c: In function 'lp_build_tgsi_aos':
gallivm/lp_bld_tgsi_aos.c:516: warning: 'dst0' may be used uninitialized in this function
gallivm/lp_bld_tgsi_aos.c:516: note: 'dst0' was declared here
2010-10-21 11:27:35 -07:00
Vinson Lee fc59790b87 gallivm: Silence uninitialized variable warnings.
Fixes these GCC warnings.
gallivm/lp_bld_sample_aos.c: In function 'lp_build_sample_image_nearest':
gallivm/lp_bld_sample_aos.c:271: warning: 't_ipart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:271: warning: 'r_ipart' may be used uninitialized in this function
2010-10-21 11:21:03 -07:00
Vinson Lee 0a5666148b gallivm: Silence uninitialized variable warnings.
Fixes these GCC warnings.
gallivm/lp_bld_sample_aos.c: In function 'lp_build_sample_image_linear':
gallivm/lp_bld_sample_aos.c:439: warning: 'r_ipart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:438: warning: 't_ipart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:438: warning: 't_fpart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:439: warning: 'r_fpart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:438: warning: 't_fpart_hi' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:438: warning: 't_fpart_lo' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:439: warning: 'r_fpart_hi' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:439: warning: 'r_fpart_lo' may be used uninitialized in this function
2010-10-21 11:10:15 -07:00
Brian Paul ec2824cd86 gallivm: fix incorrect type for zero vector in emit_kilp()
http://bugs.freedesktop.org/show_bug.cgi?id=30974
2010-10-19 09:14:19 -06:00
José Fonseca ac17c62ece gallivm: Add a note about SSE4.1's nearest mode rounding. 2010-10-18 09:32:35 -07:00
José Fonseca 4dfb43c6a6 gallivm: Comment lp_build_insert_new_block(). 2010-10-17 18:23:18 -07:00
José Fonseca dc5bdbe0f9 gallivm: Fix SoA cubemap derivative computation.
Derivatives are now scalar.

Broken since 17dbd41cf2.
2010-10-17 09:43:18 -07:00
Brian Paul 46c2ee4fad gallivm: use util_snprintf() 2010-10-15 17:32:23 -06:00
Brian Paul fb8f3d7711 gallivm: added lp_build_load_volatile()
There's no LLVM C LLVMBuildLoadVolatile() function so roll our own.
Not used anywhere at this time but can come in handy during debugging.
2010-10-15 15:40:33 -06:00
Brian Paul 991f0c2763 gallivm: added lp_build_print_vec4() 2010-10-15 15:40:33 -06:00
Brian Paul 62450b3c49 gallivm: add compile-time option to emit inst addrs and/or line numbers
Disabling address printing is helpful for diffing.
2010-10-14 17:28:24 -06:00
José Fonseca 60c5d4735d gallivm: More accurate float -> 24bit & 32bit unorm conversion. 2010-10-13 20:25:57 +01:00
Brian Paul e487b665aa gallivm: work-around trilinear mipmap filtering regression with LLVM 2.8
The bug only happens on the AOS / fixed-pt path.
2010-10-13 12:37:42 -06:00
Vinson Lee bee22ed6b9 gallivm: Remove unnecessary header. 2010-10-13 11:18:40 -07:00
Roland Scheidegger d838e4f66d gallivm: only use lp_build_conv 4x4f -> 1x16 ub fastpath with sse2
This is relying on lp_build_pack2 using the sse2 pack intrinsics which
handle clamping.
(Alternatively could have make it use lp_build_packs2 but it might
not even produce more efficient code than not using the fastpath
in the first place.)
2010-10-13 15:26:37 +02:00
Brian Paul 50f221a01b gallivm: remove newlines 2010-10-12 19:04:05 -06:00
Roland Scheidegger c1549729ce gallivm: fix different handling of [non]normalized coords in linear soa path
There seems to be no reason for it, so do same math for both
(except the scale mul, of course).
2010-10-13 02:35:05 +02:00
José Fonseca 6fbd4faf97 gallivm: Name anonymous union. 2010-10-12 16:08:09 +01:00
Keith Whitwell 22ec25e2bf gallivm: don't branch on KILLs near end of shader 2010-10-12 13:14:51 +01:00
José Fonseca 7c1b5772a8 gallivm: More detailed analysis of tgsi shaders.
To allow more optimizations, in particular for direct textures.
2010-10-11 13:05:32 +01:00
José Fonseca e1003336f0 gallivm: Eliminate unsigned integer arithmetic from texture coordinates.
SSE support for 32bit and 16bit unsigned arithmetic is not complete, and
can easily result in inefficient code.

In most cases signed/unsigned doesn't make a difference, such as for
integer texture coordinates.

So remove uint_coord_type and uint_coord_bld to avoid inefficient
operations to sneak in the future.
2010-10-11 08:14:09 +01:00
José Fonseca 17dbd41cf2 gallivm: Pass texture coords derivates as scalars.
We end up treating them as scalars in the end, and it saves some
instructions.
2010-10-10 19:51:35 +01:00
José Fonseca 693667bf88 gallivm: Use variables instead of Phis in loops.
With this commit all explicit Phi emission is now gone.
2010-10-10 19:05:05 +01:00
José Fonseca 48003f3567 gallivm: Allow to disable bri-linear filtering with GALLIVM_DEBUG=no_brilinear runtime option 2010-10-10 18:48:02 +01:00
José Fonseca 124adf253c gallivm: Fix a long standing bug with nested if-then-else emission.
We can't patch true-block at end-if time, as there is no guarantee that
the block at the beginning of the true stanza is the same at the end of
the true stanza -- other control flow elements may have been emitted half
way the true stanza.

Although this bug surfaced recently with the commit to skip mip filtering
when lod is an integer the bug was always there, although probably it
was avoided until now: e.g., cubemap selection nests if-then-else on the
else stanza, which does not suffer from the same problem.
2010-10-10 18:48:02 +01:00
José Fonseca 307df6a858 gallivm: Cleanup the rest of the flow module. 2010-10-09 21:39:14 +01:00
José Fonseca d0ea464159 gallivm: Simplify if/then/else implementation.
No need for for a flow stack anymore.
2010-10-09 21:14:05 +01:00
José Fonseca 1949f8c315 gallivm: Factor out the SI->FP texture size conversion for SoA path too 2010-10-09 20:26:11 +01:00
José Fonseca d45c379027 gallivm: Remove support for Phi generation.
Simply rely on mem2reg pass. It's easier and more reliable.
2010-10-09 20:14:03 +01:00
José Fonseca ea7b49028b gallivm: Use varilables instead of Phis for cubemap selection. 2010-10-09 19:53:21 +01:00
José Fonseca cc40abad51 gallivm: Don't generate Phis for execution mask. 2010-10-09 12:55:31 +01:00
José Fonseca 679dd26623 gallivm: Special bri-linear computation path for unmodified rho. 2010-10-09 12:13:00 +01:00
José Fonseca 81a09c8a97 gallivm: Less code duplication in log computation. 2010-10-09 12:12:59 +01:00
José Fonseca 53d7f5e107 gallivm: Handle code have ret correctly.
Stop disassembling on unconditional backwards jumps.
2010-10-09 12:12:59 +01:00
Keith Whitwell aa4cb5e2d8 llvmpipe: try to be sensible about whether to branch after mask updates
Don't branch more than once in quick succession.  Don't branch at the
end of the shader.
2010-10-09 11:44:45 +01:00
Keith Whitwell 2ef6f75ab4 gallivm: simpler uint8->float conversions
LLVM seems to finds it easier to reason about these than our
mantissa-manipulation code.
2010-10-09 11:44:45 +01:00
Keith Whitwell c79f162367 gallivm: prefer blendvb for integer arguments 2010-10-09 11:44:45 +01:00
Keith Whitwell 6da29f3611 llvmpipe: store zero into all alloca'd values
Fixes slowdown in isosurf with earlier versions of llvm.
2010-10-09 11:43:23 +01:00
José Fonseca 34c11c87e4 gallivm: Do size computations simultanously for all dimensions (AoS).
Operate simultanouesly on <width, height, depth> vector as much as possible,
instead of doing the operations on vectors with broadcasted scalars.

Also do the 24.8 fixed point scalar with integer shift of the texture size,
for unnormalized coordinates.

AoS path only for now -- the same thing can be done for SoA.
2010-10-09 09:34:31 +01:00
Roland Scheidegger ff72c79924 gallivm: make use of new iround code in lp_bld_conv.
Only requires sse2 now.
2010-10-09 00:36:38 +02:00
Roland Scheidegger 175cdfd491 gallivm: optimize soa linear clamp to edge wrap mode a bit
Clamp against 0 instead of -0.5, which simplifies things.
The former version would have resulted in both int coords being zero
(in case of coord being smaller than 0) and some "unused" weight value,
whereas now the int coords will be 0 and 1, but weight will be 0, hence the
lerp should produce the same value.
Still not happy about differences between normalized and non-normalized...
2010-10-09 00:36:38 +02:00
Roland Scheidegger 2cc6da85d6 gallivm: avoid unnecessary URem in linear wrap repeat case
Haven't looked at what code this exactly generates but URem can't be fast.
Instead of using two URem only use one and replace the second one with
select/add (this is what the corresponding aos code already does).
2010-10-09 00:36:38 +02:00
Roland Scheidegger 318bb080b0 gallivm: more linear tex wrap mode calculation simplification
Rearrange order of operations a bit to make some clamps easier.
All calculations should be equivalent.
Note there seems to be some inconsistency in the clamp to edge case
wrt normalized/non-normalized coords, could potentially simplify this too.
2010-10-09 00:36:38 +02:00
Roland Scheidegger 99ade19e6e gallivm: optimize some tex wrap mode calculations a bit
Sometimes coords are clamped to positive numbers before doing conversion
to int, or clamped to 0 afterwards, in this case can use itrunc
instead of ifloor which is easier. This is only the case for nearest
calculations unfortunately, except linear MIRROR_CLAMP_TO_EDGE which
for the same reason can use a unsigned float build context so the
ifloor_fract helper can reduce this to itrunc in the ifloor helper itself.
2010-10-09 00:36:38 +02:00
Roland Scheidegger 1e17e0c4ff gallivm: replace sub/floor/ifloor combo with ifloor_fract 2010-10-09 00:36:37 +02:00
Roland Scheidegger cb3af2b434 gallivm: faster iround implementation for sse2
sse2 supports round to nearest directly (or rather, assuming default nearest
rounding mode in MXCSR). Use intrinsic to use this rather than round (sse41)
or bit manipulation whenever possible.
2010-10-09 00:36:37 +02:00
Roland Scheidegger 0ed8c56bfe gallivm: fix trunc/itrunc comment
trunc of -1.5 is -1.0 not 1.0...
2010-10-09 00:36:37 +02:00
Vinson Lee 5e90971475 gallivm: Remove unnecessary header. 2010-10-08 14:03:10 -07:00
José Fonseca 3fde8167a5 gallivm: Help for combined extraction and broadcasting.
Doesn't change generated code quality, but saves some typing.
2010-10-08 19:48:16 +01:00
José Fonseca 438390418d llvmpipe: First minify the texture size, then broadcast. 2010-10-08 19:11:52 +01:00
José Fonseca f5b5fb32d3 gallivm: Move into the as much of the second level code as possible.
Also, pass more stuff trhough the sample build context, instead of
arguments.
2010-10-08 19:11:52 +01:00
José Fonseca 6b0c79e058 gallivm: Warn when doing inefficient integer comparisons. 2010-10-08 17:43:15 +01:00
Keith Whitwell e191bf4a85 gallivm: round rather than truncate in new 4x4f->1x16ub conversion path 2010-10-08 17:30:08 +01:00
José Fonseca f91b4266c6 gallivm: Use the wrappers for SSE pack intrinsics.
Fixes assertion failures on LLVM 2.6.
2010-10-08 17:30:08 +01:00
Keith Whitwell 607e3c542c gallivm: special case conversion 4x4f to 1x16ub
Nice reduction in the number of operations required for final color
output in many shaders.
2010-10-08 17:30:08 +01:00
José Fonseca eb605701aa gallivm: Implement brilinear filtering. 2010-10-08 15:50:28 +01:00
José Fonseca c8179ef5e8 gallivm: Fix copy'n'paste typo in previous commit. 2010-10-08 14:09:22 +01:00
José Fonseca df7a2451b1 gallivm: Clamp mipmap level and zero mip weight simultaneously. 2010-10-08 14:06:38 +01:00
José Fonseca 0d84b64a4f gallivm: Use lp_build_ifloor_fract for lod computation.
Forgot this one before.
2010-10-08 14:06:38 +01:00
José Fonseca 4f2e2ca4e3 gallivm: Don't compute the second mipmap level when frac(lod) == 0 2010-10-08 14:06:37 +01:00
José Fonseca 05fe33b71c gallivm: Simplify lp_build_mipmap_level_sizes' interface. 2010-10-08 14:06:37 +01:00
José Fonseca 4eb222a3e6 gallivm: Do not do mipfiltering when magnifying.
If lod < 0, then invariably follows that ilevel0 == ilevel1 == 0.
2010-10-08 14:06:37 +01:00