Commit Graph

939 Commits

Author SHA1 Message Date
Roland Scheidegger ca4f0baca2 gallivm: fix somewhat broken NaN behavior for exp2
I actually screwed that up in 754319490f,
mistakenly thinking the code actually wanted the non-nan result before.
So, introduce that missing nan behavior case and use that instead.
For sse, there's no actual change in the resulting code at all, the fallback
code wouldn't have done the right thing though.
Of course, the actual issue I saw with pow() was completely unrelated...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-08-30 01:34:41 +02:00
Roland Scheidegger 9da75f96bc gallivm: handle cube map arrays for texture sampling
Pretty easy, just make sure that all paths testing for PIPE_TEXTURE_CUBE
also recognize PIPE_TEXTURE_CUBE_ARRAY, and add the layer * 6 calculation
to the calculated face.
Also handle it for texture size query, looks like OpenGL wants the number
of cubes, not layers (so need division by 6).

No piglit regressions.

v2: fix up adding cube layer to face for seamless filtering (needs to happen
after calculating per-sample face). Undetected by piglit unfortunately.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com> (v1)
2014-08-30 01:33:02 +02:00
Vinson Lee c04a6d5c29 gallivm: Fix build with LLVM >= 3.6 r215967.
This LLVM 3.6 commit changed EngineBuilder constructor.

commit 3f4ed32b4398eaf4fe0080d8001ba01e6c2f43c8
Author: Rafael Espindola <rafael.espindola@gmail.com>
Date:   Tue Aug 19 04:04:25 2014 +0000

    Make it explicit that ExecutionEngine takes ownership of the modules.

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215967 91177308-0d34-0410-b5e6-96231b3b80d8

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2014-08-20 15:24:44 +09:00
Marek Olšák c10332bbb8 gallium: remove PIPE_SHADER_CAP_MAX_ADDRS
This limit is fixed in Mesa core and cannot be changed.
It only affects ARB_vertex_program and ARB_fragment_program.

The minimum value for ARB_vertex_program is 1 according to the spec.
The maximum value for ARB_vertex_program is limited to 1 by Mesa core.

The value should be zero for ARB_fragment_program, because it doesn't
support ARL.

Finally, drivers shouldn't mess with these values arbitrarily.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2014-08-11 21:53:57 +02:00
Darius Goad 5492296318 gallivm: Handle MSAA textures in emit_fetch_texels
This support is preliminary due to the fact that MSAA is not
actually implemented.

However, this patch does fix the piglit test:
spec/!OpenGL 3.2/glsl-resource-not-bound 2DMS (bug #79740).

(v2 RS: don't emit 4th coord as explicit lod)

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-08-08 18:54:08 +02:00
Roland Scheidegger 394ea139c7 draw: hack around weird primitive id input in gs
The distinction between system values and ordinary inputs is not very
obvious in gallium - further fueled by the fact that they use the same
semantic names.
Still, if there's any value which imho really is a system value, it's the
primitive id input into the gs (while earlier (tessleation) stages could read
it, it is _always_ generated by the system). For some odd reason though (which
I'd classify as a bug but seems too complicated to fix) the glsl compiler in
mesa treats this as an ordinary varying, and everything else after that
(including the state tracker and other drivers) just go along with that.
But input fetching in gs for llvm based draw was definitely limited to the
ordinary (2-dimensional) inputs so only worked with other state trackers,
the code was also additionally relying on tgsi_scan_shader filling
uses_primid correctly which did not happen neither (would set it only for
all stages if it was a system value, but only set it for the fragment shader
if it was an input value).
This fixes piglit glsl-1.50-geometry-primitive-id-restart and primitive-id-in
in llvmpipe.

Reviewed-by: Brian Paul <brianp@vmware.com>
2014-08-08 18:54:08 +02:00
Jan Vesely e28136343b gallivm: Fix build with latest LLVM
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2014-08-05 12:52:56 +09:00
Roland Scheidegger c3c33756ff gallivm: fix cube map array (and cube map shadow with bias) handling
In particular need to handle TEX2/TXB2/TXL2 opcodes.
cube map shadow with bias already used TXB2 which didn't work before
at all, despite that there's by default no piglit change (but using
no_quad_lod and no_rho_opt indeed passes some more tex-miplevel-selection
tests).
The actual sampling code still won't handle cube map arrays.

Reviewed-by: Brian Paul <brianp@vmware.com>
2014-08-05 04:13:17 +02:00
Roland Scheidegger 5a12155503 gallivm: fix up out-of-bounds level when using conformant out-of-bound behavior
When using (d3d10) conformant out-of-bound behavior for texel fetching
(currently always enabled) the level still needs to be set to a safe value
even though the offset in the end won't get used because the level is used
to look up the mip offset itself and the actual strides, which might otherwise
crash.
For simplicity, we'll use level 0 in this case (this ought to be safe, llvmpipe
does not actually fill in level 0 information if first_level is larger, but
some random strides / offsets shouldn't hurt as ultimately we always use
offset 0 in this case).
Fixes a crash in some in-house test where random huge levels appear in
lp_build_fetch_texel() (the test actually uses level 0 always but if the
fetching happens in a block with a execution mask random values may appear).

CC: <mesa-stable@lists.freedesktop.org>

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-07-31 01:31:06 +02:00
Marek Olšák 04f2c88f45 gallium: rename shader cap MAX_CONSTS to MAX_CONST_BUFFER_SIZE
This new name isn't so confusing.

I also changed the gallivm limit, because it looked wrong.

Reviewed-by: Brian Paul <brianp@vmware.com>

v2: use sizeof(float[4])
2014-07-28 23:57:08 +02:00
Tom Stellard fea996c2aa gallium: Add PIPE_SHADER_CAP_DOUBLES
This is for reporting whether or not double precision floating-point
operations are supported.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2014-07-02 15:31:52 -04:00
Aaron Watry 564821c917 gallivm: Fix build after LLVM commit 211259
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-06-20 19:49:18 -05:00
Roland Scheidegger cad60420d5 gallivm: set mcpu when initializing llvm execution engine
Previously llvm detected cpu features automatically when the execution engine
was created (based on host cpu). This is no longer the case, which meant llvm
was then not able to emit some of the intrinsics we used as we didn't specify
any sse attributes (only on avx supporting systems this was not a problem since
despite at least some llvm versions enabling it anyway we always set this
manually). So, instead of trying to figure out which MAttrs to set just set
MCPU.

This fixes https://bugs.freedesktop.org/show_bug.cgi?id=77493.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2014-06-19 16:58:00 +02:00
Marek Olšák 1df7199fc9 gallium: implement ARB_texture_query_levels
The extension is always supported if GLSL 1.30 is supported.

Softpipe and llvmpipe support is also added (trivial).
Radeon and nouveau support is already done.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-06-19 00:17:36 +02:00
Roland Scheidegger 56335b4441 gallivm: fix SCALED -> NORM conversions
Such conversions (which are most likely rather pointless in practice) were
resulting in shifts with negative shift counts and shifts with counts the same
as the bit width. This was always undefined in llvm, the code generated was
rather horrendous but happened to work.
So make sure such shifts are filtered out and replaced with something that
works (the generated code is still just as horrendous as before).

This fixes lp_test_format, https://bugs.freedesktop.org/show_bug.cgi?id=73846.

v2: prettify by using build context shift helpers.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-06-18 19:52:57 +02:00
José Fonseca 172ef0c5a5 gallivm: Disable workaround for PR12833 on LLVM 3.2+.
Fixed upstream.
2014-05-23 11:37:47 +01:00
José Fonseca 2c02f34fcc gallivm: Support MCJIT on Windows.
It works fine, though it requires using ELF objects.

With this change there is nothing preventing us to switch exclusively
to MCJIT, everywhere.  It's still off though.
2014-05-23 11:37:47 +01:00
Roland Scheidegger 1e9cbbb1c4 llvmpipe: do IR counting for shader cache management after optimization.
2ea923cf57 had the side effect of IR counting
now being done after IR optimization instead of before. Some quick analysis
shows that there's roughly 1.5 times more IR instructions before optimization
than after, hence the effective shader cache size got quite a bit smaller.
Could counter this with an increase of the instruction limit but it probably
makes more sense to count them after optimizations, so move that code.

Reviewed-by: Brian Paul <brianp@vmware.com>
2014-05-19 17:07:41 +02:00
Roland Scheidegger 3bf2d86c09 gallivm: (trivial) fix compilation with llvm 3.1, 3.2
I actually checked the getModuleIdentifier() function exists with 3.1 but
missed that the file moved...
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=78803
2014-05-17 02:03:35 +02:00
Roland Scheidegger 3a1da0abee gallivm: print out how long it takes to optimize shader IR.
Enabled with GALLIVM_DEBUG=perf (which up to now was only used to print
warnings for unoptimized code).

While some unexpectedly long shader compile times for some shaders were fixed
with 8a9f5ecdb1 this should help recognize such
problems in the future. For now though only available in debug builds (which
are not always suitable for such analysis). And since this uses system time,
it might not be all that accurate (even llvmpipe's own rasterization threads
might be running at the same time, or just other tasks).
(llvmpipe also has LP_DEBUG=counters but this only gives an average per shader
and the the total time for all shaders.)
This prints information like this:
optimizing module fs17_variant0 took 1 msec
optimizing module setup_variant_0 took 0 msec
optimizing module draw_llvm_vs_variant0 took 9 msec
optimizing module draw_llvm_vs_variant0 took 12 msec
optimizing module fs17_variant1 took 2 msec

v2: rebase for recent gallivm compilation changes, and print time for whole
modules instead of functions (otherwise it would be very spammy since it would
include all trivial inline sse2 functions), using the shiny new module names,
prying them off LLVM using new helper (not available through C bindings).
Per function timings, while possibly giving more information (if there'd be
a problem only in for instance the partial not the whole function), don't seem
all that useful for now.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-16 22:50:14 +02:00
Roland Scheidegger 26cac02c51 gallivm: give more verbose names to modules
When we had just one module "gallivm" was an appropriate name. But now we have
modules containing all functions for a particular variant, so give it a
corresponding name (this is really just for helping debugging).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-16 22:50:14 +02:00
Roland Scheidegger b416645387 gallivm: remove optimization workaround when not having sse 4.1
This workaround doesn't list any llvm version, but it was introduced
2010-06-10 (e277d5c1f6). It is unlikely
this bug is still present in llvm versions we support (3.1+).
There's no specific test listed, but I ran lp_test_arit (which uses
the mentioned functions) on llvm 3.1 and 3.3 with sse41 disabled and
this pass enabled without issues.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-16 01:09:34 +02:00
Roland Scheidegger 93731fbeec gallivm: remove workaround for reversing optimization pass order.
32bit code generation and llvm >= 2.7 used a different optimization pass
order - this code was initially introduced (2010-07-23) by
815e79e72c, apparently due to buggy code being
generated with then brand new llvm versions (which was llvm 2.7 plus pre 2.8
devel).
It seems very highly likely that whatever this bug was it has been fixed in
newer llvm versions, though there's no easy way to test this - the mentioned
piglit test has been removed years ago, and even if you'd build it I'm
sceptical the glsl compiler would still produce the required code to trigger
it.
I have no idea what a good order of passes is, but just remove the workaround
and use the same order everywhere.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-16 01:09:34 +02:00
Roland Scheidegger 8a9f5ecdb1 gallivm: only fetch pointers to constant buffers once
In 1d35f77228 support for multiple constant
buffers was introduced. This meant we had another indirection, and we did
resolve the indirection for each constant buffer access. This looks very
reasonable since llvm can figure out if it's the same pointer, however it
turns out that this can cause llvm compilation time to go through the roof
and beyond (I've seen cases in excess of factor 100, e.g. from 50 ms to more
than 10 seconds (!)), with all the additional time spent in IR optimization
passes (and in the end all of it in DominatorTree::dominate()).
I've been unable to narrow it down a bit more (only some shaders seem affected,
seemingly without much correlation to overall shader complexity or constant
usage) but it is easily avoidable by doing the buffer lookups themeselves just
once (at constant buffer declaration time).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-05-14 16:23:33 +02:00
Roland Scheidegger 18c6454ad1 gallivm: fix output stream flushing in error case for disassembly.
When there's an error, also need to flush the stream, otherwise an assertion
is hit (meaning you don't actually see the error neither).
2014-05-14 16:23:33 +02:00
José Fonseca b18b7781b2 gallivm: Remove lp_func_delete_body.
Not necessary, now that we will free the whole module (hence all
function bodies) immediately after compiling.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-05-14 11:05:00 +01:00
José Fonseca a6f5cc66db gallivm: Remove gallivm_free_function.
Unused.  Deprecated by gallivm_free_ir().

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-05-14 11:05:00 +01:00
Frank Henigman 865d0312c0 gallivm: Separate freeing LLVM intermediate data from freeing final code.
Split free_gallivm_state() into two steps.  First step is
gallivm_free_ir() which cleans up the LLVM scaffolding used to generate
code while preserving the code itself.  Second step is
gallivm_free_code() to free the memory occupied by the code.

v2: s/gallivm_teardown/gallivm_free_ir/ (Jose)

Signed-off-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-05-14 11:05:00 +01:00
Frank Henigman 2c73102dc3 gallivm: One code memory pool with deferred free.
Provide a JITMemoryManager derivative which puts all generated code into
one memory pool instead of creating a new one each time code is generated.
This saves significant memory per shader as the pool size is 512K and
a small shader occupies just several K.

This memory manager also defers freeing generated code until you tell
it to do so, making it possible to destroy the LLVM engine while keeping
the code, thus enabling future memory savings.

v2: Fix compilation errors with LLVM 3.4 (Jose)

Signed-off-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-05-14 11:05:00 +01:00
José Fonseca 2ea923cf57 gallivm: Run passes per module, not per function.
This is how it is meant to be done nowadays.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-05-14 11:05:00 +01:00
José Fonseca 920933e09e gallivm: Use LLVM global context.
I saw that LLVM internally uses its global context for some things, even
when we use our own.  Given ours is also global, might as well use
LLVM's.

However, sepearate contexts can still be enabled with a simple source
code modification, for when the need/benefit arises.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-05-14 11:05:00 +01:00
José Fonseca 69f0835ff1 gallivm: Stop using module providers.
Nowadays LLVMModuleProviderRef is just an alias for LLVMModuleRef, so
its use just causes unnecessary confusion.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-05-14 11:05:00 +01:00
José Fonseca 9cf67e51b0 gallivm,draw,llvmpipe: Remove support for versions of LLVM prior to 3.1.
Older versions haven't been tested probably don't work anyway.  But more
importantly, code supporting it is hindering further work.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-05-14 11:04:59 +01:00
Brian Paul baec25635d gallivm: add PIPE_SHADER_CAP_PREFERRED_IR switch case, remove default
Return PIPE_SHADER_IR_TGSI for the PIPE_SHADER_CAP_PREFERRED_IR query.
Remove default switch case so we learn of missing switch cases at
compile time.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2014-05-07 11:32:11 -06:00
Roland Scheidegger 64d6460a56 gallivm: fix 2 leaks in disassembly code
don't leak the MCSubtargetInfo (not really big, was already fixed with
llvm master) and TargetMachine (big). While this is only used for debugging
the leak is large enough to get you into trouble in some cases.
Tested with llvm 3.1 and master.
Before (llvm 3.1), GALLIVM_DEBUG=asm glxgears:
==14152== LEAK SUMMARY:
==14152==    definitely lost: 105,228 bytes in 20 blocks
==14152==    indirectly lost: 347,252 bytes in 261 blocks
==14152==      possibly lost: 866,625 bytes in 1,453 blocks
==14152==    still reachable: 7,344,677 bytes in 6,494 blocks
==14152==         suppressed: 0 bytes in 0 blocks

After:
==13799== LEAK SUMMARY:
==13799==    definitely lost: 3,108 bytes in 6 blocks
==13799==    indirectly lost: 0 bytes in 0 blocks
==13799==      possibly lost: 804,143 bytes in 1,429 blocks
==13799==    still reachable: 7,314,267 bytes in 6,473 blocks
==13799==         suppressed: 0 bytes in 0 blocks

Reviewed-by: Brian Paul <brianp@vmware.com>
2014-05-01 16:13:38 +02:00
José Fonseca 1527a545a4 gallivm: Fix wrong operator in lp_exec_default.
Courtesy of MSVC static code analyser.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-04-24 14:49:53 +01:00
Roland Scheidegger f23d1160c2 gallivm: fix compilation with llvm 3.5 r206241+
Just adjust to the ever-changing API, pass in MCContext when creating the
MCDisassembler.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2014-04-16 19:57:47 +02:00
Zack Rusin b1909b260f draw/llvm: improve debugging output a bit
it's useful to know what the llvmbuildstore arguments are going to
be before executing it because it can crash and make sure to
print out the inputs only if we're not generating a gs because
it fetches inputs differently.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-03-26 15:58:59 -04:00
Roland Scheidegger 3b421daf32 gallivm: fix no-op n:n lp_build_resize()
This can get called in some circumstances if both src type and dst type
have same width (seen with float32->unorm32). While this particular case
was bogus anyway let's just fix that as it can work trivially (due to the
way it was called it actually worked anyway apart from the assert).

Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-03-26 01:44:23 +01:00
Roland Scheidegger 9477d8c862 llvmpipe: add support for b5g6r5_srgb
The conversion code for srgb was tuned for n x 4x8bit AoS -> 4 x nxfloat SoA
(and vice versa), fix this to handle also 16bit 565-style srgb formats.
Still not really all that generic, things like r10g10b10a2_srgb or
r4g4b4a4_srgb wouldn't work (the latter trivial to fix, the former would not
require more work to not crash but near certainly need some higher precision
calculation) but not needed right now.
The code is not fully optimized for this (could use more direct calculation
instead of expanding to 8-bit range first) but should be good enough.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2014-03-21 17:23:38 +01:00
Jeff Muizelaar ff1e850eec gallivm: optimize repeat linear npot code in the aos int path
Similar to the other cases, shift some weight/coord calculations to int
space. This should be slightly faster (on x86 sse it should actually safe one
instruction, and generally int instructions are cheaper).
2014-03-14 19:41:18 +01:00
Roland Scheidegger 9954f01497 gallivm: use correct rounding for nearest wrap mode (in the aos int path)
The previous code used coords which were calculated as
(int) (f_coord * tex_size * 256) >> 8.
This is not only unnecessarily complex but can give the wrong texel due to
rounding for negative coords (as an example, after denormalization coords
from -1.0 to 0.0 should give -1, but this will give -1 for numbers from
-1.0-1/256 - 0.0-1/256.
Instead, juse use ifloor, dropping the shift stuff.
Unfortunately, this will most likely be slower - with arch rounding available
it shouldn't be too bad (trades a int shift for a round but also saves an int
mul (which is shared by all coords) but otherwise it's a mess.
2014-03-14 19:41:18 +01:00
Jeff Muizelaar 88637e5764 gallivm: use correct rounding for linear wrap mode (in the aos int path)
The previous method for converting coords to ints was sligthly inaccurate
(effectively losing 1bit from the 8bit lerp weight). This is probably
especially noticeable when trying to draw a pixel-aligned texture.
As an example, for a 100x100 texture after dernormalization the texture
coords in this case would turn up as
0.5, 1.5, 2.5, 3.5, 4.5, ...
After the mul by 256, conversion to int and 128 subtraction, they end up as
0, 256, 512, 768, 1024, ...
which gets us the correct coords/weights of
0/0, 1/0, 2/0, 3/0, 4/0, ...
But even LSB errors (which are unavoidable) in the input coords may cause
these coords/weights to be wrong, e.g. for a coord of 3.49999 we'd get a
coord/weight of 2/255 instead.

Fix this by using round-to-nearest int instead of FPToSi (trunc). Should be
equally fast on x86 sse though other archs probably suffer a little.
2014-03-14 19:41:18 +01:00
Roland Scheidegger b2b2a2c06c gallivm: add smallfloat to float conversion not relying on cpu denorm handling
The previous code relied on cpu denorm support for converting small float
formats (such r11g11b10_float and r16_float) to floats, otherwise denorms
are flushed to zero. We worked around that in llvmpipe blend code by
reenabling denorms, but this did nothing for texture sampling. Now it would
be possible to reenable it there too but I'm not really a fan of messing
with fpu flags (and it seems we can't actually do it reliably with llvm in
any case looking at some bug reports). (Not to mention if you actually have
a lot of denorms in there, you can expect some order-of-magnitude slowdown
with x86 cpus.)
So instead use code which adjusts exponents etc. directly hence not relying
on cpu denorm support for the rescaling mul.
(We still need the fpu flag handling as we can't do float-to-smallfloat
without using cpu denorms at least for now - I actually wanted to keep
both the old and new code and using one or the other depending on from where
it's called but that didn't work out as the parameter would have to be passed
through too many layers than I'd like.)

Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Si Chen <sichen@vmware.com>
2014-02-20 18:41:42 +01:00
Zack Rusin efb152dd04 gallivm: make sure analysis works with large number of immediates
We need to handle a lot more immediates and in order to do that
we also switch from allocating this structure on the stack to
allocating it on the heap.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-02-05 19:40:53 -05:00
Zack Rusin 69ee3f431f gallivm: handle huge number of immediates
We only supported up to 256 immediates, which isn't enough. We had
code which was allocating immediates as an allocated array, but it
was always used along a statically backed array for performance
reasons. This commit adds code to skip that performance optimization
and always use just the dynamically allocated immediates if the
number of them is too great.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-02-05 19:40:53 -05:00
Zack Rusin 8507afc97f gallivm: allow large numbers of temporaries
The number of allowed temporaries increases almost with every
iteration of an api. We used to support 128, then we started
increasing and the newer api's support 4096+. So if we notice
that the number of temporaries is larger than our statically
allocated storage would allow we just treat them as indexable
temporaries and allocate them as an array from the start.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-02-05 19:40:53 -05:00
Roland Scheidegger 4a7da3bec5 gallivm: fix F2U opcode
Previously, we were really doing F2I. And also move it to generic section.
(Note that for llvmpipe the code generated is definitely bad, due to lack
of unsigned conversions with sse. I think though what llvm does (using scalar
conversions to 64bit signed either with x87 fpu (32bit) or sse (64bit)
including lots of domain changes is quite suboptimal, could do something like
is_large = arg >= 2^31
half_arg = 0.5 * arg
small_c = fptoint(arg)
large_c = fptoint(half_arg) << 1
res = select(is_large, large_c, small_c)
which should be much less instructions but that's something llvm should do
itself.)

This fixes piglit fs/vs-float-uint-conversion.shader_test (maybe more, needs
GL 3.0 version override to run.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
2014-02-05 17:45:31 +01:00
Zack Rusin 9bace99d77 gallivm: fix opcode and function nesting
gallivm soa code supported only a single level of nesting for
control flow opcodes (if, switch, loops...) but the d3d10 spec
clearly states that those are nested within functions. To support
nesting of conditionals inside functions we need to store the
nesting data inside function contexts and keep a stack of those.
Furthermore we make sure that if nesting for subroutines is deeper
than 32 then we simply ignore all subsequent 'call' invocations.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-02-03 13:29:14 -05:00
Brian Paul c20b48c48e gallivm: add a few const qualifiers
Trivial.
2014-02-02 06:52:36 -07:00
José Fonseca 2eddf91faf gallivm: Workaround http://llvm.org/PR18600
We have code generation paths that carry out swizzles of AoS vectors via
bitwise shifts, as these tend to generate more efficient code than
straightforward byte shuffles.  But when the input is a constant the
additional bitwise arithmetic operations somehow don't really get
constant propagated properly, evenutally causing assertion failure in
InstCombine pass.

Therefore avoid the bug by using the trivial shuffles for constant
inputs.

Although the sample LLVM IR can cause a crash with any LLVM version,
this was only seen in practice with LLVM 3.2.

Reviewed-by: Matthew McClure <mcclurem@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-01-28 14:27:27 +00:00
José Fonseca 8771285054 s/Tungsten Graphics/VMware/
Tungsten Graphics Inc. was acquired by VMware Inc. in 2008.  Leaving the
old copyright name is creating unnecessary confusion, hence this change.

This was the sed script I used:

    $ cat tg2vmw.sed
    # Run as:
    #
    #   git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed
    #

    # Rename copyrights
    s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g
    /Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./
    s/TUNGSTEN GRAPHICS/VMWARE/g

    # Rename emails
    s/alanh@tungstengraphics.com/alanh@vmware.com/
    s/jens@tungstengraphics.com/jowen@vmware.com/g
    s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/
    s/jrfonseca\?@tungstengraphics.com/jfonseca@vmware.com/g
    s/keithw\?@tungstengraphics.com/keithw@vmware.com/g
    s/michel@tungstengraphics.com/daenzer@vmware.com/g
    s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/
    s/zack@tungstengraphics.com/zackr@vmware.com/

    # Remove dead links
    s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g

    # C string src/gallium/state_trackers/vega/api_misc.c
    s/"Tungsten Graphics, Inc"/"VMware, Inc"/

Reviewed-by: Brian Paul <brianp@vmware.com>
2014-01-17 20:00:32 +00:00
Zack Rusin 93b953d139 llvmpipe: do constant buffer bounds checking in shaders
It's possible to bind a smaller buffer as a constant buffer, than
what the shader actually uses/requires. This could cause nasty
crashes. This patch adds the architecture to pass the maximum
allowable constant buffer index to the jit to let it make
sure that the constant buffer indices are always within bounds.
The behavior follows the d3d10 spec, which says the overflow
should always return all zeros, and overflow is only defined
as access beyond the size of the currently bound buffer. Accesses
beyond the declared shader constant register size are not
considered an overflow and expected to return garbage but consistent
garbage (we follow the behavior which some wlk tests expect which
is to return the actual values from the bound buffer).

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2014-01-16 16:33:57 -05:00
Roland Scheidegger 27d47bd42f gallivm: fix pointer type for stmxcsr/ldmxcsr
The argument is a i8 pointer not a i32 pointer (even though the value actually
stored/loaded IS i32). Older llvm versions didn't care but 3.2 and newer do
leading to crashes.

Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-12-14 17:11:03 +01:00
Zack Rusin 155139059b llvmpipe: fix blending with half-float formats
The fact that we flush denorms to zero breaks our half-float
conversion and blending. This patches enables denorms for
blending. It's a little tricky due to the llvm bug that makes
it incorrectly reorder the mxcsr intrinsics:
http://llvm.org/bugs/show_bug.cgi?id=6393

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
2013-12-10 16:39:48 -05:00
Roland Scheidegger 2983c039df gallium: new shader cap bit for the amount of sampler views
Ever since introducing separate sampler and sampler view max this was really
missing.
Every driver but llvmpipe reports the same number as number of samplers for
now, so nothing should break.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-11-28 04:02:18 +01:00
Vinson Lee 7f56780915 gallivm: Ignore unknown file type in non-debug builds.
Fixes "Uninitialized pointer read" defect reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-11-20 22:35:36 -08:00
Si Chen e7a5905d8a gallivm: Fix mask calculation for emit_kill_if.
The exec_mask must be taken in consideration, just like emit_kill above.

The tgsi_exec module has the same bug and should be fixed in a future
change.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-11-19 19:16:18 +00:00
José Fonseca a29e40a423 gallivm: Compile flag to debug TGSI execution through printfs.
It is similar to tgsi_exec.c's DEBUG_EXECUTION compile flag.

I had prototyped this for a while while debugging an issue, but finally
cleaned this up and added a few more bells and whistles.

v2: Use '$' as marker; better output. Thanks to Brian, Zack and Roland
reviews.

Here is a sample output.

    CONST[0].x =  0.00625000009 0.00625000009 0.00625000009 0.00625000009
    CONST[0].y =  -0.00714285718 -0.00714285718 -0.00714285718 -0.00714285718
    CONST[0].z =  -1 -1 -1 -1
    CONST[0].w =  1 1 1 1
    IN[0].x =  143.5 175.5 175.5 143.5
    IN[0].y =  123.5 123.5 155.5 155.5
    IN[0].z =  0 0 0 0
    IN[0].w =  1 1 1 1
$   1: RCP TEMP[0].w, IN[0].wwww
    TEMP[0].w =  1 1 1 1
$   2: MAD TEMP[0].xy, IN[0], CONST[0], CONST[0].zwzw
    TEMP[0].x =  -0.103124976 0.0968750715 0.0968750715 -0.103124976
    TEMP[0].y =  0.117857158 0.117857158 -0.110714316 -0.110714316
$   3: MUL OUT[0].xy, TEMP[0], TEMP[0].wwww
    OUT[0].x =  -0.103124976 0.0968750715 0.0968750715 -0.103124976
    OUT[0].y =  0.117857158 0.117857158 -0.110714316 -0.110714316
$   4: MUL OUT[0].z, IN[0].zzzz, TEMP[0].wwww
    OUT[0].z =  0 0 0 0
$   5: MOV OUT[0].w, TEMP[0]
    OUT[0].w =  1 1 1 1
$   6: END
    OUT[0].x =  -0.103124976 0.0968750715 0.0968750715 -0.103124976
    OUT[0].y =  0.117857158 0.117857158 -0.110714316 -0.110714316
    OUT[0].z =  0 0 0 0
    OUT[0].w =  1 1 1 1
2013-11-14 14:04:28 +00:00
Roland Scheidegger 754319490f gallivm,llvmpipe: fix float->srgb conversion to handle NaNs
d3d10 requires us to convert NaNs to zero for any float->int conversion.
We don't really do that but mostly seems to work. In particular I suspect the
very common float->unorm8 path only really passes because it relies on sse2
pack intrinsics which just happen to work by luck for NaNs (float->int
conversion in hw gives integer indeterminate value, which just happens to be
-0x80000000 hence gets converted to zero in the end after pack intrinsics).
However, float->srgb didn't get so lucky, because we need to clamp before
blending and clamping resulted in NaN behavior being undefined (and actually
got converted to 1.0 by clamping with sse2). Fix this by using a zero/one clamp
with defined nan behavior as we can handle the NaN for free this way.
I suspect there's more bugs lurking in this area (e.g. converting floats to
snorm) as we don't really use defined NaN behavior everywhere but this seems
to be good enough.
While here respecify nan behavior modes a bit, in particular the return_second
mode didn't really do what we wanted. From the caller's perspective, we really
wanted to say we need the non-nan result, but we already know the second arg
isn't a NaN. So we use this now instead, which means that cpu architectures
which actually implement min/max by always returning non-nan (that is adhering
to ieee754-2008 rules) don't need to bend over backwards for nothing.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-11-14 12:24:55 +00:00
Roland Scheidegger ea1f7d2894 gallivm: deduplicate some indirect register address code
There's only one minor functional change, for immediates the pixel offsets
are no longer added since the values are all the same for all elements in
any case (it might be better if those weren't stored as soa vectors in the
first place maybe).

Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-11-08 03:38:32 +01:00
Roland Scheidegger b35ea09349 gallivm: fix indirect addressing of inputs
We weren't adding the soa offsets when constructing the indices
for the gather functions. That meant that we were always returning
the data in the first element.
(Copied straight from the same fix for temps.)
While here fix up a couple of broken comments in the fetch functions,
plus don't name a straight float type float4 which is just confusing.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-11-06 18:20:54 +01:00
Roland Scheidegger 5ae31d7e1d gallivm: optimize lp_build_minify for sse
SSE can't handle true vector shifts (with variable shift count),
so llvm is turning them into a mess of extracts, scalar shifts and inserts.
It is however possible to emulate them in lp_build_minify with float muls,
which should be way faster (saves over 20 instructions per 8-wide
lp_build_minify). This wouldn't work for "generic" 32bit shifts though
since we've got only 24bits of mantissa (actually for left shifts it would
work by using sse41 int mul instead of float mul but not for right shifts).
Note that this has very limited scope for now, since this is only used with
per-pixel lod (otherwise we're avoiding the non-constant shift count by doing
per-quad shifts manually), and only 1d textures even then (though the latter
should change).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-11-05 23:32:24 +01:00
Vinson Lee 749cb89097 gallivm: Remove llvm::DisablePrettyStackTrace for LLVM >= 3.4.
LLVM 3.4 r193971 removed llvm::DisablePrettyStackTrace and made the
pretty stack trace opt-in rather than opt-out.

The default value of DisablePrettyStackTrace has changed to true in LLVM
3.4 and newer.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60929
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-11-04 18:22:04 -08:00
Roland Scheidegger 2b2fc03beb gallivm: implement fully accurate corner filtering for seamless cube maps
d3d10 requires that cube corners are filtered with accurate weights (that
is, the weight of the non-existing corner texel should be evenly distributed
to the other 3 texels). OpenGL does not require this (but recommends it).
This requires us to use different filtering code, since we need per-texel
weights which our 2d lerp doesn't (and can't) do. And of course the (now
per element) weights need to be adjusted too for it to work.
Invoke the new filtering code whenever there's an edge to keep things simpler,
as it will work for edges too not just corners but of course it's only needed
with corners.
More ugly code for not much gain but at least a hacked up cubemap demo
shows very nice corners now... Not sure yet if and how this should be
configurable...

v2: incorporate feedback from Jose, only use special corner filtering code
when there's a corner not when there's only an edge (as corner filtering code
is slower, though a perf difference was only measureable when always
forcing edge code). Plus some minor style fixes.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-10-25 01:29:14 +02:00
Roland Scheidegger 3bdd1074e1 gallivm: implement seamless cube filtering
For seamless cube filtering it is necessary to determine new faces and new
coords per sample. The logic for this is _seriously_ complex (what needs
to happen is very "asymmetric" wrt face, x/y under/overflow), further
complicated by the fact that if the 4 samples are in a corner (meaning we
only have actually 3 samples, and all 3 are on different faces) then
falling off the edge is happening _both_ on x and y axis simultaneously.
There was a noticeable performance hit in mesa's cubemap demo when seamless
filtering was forced on (just below 10 percent or so in a debug build, when
disabling all filtering hacks, otherwise it would probably be a bit more) and
when always doing the logic, hence use a branch which it only does it if any
of the pixels in a quad (or in two quads) actually hit this. With that there
was no measurable performance hit in the cubemap demo (neither in a debug nor
release buidl), but this will vary (cubemap demo very rarely hits edges).
Might also be different on other cpus, as this forces SoA sampling path which
potentially can be quite a bit slower.
Note that as for corners, this code gets all the 3 samples which actually
exist right, and the 4th texel will simply be the same as one of the others,
meaning that filter weights will be a bit wrong. This however should be
enough for full OpenGL (but not d3d10) compliance.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-10-21 15:42:04 +02:00
Roland Scheidegger 9b3dbaf396 gallivm: kill old per-quad face selection code
Not used since ages, and it wouldn't work at all with explicit derivatives now
(not that it did before as it ignored them but now the code would just use
the derivs pre-projected which would be quite random numbers).

v2: also get rid of 3 helper functions no longer used.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-10-10 04:32:57 +02:00
Roland Scheidegger 47d0613eb7 gallivm: handle explicit derivatives for cubemaps
They need some special handling. Quite complicated.
Additionally, use the same code for implicit derivatives too if no_rho_approx
and no_quad_lod is set, because it seems while generally it should be ok
to use per quad lod for implicit derivatives there's at least some test which
insists that in case of cubemaps the shared lod value MUST come from a pixel
inside the primitive (due to the derivatives becoming different if a different
larger major axis is chosen).

v2: based on Brian's feedback, clean up code a bit.
And use sign bit of major axis instead of pre-select s/t/r sign for coord
mirroring (which should be the same in the end, saves 2 ands).
Also fix two bugs with select/mirror of derivatives, the minor axes need to
use major axis sign as well (instead of major derivative axis sign), and
don't mistakenly use absolute values of major derivative and inverse major
values.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-10-10 04:32:57 +02:00
Roland Scheidegger ce1d8634aa gallivm: ignore rho approximation for cube maps
There's two reasons for this:
1) even when ignoring rho approximation for cube maps, the result is still
not correct, but it's better as the max error at edges is now sqrt(2) instead
of 2 (which was a full mip level), same as it is for ordinary 2d maps when
doing rho approximations (so the error actually goes from factor 2 at edges and
sqrt(2) completely inside a face to sqrt(2) at edges and 0 inside a face).
2) I want to repurpose rho_no_approx for cubemaps for fully correct cubemap
derivatives (so don't need yet another debug var).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-10-10 04:32:57 +02:00
Zack Rusin 87fe4a33d3 llvmpipe: implement 64 bit mul opcodes in llvmpipe
Both the imul_hi and umul_hi are working with this patch.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-10-09 18:30:27 -04:00
Zack Rusin c01c6a95b4 gallivm: support printing of 64 bit integers
only 8 and 32 bit integers were supported before.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-10-09 18:29:05 -04:00
Roland Scheidegger 532dc8939f gallivm: adjust wrap mode to CLAMP_TO_EDGE always for cube maps.
Technically without seamless filtering enabled GL allows any wrap mode, which
made sense when supporting true borders (can get seamless effect with border
and CLAMP_TO_BORDER), but gallium doesn't support borders and d3d9 requires
wrap modes to be ignored and it's a pain to fix up the sampler state (as it
makes it texture dependent). It is difficult to imagine a situation where an
app really wants another behavior so just cheat here. (It looks like some
graphics hw (intel) actually requires this too hence it should be safe.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-09-19 17:14:36 +02:00
Roland Scheidegger 93b5f71179 gallivm: some bits of seamless cube filtering implementation
Simply adjust wrap mode to clamp_to_edge. This is all that's needed for a
correct implementation for nearest filtering, and it's way better than
using repeat wrap for instance for linear filtering (though obviously this
doesn't actually do seamless filtering).

v2: fix s/t wrap not r/s...

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-09-18 00:00:37 +02:00
José Fonseca 315f8f17d0 llvmpipe: Remove the special path for TGSI_OPCODE_EXP.
It was wrong for EXP.y, as we clamped the source before computing the
fractional part, and this opcode should be rarely used, so it's not
worth the hassle.
2013-09-12 11:24:24 +01:00
Zack Rusin e9f1f6ab42 gallivm: support indirect registers on both dimensions
We support indirect addressing only on the vertex index, but some
shaders also use indirect addressing on attributes. This patch
adds support for indirect addressing on both dimensions inside
gs arrays.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-09-06 15:05:27 -04:00
Roland Scheidegger f37edb5e20 gallivm: handle unbound textures in texture sampling / texture queries
Turns out we don't need to do much extra work for detecting this case,
since we are guaranteed to get a empty static texture state in this case,
hence just rely on format being 0 and return all zero then.
Previously needed dummy textures (would just have crashed on format being 0
otherwise) which cannot return the correct result for size queries and when
sampling textures with wrap modes using border.
As a bonus should hugely increase performance when sampling unbound textures -
too bad it isn't a useful feature :-).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-08-30 23:20:03 +02:00
Roland Scheidegger 289faa7e23 gallivm: (trivial) don't pass sampler_unit variable down to filtering funcs
The only reason this was needed was because the fetch texel function had to
get the (dynamic) border color, but this is now done much earlier.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-30 23:20:03 +02:00
Roland Scheidegger 61add3cc3c gallivm: don't use AoS path if min/mag filter are different with multiple lods
Instead of enhancing the AoS path so it can deal with it, just use SoA. Fixing
AoS path wouldn't be all that difficult (use all the same logic as SoA) but
considered not worth it for now.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-30 23:20:03 +02:00
Roland Scheidegger a479f34025 gallivm: support per-pixel min/mag filter in SoA path
Since we can have per-pixel lod we should also honor the filter per-pixel
(in fact we didn't honor it per quad neither in the multiple quad case).
Do this by running the linear path and simply beating the weights into shape
(the sample with the higher weight is the one which should have been chosen
with nearest filtering hence adjust filter weight to 1.0/0.0 based on that).
If all pixels use nearest filter (either min and mag) then still run just a
nearest filter as this is way cheaper (probably around 4 times faster for 2d,
more for 3d case) and it should be relatively rare that pixels really need
different filtering. OTOH if all pixels would require linear don't do anything
special since the linear path with filter adjustments shouldn't really be all
that much more expensive than ordinary linear, and we think it's rare that
min/mag filters are configured differently so there doesn't seem much value
in trying to optimize this further.
This does not yet fix the AoS path (though currently AoS is only used for
single quads hence it could be considered less broken, just never honoring
per-pixel filter decision but doing it per quad).

v2: simplify code a bit (unify min linear and min nearest cases)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-30 02:16:45 +02:00
Roland Scheidegger 81cfcdbd87 gallivm: don't calculate square root of rho if we use accurate rho method
While a sqrt here and there shouldn't hurt much (depending on the cpu) it is
possible to completely omit it since rho is only used for calculating lod and
there log2(x) == 0.5*log2(x^2). Depending on the exact path taken for
calculating lod this means we get a simple mul instead of sqrt (in case of
nearest mip filter in fact we don't need to replace the sqrt with something
else at all), only in some not very useful path this doesn't work (combined
brilinear calculation of int level and fractional lod, accurate rho calc but
brilinear filtering seems odd).
Apart from being faster as an added bonus this should increase our crappy
fractional accuracy of lod, since fast_log2 is only good for ~3bits and this
should increase accuracy by one bit (though not used if dimension is just one
as we'd need an extra mul there as we never had the squared rho in the first
place).

v2: use separate ilog2_sqrt function if we have squared rho.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-30 02:16:45 +02:00
Roland Scheidegger 10e40ad11d gallivm: refactor num_lods handling
This is just preparation for per-pixel (or per-quad in case of multiple quads)
min/mag filter since some assumptions about number of miplevels being equal
to number of lods no longer holds true.
This change does not change behavior yet (though theoretically when forcing
per-element path it might be slower with different min/mag filter since the
code will respect this setting even when there's no mip maps now in this case,
so some lod calcs will be done per-element just ultimately still the same
filter used for all pixels).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-30 02:16:45 +02:00
Roland Scheidegger ad9b5b9ae9 gallivm: fix min/mag switchover point for nearest/none mip filter
Previously, the min/mag switchover point when using nearest/none mip
filter was effectively -0.5 which can't be right. Looks like new OpenGL
thinks it's ok if it's always 0.0 (older versions required 0.5 in some
cases), let's hope everybody else thinks that's fine too.
Refactor this slightly and get the per-quad/per-pixel min/mag decision
values further down to sampling, though still only the first component
is used yet.
While here also fix code trying to skip lod bias application etc. when
mipfilter is none, as this is still needed for determining min/mag filter.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-23 23:46:28 +02:00
Roland Scheidegger bd0b6c5180 gallivm: do per-element lod for lod bias and explicit derivs too
Except for explicit derivs with cube maps which are very bogus anyway.
Just like explicit lod this is only used if no_quad_lod is set in
GALLIVM_DEBUG env var.
Minification is terrible on cpus which don't support true vector shifts
(but should work correctly). Cannot do the min/mag filter decision (if
they are different) per pixel though, only selecting different mip levels
works.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-22 19:05:52 +02:00
Roland Scheidegger 33694a1800 gallivm: (trivial) fix int/uint border color clamping
Just a copy & paste error.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=68409.
Note that the test passing before probably simply means it doesn't verify
clamping of the border color itself as required by the OpenGL spec.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-22 19:05:52 +02:00
Roland Scheidegger 6ff9008544 gallivm: (trivial) fix linear aos sampling of 3d compressed formats
block size depth is always 1 even for compressed formats (unless someone
invents true 3d compressed formats at least which we can't represent).
Nearest (and soa) path had it right.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-22 19:05:52 +02:00
José Fonseca fb62388d6a gallium: Support PIPE_FORMAT_R10G10B10A2_UINT.
Same as PIPE_FORMAT_B10G10R10A2_UINT but without the swizzling.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-08-22 12:14:15 +01:00
Roland Scheidegger e6013e4bee gallivm: unify sin and cos implementation
The (complicated!) math is all identical, there's just minimal differences how
sign bit is calculated plus there's an additional subtraction for the argument
going into the polynomial for cos.
The logic stays 100% the same (with a small exception, sign bit calculation for
sin is minimally simplified, applying sign mask after xoring the arguments
instead of applying it to each argument).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-21 22:05:53 +02:00
Roland Scheidegger 275d2efeed gallivm: add comment for bogus min/mag filter selection with nearest mip filter
Detected this hunting some other bug, not sure if it really needs fixing but
it is definitely wrong.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-21 22:05:52 +02:00
Roland Scheidegger 21d8fa2759 gallivm: fix rho calculation for 1d case
Was using wrong (undefined) vector element (the elements are at 0/2 position,
not 0/1).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-21 22:05:52 +02:00
Roland Scheidegger 4b45b61fef util: add avx2 and xop detection to cpu detection code
Going to need this soon (not going to bother with avx2 intrinsics at this time
but don't want to do workarounds for true vector shifts if llvm itself can use
them just fine and won't need the gazillion instruction emulation).
Not really tested other than my cpu returns 0 for these features...
(I have no idea if llvm actually would emit avx2/xop instructions neither...)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-20 23:00:24 +02:00
Roland Scheidegger 9299128bf2 gallivm: fix bogus aos path detection
Need to check the wrap mode of the actually used coords not a fixed 2.
While checking more than necessary would only potentially disable aos and
not cause any harm I'm pretty sure for 3d textures it could have caused
assertion failures (if s,t coords have simple filter and r not).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-20 23:00:24 +02:00
Roland Scheidegger fe92d7fab4 gallivm: do clamping of border color correctly for all formats
Turns out it is actually very complicated to figure out what a format really
is wrt range, as using channel information for determining unorm/snorm etc.
doesn't work for a bunch of cases - namely compressed, subsampled, other.
Also while here add clamping for uint/sint as well - d3d10 doesn't actually
need this (can only use ld with these formats hence no border) and we could
do this outside the shader for GL easily (due to the fixed texture/sampler
relation) do it here too just so I can forget about it.

v2: move border color clamping out of fetch texel. Also change it to clamp
the whole border vector at once (and use vectorized load of border color),
which saves a couple of instructions - needs some different handling of
mixed signed/unsigned formats so skip the per channel stuff and just derive
this from first channel except for special formats.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-20 23:00:24 +02:00
Roland Scheidegger ac1a2714c7 gallivm: implement better control of per-quad/per-element/scalar lod
There's a new debug value used to disable per-quad lod optimizations
in fragment shader (ignored for vs/gs as the results are just too wrong
typically). Also trying to detect if a supplied lod value is really a
scalar (if it's coming from immediate or constant file) in which case
sampler code can use this to stay on per-quad-lod path (in fact for
explicit lod could simplify even further and use same lod for both
quads in the avx case but this is not implemented yet).
Still need to actually implement per-element lod bias (and derivatives),
and need to handle per-element lod in size queries.

v2: fix comments, prettify.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-08-20 23:00:24 +02:00
Zack Rusin 7115bc3940 draw: handle nan clipdistance
If clipdistance for one of the vertices is nan (or inf) then the
entire primitive should be discarded.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-08-15 16:26:32 -04:00
Roland Scheidegger 6ca18e06ae gallivm: revert accidentally commited hunk
That magic wasn't meant to be commited, need to work on some proper fix.
2013-08-15 19:26:39 +02:00
Roland Scheidegger 5626a84a00 gallivm: do per-sample depth comparison instead of doing it post-filter
Doing the comparisons pre-filter is highly recommended by OpenGL (and d3d9)
and definitely required by d3d10.
This actually doesn't do it pre-filter but more "in-filter" as otherwise
need to push the comparisons even further down into fetch code and this
also trivially allows using a somewhat cheaper lerp.
Doing it pre-filter would actually have some performance advantage for UNORM
formats (because the comparisons should be done in texture format, we'd only
need to convert the shadow ref coord to texture format once, but in turn would
save converting the per-sample texture values to floats) but this gets a bit
messy as this has implications for border color handling as well (which needs
to be done prior to depth comparisons, hence would also need to convert border
color to texture format too or use some other tricks like doing separate border
color / shadow ref comparison and simply using that result directly when doing
border replacement).
Should make no difference for nearest filtering, and performance for linear
filtering should be mostly the same too (essentially have one more comparison
instruction per sample, and replace the sub/mul/add lerp with a sub/and/and/add
special "lerp" which all in all shouldn't be much of a difference).

v2: get rid of old code completely

Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-08-15 18:42:20 +02:00
Roland Scheidegger e58c2310b8 gallivm: already pass coords in the right place in the sampler interface
This makes things a bit nicer, and more importantly it fixes an issue
where a "downgraded" array texture (due to view reduced to 1 layer and
addressed with (non-array) samplec instruction) would use the wrong
coord as shadow reference value. (This could also be fixed by passing
target through the sampler interface much the same way as is done for
size queries, might do this eventually anyway.)
And if we'd ever want to support (shadow) cube map arrays, we'd need
5 coords in any case.

v2: fix bugs (texel fetch using wrong layer coord for 1d, shadow tex
using wrong shadow coord for 2d...). Plus need to project the shadow
coord, and just for fun keep projecting the layer coord too.

Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-08-15 00:40:14 +02:00
Roland Scheidegger d4b43cedb6 gallivm: change coordinate handling throughout functions
Instead of passing s,t,r coordinates pass a coord array - the reason is that
I need to pass more coords (in particular for shadow "coord", future will also
need another one for cube map arrays) so just pass them as an array.
Also, to simplify things, use fixed location for the shadow reference value I
want to get rid of the silly "where is the right coord value" game.
Keep old-style however for aos sampling (which is not going to need shadow
coord, though for cube map arrays it still would need fixing).
(Next patch will pass those through using the new arrangement directly from
sampler interface.)

v2: fix up soa split path (unreachable currently but still...)

Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-08-15 00:40:14 +02:00
Roland Scheidegger c6c55ad3e9 gallivm: fix border color with normalized texture formats
We need to put border color into texture format color space which
essentially means clamping for non-float, normalized formats (not entirely
sure if we're also meant to quantize the float but it's probably ok not to
do it thankfully).
For OpenGL we could do this easily outside generated code due to the
1:1 sampler/texture correspondence but not for d3d10 which is terrible
(as we recalculate a constant over and over again per shader invocation).
Fortunately border color should be rare enough that we don't care THAT much.

Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-08-15 00:40:14 +02:00
Roland Scheidegger 6991f86945 gallivm: implement new float comparison instructions returning integer masks
FSEQ/FSGE/FSLT/FSNE work just the same as SEQ/SGE/SLT/SNE except skip the
select.
And just for consistency use the same appropriate ordered/unordered comparisons
for the old opcodes as well.

Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-08-13 19:09:17 +02:00