Not sure how I computed these, but they were wrong (which explains why
bumping the polynomial order before never improved precision).
This allows to pass the EXP test cases of PSPrecision/VSPrecision DCTs.
Add an iteration step, which makes rqsqrt precision go from 12bits to
24, and fixes RSQ/NRM test case of PSPrecision/VSPrevision DCTs.
There are no uses of this function outside shader translation.
This branch defines a gallivm_state structure which contains the
LLVMBuilderRef, LLVMContextRef, etc. All data structures built with
this object can be periodically freed during a "garbage collection"
operation.
The gallivm_state object has to be passed to most of the builder
functions where LLVMBuilderRef used to be used.
Conflicts:
src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
src/gallium/drivers/llvmpipe/lp_state_setup.c
sse2 supports round to nearest directly (or rather, assuming default nearest
rounding mode in MXCSR). Use intrinsic to use this rather than round (sse41)
or bit manipulation whenever possible.
This fixes the assert added in LLVM 2.8:
assert(getType()->isIntOrIntVectorTy() &&
"Tried to create an integer operation on a non-integer type!")
But it also fixes some subtle bugs, since we should've been doing this
since LLVM 2.6 anyway.
Includes a modified patch from steckdenis@yahoo.fr for the
FNeg instructions in emit_fetch(); thanks for pointing those out.
http://bugs.freedesktop.org/29404http://bugs.freedesktop.org/29407
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Plus fix minor error in lp_build_iceil() by tweaking the offset value.
And add a bunch of comments for the round(), trunc(), floor(), ceil()
functions.
Also start axing the code duplication for scalar case. The olution is to
treat the scalar case specially in a few innermost functions, and leave
outer functions untouched.
The advantage of range[-0.5, 0.5] is that it doesn't require floor (for
which intrinsics are only available in SSE4.1).
But the EXP opcode pretty much forces us to use floor, and there is a
good floor approximation around truncation available anyway.
This fixes EXP failures in VShader DCT.
Runtime linking doesn't quite work.
Just comment then out for now to prevent crashes. These will go away in
the future because calling 4 times CRT's cosf()/sinf() is over-precise
and under-performing.
LLVMBuildFPTrunc() should be used for double->float conversion, not
float->int conversion.
There should be a better way to compute floor(), ceil(), etc that doesn't
involve float->int->float conversion.
The LOD is computed from texcoord partial derivatives and used to
select a mipmap level. Still some bugs in texel fetching. Lots of
rough edges and unfinished parts but the basics are in place.
Lots of changes to the lp_bld_arit.c code to support non-vector/scalar
datatypes.