We can now just get the data needed from the gl_shader_program_data
pointer in gl_program.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Having it here rather than in gl_linked_shader allows us to simplify
the code.
Also it is error prone to depend on the gl_linked_shader for programs
in current use because a failed linking attempt will free infomation
about the current program. In i965 we could be trying to recompile
a shader variant but may have lost some required fields.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Here we also remove the duplicate field in gl_linked_shader and always
get the value from shader_info instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This will help allow us to store pointers to gl_program structs in the
CurrentProgram array resulting in a bunch of code simplifications.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This also removes the duplicate field in gl_linked_shader, and
gets num_ubos from shader_info instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Having it here rather than in gl_linked_shader allows us to simplify
the code.
Also it is error prone to depend on the gl_linked_shader for programs
in current use because a failed linking attempt will free infomation
about the current program. In i965 we could be trying to recompile
a shader variant but may have lost some required fields.
We drop the memset on ImageUnits because gl_program is already
created using rzalloc().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This removes another dependency on gl_shader_program from the codegen functions,
this will help allow us to use gl_program for the CurrentProgram array rather
than gl_shader_program.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Rather then passing gl_shader_program.
The only field use was Name which is the same as the Id field in
gl_program.
For wm and vs we also make the functions static and move them before
the codegen functions.
This change reduces the codegen functions dependency on gl_shader_program.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fix typo using wrong (uninitialized) build context introduced by
4634cb5921. (This only affects very rare
small packed formats which have a PIPE_SWIZZLE_0 channel, such as
r4a4, which is never used by mesa/st. Nevertheless it broke lp_test_format.)
Using bit replication. This path now resembles something which might make
sense. (The logic was mostly copied from llvmpipe fs backend.)
I am not convinced though it is actually faster than SoA sampling (actually
I'm quite certain it's always a loss with AVX).
With SoA it's just shift/mask/cvt/mul for getting the colors, whereas
there's still roughly 3 shifts, 3 or/and per channel for AoS
(i.e. for SoA it's exactly the same as it would be for a rgba8 format,
whereas the extra effort for AoS is significant). The filtering
might still be faster (albeit with FMA the instruction count gets down
quite a bit there on the SoA float filtering path on new cpus). And those
small unorm formats often don't have an alpha channel (which makes things
worse relatively for AoS path).
(This also fixes a trivial bug in the llvmpipe fs code this was derived
from, albeit it was only relevant for 4-bit channels.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This code uses a vector shift which has to be emulated on x86 unless
there's AVX2. Luckily in some cases we can actually avoid the shift
altogether, so do that.
Also make sure we hit the fast lp_build_conv() path when applicable,
albeit that's quite the hack...
That said, this path is taken for AoS sampling for small unorm (smaller
than rgba8) formats, and it is completely hopeless even with those
changes, with or without AVX.
(Probably should have some code similar to the one in the llvmpipe fs
backend code, using bit replication to extend to rgba8888 - rounding
is not quite 100% accurate but if it's good enough there it should be
here as well.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
If we only feed one source vector at a time, we cannot use pack intrinsics
(as we only have a 64bit destination dst vector). lp_bld_conv_auto is
specifically designed to alter the length and number of destination vectors,
so this works just fine (if we use single source vectors at a time, afterwards
we immediately reassemble the vectors).
For AVX though this isn't really possible, since we expect 128bit output
already for a single 256bit input. (One day we should handle AVX2 which again
would need multiple inputs, however there's the problem that we get different
ordered output there and we don't want to reorder, so would need to be able
to tell build_conv to handle upper and lower halfs independently.)
A similar strategy would probably work for 32->8bit too (if it doesn't hit
the special case) but I'm going to try something different for that...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
simd instruction sets usually have comparisons for equal, not unequal.
So use a different comparison against the mask itself - which also means
we don't need a all-zero as well as a all-one (for the pxor) reg.
Also add code to avoid scalar expansion of i1 values which we definitely
shouldn't do. There's problems with this though with llvm select
interaction, so it's disabled (basically using llvm select instead of
intrinsics may still produce atrocious code, even in cases where we
figured it should not, albeit I think this could probably be fixed
with some better selection of optimization passes, but I have zero
idea there really).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The specification section 9.4 says :
When an application attempts to create many pipelines in a single
command, it is possible that some subset may fail creation. In that
case, the corresponding entries in the pPipelines output array will
be filled with VK_NULL_HANDLE values. If any pipeline fails
creation (for example, due to out of memory errors), the
vkCreate*Pipelines commands will return an error code. The
implementation will attempt to create all pipelines, and only
return VK_NULL_HANDLE values for those that actually failed.
Fixes :
dEQP-VK.api.object_management.alloc_callback_fail_multiple.graphics_pipeline
dEQP-VK.api.object_management.alloc_callback_fail_multiple.compute_pipeline
v2: C is hard let's go shopping (Lionel)
v3: Remove unnecessary condition in for loops (Lionel)
v4: Document why we return on first failure (Eduardo)
Move i declaration inside for() (Eduardo)
v5: Move array cleanup out of loop (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In OpenGL 3.0 and later it is legal to make a context current without
a default framebuffer.
This has been broken since DRI3 support was introduced.
Cc: "13.0 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Draw calls no longer flush SDMA IBs. r600_need_dma_space is
responsible for synchronizing execution between both IBs.
Initial buffer clears and fast clears will stay unflushed in the SDMA IB
(up to 64 MB) as long as the GFX IB isn't flushed either.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Call r600_dma_emit_wait_idle only when there is a possibility of
a read-after-write hazard. Buffers not yet used by the SDMA IB don't
have to wait.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
r600_dma_emit_wait_idle is going away in its current form.
The only difference is that the moved code is executed before DMA calls
instead of after them.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This fixes the mistake introduced in commit
b6737a8bcd
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Old test caused breakage with llvm-svn (4.0.0svn), and not needed as
the minimum required llvm version for swr is 3.6.
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
wrap lp_bld_type.h around extern "C".
Windows decorates global variables, so when used from .cpp files, need
to use an undecorated version.
Also, removed related and unneeded code from swr_screen.cpp
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>