Commit Graph

345 Commits

Author SHA1 Message Date
Dave Airlie 67ec1760ee llvmpipe: add multisample bit to fragment shader key.
The fragment shader needs to be regenerated when multisample changes.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4122>
2020-05-06 06:20:37 +00:00
Dave Airlie a30db60ede llvmpipe: pass color and depth sample strides into fragment shader.
This just adds the interface and passes the depth and sample strides
into the fragment shader, nothing uses them yet.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4122>
2020-05-06 06:20:37 +00:00
Dave Airlie f6383673c9 llvmpipe: fix race between draw and setting fragment shader.
There is a race with u_blitter shaders + pipeline shaders (aaline/aapoint)
where the draw bind can cause a pipeline flush which can use bind_fs_state to
 be reenters and llvmpipe->fs gets the wrong value. Fix this by only
setting the llvmpipe->fs value after the draw binding is complete.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4122>
2020-05-06 06:20:37 +00:00
Dave Airlie 565df65651 llvmpipe: clamp color storage for integer types.
If storing to an integer for lower bit size (i.e. 16-bit uint to
10-bit uint), we need to clamp to the maximum value not truncate.

Fixes:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.r16_uint.a2b10g10r10_uint_pack32.optimal_optimal_nearest

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4574>
2020-04-27 12:35:24 +10:00
Dave Airlie befe2ff3a6 llvmpipe/nir: free the nir shader
Fixes: 18f896e55d (llvmpipe: add initial nir support)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4563>
2020-04-16 06:25:46 +10:00
Dave Airlie eb5227173f llvmpipe: add support for tessellation shaders
This adds the hooks between llvmpipe and draw to enable tessellation shaders.

It also updates the CI results and docs.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3841>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3841>
2020-02-28 18:33:34 +10:00
Dave Airlie e35b2c37cd llvmpipe/nir: handle texcoord requirements
Switch to using texcoord intrinsic support.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-12-12 09:16:24 +10:00
Dave Airlie f137672197 llvmpipe: disable occlusion queries when requested by state tracker
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-12-06 06:48:30 +10:00
Dave Airlie 502548a09c gallivm/llvmpipe: add support for front facing in sysval.
This wires up the front facing value as a sysval, I'd like to
remove the other facing code but I'd need to confirm VMware
don't use it first.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-12-03 15:29:04 +10:00
Dave Airlie 18f896e55d llvmpipe: add initial nir support
This adds the hooks between llvmpipe and the gallivm NIR
code, for compute and fragment shaders.

NIR support is hidden behind LP_DEBUG=nir for now until
all the intergration issues are solved

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-11-28 14:49:23 +10:00
Eric Anholt 882ca6dfb0 util: Move gallium's PIPE_FORMAT utils to /util/format/
To make PIPE_FORMATs usable from non-gallium parts of Mesa, I want to
move their helpers out of gallium.  Since u_format used
util_copy_rect(), I moved that in there, too.

I've put it in a separate directory in util/ because it's a big chunk
of related code, and it's not clear to me whether we might want it as
a separate library from libmesa_util at some point.

Closes: #1905
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-11-14 10:47:20 -08:00
Dylan Baker ee4f1bc187 util: rename PIPE_ARCH_*_ENDIAN to UTIL_ARCH_*_ENDIAN
As requested by Tim.

This was generated with:
grep 'PIPE_ARCH_.*_ENDIAN' -rIl | xargs sed -ie 's@PIPE_ARCH_\(.*\)_ENDIAN@UTIL_ARCH_\1_ENDIAN@'g

v2: - add this patch

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-11-05 16:39:55 +00:00
Dylan Baker f9f60da813 util/u_endian: set PIPE_ARCH_*_ENDIAN to 1
This will allow it to be used as a drop in replacement for
_mesa_little_endian in a number of cases.

v2: - Always define PIPE_ARCH_LITTLE_ENDIAN and PIPE_ARCH_BIG_ENDIAN,
      define the one that reflects the host system to 1 and the other to 0
    - replace all uses of #ifdef, #ifndef, and #if defined() with #if
      and #if ! with PIPE_ARCH_*_ENDIAN

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-11-05 16:39:55 +00:00
Dave Airlie 0b51e73de2 llvmpipe: add compute shader images support
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 45a8cf95f2 llvmpipe: add ssbo support to compute shaders
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 4ca40cc3dc llvmpipe: add support for compute constant buffers.
This is mostly ported from the fragment shader code.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-09-04 15:22:20 +10:00
Dave Airlie 2909c654b0 llvmpipe: add fragment shader image support
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-08-27 12:30:04 +10:00
Dave Airlie 3c2c232059 llvmpipe: move the fragment shader variant key to dynamic length.
This mirrors the vs/gs keys, and will be needed when adding images
support.

The const changes also mirror how the draw code work (as is needed
when we add images)

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-08-27 12:29:42 +10:00
Dave Airlie cf84b46a1c llvmpipe: handle early test property.
Also handle setting late for shaders that use stores

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-08-27 12:29:33 +10:00
Dave Airlie 16fcbb2eba gallium: fix windows build from params change.
This is why we can't have nice things. I'm sure there's someway
to do this with {0} but I really don't have time for that.

Fixes: 2631fd3b0b ("gallivm: rework lp_build_tgsi_soa to take a struct")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-25 10:02:22 +10:00
Dave Airlie 2631fd3b0b gallivm: rework lp_build_tgsi_soa to take a struct
The parameters were getting messy and I have to add a few more
for compute shaders, so clean it up before proceeding.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-24 09:20:09 +10:00
Eric Engestrom dffeaa55dd util: use standard name for snprintf()
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-19 22:39:38 +01:00
Dave Airlie df46b3d196 llvmpipe: add support for shader buffer binding.
This add support for setting shader buffers and passing them
to draw or binding them to the fragment shader jit.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-07 16:24:12 +10:00
Dave Airlie e21007f426 llvmpipe: add support for ssbo to the fragment shader jit.
This just adds the ssbo ptrs to the jit fragment shader api.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-07 16:23:51 +10:00
Dave Airlie 5ff697aa65 gallivm: add ssbo pointers to the soa build api.
Need to pass ssbo + ssbo size pointers just like constants.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-07-07 16:23:36 +10:00
Dave Airlie 00a56acc23 llvmpipe: make remove_shader_variant static.
this isn't used outside this file.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-06-21 10:27:57 +10:00
Marek Olšák daa19363de gallium: split depth_clip into depth_clip_near & depth_clip_far
for AMD_depth_clamp_separate.
2018-09-06 21:53:00 -04:00
Roland Scheidegger 7b89fcec41 llvmpipe: improve rasterization discard logic
This unifies the explicit rasterization discard as well as the implicit
rasterization disabled logic (which we need for another state tracker),
which really should do the exact same thing.
We'll now toss out the prims early on in setup with (implicit or
explicit) discard, rather than do setup and binning with them, which
was entirely pointless.
(We should eventually get rid of implicit discard, which should also
enable us to discard stuff already in draw, hence draw would be
able to skip the pointless clip and fallback stages in this case.)
We still need separate logic for only null ps - this is not the same
as rasterization discard. But simplify the logic there and don't count
primitives simply when there's an empty fs, regardless of depth/stencil
tests, which seems perfectly acceptable by d3d10.
While here, also fix statistics for primitives if face culling is
enabled.
No piglit changes.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2018-05-23 04:23:32 +02:00
Brian Paul 42aee8f4f6 llvmpipe: fix check for a no-op shader
The tgsi_info.num_tokens fix broke llvmpipe's detection of no-op shaders.
Fix the code to check for num_instructions <= 1 instead.

Fixes: 8fde9429c3 ("tgsi: fix incorrect tgsi_shader_info::num_tokens
computation")
Tested-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-05-18 09:09:41 -06:00
Ian Romanick d76c204d05 util: Move util_is_power_of_two to bitscan.h and rename to util_is_power_of_two_or_zero
The new name make the zero-input behavior more obvious.  The next
patch adds a new function with different zero-input behavior.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2018-03-29 14:09:23 -07:00
Roland Scheidegger 06e724c7b4 tgsi/scan: use wrap-around shift behavior explicitly for file_mask
The comment said it will only represent the lowest 32 regs. This was
not entirely true in practice, since at least on x86 you'll get
masked shifts (unless the compiler could recognize it already and toss
it out). It turns out this actually works out alright (presumably
noone uses it for temp regs) when increasing max sampler views, so
make that behavior explicit.
Albeit it feels a bit hacky (but in any case, explicit behavior there
is better than undefined behavior).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2018-03-06 05:18:17 +01:00
Brian Paul 7a044ef68b gallivm/llvmpipe: add const qualifiers on sampler variables
Once a lp_build_sampler_soa or lp_build_sampler_aos object is created,
it should never be modified.  Found by inspection.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2018-02-01 14:19:58 -07:00
Nicolai Hähnle 222a2fb998 util: move os_time.[ch] to src/util
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:57:21 +01:00
Roland Scheidegger 3d0deed12a llvmpipe: handle shader sample mask output
This probably isn't all that useful for GL, but there are apis where
sample_mask is a valid output even without msaa.
Just discard the pixel if the sample_mask doesn't include the bit for
sample 0.

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-10-18 18:16:44 +02:00
Brian Paul 33122e8a3d llvmpipe: silence 'variable may be used uninitialized' warnings
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2017-10-03 14:33:00 -06:00
Roland Scheidegger 57a341b0a9 llvmpipe, draw: improve shader cache debugging
With GALLIVM_DEBUG=perf set, output the relevant stats for shader cache usage
whenever we have to evict shader variants.
Also add some output when shaders are deleted (but not with the perf setting
to keep this one less noisy).
While here, also don't delete that many shaders when we have to evict. For fs,
there's potentially some cost if we have to evict due to the required flush,
however certainly shader recompiles have a high cost too so I don't think
evicting one quarter of the cache size makes sense (and, if we're evicting
based on IR count, we probably typically evict only very few or just one
shader too). For vs, I'm not sure it even makes sense to evict more than
one shader at a time, but keep the logic the same for now.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2017-09-09 03:06:10 +02:00
Nicolai Hähnle 16923e42a4 gallium: rename util_dump_* to util_str_* for enum-to-string conversion
This is mostly mechanical search-and-replace, plus touching up the
macros in u_dump_defines.c manually a bit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-02 09:46:24 +02:00
Brian Paul 2b9ab605aa gallium: s/uint/enum pipe_shader_type/ for set_constant_buffer()
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-08 08:50:20 -07:00
Roland Scheidegger f4821daed1 llvmpipe: do transpose/untwiddle after conversion for 8bit formats
Generally we should do tranpose after conversion, if the format has less than
32 bits per channel (if it has 32 bits, conversion is going to be a no-op
anyway...). This is obviously because there's less vectors to deal with.
Though the advantage for 16 bit formats isn't that big, and in fact with AVX
there isn't really any (as the 32bit unpacks can be done with 256bit, but
the smaller ones cannot, although that would change again with proper AVX2
support).
Only makes sense for 2d and not 1d cases. And to keep things easy, only handle
1,2 and 4 channels (rgbx is just fine).
For rgba unorm8 format the backend conversion sums up to these instruction
totals (not counting the movs for SSE2 due to 2-op syntax - generally every 2
unpacks need an additional mov).
                     SSE2                    AVX
transpose:           32 unpack               16 unpack
untwiddle:           0                       8 (128bit low/high permutes)
convert:             16 mul + 16 cvt         8 mul + 8 cvt
32->8bit:            12 pack                 8 (128bit extract) + 12 pack

When doing transpose/untwiddle afterwards we get:
convert:             16 mul + 16 cvt         8 mul + 8 cvt
32->8bit:            12 pack                 8 (128bit extract) + 12 pack
transpose/untwiddle  12 unpack               12 unpack

So for SSE2, this drops 20 unpacks (total instruction count 76->56)
whereas for AVX it replaces the 16 256bit unpacks with 8 128bit ones
and drops the 8 lo/hi permutes (in total 60->48). (Albeit to be fair,
the permutes could be dropped even when doing the transpose first,
they are extremely pointless but we'd need to be able to tell
lp_build_conv to reorder the vectors, for AVX2 we're going to need to
be able to tell lp_build_conv about ordering in any case.)

(With different ordering going into conversion, it would be possible
to do 4 unpacks + 4 pshufbs instead of 12 unpacks, but that might not
be better, and not all cpus can do it. Proper AVX2 support should eliminate
the 8 128bit extracts, reduce these 12 packs to 6 and the 12 unpacks to 2
pshufb + 2 permq ideally (+ 2 final 128bit extracts).)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-06 23:13:34 +01:00
Roland Scheidegger 04480a04b1 llvmpipe: use alpha from already converted color if possible
For rgbx formats, there is no point in doing alpha conversion again (and
with different tranpose even, so llvm can't eliminate it).
Albeit it looks like there's some minimal changes needed in the blend code
(found by code inspection, no test seemed to complain) if we do this -
the blend factors are already sanitized if we have no destination alpha,
however for src_alpha_saturate it looks like it still might make a
difference (note that we forced has_alpha to true before for some formats
and nothing complained, but this seems safer).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-06 23:13:34 +01:00
Roland Scheidegger 53c2d24a24 llvmpipe: use scalar load instead of vectors for small vectors in fs backend
llvm has _huge_ problems trying to load things like <4 x i8> vectors and
stitching such loads together to form 128bit vectors. My understanding
of the problem is that the type legalizer tries to extend that to
really a <4 x i32> vector and not a <16 x i8> vector with the 4 elements
first then followed by padding, so the shuffles for then combining things
together are more or less impossible - you can in fact see the pmovzxd
llvm generates. Pre-4.0 llvm just gives up on it completely and does a 30+
pextrb/pinsrb sequence instead.
It looks like current llvm has fixed this behavior (my guess would be
due to better shuffle combination and load/shuffle folds), but we can
avoid this by just loading as <1 x i32> values, combine that and only
cast at the end. (I suspect it might also work if we'd pad the loaded
vectors immediately before shuffling them together, instead of directly
stitching 2 such vectors together pairwise before combining the pair.
But this _might_ lose the ability to load the values directly into
their right place in the vector with pinsrd.). But using 32bit values
is probably easier for llvm as it will never give it funny ideas how
the vector should look like.
(This is possibly only a problem for 1x8bit formats, since 2x8bit will
end up fetching 64bit hence only two vectors are stitched together,
not 4, but we use the same strategy anyway.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-06 23:13:34 +01:00
Roland Scheidegger 4634cb5921 gallivm: implement aos unpack (to unorm8) for small unorm formats
Using bit replication. This path now resembles something which might make
sense. (The logic was mostly copied from llvmpipe fs backend.)
I am not convinced though it is actually faster than SoA sampling (actually
I'm quite certain it's always a loss with AVX).
With SoA it's just shift/mask/cvt/mul for getting the colors, whereas
there's still roughly 3 shifts, 3 or/and per channel for AoS
(i.e. for SoA it's exactly the same as it would be for a rgba8 format,
whereas the extra effort for AoS is significant). The filtering
might still be faster (albeit with FMA the instruction count gets down
quite a bit there on the SoA float filtering path on new cpus). And those
small unorm formats often don't have an alpha channel (which makes things
worse relatively for AoS path).
(This also fixes a trivial bug in the llvmpipe fs code this was derived
from, albeit it was only relevant for 4-bit channels.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-05 23:59:38 +01:00
Roland Scheidegger db7e786a25 llvmpipe: (trivial) minimally simplify mask construction
simd instruction sets usually have comparisons for equal, not unequal.
So use a different comparison against the mask itself - which also means
we don't need a all-zero as well as a all-one (for the pxor) reg.

Also add code to avoid scalar expansion of i1 values which we definitely
shouldn't do. There's problems with this though with llvm select
interaction, so it's disabled (basically using llvm select instead of
intrinsics may still produce atrocious code, even in cases where we
figured it should not, albeit I think this could probably be fixed
with some better selection of optimization passes, but I have zero
idea there really).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-05 23:59:38 +01:00
Aaron Watry 1492633070 llvmpipe: Fix build after removal of deprecated attribute API v2
Applies on top of v3 of Tom's gallivm change.

v2:
  - Tom Stellard: Use enums instread of strings.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>
CC: Tom Stellard <thomas.stellard@amd.com>
CC: Jan Vesely <jan.vesely@rutgers.edu>
2016-11-09 20:13:27 +00:00
Roland Scheidegger 0849621891 llvmpipe: fix issues with depth clamp
We only did depth clamp when the value was written from the fs.
This is very wrong both for d3d10 and GL, and only passed the
corresponding piglit test due to pure luck (it no longer does
with the enhanced test).
Also, interpolation clamped values to 1.0 always, which can legitimately
happen if depth clip is disabled, so fix that as well (untested).
There is one unresolved issue left, d3d10 always does depth clamping,
whereas GL does not (but does [0,1] clamp instead for fs depth outputs)
- this information isn't in any gallium state object, leave it as-is
for now (though it looks like llvmpipe misses the [0,1] clamp as well).
This (with the previous patch) fixes piglit depth-clamp-range test.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-08-20 04:05:33 +02:00
Rob Clark ef534b9389 gallium: make constant_buffer const
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-20 12:36:20 -04:00
Roland Scheidegger f4184d5450 llvmpipe: hack-fix bugs due to bogus bind flags
The gallium contract would be that bind flags must indicate all possible
bindings a resource might get used, but fact is the mesa state tracker does
not set bind flags correctly, and this is more or less unfixable due to GL.

This caused a bug with piglit arb_uniform_buffer_object-rendering-dsa
since 6e6fd911da - the commit is correct,
but it caused us to miss updates to fs UBOs completely, since the
corresponding buffer didn't have the appropriate bind flag set (thus we
wouldn't check if it is indeed currently bound).
See the discussion about this starting here:
https://lists.freedesktop.org/archives/mesa-dev/2016-June/119829.html

So, update the bind flags when we detect such usage.
Note we update this value for now only in places which matter for us - that
is creating sampler/surface view, or binding constant buffer. There's plenty
more places (setting streamout buffers, vertex/index buffers, ...) where
things can be set with the wrong bind flags, but the bind flags there never
matter.

While here also make sure we only set dirty constant bit when it's a fs
constant buffer - totally doesn't matter if it's vs/gs.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-06-14 17:03:34 +02:00
Brian Paul 1d242b6882 llvmpipe: s/Elements/ARRAY_SIZE/
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-27 10:23:19 -06:00
Marek Olšák fb523cb6ad gallium: merge PIPE_SWIZZLE_* and UTIL_FORMAT_SWIZZLE_*
Use PIPE_SWIZZLE_* everywhere.
Use X/Y/Z/W/0/1 instead of RED, GREEN, BLUE, ALPHA, ZERO, ONE.
The new enum is called pipe_swizzle.

Acked-by: Jose Fonseca <jfonseca@vmware.com>
2016-04-22 01:30:39 +02:00
Roland Scheidegger 64d3ae09b7 llvmpipe: (trivial) initialize src1_alpha var to NULL
The blend code would do a conditional assignment based on it, causing valgrind
to complain. Since that variable was actually unused in this case, this
doesn't fix anything but the warning.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94955
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-04-15 22:51:28 +02:00