Commit Graph

6560 Commits

Author SHA1 Message Date
Roland Scheidegger e442db8e98 draw: drop some overflow computations
It turns out that noone actually cares if the address computations overflow,
be it the stride mul or the offset adds.
Wrap around seems to be explicitly permitted even by some other API (which
is a _very_ surprising result, as these overflow computations were added just
for that and made some tests pass at that time - I suspect some later fixes
fixed the actual root cause...). So the requirements in that other api were
actually sane there all along after all...
Still need to make sure the computed buffer size needed is valid, of course.
This ditches the shiny new widening mul from these codepaths, ah well...

And now that I really understand this, change the fishy min limiting
indices to what it really should have done. Which is simply to prevent
fetching more values than valid for the last loop iteration. (This makes
the code path in the loop minimally more complex for the non-indexed case
as we have to skip the optimization combining two adds. I think it should
be safe to skip this actually there, but I don't care much about this
especially since skipping that optimization actually makes the code easier
to read elsewhere.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-21 20:02:53 +01:00
Roland Scheidegger 2471aaa02f draw: simplify fetch some more
Don't keep the ofbit. This is just a minor simplification, just adjust
the buffer size so that there will always be an overflow if buffers aren't
valid to fetch from.
Also, get rid of control flow from the instanced path too. Not worried about
performance, but it's simpler and keeps the code more similar to ordinary
fetch.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-21 20:02:53 +01:00
Roland Scheidegger 4e1be31f01 draw: unify linear and elts draw jit functions
The code for elts and linear paths was nearly 100% identical by now - with
the elts path simply having some additional gather for the elements in the
main loop (with some additional small differences before the main loop).

Hence nuke the separate functions and decide this at jit shader execution
time (simply based on the presence of the elts pointer).

Some analysis shows that the generated vs jit functions seem to be just very
minimally more complex than the former elts functions, and almost none of the
additional complexity is in the main loop (basically just the branch logic
for the branch fetching the actual indices).
Compared to linear, the codesize of the function is of course a bit larger,
however the actual executed code in the main loop appears to be near 100%
identical (the additional code looking up indices is skipped as expected).

So, I would not expect a (meaningful) performance difference with the
generated code, neither with elts nor linear, this does however roughly
half the compilation time (the compiled shaders should also use only half
the memory of course).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-21 20:02:53 +01:00
Roland Scheidegger 8cf7edff7d draw: use same argument order for jit draw linear / elts functions
This is a bit simpler. Mostly to make it easier to unify the paths later...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-21 20:02:53 +01:00
Roland Scheidegger 78a997f728 draw: drop unnecessary index overflow handling from vsplit code
This was kind of strange, since it replaced indices which were only
overflowing due to bias with MAX_UINT. This would cause an overflow later
in the shader, except if stride was 0, however the vertex id would be
essentially random then (-1 + eltBias). No test cared about it, though.
So, drop this and just use ordinary int arithmetic wraparound as usual.
This is much simpler to understand and the results are "more correct" or
at least more consistent (vertex id as well as actual fetch results just
correspond to wrapped around arithmetic).
There's only one catch, it is now possible to hit the cache initialization
value also with ushort and ubyte elts path (this wouldn't be an issue if
we'd simply handle the eltBias itself later in the shader). Hence, we need
to make sure the cache logic doesn't think this element has already been
emitted when it has not (I believe some seriously bad things could happen
otherwise). So, borrow the logic which handled this from the uint case, but
not before fixing it up...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-21 20:02:53 +01:00
Roland Scheidegger 7a55c436c6 draw: simplify vsplit elts code a bit
vsplit_get_base_idx explicitly returned idx 0 and set the ofbit
in case of overflow. We'd then check the ofbit and use idx 0 instead of
looking it up. This was necessary because DRAW_GET_IDX used to return
DRAW_MAX_FETCH_IDX and not 0 in case of overflows.
However, this is all unnecessary, we can just let DRAW_GET_IDX return 0
in case of overflow. In fact before bbd1e60198
the code already did that, not sure why this particular bit was changed
(might have been one half of an attempt to get these indices to actual draw
shader execution - in fact I think this would make things less awkward, it
would require moving the eltBias handling to the shader as well).
Note there's other callers of DRAW_GET_IDX - those code paths however
explicitly do not handle index buffer overflows, therefore the overflow
value doesn't matter for them.

Also do some trivial simplification - for (unsigned) a + b, checking res < a
is sufficient for overflow detection, we don't need to check for res < b too
(similar for signed).

And an index buffer overflow check looked bogus - eltMax is the number of
elements in the index buffer, not the maximum element which can be fetched.
(Drop the start check against the idx buffer though, this is already covered
by end check and end < start).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-21 20:02:53 +01:00
Roland Scheidegger 5ec3a7333f draw: finally optimize bool clip mask generation
lp_build_any_true_range is just what we need, though it will only produce
optimal code with sse41 (ptest + set) - but even without it on 64bit x86
the code is still better (1 unpack, 2 movq + or + set), on 32bit x86 it's
going to be roughly the same as before.
While here also make it a "real" 8bit boolean - cuts one instruction but
more importantly similar to ordinary booleans.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-18 01:25:21 +01:00
Roland Scheidegger b16f06fd05 draw: use vectorized calculations for fetch (v2)
Instead of doing all the math with scalars, use vectors. This means the
overflow math needs to be done manually, albeit that's only really
problematic for the stride/index mul, the rest has been pretty much
moved outside the shader loop (albeit the mul could actually be optimized
away too), where things are still scalar.
To eliminate control flow in the main shader loop fetch, provide fake
buffers (so index 0 is always valid to fetch).
Still uses aos fetch though in the end - mostly because some more code
would be needed to handle unaligned fetches in that path, and because for
most formats it won't make a difference anyway (we generate some truly
horrendous code for things like R16G16_something for instance).

Instanced fetch however stays roughly the same as before, except that
no longer the same element is fetched multiple times (I've seen a reduction
of ~3 times in main shader loop size due to llvm not recognizing it's all
the same fetch, since it would have been possible some of the fetches
getting replaced with zeros in case vector size exceeds remaining fetch
count - the values of such fetches don't matter at all though).

Also, for elts gathering, use vectorized code as well.

The generated shaders are smaller and faster to compile (not entirely sure
about execution speed, but generally unless there's just single vertices
to handle I would expect it to be faster - there's more opportunities
for future improvements by using soa fetch).

v3: skip the fake index buffer, not needed due to the jit code never seeing
the real index buffer in the first place.
Fix a bug with mask expansion (needs SExt, not ZExt).
Also, be really really careful to keep the behavior the same, even in cases
where it looks wrong, and add comments why the code is doing the seemingly
wrong stuff... Fortunately it's not actually more complex in the end...
Also change function order slightly just to make the diff more readable.

No piglit change. Passes some internal testing with another api too...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-18 01:25:21 +01:00
Nicolai Hähnle fb17b7f99d u_simple_shaders: try to un-break the Windows build
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-16 13:25:35 +01:00
Nicolai Hähnle 3817a7a1d7 util/blitter: add clamping during SINT <-> UINT blits
Even though glBlitFramebuffer cannot be used for SINT <-> UINT blits, we
still need to handle this type of blit here because it can happen as part
of texture uploads / downloads, e.g. uploading a GL_RGBA8I texture from
GL_UNSIGNED_INT data.

Fixes parts of GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-16 10:31:21 +01:00
Nicolai Hähnle ab5fd10eaa util/blitter: index texfetch_col shaders by type
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-11-16 10:31:07 +01:00
Marek Olšák 72217d4335 gallium: add PIPE_SHADER_CAP_LOWER_IF_THRESHOLD
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 20:23:40 +01:00
Marek Olšák 5b8876609e gallivm: limit use of setFastMathFlags to LLVM 3.8 and later
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-11-15 20:22:28 +01:00
Marek Olšák 41d20d4920 gallivm: add lp_create_builder with an unsafe_fpmath option
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-15 19:17:56 +01:00
Tim Rowley b9578b683d gallium: detect avx512 cpu features
v3: fix check for xmm/ymm test
v2: style code, add avx512 to cpu dump

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-11-10 15:03:21 -06:00
Nicolai Hähnle b46a9c570f gallivm: fix [IU]MUL_HI regression harder
The fix in commit 88f791db75 was insufficient
for radeonsi because the vector case was not handled properly. It seems
piglit only covers the scalar case, unfortunately.

Fixes GL45-CTS.shader_bitfield_operation.[iu]mulExtended.*

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-11-10 13:17:10 +01:00
Tom Stellard 8bdd52c8f3 gallivm: Fix build after removal of deprecated attribute API v3
v2:
  Fix adding parameter attributes with LLVM < 4.0.

v3:
  Fix typo.
  Fix parameter index.
  Add a gallivm enum for function attributes.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-09 20:13:27 +00:00
Roland Scheidegger 4d5346aaac Revert "draw: use vectorized calculations for fetch"
Trivial. There's some regressions internally, related to overflow
behavior. I'll have to look at it at another time, some interactions
with vsplit/vcache are actually mind-blowing.

This reverts commit 3fa10ffb49.
2016-11-09 05:53:16 +01:00
Marek Olšák bdd48e47c0 tgsi/scan: turn a huge if-else-if.. chain into a switch statement
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-08 17:56:42 +01:00
Marek Olšák f864547fa9 tgsi/scan: fix images_buffers regression
The first IF statement disabled the second one.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98599

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-08 17:56:42 +01:00
Nicolai Hähnle 88f791db75 gallivm: fix [IU]MUL_HI regression
This patch does two things:

1. It separates the host-CPU code generation from the generic code
   generation. This guards against accidently breaking things for
   radeonsi in the future.

2. It makes sure we actually use both arguments and don't just compute
   a square :-p

Fixes a regression introduced by commit 29279f44b3

Cc: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-11-08 16:25:54 +01:00
Roland Scheidegger 3fa10ffb49 draw: use vectorized calculations for fetch
Instead of doing all the math with scalars, use vectors. This means the
overflow math needs to be done manually, albeit that's only really
problematic for the stride/index mul, the rest has been pretty much
moved outside the shader loop (albeit the mul could actually be optimized
away too), where things are still scalar. Because llvm is complete fail
with the zero-extend widening mul, roll our own even...
To eliminate control flow in the main shader loop fetch, provide fake
buffers (so index 0 is always valid to fetch).
Still uses aos fetch though in the end - mostly because some more code
would be needed to handle unaligned fetches in that path, and because for
most formats it won't make a difference anyway (we generate some truly
horrendous code for things like R16G16_something for instance).

Instanced fetch however stays roughly the same as before, except that
no longer the same element is fetched multiple times (I've seen a reduction
of ~3 times in main shader loop size due to apparently llvm not being able
to deduce it's really all the same with a couple instanced elements).

Also, for elts gathering, use vectorized code as well - provide a fake
elt buffer if there's no valid one bound.

The generated shaders are smaller and faster to compile (not entirely sure
about execution speed, but generally unless there's just single vertices
to handle I would expect it to be faster - there's more opportunities
for future improvements by using soa fetch).

No piglit change.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-08 03:41:26 +01:00
Roland Scheidegger 29279f44b3 gallivm: introduce 32x32->64bit lp_build_mul_32_lohi function
This is used by shader umul_hi/imul_hi functions (and soon by draw).
It's actually useful separating this out on its own, however the real
reason for doing it is because we're using an optimized sse2 version,
since the code llvm generates is atrocious (since there's no widening
mul in llvm, and it does not recognize the widening mul pattern, so
it generates code for real 64x64->64bit mul, which the cpu can't do
natively, in contrast to 32x32->64bit mul which it could do).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-11-08 03:41:26 +01:00
Steven Toth 381edca826 gallium/hud: protect against and initialization race
In the event that multiple threads attempt to install a graph
concurrently, protect the shared list.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-07 18:31:52 +01:00
Steven Toth 5a58323064 gallium/hud: close a previously opened handle
We're missing the closedir() to the matching opendir().

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-07 18:31:52 +01:00
Steven Toth 6ffed08679 gallium/hud: fix a problem where objects are free'd while in use.
Instead of trying to maintain a reference counted list of valid HUD
objects, and freeing them accordingly, creating race conditions
between unanticipated multiple threads, simply accept they're
allocated once and never released until the process terminates.

They're a shared resource between multiple threads, so accept
they're always available for use.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-07 18:31:52 +01:00
Roland Scheidegger 572a952126 draw: fix undefined input handling some more...
Previous fixes were incomplete - some code still iterated through the number
of elements provided by velem layout instead of the number stored in the key
(which is the same as the number defined by the vs). And also actually
accessed the elements from the layout directly instead of those in the key.
This mismatch could still cause crashes.
(Besides, it is a very good idea to only use data stored in the key anyway.)
v2: move null format check, remove now unnecessary function parameter,
some minor prettify

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-11-04 01:48:22 +01:00
Brian Paul f4dd3bde37 gallium/hud: call fflush() after printing error messages
For Windows.  Otherwise, we don't see the message until the program exits.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-11-03 14:29:23 -06:00
Timothy Arceri e1af20f18a nir/i965/anv/radv/gallium: make shader info a pointer
When restoring something from shader cache we won't have and don't
want to create a nir_shader this change detaches the two.

There are other advantages such as being able to reuse the
shader info populated by GLSL IR.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Brian Paul 88a618ce86 tgsi: trivial build fix for MSVC
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-24 14:16:07 -07:00
Axel Davy 54010cf8b6 gallium/util: Add align_calloc
Add implementation for align_calloc,
which is align_malloc + memset.

v2: add if (ptr) before memset.
Fix indentation.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:56:44 +02:00
Marek Olšák f35b1d156b tgsi/scan: scan texture offset operands
This seems important considering how much we depend on some of the flags.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:38 +02:00
Marek Olšák a2f98dff14 tgsi/scan: move src operand processing into a separate function
the next commit will need this

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:36 +02:00
Marek Olšák 72267a25db tgsi/scan: get information about shader buffer usage
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:35 +02:00
Marek Olšák d89890d000 tgsi/scan: handle indirect image indexing correctly
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:33 +02:00
Marek Olšák ac37720f51 tgsi/scan: don't treat RESQ etc. as memory instructions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:30 +02:00
Marek Olšák f095a4eb17 tgsi/scan: get information about indirect 2D file access
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:28 +02:00
Marek Olšák 965a5f1810 tgsi/scan: get information about indirect CONST access
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:26 +02:00
Marek Olšák c2a602d21a gallivm: try to fix build with LLVM <= 3.4 due to missing CallSite.h
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2016-10-20 17:45:23 +02:00
Marek Olšák 2db56434d4 gallivm: add wrappers for missing functions in LLVM <= 3.8
radeonsi needs these.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-20 11:07:50 +02:00
Roland Scheidegger aeceec54a8 draw: improve vertex fetch (v2)
The per-element fetch has quite some calculations which are constant,
these can be moved outside both the per-element as well as the main
shader loop (llvm can figure out it's constant mostly on its own, however
this can have a significant compile time cost).
Similarly, it looks easier swapping the fetch loops (outer loop per attrib,
inner loop filling up the per vertex elements - this way the aos->soa
conversion also can be done per attrib and not just at the end though again
this doesn't really make much of a difference in the generated code). (This
would also make it possible to vectorize the calculations leading to the
fetches.)
There's also some minimal change simplifying the overflow math slightly.
All in all, the generated code seems to look slightly simpler (depending
on the actual vs), but more importantly I've seen a significant reduction
in compile times for some vs (albeit with old (3.3) llvm version, and the
time reduction is only really for the optimizations run on the IR).
v2: adapt to other draw change.

No changes with piglit.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-10-19 01:44:59 +02:00
Roland Scheidegger 0942fe548e draw: improved handling of undefined inputs
Previous attempts to zero initialize all inputs were not really optimal
(though no performance impact was measurable). In fact this is not really
necessary, since we know the max number of inputs used.
Instead, just generate fetch for up to max inputs used by the shader,
directly replacing inputs for which there was no vertex element by zero.
This also cleans up key generation, which previously would have stored
some garbage for these elements.
And also drop the assertion which indicates such bogus usage by a
debug_printf (the whole point of initializing the undefined inputs was to
make this case safe to handle).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-10-19 01:44:59 +02:00
Roland Scheidegger d1b4a3451e gallivm: print out time for jitting functions with GALLIVM_DEBUG=perf
Compilation to actual machine code can easily take as much time as the
optimization passes on the IR if not more, so print this out too.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-10-19 01:44:59 +02:00
Roland Scheidegger 6f2f0daeb4 gallivm: Use native packs and unpacks for the lerps
For the texturing packs, things looked pretty terrible. For every
lerp, we were repacking the values, and while those look sort of cheap
with 128bit, with 256bit we end up with 2 of them instead of just 1 but
worse, plus 2 extracts too (the unpack, however, works fine with a
single instruction, albeit only with llvm 3.8 - the vpmovzxbw).

Ideally we'd use more clever pack for llvmpipe backend conversion too
since we actually use the "wrong" shuffle (which is more work) when doing
the fs twiddle just so we end up with the wrong order for being able to
do native pack when converting from 2x8f -> 1x16b. But this requires some
refactoring, since the untwiddle is separate from conversion.

This is only used for avx2 256bit pack/unpack for now.

Improves openarena scores by 8% or so, though overall it's still pretty
disappointing how much faster 256bit vectors are even with avx2 (or
rather, aren't...). And, of course, eliminating the needless
packs/unpacks in the first place would eliminate most of that advantage
(not quite all) from this patch.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-10-19 01:44:59 +02:00
Emil Velikov af7abc512c loader: remove loader_get_driver_for_fd() driver_type
Reminiscent from the pre-loader days, were we had multiple instances of
the loader logic in separate places and one could build a "GALLIUM_ONLY"
version.

Since that is no longer the case and the loaders (glx/egl/gbm) do not
(and should not) require to know any classic/gallium specific we can
drop the argument and the related code.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 17:06:29 +01:00
Marek Olšák 34099894c3 gallium/tgsi: add missing #include
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 11:20:57 +02:00
Jose Fonseca c6d17701c8 pipe_loader_sw: Don't invoke Unix close() on Windows.
Trivial.
2016-10-14 16:29:04 +01:00
Emil Velikov c079a206ad gallium: rename drm_driver_descriptor::{, driver_}name
Historically we use "device name" for the name of the kernel module and
"driver name" for the dri/other driver.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-14 11:09:00 +01:00
Emil Velikov 9837cf13b1 gallium: remove unused drm_driver_descriptor::driver_name
Likely unused since day 1, although I've only checked back until the
st/dri unification with commit 29ca7d2c94 ("st/dri: merge dri/drm and
dri/sw backends")

Based on the comment, referencing drmOpenByName it's not something we
want to bring back.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-14 11:09:00 +01:00
Brian Paul b81546d43c tgsi: fix comment typo in tgsi_ureg.c
Trivial.
2016-10-13 17:38:49 -06:00
Axel Davy 197cdd1bbd gallium/os: Use unsigned integers for size computation
Use uint64_t instead of int64_t in the calculation,
as the result is uint64_t.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-13 21:16:35 +02:00
Nicolai Hähnle 2b460c750a tgsi/ureg: add ureg_DECL_output_layout
For specifying an exact location/component.

v2: change the order of parameters (Dave)

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1)
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
2016-10-12 18:50:10 +02:00
Nicolai Hähnle 047a7c7a0b tgsi/ureg: add layout/component input declarations
v2: change the order of parameters (Dave)

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1)
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
2016-10-12 18:50:10 +02:00
Nicolai Hähnle f9a01f3872 tgsi/scan: fix num_inputs/num_outputs for shaders with overlapping arrays
v2: remove a tautological left-over assert (Marek)

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1)
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
2016-10-12 18:50:10 +02:00
Roland Scheidegger 7e86b2ddae draw: initialize shader inputs
This should make the code more robust if a shader tries to use inputs which
aren't defined by the vertex element layout (which usually shouldn't happen).

No piglit change.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-10-12 15:05:44 +02:00
Axel Davy 2290eac84e gallium/util: Really allow aliasing of dst for u_box_union_*
Gallium nine relies on aliasing to work with this function.
Without this patch, dirty region tracking was incorrect, which
could lead to incorrect textures or vertex buffers.
Fixes several game bugs with nine.
Fixes https://github.com/iXit/Mesa-3D/issues/234

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-10-10 23:43:48 +02:00
Axel Davy 218459771a gallium/os: Fix overflow on 32 bits
On systems with more than 4GB of ram,
os_get_total_physical_memory was triggering an integer
overflow for the linux and haiku path, when on
32 bits.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94561

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-10 23:43:48 +02:00
Steven Toth e00fdd643b gallium/hud: Remove superfluous debug
No longer required.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-06 16:37:06 +01:00
Marek Olšák faee2d6dda tgsi/scan: don't set interp flags for inputs only used by INTERP (v2)
(v1 pushed, then reverted)

This fixes 9 randomly failing tests on radeonsi:
  GL45-CTS.shader_multisample_interpolation.render.interpolate_at_centroid.*

v2: use input_interpolate[input] (correct) instead of
    input_interpolate[index] (incorrect)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:23 +02:00
Jose Fonseca 437d7e1baf gallivm: Use AVX2 gather instrinsics.
v2: Use AVX2 gather for non aligned loads too.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-10-04 23:36:20 +01:00
Roland Scheidegger bc80741d7a gallivm: Use 8 wide AoS sampling on AVX2.
v2: Make sure that with num_lods > 1 and min_filter != mag_filter we
still enter the splitting path. So this case would still use 4-wide aos
path (as a side note, the 4-wide aos sampling path could actually be
improved quite a bit if we have avx2, by just doing the filtering with
256bit vectors).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-10-04 23:36:20 +01:00
José Fonseca e088390c7d gallivm: Basic AVX2 support.
v2: pblendb -> pblendvb

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-10-04 23:36:20 +01:00
Matt Whitlock 5d0069eca2 gallium/auxiliary: replace call to dup(2) with fcntl(F_DUPFD_CLOEXEC)
Without this fix, duplicated file descriptors leak into child processes.
See commit aaac913e90 for one instance
where the same fix was employed.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Matt Whitlock <freedesktop@mattwhitlock.name>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-04 11:08:55 +02:00
Nayan Deshmukh b7a0f2e1f7 vl/dri3: fix warning about incompatible pointer type
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-10-03 12:51:30 -04:00
Steven Toth e99b9395be gallium/hud: Add support for CPU frequency monitoring
Detect all of the CPUs in the system. Expose metrics
for min, max and current frequency in Hz.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-30 15:18:46 -06:00
Marek Olšák 7b87190d2b Revert "gallium/hud: automatically print % if max_value == 100"
This reverts commit dbfeb0ec12.

With max_value being rounded to 100, it's often wrong.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-30 22:07:12 +02:00
Steven Toth 1d466b9b04 gallium/hud: Add power sensor support
Implement support for power based sensors, reporting units in
milli-watts and watts.

Also, minor cleanup - change the related if block to a switch.

Tested with two different power sensors, including the nouveau
'power1' sensors on a GTX950 card.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-29 17:51:15 -06:00
Steven Toth 8c60bcb4c3 gallium/hud: Add support for block I/O, network I/O and lmsensor stats
V8: Feedback based on peer review
    convert if block into a switch
    Constify some func args

V7: Increase precision when measuring lmsensors volts
    Flatten patch series.

V6: Feedback based on peer review
    Simplify sensor initialization (arg passing).
    Constify some func args

V5: Feedback based on peer review
    Convert sprintf to snprintf
    Convert char * to const char *
    int arg converted to bool
    Func changes to take a filename vs a larger struct.
    Omit the space between '*' and the param name.

V4: Merged with master as of 2016/9/27 6pm

V3: Flatten the entire patchset ready for the ML

V2: Additional seperate patches based on feedback
a) configure.ac: Add a comment related to libsensors

b) HUD: Disable Block/NIC I/O stats by default.
Implement configuration option --enable-gallium-extra-hud=yes
and enable both statistics when this option is enabled.

c) Configure.ac: Minor cleanup to user visible configuration settings

d) Configure.ac: HUD stats - build system improvements
Move the -lsensors out of a deeper Makefile, bring it into the configure.ac.
Also, rename a compiler directive to more closely follow the standard.

V1: Initial release to the ML
Three new features:
1. Disk/block I/O device read/write stats MB/ps.
2. Network Interface RX/TX transfer statistics as a percentage
   of the overall NIC speed.
3. lmsensor power, voltage and temperature sensors.

The lmsensor changes makes a dependency on libsensors so support
for the change is opt out by default.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-28 16:18:05 -06:00
Nicolai Hähnle 4421c0fb0d gallium/radeon/winsyses: reduce the number of pb_cache buckets
Small buffers are now handled via the slabs code, so separate buckets in
pb_cache have become redundant.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:41 +02:00
Nicolai Hähnle 84f156c0cb gallium/pipebuffer: add pb_slab utility
This is a simple framework for slab allocation from buffers that fits into
the buffer management scheme of the radeon and amdgpu winsyses where bufmgrs
aren't used.

The utility knows about different sized allocations and explicitly manages
reclaim of allocations that have pending fences. It manages all the free lists
but does not actually touch buffer objects directly, relying on callbacks for
that.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:44:42 +02:00
Nicolai Hähnle b3ebc229dc gallium/u_math: add util_logbase2_ceil
For finding the exponent of the next power of two.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:44:38 +02:00
Rob Clark ecd6fce261 mesa/st: support lowering multi-planar YUV
Support multi-planar YUV for external EGLImage's (currently just in the
dma-buf import path) by lowering to multiple texture fetch's for each
plane and CSC in shader.

There was some discussion of alternative approaches for tracking the
additional UV or U/V planes:

  https://lists.freedesktop.org/archives/mesa-dev/2016-September/127832.html

They all seemed worse than pipe_resource::next

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-09-26 15:29:17 -04:00
Samuel Pitoiset be0535b8c7 gallium/util: make use of strtol() in debug_get_num_option()
This allows to use hexadecimal numbers which are automatically
detected by strtol() when the base is 0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Brian Paul <brianp@vmware.com>
2016-09-26 19:39:04 +02:00
Brian Paul b35684543e gallium/util: add comment on util_is_format_compatible()
From reading the code, it's not obvious what is src/dest compatible.
The list of a->b copy-compatible formats comes from Jose's original
check-in message, with some format name updates.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-09-21 12:26:17 -06:00
Nicolai Hähnle 1f291369e4 gallivm: support negation on 64-bit integers
This should be analogous to 32-bit integers.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-21 10:24:50 +02:00
Dave Airlie 5561a37710 gallivm/llvmpipe: prepare support for ARB_gpu_shader_int64.
This enables 64-bit integer support in gallivm and
llvmpipe.

v2: add conversion opcodes.
v3:
- PIPE_CAP_INT64 is not there yet
- restrict DIV/MOD defaults to the CPU, as for 32 bits
- TGSI_OPCODE_I2U64 becomes TGSI_OPCODE_U2I64

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-21 10:24:30 +02:00
Dave Airlie 6b26039da3 tgsi/softpipe: prepare ARB_gpu_shader_int64 support. (v3)
This adds all the opcodes to tgsi_exec for softpipe to use.

v2: add conversion opcodes.
v3:
- no PIPE_CAP_INT64 yet
- change TGSI_OPCODE_I2U64 to TGSI_OPCODE_U2I64

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-21 10:24:11 +02:00
Dave Airlie 3985e6c044 gallium/tgsi: add support for 64-bit integer immediates.
This adds support to TGSI for 64-bit integer immediates.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-09-21 10:23:55 +02:00
Dave Airlie 6e1a34d545 gallium: add opcode and types for 64-bit integers. (v3)
This just adds the basic support for 64-bit opcodes,
and the new types.

v2: add conversion opcodes.
add documentation.
v3:
- make docs more consistent
- change TGSI_OPCODE_I2U64 to TGSI_OPCODE_U2I64

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-21 10:23:05 +02:00
Nayan Deshmukh 853e80f5a0 vl/dri3: handle the case of different GPU(v4.2)
In case of prime when rendering is done on GPU other then the
server GPU, use a seprate linear buffer for each back buffer
which will be displayed using present extension.

v2: Use a seprate linear buffer for each back buffer (Michel)
v3: Change variable names and fix coding style (Leo and Emil)
v4: Use PIPE_BIND_SAMPLER_VIEW for back buffer in case when
    a seprate linear buffer is used (Michel)
v4.1: remove empty line
v4.2: destroy the context and handle the case when
      create_context fails (Emil)

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-09-20 11:17:02 +02:00
Lars Hamre ddd6116e32 tgsi: Enable returns from within loops
Fixes the following piglit test (for softpipe):
/spec/glsl-1.10/execution/fs-loop-return

Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-09-17 10:24:13 -06:00
Rob Clark ba8a50955d ttn: fix warning after 7bf76563e
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-09-16 11:55:26 -04:00
Marek Olšák f019255acf Revert "tgsi/scan: don't set interp flags for inputs only used by INTERP instructions"
This reverts commit 524fd55d2d.

Reason: https://bugs.freedesktop.org/show_bug.cgi?id=97808
2016-09-15 00:47:24 +02:00
Marek Olšák 524fd55d2d tgsi/scan: don't set interp flags for inputs only used by INTERP instructions
radeonsi depends on the interp flags a little bit too much.

This fixes 9 randomly failing tests:
  GL45-CTS.shader_multisample_interpolation.render.interpolate_at_centroid.*

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-13 20:38:25 +02:00
Marek Olšák c723acc03d ddebug: dump shader buffers and images
this was unimplemented

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-09-13 20:38:25 +02:00
Andy Furniss 304f70536a vl/util: Fix YV12/I420 convert to NV12 U/V reversal
Fix VAAPI YV12/I420 convert to NV12 U/V reversal.
Input order is YVU when this is called.

Signed-off-by: Andy Furniss <adf.lists@gmail.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
2016-09-13 13:58:40 -04:00
Leo Liu 6a7f79af9b vl/rbsp: match initial escaped bits with valid in the buffer
Otherwise the check for the three byte will not make sense.

Signed-off-by: Leo Liu <leo.liu@amd.com>
2016-09-12 10:09:27 -04:00
Marek Olšák 5981ab5445 gallium: remove PIPE_BIND_TRANSFER_READ/WRITE
not used in any useful way

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-09-08 22:51:33 +02:00
Marek Olšák e7a73b75a0 gallium: switch drivers to the slab allocator in src/util 2016-09-06 14:24:04 +02:00
Thomas Hellstrom fc6be40011 gallium/postprocess: Fix resource freeing
The code was triggering asserts in DEBUG builds of the SVGA driver since
the reference count of the resource was never decremented before destroy.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2016-09-01 07:59:49 +02:00
Kai Wasserbäch 4c53267b8f gallium: Use enum pipe_shader_type in set_shader_images()
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-29 09:07:37 -06:00
Kai Wasserbäch 532db3b788 gallium: Use enum pipe_shader_type in set_sampler_views()
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-29 09:07:25 -06:00
Kai Wasserbäch 7413625ad3 gallium: Use enum pipe_shader_type in bind_sampler_states() (v2)
v1 → v2:
 - Fixed indentation (noted by Brian Paul)
 - Removed second assert from nouveau's switch statements (suggested by
   Brian Paul)

Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-29 08:45:48 -06:00
Marek Olšák d301efb400 tgsi/scan: remember sampler view types
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-29 14:16:57 +02:00
Brian Paul d221a6545c gallium/hud: move signo declaration inside PIPE_OS_UNIX block
To silence unused var warning with MSVC, MinGW.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-26 06:19:51 -06:00
Marek Olšák 9daaa6f5a6 gallium: add a pipe_context parameter to resource_get_handle
radeonsi needs to do some operations (DCC decompression) for OpenGL-OpenCL
interop and this is the only way to make it coherent with the current
context. It can optionally be set to NULL.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-25 14:09:48 +02:00
Rhys Kidd c9c989763a gallium/ttn: Remove duplicated TGSI_OPCODE_DP2A initialization
Duplicate line is currently on 1535.

Identified by Clang, when run through Eric Anholt's Travis harness.

Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-08-24 11:54:50 -07:00
Leo Liu 5277f25480 vl/rbsp: fix another three byte not detected
This happens when three byte "00 00 03" is partly loaded to
vlc->buffer, thus at the bottom of buffer with valid bits is
"00" or "00 00" and left  like "00 03" or "03" in the data,
so that it will not be detected by three byte emulation check.
The reason for that is the escaped bit was set to 0 from the
rbsp init.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2016-08-24 11:17:16 -04:00
Eric Engestrom 9411eb67ec gallium/cso: avoid unnecessary null dereference
The label `out:` calls `destroy()` which dereferences `ctx`.
This is unnecessary as there is nothing to destroy.
Immediately return instead.

CovID: 1258255
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-24 11:35:05 +01:00
Marek Olšák 0328b20050 gallium/hud: round max_value to print nicely rounded numbers next to graphs
This improves readability a lot.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-22 16:01:35 +02:00
Marek Olšák 0f1befe926 gallium/hud: generalize code for drawing numbers next to graphs
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-22 16:01:35 +02:00
Marek Olšák a33eb48d61 gallium/hud: draw numbers with 3 decimal places if those aren't 0
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-22 16:01:35 +02:00
Marek Olšák b9c9551c09 gallium/hud: use sRGB for nicer AA lines
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-22 16:01:35 +02:00
Marek Olšák 6ffde82083 gallium/hud: use AA lines for graphs
this looks a lot better (with the next patch)

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-22 16:01:35 +02:00
Marek Olšák 6902f9e82a gallium/hud: don't enable blending for all objects
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-22 16:01:35 +02:00
Eric Anholt c078c41520 ttn: Use nir_load_front_face instead of the TGSI-style input.
This reduces the diff between GLSL-to-NIR and TGSI-to-NIR, and gives NIR
more optimization to work on.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 13:11:36 -07:00
Eric Anholt ed92241d78 ttn: Make FRAG_RESULT_DEPTH be a float variable to match gtn and ptn.
This lets TTN-using drivers handle FRAG_RESULT_DEPTH the same between all
their source paths.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-08-19 13:11:36 -07:00
Marek Olšák 325379096f gallium: change pipe_image_view::first_element/last_element -> offset/size
This is required by OpenGL. Our hardware supports this.

Example: Bind RGBA32F with offset = 4 bytes.

Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-17 14:15:33 +02:00
Marek Olšák 7cd256ce7e gallium: change pipe_sampler_view::first_element/last_element -> offset/size
This is required by OpenGL. Our hardware supports this.

Example: Bind RGBA32F with offset = 4 bytes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97305

Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-17 14:15:33 +02:00
Nicolai Hähnle 41001ca4bd gallivm: add lp_build_alloca_undef
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:24 +02:00
Nicolai Hähnle 17e88e276c gallivm: add create_builder_at_entry helper function
Reduces code duplication.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:24 +02:00
Nicolai Hähnle 67c0f077a2 tgsi/scan: add tgsi_scan_arrays
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-17 12:11:21 +02:00
Brian Paul 038b1b11fe gallium: remove unused u_clear.h file
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-16 08:28:33 -06:00
Brian Paul 66debeae9d gallium/util: minor reformatting in u_box.h
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-16 08:28:32 -06:00
Rob Clark 142dd7b9c0 gallium/u_blitter: split out a helper for common clear state
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-16 09:21:13 -04:00
Rob Clark 2b2f436c69 gallium/u_blitter: add helper to save FS const buffer state
Not (currently) state that is overwridden by u_blitter itself, but
drivers with custom blit/clear which are reusing part of the u_blitter
infrastructure will use it.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-16 09:21:13 -04:00
Rob Clark 433e12fea8 gallium/u_blitter: export some functions
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-16 09:21:13 -04:00
Ilia Mirkin c85b7f0e87 gallium/util: add helper to compute zmin/zmax for a viewport state
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
2016-08-14 17:41:33 -04:00
Leo Liu 6575ebdc45 vl/rbsp: add a check for emulation prevention three byte
This is the case when the "00 00 03" is very close to the beginning of
nal unit header

v2: move the check to rbsp init

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-08-10 09:52:44 -04:00
Marek Olšák a909210131 gallium: add render_condition_enable param to clear_render_target/depth_stencil
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-10 01:10:21 +02:00
Mathias Fröhlich aa920736fe gallium: Add c99_compat.h to u_bitcast.h
We need this for 'inline'.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-09 21:20:56 +02:00
Mathias Fröhlich 027cbf00f2 util: Move _mesa_fsl/util_last_bit into util/bitscan.h
As requested with the initial creation of util/bitscan.h
now move other bitscan related functions into util.

v2: Split into two patches.

Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-08-09 21:20:46 +02:00
Jason Ekstrand f29fd7897a util: Move format_r11g11b10f.h to src/util
It's used from both mesa main and gallium.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-08-05 09:06:57 -07:00
Jason Ekstrand 6c665cdfc5 util: Move format_rgb9e5.h to src/util
It's used from both mesa main and gallium.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-08-05 09:06:31 -07:00
Michel Dänzer 67c5e843b9 vl/dri3: Destroy Present event context when destroying drawable v2
Without this, the X server may accumulate stale Present event contexts
if a client performs several video decoding sessions using the same
window.

v2: Based on Chris Wilson's review:
* Use xcb_discard_reply() instead of free(xcb_request_check())

Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com>
2016-08-04 15:45:43 +09:00
Marek Olšák 6db93cd167 gallium/util: fix align64
it cut off the upper 32 bits

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-08-01 23:28:14 +02:00
Matt Turner be35c6ba92 draw: Avoid aliasing violations.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-01 12:09:17 -07:00
Matt Turner 16ff8f9ae8 gallium/auxiliary: Add u_bitcast.h header.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-01 12:09:17 -07:00
Brian Paul 13fa051356 auxiliary/os: add new os_get_command_line() function
This can be used by the driver to get the command line which started
the process.  Will be used by the VMware driver for extra logging.

For now, this is only implemented for Linux via /proc/self/cmdline
and Windows via GetCommandLine().

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-01 12:20:19 -06:00
Rob Clark 53b2b8bf6f u_vbuf: fix potentially bogus assert
There are cases where we hit u_vbuf path due to alignment or pitch-
alignment restrictions, but for an output-format that u_vbuf does not
support translating (yet the driver does support natively).  In which
case we hit the memcpy() path and don't care that u_vbuf doesn't
understand it.

Fixes crash with debug build of mesa in:
dEQP-GLES3.functional.vertex_arrays.single_attribute.strides.fixed.user_ptr_stride17_components2_quads1

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95000
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-01 13:42:11 -04:00
Roland Scheidegger 99a47391e4 Revert "gallium/util: fix resource leak"
This reverts commit d1fe26a628.

Replacing a resource leak with a segfault isn't the solution.
2016-07-30 18:18:09 +02:00
Eric Engestrom d1fe26a628 gallium/util: fix resource leak
CovID: 401540
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-30 17:27:42 +02:00
Rob Clark 010e4b2d52 os: add pipe_mutex_assert_locked()
Would be nice if we could also have lockdep, like in the linux kernel.
But this is better than nothing.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-30 09:23:42 -04:00
Eric Anholt 4d0b2c7aaa ttn: Update shader->info as we generate code.
We could use the nir_shader_gather_info() pass to update it after the
fact, but this is what glsl_to_nir and prog_to_nir do.

Reviewed-by: Rob Clark <robclark@freedesktop.org>
2016-07-26 13:47:50 -07:00
Boyuan Zhang 23b4ab1738 vl/util: add copy func for yv12image to nv12surface v2
Add function to copy from yv12 image to nv12 surface for VAAPI putimage call.
We need this function in VaPutImage call where copying from yv12 image to nv12
surface for encoding. Existing function can't be used because it only work for
copying from yv12 surface to nv12 image in Vaapi.

v2: cleanup variable types and commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2016-07-25 13:39:18 +02:00
Marek Olšák 8e3e9d2839 gallium/util: don't modify usage in pipe_buffer_write
All drivers were already doing it except virgl.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-23 13:33:42 +02:00
Marek Olšák 1ffe77e7bb gallium: split transfer_inline_write into buffer and texture callbacks
to reduce the call indirections with u_resource_vtbl.

The worst call tree you could get was:
  - u_transfer_inline_write_vtbl
    - u_default_transfer_inline_write
      - u_transfer_map_vtbl
        - driver_transfer_map
      - u_transfer_unmap_vtbl
        - driver_transfer_unmap

That's 6 indirect calls. Some drivers only had 5. The goal is to have
1 indirect call for drivers that care. The resource type can be determined
statically at most call sites.

The new interface is:
  pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data)
  pipe_context::texture_subdata(ctx, resource, level, usage, box, data,
                                stride, layer_stride)

v2: fix whitespace, correct ilo's behavior

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
2016-07-23 13:33:42 +02:00
Marek Olšák 4cdc482283 gallium/os: use CLOCK_MONOTONIC for sleeps (v2)
v2: handle EINTR, remove backslashes

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2016-07-22 22:34:49 +02:00
Marek Olšák 8d5944199d gallium/pb_cache: reduce the number of pointer dereferences
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák 3cdc0e133f gallium/pb_cache: divide the cache into buckets for reducing cache misses
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Marek Olšák fec7f74129 gallium/pb_cache: check parameters that are more likely to fail first
This makes Bioshock Infinite with deferred flushing 2% faster.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-19 23:45:06 +02:00
Eric Engestrom 8ba46fbd9e vl: fix memory leak
CovID: 1363008
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-19 12:41:00 +02:00
Leo Liu 134d6e4e4f vl/dri3: fix a memory leak from front buffer
Inspired by fix for mem leak of vdpau interop, resource_from_handle
set texture reference count, that need to be decreased and released,
recall there is a similar case for DRI3, that is with VA-API glx
extension, there is temporary TFP(texture from pixmap), we target it
through dma-buf. leak happens when without count down the reference.

Checked and found with mpv vo=opengl case, there only one static TFP,
the leak happens once, but for totem player using gstreamer VA-API glx,
the dynamic TFP for each frame, so leak quite a bit.

This fixes mem leak for mpv and totem.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-07-18 09:20:40 -04:00
Kenneth Graunke ac1181ffbe compiler: Rename INTERP_QUALIFIER_* to INTERP_MODE_*.
Likewise, rename the enum type to glsl_interp_mode.

Beyond the GLSL front-end, talking about "interpolation modes" seems
more natural than "interpolation qualifiers" - in the IR, we're removed
from how exactly the source language specifies how to interpolate an
input.  Also, SPIR-V calls these "decorations" rather than "qualifiers".

Generated by:
$ find . -regextype egrep -regex '.*\.(c|cpp|h)' -type f -exec sed -i \
  -e 's/INTERP_QUALIFIER_/INTERP_MODE_/g' \
  -e 's/glsl_interp_qualifier/glsl_interp_mode/g' {} \;

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Dave Airlie <airlied@redhat.com>
2016-07-17 19:26:48 -07:00
Rob Clark 44bbfedbd9 gallium/u_queue: add optional cleanup callback
Adds a second optional cleanup callback, called after the fence is
signaled.  This is needed if, for example, the queue has the last
reference to the object that embeds the util_queue_fence.  In this
case we cannot drop the ref in the main callback, since that would
result in the fence being destroyed before it is signaled.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-16 10:00:04 -04:00
Yaakov Selkowitz 5d303867f5 Use correct names for dlopen()ed files on Cygwin
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
Reviewed-by: Jon Turney <jon.turney@dronecode.org.uk>
2016-07-15 19:46:54 +01:00
Marek Olšák 6596ecf8c5 gallivm: add helper lp_add_attr_dereferenceable
Not sure if this is the right way to do it, but it seems to work.

v2: make it a no-op on LLVM <= 3.5

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-07-13 19:46:16 +02:00
Leo Liu 82f875f4d8 vl/compositor: set layer of y or uv to render
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
2016-07-12 09:27:53 -04:00
Leo Liu 14761da9f9 vl/compositor: add weave to yuv shader
This shader will make interlaced yuv to progressive yuv.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
2016-07-12 09:27:53 -04:00
Leo Liu 2e18c2c6f8 vl/compositor: move weave shader out from rgb weaving
We'll use weave shader in the later patch.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
2016-07-12 09:27:53 -04:00
Marek Olšák d7b6f90684 gallivm: set LLVMNoUnwindAttribute on all intrinsics
RadeonSI stats: Mostly 0% difference, but Valley shows a small improvement:

 Application            Files    SGPRs     VGPRs   SpillSGPR SpillVGPR Code Size    LDS    Max Waves   Waits
 unigine_valley           278    0.00 %   -0.29 %    0.00 %    0.00 %    0.01 %    0.00 %    0.17 %    0.00 %

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-07-11 19:06:05 +02:00
Nicolai Hähnle 374aa2bb27 gallium/u_queue: assert that users must wait on fences before destroying them
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-11 11:04:44 +02:00
Nicolai Hähnle a0a616720a gallium/u_queue: guard fence->signalled checks with fence->mutex
I have seen a hang during application shutdown that could be explained by the
following race condition which this patch fixes:

1. Worker thread enters util_queue_fence_signal, sets fence->signalled = true.
2. Main thread calls util_queue_job_wait, which returns immediately.
3. Main thread deletes the job and fence structures, leaving garbage behind.
4. Worker thread calls pipe_condvar_broadcast, which gets stuck forever because
   it is accessing garbage.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-07-11 11:03:59 +02:00
Nayan Deshmukh af18a04755 vl: add half pixel to v_tex before adding offsets
Since pixel center lies at 0.5, add half_pixel to vtex
before adding offsets to it.

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-08 20:51:12 +02:00
Rob Clark def044376a gallium/util: make util_copy_framebuffer_state(src=NULL) work
Be more consistent with the other u_inlines util_copy_xyz_state()
helpers and support NULL src.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-07-06 10:17:30 -04:00
Hans de Goede d386cef246 tgsi: Add WORK_DIM System Value
Add a new WORK_DIM SV type, this is will return the grid dimensions
(1-4) for compute (opencl) kernels.

This is necessary to implement the opencl get_work_dim() function.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-07-02 12:21:28 +02:00
Nayan Deshmukh 872dd9ad15 vl: add a bicubic interpolation filter(v5)
This is a shader based bicubic interpolater which uses cubic
Hermite spline algorithm.

v2: set dst_area and dst_clip during scaling (Christian)
v3: clear the render target before rendering
v4: intialize offsets while initializing shaders
    use a constant buffer to send dst_size to frag shader
    small changes to reduce calculation in shader
v5: send half pixel offset instead of sending dst_size

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-07-01 12:54:33 +02:00
Brian Paul c823ff8dfb gallium/util: check for window cliprects in util_can_blit_via_copy_region()
We can't blit with resource_copy_region() if there are window clip rects.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-30 18:19:09 -06:00
Brian Paul 5f1335878e gallium/util: add tight_format_check param to util_can_blit_via_copy_region()
The VMware driver will use this for implementing GL_ARB_copy_image.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:06 -06:00
Brian Paul a029d9f074 gallium/util: simplify a few things in util_can_blit_via_copy_region()
Since only the src box can have negative dims for flipping, just
comparing the src/dst box sizes is enough to detect flips.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:06 -06:00
Brian Paul 5d31ea4b8f gallium/util: new util_try_blit_via_copy_region() function
Pulled out of the util_try_blit_via_copy_region() function.  Subsequent
changes build on this.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-30 14:32:06 -06:00
Hans de Goede 459cc94507 pipe_loader_sw: Fix fd leak when instantiated via pipe_loader_sw_probe_kms
Make pipe_loader_sw_probe_kms take ownership of the passed in fd,
like pipe_loader_drm_probe_fd does.

The only caller is dri_kms_init_screen which passes in a dupped fd,
just like dri2_init_screen passes in a dupped fd to
pipe_loader_drm_probe_fd.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-28 12:29:54 +02:00
Marek Olšák cbb5adb908 gallium/u_queue: allow the execute function to differ per job
so that independent types of jobs can use the same queue.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 12:24:40 +02:00
Marek Olšák 4a06786efd gallium/u_queue: reduce the number of mutexes by 2
by converting semaphores to condvars and using the main mutex

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 12:24:40 +02:00
Marek Olšák 2fba0aaa70 gallium/u_queue: add an option to name threads
for debugging

v2: correct the snprintf use

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 12:24:40 +02:00
Marek Olšák 404d0d50d8 gallium/u_queue: add an option to have multiple worker threads
independent jobs don't have to be stuck on only one thread

v2: use CALLOC & FREE

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 12:24:40 +02:00
Marek Olšák 4358f6dd13 gallium/u_queue: rewrite util_queue_fence to allow multiple waiters
Checking "signalled" is first done without a mutex, then with a mutex.
Also, checking without waiting doesn't lock the mutex. This is racy, but
should be safe.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 12:24:40 +02:00
Marek Olšák d8367e91f2 gallium/u_queue: use a ring instead of a stack
and allow specifying its size in util_queue_init.

v2: use CALLOC & FREE

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-24 12:24:40 +02:00
Brian Paul e0dc3c5f19 gallium/util: fix some 4-space indentation in blitter code
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-23 07:31:20 -06:00
Ilia Mirkin 5b0d64886d translate: fix start_instance parameter in sse version
The generic version gets this right already, but this was using an
incorrect formula in SSE.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-06-21 21:50:16 -04:00
Marek Olšák 5fed1122e8 gallium/u_blitter: implement mipmap generation
for pipe_context::generate_mipmap

first move some of the blit code from util_blitter_blit_generic
to a separate function, then use it from util_blitter_generate_mipmap

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-21 13:52:05 +02:00
Roland Scheidegger b0cf99165a gallivm: don't use integer min/max sse intrinsics with llvm >= 3.9
Apparently, these are deprecated. There's some AutoUpgrade feature which
is supposed to promote these to cmp/select, which apparently doesn't work
with jit code. It is possible it's not actually even meant to work (see
the bug filed against llvm which couldn't provide an answer neither)
but in any case this is meant to be only temporary unless the intrinsics
are really illegal. So, just use the fallback code (which should be cmp/select,
we're actually doing cmp/sext/trunc/select, but in any case llvm 3.9 manages
to optimize this back to pmin/pmax in the end).

This addresses https://llvm.org/bugs/show_bug.cgi?id=28176

CC: <mesa-stable@lists.freedesktop.org>

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Tested-by: Aaron Watry <awatry@gmail.com>
2016-06-20 17:19:03 +02:00
Christian König bf89e672cf vl: support luma keying for interlaced surfaces as well
We had the CSC code twice in there, factor it out into a separate function.

Signed-off-by: Christian König <christian.koenig@amd.com>
2016-06-16 09:41:12 +02:00
Brian Paul bb1292e226 auxilary/os: allow appending to GALLIUM_LOG_FILE
If the log file specified by the GALLIUM_LOG_FILE begins with '+', open
the file in append mode.  This is useful to log all gallium output for
an entire piglit run, for example.

v2: put GALLIUM_LOG_FILE support inside an #ifdef DEBUG block.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-06-15 17:16:42 -06:00
Marek Olšák 562cb03d76 gallium/util: import the multithreaded job queue from amdgpu winsys (v2)
v2: rename the event to util_queue_fence

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-15 21:07:34 +02:00
Roland Scheidegger afbf5888f5 gallium/util: don't use blocksize for minify for assertions
The previous assertions required for texture sizes smaller than block_size
that src_box.x + src_box.width still be block size.
(e.g. for a texture with width 3, and src_box.x = 0, src_box.width would
have to be 4 to not assert.)
This caused some assertions with some other state tracker.
It looks though like callers aren't expected to round up widths to block sizes
(for sizes larger than block size the assertion would still have verified it
wouldn't have been rounded up) so we simply shouldn't use a minify which
rounds up to block size.
(No piglit change with llvmpipe.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-14 17:03:34 +02:00
Julien Isorce 1cdb4da1d6 st/va: ensure linear memory for dmabuf
In order to do zero-copy between two different devices
the memory should not be tiled.

Tested with GStreamer on a laptop that has 2 GPUs:
1- gstvaapidecode:
   HW decoding and dmabuf export with nouveau driver on Nvidia GPU.
2- glimagesink:
   EGLImage imports dmabuf on Intel GPU.

TEST: DRI_PRIME=1 gst-launch vaapidecodebin ! glimagesink

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-14 08:40:33 +01:00
Mathias Fröhlich c3b6656676 mesa/gallium: Move u_bit_scan{,64} from gallium to util.
The functions are also useful for mesa.
Introduce src/util/bitscan.{h,c}. Move ffs function
implementations from src/mesa/main/imports.{h,c}.
Move bit scan related functions from
src/gallium/auxiliary/util/u_math.h. Merge platform
handling with what is available from within mesa.

v2: Try to fix MSVC compile.

Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2016-06-14 05:19:10 +02:00
Brian Paul cf9bb9acac util: update some assertions in util_resource_copy_region()
To cope with copies of compressed images which are not multiples of
the block size.  Suggested by Jose.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@sroland@vmware.com>
2016-06-13 13:30:19 -06:00
Jan Vesely 1fb4179f92 vl: Fix trivial sign compare warnings
v2: add whitepace fixes

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
[Emil Velikov: squash a few more whitespace issues]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:31:29 +01:00
Rob Herring 112e988329 Android: move libdrm settings to top-level Android.common.mk
Fix warnings like these due to HAVE_LIBDRM being inconsistently defined:

external/libdrm/include/drm/drm.h:839:30: warning: redefinition of typedef 'drm_clip_rect_t' is a C11 feature [-Wtypedef-redefinition]
typedef struct drm_clip_rect drm_clip_rect_t;

HAVE_LIBDRM needs to be set project wide to fix this. This change also
harmlessly links libdrm with everything, but simplifies the makefiles a
bit.

Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2016-06-13 15:31:29 +01:00
Jan Vesely ace70aedcf gallivm: Fix trivial sign warnings
v2: include whitespace fixes

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-06-13 09:23:09 -04:00
Brian Paul dd4be2e19a util: update util_resource_copy_region() for GL_ARB_copy_image
This primarily means added support for copying between compressed
and uncompressed formats.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-06-10 15:50:04 -06:00
Anuj Phogat 466b320163 gallium: Fix region overlap conditions for rectangles with a shared edge
>From OpenGL 4.0 spec, section 4.3.2 "Copying Pixels":
"The pixels corresponding to these buffers are copied from the source
rectangle bounded by the locations (srcX0, srcY 0) and (srcX1, srcY 1)
to the destination rectangle bounded by the locations (dstX0, dstY 0)
and (dstX1, dstY 1). The lower bounds of the rectangle are inclusive,
while the upper bounds are exclusive."

So, the rectangles sharing just an edge shouldn't overlap.
 -----------
|           |
 ------- ---
|       |   |
|       |   |
 ------- ---

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-06-10 14:35:21 -07:00
Dave Airlie 1584918996 gallivm: more 64-bit integer prep work.
This converts one other place to using the new helper.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-11 06:44:30 +10:00
Dave Airlie e5c57824ec gallivm: make non-float return code bitcast consistent.
This just uses the same form across the fetches.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-11 06:44:17 +10:00
Dave Airlie 3b97e50b9a gallium/gallivm: use 64-bit test instead of doubles.
This just makes some generic code that currently emits double
suitable for emitting 64-bit values.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-11 06:44:13 +10:00
Dave Airlie 213ab8db87 gallium/tgsi: add 64-bitness type check function.
Currently this just doubles, but we'll convert users to this
so making adding 64-bit integers easier.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-11 06:43:45 +10:00
Leo Liu 2ad443e4cc vl/dri3: support receiving new pixmap for front buffer
With glx of gstreamer-vaapi, the temporary pixmap for front buffer gets
renewed in each frame, so when we receive a new pixmap, should get a new
front buffer for it.

This also fixes Totem player playback corruption.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 11:24:24 -04:00
Leo Liu 0ef8500aab vl/dri3: get Makefile properly
From original commit, the macro "if HAVE_DRI3" was in Makefile.sources,
this file is shared with SCons, SCons is not able to parse this marco,
the SCons build failed. Jose quickly gave two approaches and quick fix
with his second approach, thanks Jose for the solutions and fixes.

This patch is Jose's first approach, and it's more proper, because the
dri3 c file should not be included to build when DRI3 is not enabled.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-06-10 11:24:19 -04:00
Jose Fonseca 2b4cee0571 gallivm: Never emit llvm.fmuladd on LLVM 3.3.
Besides the old JIT bug, it seems the X86 backend on LLVM 3.3 doesn't
handle llvm.fmuladd and instead it fall backs to a C function.  Which in
turn causes a segfault on Windows.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-10 16:17:04 +01:00
Jose Fonseca 320d1191c6 gallivm: Use llvm.fmuladd.*.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-10 13:47:35 +01:00
Jose Fonseca 9e8edfa190 util,gallivm: Explicitly enable/disable fma attribute.
As suggested by Roland Scheidegger.

Use the same logic as f16c, since fma requires VEX encoding.

But disable FMA on LLVM 3.3 without MCJIT.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-10 13:47:35 +01:00
Nayan Deshmukh f24eb5a178 vl: Apply luma key filter before CSC conversion
Apply the luma key filter to the YCbCr values during the CSC conversion
    in video buffer shader. The initial values of max and min luma are set
    to opposite values to disable the filter initially and will be set when
    enabling it.

    Add extra parmeters min and max luma for the luma key filter in
    vl_compositor_set_csc_matrix in va, xvmc. Setting them
    to opposite value 1.f and 0.f respectively won't effect the CSC
    conversion

    v2: -Squash 1,2 and 3 into one patch to avoid breaking build of
        other components. (Christian)
        -use ureg_swizzle. (Christian)
        -change name of the variables. (Christian)

    v3: -Squash all patches in one to avoid breaking of build. (Emil)
        -wrap functions properly. (Emil)
        -use 0.0f and 1.0f instead of 0.f and 1.f respectively. (Emil)

    v4: -Divide it in two patches one which introduces the functionality
	 and assigs dummy values to the changed functions and second which
	 implements the lumakey filter. (Christian)
	-use ureg_scalar instead ureg_swizzle. (Christian)

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-06-09 14:23:07 +02:00
Nicolai Hähnle d3a584defe tgsi/scan: add uses_derivatives (v2)
v2:
- TG4 does not calculate derivatives (Ilia)
- also handle SAMPLE* instructions (Roland)

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-07 23:45:17 +02:00
Ilia Mirkin 30684b50d7 gallium: add VOTE_* opcodes to implement GL_ARB_shader_group_vote
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-06-06 20:49:28 -04:00
Charmaine Lee 627e975896 tgsi: fix mixed data type comparison in tgsi_point_sprite.c
Cast the unsigned semantic index to integer datatype before comparing
to max_generic, otherwise, max_generic which is initialized to -1
will be converted to unsigned int before the comparison, causing a wrong
semantic index to be assigned to a shader output.

Fixes the assert running TurboCAD_gl.trace. (VMware bug 1667265)

Also tested with glretrace, mesa demos pointblast, spriteblast and pointcoord.

v2: use the original max_generic variable but add the (int) cast
    to the semantic index, as suggested by Brian.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-06-06 10:20:45 -06:00
Lars Hamre 4163c71010 tgsi: use truncf in micro_trunc
Switches to using truncf in micro_trunc.

Fixes the following piglit tests (for softpipe):

/spec/glsl-1.30/execution/built-in-functions/...
fs-trunc-float
fs-trunc-vec2
fs-trunc-vec3
fs-trunc-vec4
vs-trunc-float
vs-trunc-vec2
vs-trunc-vec3
vs-trunc-vec4

/spec/glsl-1.50/execution/built-in-functions/...
gs-trunc-float
gs-trunc-vec2
gs-trunc-vec3
gs-trunc-vec4

Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-06-06 15:56:28 +02:00
Marek Olšák ada3d8f31e gallium/u_suballoc: allow different alignment for each allocation
Just move the alignment parameter from u_suballocator_create
to u_suballocator_alloc.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2016-06-04 15:42:33 +02:00
Rob Clark 228b2b36f4 gallium/util: remove u_staging
Unused, and fixes a couple of coverity warnings: CID 1362171, 1362170

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Marek Olšák <marek.olsak@amd.com>
2016-06-02 15:44:07 -04:00
Nicolai Hähnle d9893feb2c gallium/cso: allow saving the first fragment shader image slot
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:15 +02:00
Nicolai Hähnle fc0352ff9c gallium/u_inlines: allow NULL src in util_copy_image_view
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-06-01 22:37:12 +02:00
Marek Olšák 9d881cc0ac gallium/util: add util_texrange_covers_whole_level from radeon
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-06-01 17:35:30 +02:00
Marek Olšák 921ab0028e gallium/u_blitter: do GL-compliant integer resolves
The GL spec has been clarified and the new rule says we should just
copy 1 sample.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-31 16:48:53 +02:00
Frederic Devernay cee459d84d gallivm: initialize init_native_targets_once_flag correctly
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2016-05-30 16:13:52 +02:00
Brian Paul 747754f027 gallium/util: another s/unsigned/enum pipe_prim_type/ for clang
Trivial.
2016-05-27 18:42:21 -06:00
Brian Paul 8beb6f3c9c gallium/util: another unsigned -> enum pipe_prim_type change
gcc didn't warn about the unsigned / enum pipe_prim_type mismatch
between the .c and .h file.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-27 17:55:05 -06:00
Roland Scheidegger 9247570d42 gallivm: eliminate a unnecessary AND with unorm lerps
Instead of doing a add and then mask out the upper bits, we can
simply do a add with a half wide type (this, of course, assumes
the hw can actually do it...), so we'll get the required zero
in the upper bits automatically.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-05-27 19:11:28 +02:00
Roland Scheidegger 17d685c426 gallium/util: use enum pipe_prim_type instead of unsigned some more
There were complaints from a mingw build:
u_draw.h:134:14: error: invalid conversion from ‘uint {aka unsigned int}’
to ‘pipe_prim_type’ [-fpermissive]

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-27 19:11:28 +02:00
Rob Clark 4f98c94be7 gallium/util: fix build break
Missing #include caused build breaks after 21a3fb9cd.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2016-05-26 20:59:08 -04:00
Brian Paul 1ec45a1948 gallium/util: use enum pipe_prim_type in u_prim.h functions
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:18 -06:00
Brian Paul 7a49b41436 util/indices: move duplicated assignments out of switch cases
Spotted by Roland.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:18 -06:00
Brian Paul a25ae485a6 util/indices,svga: s/unsigned/enum pipe_prim_type/
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:18 -06:00
Brian Paul 21a3fb9cd8 util: s/unsigned/enum pipe_resource_usage/ for buffer usage variables
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:18 -06:00
Brian Paul 0f983e1793 util/indices: implement unfilled (tri->line) conversion for adjacency prims
Tested with new piglit gl-3.2-adj-prims test.

v2: re-order trisadj and tristripadj code, per Roland.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:17 -06:00
Brian Paul d6c2c7d710 util/indices: implement provoking vertex conversion for adjacency primitives
Tested with new piglit gl-3.2-adj-prims test.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:17 -06:00
Brian Paul 479d364c39 util/indices: assert that the incoming primitive is a triangle type
The unfilled index translator/generator functions should only be
called when the primitive mode is one of the triangle types.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:17 -06:00
Brian Paul 26de558072 util/indices: formatting, whitespace fixes in u_unfilled_indices.c
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:17 -06:00
Brian Paul 24eadb4810 util/indices: improve comments in u_indices.h
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-05-26 17:44:17 -06:00
Rob Clark 6e51fe75a4 tgsi: fix coverity out-of-bounds warning
CID 1271532 (#1 of 1): Out-of-bounds read (OVERRUN)34. overrun-local:
Overrunning array of 2 16-byte elements at element index 2 (byte offset
32) by dereferencing pointer &inst.Dst[i].

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-26 15:17:49 -04:00
Rob Clark 3d66ba971e tgsi: fix out of bounds access
Not sure why coverity calls this an out-of-bounds read vs out-of-bounds
write.

CID 1358920 (#1 of 1): Out-of-bounds read (OVERRUN)9. overrun-local:
Overrunning array r of 3 16-byte elements at element index 3 (byte
offset 48) using index chan (which evaluates to 3).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-26 15:17:49 -04:00
Lars Hamre c626a86586 gallium/tgsi: use _mesa_roundevenf in micro_rnd
Fixes the following piglit tests (for softpipe):

/spec/glsl-1.30/execution/built-in-functions/...
fs-roundeven-float
fs-roundeven-vec2
fs-roundeven-vec3
fs-roundeven-vec4
vs-roundeven-float
vs-roundeven-vec2
vs-roundeven-vec3
vs-roundeven-vec4

/spec/glsl-1.50/execution/built-in-functions/...
gs-roundeven-float
gs-roundeven-vec2
gs-roundeven-vec3
gs-roundeven-vec4

Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-26 07:59:15 -06:00
Giuseppe Bilotta 8c00fe3970 scons: whitespace cleanup
This text transformation was done automatically via the following shell
command:

$ find -name SCons\* -exec sed -i s/\\s\\+$// '{}' \;

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-25 12:23:12 -06:00
Brian Paul 9690ab0cdf tgsi: print TGSI_PROPERTY_NEXT_SHADER value as string, not an integer
Print "GEOM" instead of "2", for example.

v2: also update the text parsing code, per Ilia.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-25 07:21:23 -06:00
Brian Paul 2b773fcf00 tgsi: s/6/PIPE_SHADER_TYPES/ for tgsi_processor_type_names array size
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-25 07:21:23 -06:00
Emil Velikov a155cdaace vl/drm: don't call close(-1) in vl_drm_screen_create error path
Analogous to previous commits.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2016-05-23 12:07:47 +01:00
Dave Airlie e6d9389366 tgsi: remove culldist semantic.
This isn't used anymore in the tree, culldist's
are part of the clipdist semantic, we could in theory
rename it, but I'm not sure there is much point, and
I'd have to be careful with virgl.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 11:03:44 +10:00
Dave Airlie d17062a40e draw: stop using CULLDIST semantic.
The way the HW works doesn't really fit with having
two semantics for this.

The GLSL compiler emits 2 vec4s and two properties,
this makes draw use those instead of CULLDIST semantics.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-05-23 11:03:40 +10:00
Axel Davy 52cb8e33c3 gallium/util: Implement util_format_translate_3d
This is the equivalent of util_format_translate, but for volumes.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-05-18 23:37:14 +02:00
Brian Paul 5888c47cc9 cso: remove / add some comments
Signed-off-by: Brian Paul <brianp@vmware.com>
2016-05-17 19:20:36 -06:00
Jan Vesely 47b390fe45 Treewide: Remove Elements() macro
Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2016-05-17 15:28:04 -04:00
Jose Fonseca cf010de6ee vl/dri: Move the DRI3 check out of sources include into C.
Fixes SCons build.

Trivial.  Built locally with SCons and autotools.
2016-05-16 21:50:43 +01:00
Leo Liu c122c74dca vl/dri3: implement functions for get and set timestamp
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu 9f50a79b8f vl/dri3: handle PresentCompleteNotify event
and get timestamp calculated based on the event's reply

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu 8d7ac0a4e4 vl/dri3: implement DRI3 BufferFromPixmap
We also need render to the front buffer of temporary X pixmap,
this is the case of when we using opengl as video out for vaapi.
the basic implementation is to pass pixmap ID to X server, and
then X will return dma-buf fd, we will get the buffer object
through this dma-buf fd.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu 858b329c2c vl/dri3: add support for resizing
When drawable size changed, PresentConfigureNotify event will be
emitted, by handling the event to re-allocate resized buffer.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu 96580ad593 vl/dri3: implement funciton for get dirty area
This will clear presentation area not covered by video content

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu b0bd908284 vl/dri3: implement function for flush frontbuffer
Request drawable content in pixmap by calling DRI3 PresentPixmap,
and handle PresentIdleNotify event.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu e1223282db vl/dri3: add back buffers support
This implements DRI3 PixmapFromBuffer. Create buffer objects, and
associate it to a dma-buf fd, and then pass this fd with a pixmap
ID to X server for creating pixmap object; also add a function
for wait events.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu 69ba9be4d2 vl/dri3: implement flushing for queued events
also place holder for present events handling

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu 758b1bbaa7 vl/dri3: register present events
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu 672e8d5e7e vl/dri3: set drawable geometry
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu 12e5220e34 vl/dri3: add DRI3 support and implement create and destroy
Required functions into place for implementation, create screen
with device fd returned from X server, also bail out to DRI2
with certain conditions.

v2: -organize the error out path (Axel)
    -squash previous patch 1 and 2 into one (Emil)

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-05-16 16:28:51 -04:00
Leo Liu bd9ae72459 vl/dri: fix close fd error out
fd should be set to -1 only if it got closed by pipe_loader_release.

Signed-off-by: Leo Liu <leo.liu@amd.com>
2016-05-12 18:26:48 -04:00
Tim Rowley 2785f2f2d7 swr: properly expose compressed format support
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-05-12 14:12:18 -05:00
Rob Clark 425dc4c4b3 gallium: refactor pipe_shader_state to support multiple IR's
The goal is to allow the pipe driver to request something other than
TGSI, but detect whether what is getting is TGSI vs what it requested.
The pipe drivers will always have to support TGSI (and convert that into
whatever it is that they prefer), but in some cases we should be able to
skip the TGSI intermediate step (such as glsl->nir vs glsl->tgsi->nir).

I think pipe_compute_state should get similar treatment.  Currently,
afaict, it has one user and one consumer, which has allowed it to be
sloppy wrt. supporting alternative IR's.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-05-11 12:20:11 -04:00
Roland Scheidegger 430797843a gallivm: improve dumping of bitcode
Use GALLIVM_DEBUG=dumpbc for dumping of modules as bitcode.
Instead of a fixed llvmpipe.bc name, use ir_<modulename>.bc so multiple
modules can be dumped (albeit it might still overwrite previous modules,
particularly the modules from draw tend to always have the same name).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-05-11 04:43:35 +02:00
Roland Scheidegger e4cf8717de gallivm: print declarations of intrinsics with GALLIVM_DEBUG=ir
Those aren't really interesting, however outputting them is helpful when
trying to feed the IR to llvm llc (or opt) for debugging.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-05-10 17:08:16 +02:00
Roland Scheidegger 5c200894c8 gallivm: use InternalLinkage instead of PrivateLinkage for texture functions
At least with MCJIT the disassembler will crash otherwise when trying to
disassemble such functions.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-05-10 17:08:16 +02:00
Roland Scheidegger 8b66e2647d gallivm: disable avx512 features
We don't target this yet, and some llvm versions incorrectly enable it based
on cpu string, causing crashes.
(Albeit this is a losing battle, it is pretty much guaranteed when the next
new feature comes along llvm will mistakenly enable it on some future cpu,
thus we would have to proactively disable all new features as llvm adds them.)

This should fix https://bugs.freedesktop.org/show_bug.cgi?id=94291 (untested)

Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com

CC: <mesa-stable@lists.freedesktop.org>
2016-05-10 17:08:16 +02:00