Commit Graph

29279 Commits

Author SHA1 Message Date
Brian Paul 88a618ce86 tgsi: trivial build fix for MSVC
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-24 14:16:07 -07:00
Samuel Pitoiset 6dbb8d12a8 nv50/ir: do not perform global membar for shared memory
Shared memory is local to CTA, thus we should only wait for
prior memory writes which are visible to other threads in
the same CTA, and not at global level. This should speedup
compute shaders which use shared memory.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-24 22:51:54 +02:00
Axel Davy eed605a473 st/nine: Fix locking CubeTexture surfaces.
Only one face of Cubetextures was locked when in DEFAULT Pool.
Fixes:
https://github.com/iXit/Mesa-3D/issues/129

CC: "12.0 13.0" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-24 21:56:44 +02:00
Axel Davy fe7bb46134 st/nine: Fix mistake in Volume9 UnlockBox
In the format fallback path,
the height was used instead of the depth.

CC: "12.0 13.0" <mesa-stable@lists.freedesktop.org>

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-24 21:56:44 +02:00
Axel Davy 942778099e st/nine: Use align_calloc instead of align_malloc
We are not sure exactly what needs to be 0 initialized,
but we are missing some cases. 0 initialize all our current
aligned allocation.

Fixes Tree of Savior visual issues.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-24 21:56:44 +02:00
Axel Davy 54010cf8b6 gallium/util: Add align_calloc
Add implementation for align_calloc,
which is align_malloc + memset.

v2: add if (ptr) before memset.
Fix indentation.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:56:44 +02:00
Axel Davy 25beccb379 st/nine: Fix leak with integer and boolean constants
Leak introduced by:
a83dce0128

The patch also moves the part to
release changed.vs_const_i and changed.vs_const_b
before the if (!cb.buffer_size) check,
to avoid reuploading every draw call if
integer or boolean constants are dirty, but the shaders
use no constants.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
CC: "13.0" <mesa-stable@lists.freedesktop.org>
2016-10-24 21:56:44 +02:00
Marek Olšák f35b1d156b tgsi/scan: scan texture offset operands
This seems important considering how much we depend on some of the flags.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:38 +02:00
Marek Olšák a2f98dff14 tgsi/scan: move src operand processing into a separate function
the next commit will need this

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:36 +02:00
Marek Olšák 72267a25db tgsi/scan: get information about shader buffer usage
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:35 +02:00
Marek Olšák d89890d000 tgsi/scan: handle indirect image indexing correctly
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:33 +02:00
Marek Olšák ac37720f51 tgsi/scan: don't treat RESQ etc. as memory instructions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:30 +02:00
Marek Olšák f095a4eb17 tgsi/scan: get information about indirect 2D file access
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:28 +02:00
Marek Olšák 965a5f1810 tgsi/scan: get information about indirect CONST access
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-24 21:41:26 +02:00
Samuel Pitoiset d588e4f192 nv50/ir: display OP_BAR subops in debug mode
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-24 18:53:45 +02:00
Ilia Mirkin 7b7eb7170d nv50/ir: it appears that OP_DISCARD can't take a join modifier
nvdisasm does not print a .S even though the bit is set.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-22 12:02:35 -04:00
Ilia Mirkin adad576bfc nv50/ir: use levelZero for non-frag tex/txp ops
radeonsi also does the same thing. I suspect that this is likely to be a
no-op in reality, but it brings nouveau code closer to what the blob
produces. Plus it makes sense to not try to do auto-derivatives on this.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-22 12:02:35 -04:00
Ilia Mirkin 3fdeb7c983 gallium: add PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS
This allows the driver to signal that it can't handle random
interleaving of attributes across buffers. This is required for
ARB_transform_feedback3, and it's initialized to whatever the previous
value of PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME was except for nv50 where
it is disabled. Note that the proprietary drivers never expose
ARB_transform_feedback3 on any GT21x's (where nouveau previously did),
and after some effort I was unable to get it to work.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-22 12:02:35 -04:00
Samuel Pitoiset 6e08f3e96c nvc0/ir: remove outdated comment about SHLADD
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-22 14:50:17 +02:00
Eric Anholt 8ff4182876 vc4: Avoid making temporaries for assignments to NIR registers.
Getting stores to NIR regs to not generate new MOVs is tricky, since the
result we're trying to store into the NIR reg may have been from a
conditional update of a temp, or a series of packed writes.  The easiest
solution seems to be to require that nir_store_dest()'s arg comes from an
SSA temp.

This causes us to put in a few more temporary MOVs in the NIR SSA dest
case, but copy propagation successfully cleans those up.

The shader-db change is modest:

total instructions in shared programs: 93774 -> 93598 (-0.19%)
instructions in affected programs:     14760 -> 14584 (-1.19%)
total estimated cycles in shared programs: 212135 -> 211946 (-0.09%)
estimated cycles in affected programs:     27005 -> 26816 (-0.70%)

but I was seeing patterns in some register-allocation failures in DEQP
tests that looked like the extra MOVs would increase maximum register
pressure in loops.  Some debug code indicates that that's not the case,
though I'm still a bit confused by that result.
2016-10-21 14:12:22 -07:00
Eric Anholt a689b8b9df vc4: Add a comment with discussion of how simulation works. 2016-10-21 14:12:22 -07:00
Eric Anholt 83ffb607b7 vc4: Move simulator winsys mapping and tracking to the simulator.
One tiny hack is left in vc4_bufmgr.c for what kind of mapping we got so
that we can free it.
2016-10-21 14:12:22 -07:00
Eric Anholt 1c38ee380d vc4: Move simulator memory management to a u_mm.h heap.
Now we aren't limited to 256MB total allocated across a driver instance,
just 256MB at one time.  We're still copying in and out, which should get
fixed.
2016-10-21 14:12:22 -07:00
Eric Anholt 9f75522382 vc4: Move simulator globals into a struct.
I would like to put a couple more things in here, so it's time to package
it up.
2016-10-21 14:12:22 -07:00
Eric Anholt 78087676c9 vc4: Restructure the simulator mode.
Rather than having simulator mode changes scattered around vc4_bufmgr.c
and vc4_screen.c, make vc4_bufmgr.c just call a vc4_simulator_ioctl, which
then dispatches to a corresponding implementation.

This will give the simulator support a centralized place to do tricks like
storing most BOs directly in simulator memory rather than copying in and
out.

This leaves special casing of mmaping BOs and execution, because of the
winsys mapping.
2016-10-21 14:12:22 -07:00
Eric Anholt 1d7874fa7b vc4: Fix termination of the initial scan for branch targets.
The loop is scanning until the original max_ip (size of the BO), but we
want to not examine any code after the PROG_END's delay slots.  There was
a block trying to do that, except that we had some early continue
statements if the signal wasn't a PROG_END or a BRANCH.

The failure mode would be that a valid shader is rejected because some
undefined memory after the PROG_END slots is parsed as a branch and the
rest of its setup is illegal.  I haven't seen this in the wild, but
valgrind was complaining and the new userland simulator code started
triggering it.
2016-10-21 14:12:06 -07:00
Nicolai Hähnle 17353ef043 radeonsi: fix a regression in si_eliminate_const_output
A constant value of float type is not necessarily a ConstantFP: it could also
be a constant expression that for some reason hasn't been folded.

This fixes a regression in GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2
that was introduced by commit 3ec9975555.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-21 09:59:26 +02:00
Ilia Mirkin 8cf0f05713 nv50,nvc0: don't keep track of whether fb rt0 is integer-only
This reverts commits 1af0641db3 and
a6ad49cbbd.

st/mesa adjusts the rasterizer state for us now.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-21 02:28:26 -04:00
Samuel Pitoiset 42273edf79 nvc0: do not break 3D state by pushing MS coordinates on Fermi
Long story short, 3D and CP are aliased on Fermi and initializing
compute after pushing the MS sample coordinate offsets seems to
corrupt 3D state for weird reasons.

I still don't have the faintest clue what is going on, but
this seems to only affect Fermi generation. A possible fix
could be to use two different channels, one for 3D and one
for CP.

This fixes a bunch of regressions pinpointed by piglit.

Fixes: "nvc0: fix up image support for allowing multiple samples"
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-20 19:59:08 +02:00
Samuel Pitoiset 24e15aa198 nvc0: translate compute shaders at program creation
This makes shader-db reports results for compute shaders.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-20 19:46:18 +02:00
Marek Olšák c2a602d21a gallivm: try to fix build with LLVM <= 3.4 due to missing CallSite.h
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2016-10-20 17:45:23 +02:00
Marek Olšák f19f71830b radeonsi: fix build of si_eliminate_const_vs_outputs on LLVM <= 3.8
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-20 11:07:50 +02:00
Marek Olšák 2db56434d4 gallivm: add wrappers for missing functions in LLVM <= 3.8
radeonsi needs these.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-20 11:07:50 +02:00
Nicolai Hähnle 4a2dbfff05 radeonsi: fix 64-bit loads from LDS
Fixes spec/arb_tessellation_shader/execution/dvec[23]-vs-tcs-tes, among
others.

Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-20 10:37:07 +02:00
Ilia Mirkin cd45d758ff nv50/ir: process texture offset sources as regular sources
With ARB_gpu_shader5, texture offsets can be any source, including TEMPs
and IN's. Make sure to process them as regular sources so that we pick
up masks, etc.

This should fix some CTS tests that feed offsets directly to
textureGatherOffset, and we were not picking up the input use, thus not
advertising it in the shader header.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dave Airlie <airlied@redhat.com>
Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
2016-10-19 21:02:01 -04:00
Ilia Mirkin 313fba5ee1 nv50,nvc0: avoid reading out of bounds when getting bogus so info
The state tracker tries to attach the info to the wrong shader. This is
easy enough to protect against.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
2016-10-19 21:02:01 -04:00
Samuel Pitoiset 2b6e04e91f nvc0/ir: simplify predicate logic for GK104 atomic operations
The predicate is always CC_NOT_P as defined in
processSurfaceCoordsNVE4(), so we only want to emit OR.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-19 23:53:57 +02:00
Samuel Pitoiset 974ab614d3 nvc0/ir: remove useless NVC0LoweringPass::gMemBase
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-19 23:53:48 +02:00
Samuel Pitoiset 03dc87caab nv50/ir: print CCTL subops in debug mode
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-19 23:53:39 +02:00
Marek Olšák 3ec9975555 radeonsi: eliminate trivial constant VS outputs
These constant value VS PARAM exports:
- 0,0,0,0
- 0,0,0,1
- 1,1,1,0
- 1,1,1,1
can be loaded into PS inputs using the DEFAULT_VAL field, and the VS exports
can be removed from the IR to save export & parameter memory.

After LLVM optimizations, analyze the IR to see which exports are equal to
the ones listed above (or undef) and remove them if they are.

Targeted use cases:
- All DX9 eON ports always clear 10 VS outputs to 0.0 even if most of them
  are unused by PS (such as Witcher 2 below).
- VS output arrays with unused elements that the GLSL compiler can't
  eliminate (such as Batman below).

The shader-db deltas are quite interesting:
(not from upstream si-report.py, it won't be upstreamed)

PERCENTAGE DELTAS    Shaders PARAM exports (affected only)
batman_arkham_origins    589  -67.17 %
bioshock-infinite       1769   -0.47 %
dirt-showdown            548   -2.68 %
dota2                   1747   -3.36 %
f1-2015                  776   -4.94 %
left_4_dead_2           1762   -0.07 %
metro_2033_redux        2670   -0.43 %
portal                   474   -0.22 %
talos_principle          324   -3.63 %
warsow                   176   -2.20 %
witcher2                1040  -73.78 %
----------------------------------------
All affected             991  -65.37 %  ... 9681 -> 3353
----------------------------------------
Total                  26725  -10.82 %  ... 58490 -> 52162

v2: treat Undef as both 0 and 1

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)
2016-10-19 22:21:46 +02:00
Samuel Pitoiset 041da0ae81 nv50/ir: silent TGSI_PROPERTY_FS_DEPTH_LAYOUT
Found that information message while replaying a trace from
Metro 2033 Redux. Mark that property as useless for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-19 21:02:50 +02:00
Marek Olšák a2ea653a49 radeonsi: remove cb0_is_integer handling
st/mesa does this for us.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-19 19:26:30 +02:00
Roland Scheidegger aeceec54a8 draw: improve vertex fetch (v2)
The per-element fetch has quite some calculations which are constant,
these can be moved outside both the per-element as well as the main
shader loop (llvm can figure out it's constant mostly on its own, however
this can have a significant compile time cost).
Similarly, it looks easier swapping the fetch loops (outer loop per attrib,
inner loop filling up the per vertex elements - this way the aos->soa
conversion also can be done per attrib and not just at the end though again
this doesn't really make much of a difference in the generated code). (This
would also make it possible to vectorize the calculations leading to the
fetches.)
There's also some minimal change simplifying the overflow math slightly.
All in all, the generated code seems to look slightly simpler (depending
on the actual vs), but more importantly I've seen a significant reduction
in compile times for some vs (albeit with old (3.3) llvm version, and the
time reduction is only really for the optimizations run on the IR).
v2: adapt to other draw change.

No changes with piglit.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-10-19 01:44:59 +02:00
Roland Scheidegger 0942fe548e draw: improved handling of undefined inputs
Previous attempts to zero initialize all inputs were not really optimal
(though no performance impact was measurable). In fact this is not really
necessary, since we know the max number of inputs used.
Instead, just generate fetch for up to max inputs used by the shader,
directly replacing inputs for which there was no vertex element by zero.
This also cleans up key generation, which previously would have stored
some garbage for these elements.
And also drop the assertion which indicates such bogus usage by a
debug_printf (the whole point of initializing the undefined inputs was to
make this case safe to handle).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-10-19 01:44:59 +02:00
Roland Scheidegger d1b4a3451e gallivm: print out time for jitting functions with GALLIVM_DEBUG=perf
Compilation to actual machine code can easily take as much time as the
optimization passes on the IR if not more, so print this out too.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-10-19 01:44:59 +02:00
Roland Scheidegger 6f2f0daeb4 gallivm: Use native packs and unpacks for the lerps
For the texturing packs, things looked pretty terrible. For every
lerp, we were repacking the values, and while those look sort of cheap
with 128bit, with 256bit we end up with 2 of them instead of just 1 but
worse, plus 2 extracts too (the unpack, however, works fine with a
single instruction, albeit only with llvm 3.8 - the vpmovzxbw).

Ideally we'd use more clever pack for llvmpipe backend conversion too
since we actually use the "wrong" shuffle (which is more work) when doing
the fs twiddle just so we end up with the wrong order for being able to
do native pack when converting from 2x8f -> 1x16b. But this requires some
refactoring, since the untwiddle is separate from conversion.

This is only used for avx2 256bit pack/unpack for now.

Improves openarena scores by 8% or so, though overall it's still pretty
disappointing how much faster 256bit vectors are even with avx2 (or
rather, aren't...). And, of course, eliminating the needless
packs/unpacks in the first place would eliminate most of that advantage
(not quite all) from this patch.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2016-10-19 01:44:59 +02:00
Brian Paul 8b731b8b03 svga: minor code improvements in svga_validate_pipe_sampler_view()
Use the 'texture' local var in more places.
Rename 'pFormat' to 'viewFormat'.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2016-10-18 16:16:26 -06:00
Boyuan Zhang 5567145d59 st/va: force to flush the last p frame in idr period
During dual instance encoding submission, if the second encode task and first
encode task have no reference dependency, e.g. p following with idr-frame,
there is a chance the second task will use for its reconstructed picture
buffer the same buffer used by first task for its reference/reconstructed
picture. In this case, buffer corruption may occur depending on encoding
speed. Fix is to force flush these two tasks separately to avoid race condition

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98005

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2016-10-18 15:16:34 -04:00
Marek Olšák 21af69e753 radeonsi: rename prefixes from radeon to si
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-18 18:41:08 +02:00
Marek Olšák 6e475fefa1 radeonsi: merge radeon_llvm_context and si_shader_context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-18 18:41:06 +02:00
Marek Olšák 5ab25bb4ba radeonsi: import all TGSI->LLVM code from gallium/radeon
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-18 18:41:04 +02:00
Marek Olšák 4967cacdfa gallium/radeon: simplify initialization of 64-bit gallivm builders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-18 18:41:03 +02:00
Marek Olšák 502dad4dca gallium/radeon: remove unused radeon_llvm_reg_index_soa
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-18 18:41:01 +02:00
Marek Olšák 4e5d076fcf radeonsi: move LLVM ALU codegen into radeonsi
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-18 18:40:59 +02:00
Emil Velikov af7abc512c loader: remove loader_get_driver_for_fd() driver_type
Reminiscent from the pre-loader days, were we had multiple instances of
the loader logic in separate places and one could build a "GALLIUM_ONLY"
version.

Since that is no longer the case and the loaders (glx/egl/gbm) do not
(and should not) require to know any classic/gallium specific we can
drop the argument and the related code.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 17:06:29 +01:00
Ilia Mirkin 8c78fdb328 gm107/ir: fix bit offset of tex lod setting for indirect texturing
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-10-18 09:56:14 -04:00
Ilia Mirkin ecea2f69ef gm107/ir: fix texturing with indirect samplers
The indirect handle has to come right after the coordinates, so if there
was a sample/bias/depth compare/offset, everything would end up being
shifted by one argument position.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-10-18 09:56:14 -04:00
Marek Olšák 34099894c3 gallium/tgsi: add missing #include
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-18 11:20:57 +02:00
Julien Isorce dbc8e18116 st/va: set default rt formats when calling vaCreateConfig
As specified in va.h, default value should be set on attributes
not present in the input list.

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
2016-10-18 08:44:14 +01:00
Nicolai Hähnle 9160b4d981 radeonsi: unify the constant load paths
Remove the split between direct and indirect.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-17 19:08:45 +02:00
Nicolai Hähnle 51f9b38ce8 radeonsi: fix indirect loads of 64 bit constants
This fixes GL45-CTS.compute_shader.fp64-case3.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-17 19:08:36 +02:00
Marek Olšák 74d145f4a8 radeonsi: shorten "shader->selector" to "sel" in si_shader_create
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-17 12:13:00 +02:00
Marek Olšák 2e74e8ead9 radeonsi: clear DB_RENDER_OVERRIDE
Vulkan doesn't set these fields even though it doesn't use HiS.
HiS is disabled by programming DB_SRESULTS_COMPARE_STATEn to 0.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-17 12:13:00 +02:00
Axel Davy 9baf4505fb st/nine: Fix multisample limit check
Fixes regression introduced by
b560305687

The regression prevents some apps to start.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-17 00:02:52 +02:00
Eric Anholt c61eb3c91c vc4: Fix fast clear color packing for 565.
Piglit didn't manage to cover this because fbo-clear-formats uses
scissors, so we don't get fast clearing.
2016-10-16 11:22:50 -07:00
Tobias Klausmann b7d9677de8 nv50/ir: constant fold OP_SPLIT
Split the source immediate value into new values and move them into the
original defs set by the split. Since we can only have up to 64-bit
immediates, this is largely beneficial for F64 (and, in the future, U64)
operations.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
[imirkin: always use U32, set newi for foldCount tracking]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-14 23:23:57 -04:00
Jose Fonseca c6d17701c8 pipe_loader_sw: Don't invoke Unix close() on Windows.
Trivial.
2016-10-14 16:29:04 +01:00
Emil Velikov 48267b730c gallium: annotate sw_driver_descriptor instance as const data
Already treated and handled as such.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-14 11:09:00 +01:00
Emil Velikov 792148f16a gallium: annotate drm_driver_descriptor instance as const data
Already treated and handled as such.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-14 11:09:00 +01:00
Emil Velikov c079a206ad gallium: rename drm_driver_descriptor::{, driver_}name
Historically we use "device name" for the name of the kernel module and
"driver name" for the dri/other driver.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-14 11:09:00 +01:00
Emil Velikov 9837cf13b1 gallium: remove unused drm_driver_descriptor::driver_name
Likely unused since day 1, although I've only checked back until the
st/dri unification with commit 29ca7d2c94 ("st/dri: merge dri/drm and
dri/sw backends")

Based on the comment, referencing drmOpenByName it's not something we
want to bring back.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-14 11:09:00 +01:00
Emil Velikov 0f031dcf11 gallium: fix drm_driver_descriptor::name comment
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-14 11:09:00 +01:00
Mark Thompson 0b241b7717 st/va: Fix H.264 PicOrderCnt value
TopFieldPicOrderCnt is exactly the PicOrderCnt value for a frame - see
H.264 section 8.2.1.

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-10-14 11:57:52 +02:00
Mark Thompson 1edaa33135 st/va: Baseline profile is not supported
Constrained baseline profile is supported, so use that instead.  This
matches what the encoder already does (constraint_set1_flag is always
set in the output bitstream).

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-10-14 11:57:48 +02:00
Mark Thompson e0604eed9f st/va: Return surface formats depending on config chroma format
This makes the supported format actually match the configuration, and
allows the user to observe that NV12 is supported for video processing
where previously they couldn't (though it did always work if they
blindly tried to use it anyway).

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-10-14 11:57:44 +02:00
Mark Thompson e7c7ef3625 st/va: Save surface chroma format in config
Both YUV420 and RGB32 configurations are supported, so we need to be
able to distinguish which is being used.

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-10-14 11:57:40 +02:00
Mark Thompson 8a931c83ba st/va: Return more useful config attributes
The encoder attributes are needed for a user of the encoder to be
able to configure it sensibly without internal knowledge.

Reviewed-by: Christian König <christian.koenig@amd.com>
2016-10-14 11:57:25 +02:00
Tim Rowley a42c22fdbf swr: [rasterizer core] don't construct pArContext on non-ar builds
Stops debug directory being created on non-ar builds.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-13 23:39:14 -05:00
Tim Rowley 29d07480b8 swr: [rasterizer core] remove WorkerWaitForThreadEvent bucket
Cause of bucket stop capture hang, as threads get stuck in level 1.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-13 23:39:14 -05:00
Tim Rowley ada27b503e swr: [rasterizer core] move binner functionality to separate file
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-13 23:39:14 -05:00
Tim Rowley f0a66c1da2 swr: [rasterizer scripts] add DEBUG_OUTPUT_DIR knob
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-13 23:39:14 -05:00
Tim Rowley ffd0224303 swr: [rasterizer core] fix comment typo
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-13 23:39:14 -05:00
Tim Rowley 4889922210 swr: [rasterizer core/sim] 8x2 backend + 16-wide tile clear/load/store
Work in progress (disabled).

USE_8x2_TILE_BACKEND define in knobs.h enables AVX512 code paths
(emulated on non-AVX512 HW).

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-13 23:39:14 -05:00
Tim Rowley bf1f46216c swr: [rasterizer archrast] fix event file issue with saving data
Also, tagging stats with draw id to correlate these events with
draw/dispatch events.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-13 23:39:13 -05:00
Eric Engestrom 827e038062 swr: [rasterizer common] fix assert index
Fixes: b3bd8bb611 ("swr: [rasterizer core] add support
       for "RAW" surface format")
CovID: 1373647
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-13 21:37:20 -05:00
Ilia Mirkin afb6dc53bf nv50: enable ARB_enhanced_layouts
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-13 21:45:21 -04:00
Ilia Mirkin a6d6eff2e6 nvc0/ir: be more careful about preserving modifiers in SHLADD creation
src2 was being given the wrong modifier, and we were not properly
managing the modifier on the SHL source either.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-13 21:44:03 -04:00
Brian Paul b81546d43c tgsi: fix comment typo in tgsi_ureg.c
Trivial.
2016-10-13 17:38:49 -06:00
Eric Anholt 99d790538d vc4: Avoid loading from the texture during non-utile-aligned glTexImage().
Previously, the plan was "if the width/height we have to load/store isn't
the size the user is planning on writing, then we need to load the old
contents out beforehand to prevent writing back undefined".

However, when we're doing glTexImage() we often end up aligning the
width/height into the padding of the texture, and we don't actually
need to read out that padding.

Improves x11perf -aatrapezoid100 performance from ~460/sec to
~700/sec.
2016-10-13 14:27:30 -07:00
Axel Davy 0717cd975d st/nine: Fix possible segfault in surface ctor
Regression introduced by
ba0274c7d6

Check the resource exists before assigning it
a flag (and use This->base.resource instead
of pResource, since the former may have a newly
allocate resource, while the latter would be
NULL).

This should reintroduce the behaviour of previous
code.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-13 21:16:35 +02:00
Axel Davy 98b8ad61c6 st/nine: Remove useless code in nine_shader
Since 1604efa6fd,
lconsti and lconstb don't need to be initialized.

Remove some leftovers from the previous code (which
has now invalid use of ARRAY_SIZE on a pointer instead
of an array).

Reported by Coverity.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-13 21:16:35 +02:00
Axel Davy 197cdd1bbd gallium/os: Use unsigned integers for size computation
Use uint64_t instead of int64_t in the calculation,
as the result is uint64_t.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-13 21:16:35 +02:00
Samuel Pitoiset 4527222169 nvc0: enable ARB_enhanced_layouts
All ARB_enhanced_layouts piglit tests pass without any changes
in our compiler.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-13 21:13:34 +02:00
Marek Olšák 7dddf0b7ab radeonsi: adjust and clean up Z_ORDER and EXEC_ON_x settings
The table was copied from the Vulkan driver. The comment lines are as long
as the table for cosmetic reasons.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-13 19:00:51 +02:00
Marek Olšák e12c1cab5d radeonsi: disable ReZ
This is a serious performance fix. Discovered by luck.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94354

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-13 19:00:51 +02:00
Marek Olšák d4d9ec55c5 radeonsi: implement TC-compatible HTILE
so that decompress blits aren't needed and depth texturing needs less
memory bandwidth.

Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible
HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16.
The format promotion is not visible to state trackers.

This is part of TC-compatible renderbuffer compression, which has 3 parts:
DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now.

I don't see a measurable increase in performance though.

(I tested Talos Principle and DiRT: Showdown, the latter is improved by
 0.5%, which is almost noise, and it originally used layered Z16,
 so at least we know that Z16 promoted to Z32F isn't slower now)

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-13 19:00:51 +02:00
Marek Olšák a077185ea9 gallium: add PIPE_RESOURCE_FLAG_TEXTURING_MORE_LIKELY
For performance tuning in drivers. It filters out window system
framebuffers and OpenGL renderbuffers.

radeonsi will use this to guess whether a depth buffer will be read
by a shader. There is no guarantee about what will actually happen.

This is a departure from PIPE_BIND flags which are defined to be strict
but they are useless in practice.

Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-13 19:00:51 +02:00
Nicolai Hähnle 761388a0eb radeonsi: fix regression in image atomics
Caused by a bad rebase when pushing commit 76a940893.
2016-10-13 16:04:16 +02:00
Nicolai Hähnle 76a940893d radeonsi: fix the coordinate overloading of llvm.amdgcn.image.atomic.cmpswap.*
Fixes GL45-CTS.shader_image_load_store.basic-allTargets-atomic*

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-13 10:17:42 +02:00
Emil Velikov a4622305e6 swr: automake: add ar_eventhandlerfile_h.template to the tarball
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-12 18:55:22 +01:00
Ilia Mirkin a48a343c29 nvc0/ir: fix textureGather with a single offset
Recent fix for non-const offsets broke the case of a single offset (vs 4
offsets). The later code relies on the offs array to contain null values
to tell whether they should be added onto the srcs list.

Fixes: 5239bd592 ("nvc0/ir: fix overwriting of value backing non-constant gather offset")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-10-12 13:18:14 -04:00
Ilia Mirkin 300b5ad023 nv50/ir: copy over value's register id when resolving merge of a phi
The offset needs to be properly copied over to the phi value, otherwise
it will get assigned to the base of the merge instead of the proper
location.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-10-12 13:18:14 -04:00
Nicolai Hähnle 789119d212 st/mesa: enable ARB_enhanced_layouts and turn the cap on
v2: mark llvmpipe & softpipe properly as well (Jason Wood)

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-12 18:50:10 +02:00
Nicolai Hähnle 2b460c750a tgsi/ureg: add ureg_DECL_output_layout
For specifying an exact location/component.

v2: change the order of parameters (Dave)

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1)
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
2016-10-12 18:50:10 +02:00
Nicolai Hähnle 047a7c7a0b tgsi/ureg: add layout/component input declarations
v2: change the order of parameters (Dave)

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1)
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
2016-10-12 18:50:10 +02:00
Nicolai Hähnle f9a01f3872 tgsi/scan: fix num_inputs/num_outputs for shaders with overlapping arrays
v2: remove a tautological left-over assert (Marek)

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1)
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
2016-10-12 18:50:10 +02:00
Nicolai Hähnle 700a571f89 gallium: add PIPE_CAP_TGSI_ARRAY_COMPONENTS
This is a screen cap because drivers are expected to support it either
for all shader types or for none of them.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2016-10-12 18:50:10 +02:00
Tom Stellard b33cb709fd radeonsi: Use the new image load/store intrinsic signatures
This patch requires LLVM r284024 or newer.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 16:42:43 +00:00
Tom Stellard ff0df66e10 radeonsi: Add function for converting LLVM type to intrinsic string
The existing function only worked for integer types.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 16:42:07 +00:00
Tom Stellard a96a7eae04 radeonsi: Refactor image store/load intrinsic name creation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 16:42:07 +00:00
Marek Olšák d7e74b52bb winsys/amdgpu: fix infinite loop w/ RADEON_NOOP=1 caused by unsubmitted fences
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 18:29:40 +02:00
Marek Olšák e4bbab9022 radeonsi: fix R600_DEBUG=precompile for shader-db
radeonsi no longer supports pixel shaders without interpolation optimizations,
which led to assertion failures in si_shader_ps when running shader-db.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 18:29:40 +02:00
Marek Olšák 40e1f7e09b radeonsi: use TC write-back instead of full cache invalidation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 18:29:40 +02:00
Marek Olšák 8cdce30cc2 radeonsi: implement TC L2 write-back (flush) without cache invalidation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 18:29:40 +02:00
Marek Olšák 65a4d55a9f radeonsi: don't invalidate VMEM L1 for memory barriers for index buffers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-12 18:29:40 +02:00
Samuel Pitoiset 87b06cab14 nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)
total instructions in shared programs :2286901 -> 2284473 (-0.11%)
total gprs used in shared programs    :335256 -> 335273 (0.01%)
total local used in shared programs   :31968 -> 31968 (0.00%)

                local        gpr       inst      bytes
    helped           0          41         852         852
      hurt           0          44          23          23

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-12 17:46:03 +02:00
Roland Scheidegger 7e86b2ddae draw: initialize shader inputs
This should make the code more robust if a shader tries to use inputs which
aren't defined by the vertex element layout (which usually shouldn't happen).

No piglit change.

Reviewed-by: Brian Paul <brianp@vmware.com>
2016-10-12 15:05:44 +02:00
Ilia Mirkin 389d6dedbe trace: add invalidate_resource callback
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-11 20:47:54 -04:00
Marek Olšák b425b57d1e radeonsi: emit TA_CS_BC_BASE_ADDR on SI only if the kernel allows it
Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-11 20:04:57 +02:00
Tim Rowley 9db9c61d26 swr: [rasterizer archrast] update proto file
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:48:23 -05:00
Tim Rowley 3805e40f32 swr: [rasterizer archrast] add support for stats files
Only stat and counter events are saved to the event files.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:48:23 -05:00
Tim Rowley f4684cdb5f swr: [rasterizer jitter] remove architecture override
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:48:23 -05:00
Tim Rowley 185a531206 swr: [rasterizer jitter] adjust jitmanager assert
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:48:17 -05:00
Tim Rowley eaec263427 swr: [rasterizer] eliminate unused label warnings on gcc
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley 12e6f4c879 swr: [rasterizer core] implement depth bounds test
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley 1b86c050ad swr: [rasterizer core] update/add formats
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley a907b7a5f7 swr: [rasterizer core] SwrStoreTiles api change
SwrStoreTiles now takes a mask of surfaces to store.  Reduces
overhead when storing multiple render targets.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley 5d5179a6c2 swr: [rasterizer scripts] add ENABLE_ASSERT_DIALOGS knob for windows
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley 07326d4006 swr: [rasterizer archrast] add mako template
Add template for generating code to save events to a file.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley e845eeb0be swr: [rasterizer core] disable cull for rect_list
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley b3bd8bb611 swr: [rasterizer core] add support for "RAW" surface format
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley 2966d9c691 swr: [rasterizer core] align Macrotile FIFO memory to SIMD size
Align and use streaming store instructions for BE fifo queues.
Provides slightly faster enqueue and doesn't pollute the caches.
Add appropriate memory fences to ensure streaming writes are
globally visible.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley 6b3691c876 swr: [rasterizer common] remove threadviz code
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Tim Rowley 2550b04179 swr: [rasterizer memory] split load/store for compile speed
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-10-11 11:22:04 -05:00
Nicholas Bishop 64435fd888 i915g: fix incorrect gl_FragCoord value
On Intel Pineview M hardware, the i915 gallium driver doesn't output
the correct gl_FragCoord. It seems to always have an X coord of 0.0
and a Y coord of the window's height in pixels, e.g. 600.0f or such.

I believe this is a regression caused in part by this commit:
afa035031f

The old behavior used the output at index zero, while the new behavior
uses actual zeroes. In the case of gl_FragCoord the output at index
zero happened to be the correct one, so the behavior appeared correct
although the code already had a bug.

Fixed by checking for I915_SEMANTIC_POS when setting up texCoords. If
the generic_mapping is I915_SEMANTIC_POS, look for the
TGSI_SEMANTIC_POSITION instead of a TGSI_SEMANTIC_GENERIC output.

https://bugs.freedesktop.org/show_bug.cgi?id=97477

Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Tested-by: Stéphane Marchesin <marcheu@chromium.org>
2016-10-10 18:32:36 -07:00
Axel Davy eef0744d43 st/nine: More checks for GetRenderTargetData
Fixes a wine test crash

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:51 +02:00
Patrick Rudolph a52e700169 st/nine: Add debug output for lost devices
Add debug output to ease debugging.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph 5d85253dc3 st/nine: Prevent crash in GetRenderTargetData
Return error instead of crashing on source surfaces
with format D3DFMT_NULL.

Fix for issue #236.

Tested on Windows 7.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph 09edc0555f st/nine: Set CLAMP_TO_EDGE on cubetextures
Wine tests show that cubetextures always use
PIPE_TEX_WRAP_CLAMP_TO_EDGE regardless of set
sampler states.

Fixes failing d3d9 wine test test_cube_wrap.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph fa2574497b st/nine: handle possible failure of D3DWindowBuffer_create
Check for errors and pass them to the callers.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph f04fa0a62c st/nine: Assert on buffer creation failure
Add an assert to make sure buffer creation doesn't fail.
Add error handling in calling functions.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph f8c01e7a96 st/nine: Use NineDevice9_CreateDepthStencilSurface in swapchain9
Replace custom code with NineDevice9_CreateDepthStencilSurface.
All functionality is given now.
2016-10-10 23:43:51 +02:00
Axel Davy 63367e6c95 st/nine: Fix check and remove useless code in swapchain9
The removed code was there for two reasons:
1) Allow DF16, DF24, INTZ to be used as depth buffer
for swapchain, if the driver doesn't support
PIPE_BIND_SAMPLER_VIEW for the underlying format
2) Set PIPE_BIND_SAMPLER_VIEW if possible, such that
if StretchRect is called on the depth texture, it is happy.

1) The reason these formats needed a workaround is because
the check flags for them in CheckDeviceFormat were incorrect,
which led applications to think the formats were valid for
swapchains, even if they weren't supported.
2) StretchRect limitations for depth buffers force
the resource_copy_region path, which should be fine without
PIPE_BIND_SAMPLER_VIEW.

Thus fix the check for 1), and remove the code.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph 60624be203 st/nine: Implement MSAA quality levels
Advertise quality levels:
Each supported multisample count matches to one quality level.
The application doesn't know how much samples each quality level has.
For that reason it's not possible to set the multisample mask.

Return errors on quality level missmatch.

Fixes several old games not having multisample support until now.

Fix for issue #73.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph 8a50b1244f st/nine: Prepare update_framebuffer for MS quality levels
Compare resource's nr_samples instead of D3D multisample level.
Required for multisample quality levels to work correct.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph b560305687 st/nine: Add additional error handling in CheckDeviceMultiSampleType
Return one supported quality level in error cases.
Return error on invalid multisample count.

Fixes failing wine tests.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph 7afab4ad39 st/nine: Fix compiler warning
Use strict aliasing in SetPrivateData and struct pheader.
Casting char[1] to IUnknown** isn't allowed in strict aliasing.
Compute pointer to body by adding size of header to header pointer.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph b9f31111ac st/nine: Remove resource9 {Set/Get/Free}PrivateData functions
Remove {Set/Get/Free}PrivateData in resource9.
Functionality has been implement in IUnknown interface.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph 03888e8a46 st/nine: Remove volume9 {Set/Get/Free}PrivateData functions
Remove {Set/Get/Free}PrivateData in volume9.
Functionality has been implement in IUnknown interface.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph 485cba7eb4 st/nine: Switch {Set/Get/Free}PrivateData functions
Switch {Set/Get/Free}PrivateData function to introduced IUnknown functions.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph 4117f5e1ab st/nine: Implement {Set/Get/Free}PrivateData in iunknown
Implement {Set/Get/Free}PrivateData in iunknown to get rid
of duplicated code in resource9 and volume9.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph c1c8e852c1 st/nine: Return device in NineSurface9_GetContainer
According to MSDN the device is returned for surfaces that do
not have a regular container.

Such surfaces are:
OffscreenPlainSurface, DepthStencilSurface and RenderTarget

Tested and verified on Windows.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph ba0274c7d6 st/nine: Allocate surface resources in surface ctor
Allocate resources in surface ctor.
Allows to use statetracker internal memory accounting.

Fix for issue #231.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Axel Davy 1f65f67b21 st/nine: Fix D3DFMT_NULL size
D3DFMT_NULL is mapped to PIPE_FORMAT_NONE.
Instead of relying on PIPE_FORMAT_NONE to
return a size, pick one.
The one picked is the same than Wine.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph 9dc792b95b st/nine: Add debugging output
Add DBG calls to NineTexture9_GetLevelDesc and
NineTexture9_GetSurfaceLevel to ease debugging.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph 8ceb2264c5 st/nine: Fix assert in NineUnknown_QueryInterface
Tests showed that is allowed to call this method on
object that have a zero refcount.
Required for issue #230.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:51 +02:00
Patrick Rudolph f2eacef33d st/nine: Print interface id in NineVolume9_GetContainer
To ease debugging print interface id.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Patrick Rudolph 489dbc51ae st/nine: Print interface id in NineSurface9_GetContainer
To ease debugging print interface id.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Patrick Rudolph e63a38832b st/nine: Print interface id in NineUnknown_QueryInterface
To ease debugging print interface id.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Patrick Rudolph 6a1cce20b6 st/nine: Move assert in NineSurface9_ctor
Move assert to function entry.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy 851e4b8d8a st/nine: Properly declare sampler states for ff
Fixes a softpipe assertion failure with wine tests

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy 5ce23c1689 st/nine: Handle user clipping planes properly for ff
Found reading msdn and checking Wine.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy d2fd296648 st/nine: Fix the calculation of the number of vs inputs
Fixes hangs on radeonsi, and assert on llvmpipe.

Signed-off-by: Axel Davy <axel.davy@ens.fr>

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-10-10 23:43:50 +02:00
Axel Davy 71e7292a85 st/nine: Fix specular w coordinate
Found looking at Wine formulas.
Fixes a few visual issues.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy 732cea09cd st/nine: Disable parts of lighting calculation if no normal provided
Behaviour found in Wine sources, and checked with some test apps.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy fc9bb19dce st/nine: Fix condition for specular lightning
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy c56c7c1fc8 st/nine: Do always accumulate diffuse
According to spec.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy c5bce80f50 st/nine: Initialize ps ff registers
Found with wine tests for the rTmp register.
Not sure for the other ones.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy 4ed3d5ee57 st/nine: Do not pollute rTmp in ff ps
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy d9b8b3196e st/nine: Allocate temporaries on demand for ps ff
Same change than for vs ff.
This makes it easier to not introduce mistakes
reusing temporaries whose result shouldn't be
erased.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy f7dd27aed3 st/nine: Fix texbem
Error found with wine tests.
nine_shader was expecting another order
than the one device9 was using.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy 7afcbb49ba st/nine: Fix ff computation for inverse
Thanks to wine tests.
Apparently 4x4 inverse is to be used, and
if the inverse can't be calculated, the
input matrix is to be used.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy 36399f9a7f st/nine: Used normed Vtx for reflectionvector
Fix deduced from the spec.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy eda1e6ece7 st/nine: Implement SPHEREMAP
Behaviour checked with a test app.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy a3ddc80ec8 st/nine: Enable passthrough only if positiont is used
Wine tests for the passthrough feature are for positiont.

Nothing seems to indicate passthrough happens when positiont
it not used. However having passthrough with positiont makes
sense (to be used with ProcessVertices outputs).

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy 0b5bed774b st/nine: Fix wrong mask in ff vs
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy 028dab95f6 st/nine: Fix tweening factor computation
The computation was reversed.
Deduced by tests on windows.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy 1fe055338d st/nine: Disable ff vertex blending if required inputs are missing
This behaviour has been partially tested on windows.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy aa69bb6848 st/nine: Use materials if source is not given.
Deduced by test on windows.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy ab068a78d3 st/nine: Fix ff SPECULARENABLE
We were (wrongly) adding specular to diffuse
in vertex shaders when SPECULARENABLE was set.

However the spec says specular has to be added
after texture processing (which is in ps).
Besides SPECULARENABLE is flagged as a pixel state.

There was unused support for SPECULARENABLE
in the ps ff code.
Remove the vs code, and use the ps code.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy 1d7890a441 st/nine: Undefined specular should be full of zeros
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy d9330f9348 st/nine: Implement normal transformation with vertex blending
The formula is different from the one of the spec,
but otherwise nothing particular.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy 305e8106ab st/nine: Increase MaxVertexBlendMatrixIndex
Modern cards do advertise 8.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy 567be40de9 st/nine: Compact ff vs constants a bit
There are several holes. This patch reduces
the holes a bit, which reduces the size of
the constant buffer uploaded.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy 07d1f32e0f st/nine: Fix vertex blending aVtx computation
There was an multiplication by the world matrix 0
which had nothing to do there.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy d9d8cb9f19 st/nine: Reorganize ff vtx processing
The new order simplified the code a bit for
next patches.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy cde74cba71 st/nine: Small simplification for position_t and fog
position_t disables fog computation.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy 5d2a8e8a36 st/nine: Cleaning code for vs temporaries
This has been a real mess up to now: the temporaries
were allocated once, and shared after that between
the different parts of the code.

To help maintaining the code, the temporaries are now
allocated and released on need.

As surprising as it could be, this patch, which was
supposed to introduce no behaviour change, actually
solved a visual bug observed on a sample program.
This was due to ureg_normalize3 polluting a temporary
variable.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy 1f18b6f351 st/nine: No need for the local flag for temporaries in ff
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy eb9ad8f969 st/nine: Handle D3DRS_NORMALIZENORMALS
When this state is set, the normals computed
in the vs ff shader should be normalized.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:50 +02:00
Axel Davy b9639c661f st/nine: Initial ProcessVertices support
For now only VS 3 support is implemented.

This enables The Sims 2 to work.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:50 +02:00
Axel Davy 3bf02d383f st/nine: Partial software vertex processing support
Software Vertex Processing allows:
. Less limitations for shaders (more loops, etc)
. Less limitations for ff (more enabled lights, 255
matrices for VertexBlend)

In particular shaders can get more constants.
This patch implements support for this (not using software
rendering, but hardware rendering, as llvmpipe and dx10+ hw
have the same limits...)

This is considered a second class path. Even apps asking for
"Mixed Vertex processing" (ie the ability to switch to swvp
on demand) do not use the feature much. Some just initialize
more constants than the normal limit at the start of the
application, but never use more than the normal limit.
When the apps do not need the software vertex processing
features, they do not seem to turn it on. This means it is
ok if that path is slow.
Thus no care has been made to make the path optimized.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy f8c8f44244 st/nine: Rework vs int and bool constants buffer
This will help to support swvp constants.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy a83dce0128 st/nine: Change dirty tracking for vs int and bool constants
This change makes easier to introduce tracking for
swvp constants.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy f78089b962 st/nine: Drop unused constant upload path
This path has been disabled for some time because
of some bugs with it. It hasn't been updated to the
new features, and is not faster.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:49 +02:00
Axel Davy 1604efa6fd st/nine: Add support for swvp constants in shaders
swvp has relaxed limits (more nested loops, etc).
In particular it enables more constants.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy 56ea3df7d4 st/nine: Initial mixed vertex processing support
In mixed vertex processing, the user can enable or disable
software vertex processing. It is on hardware by default.

This feature is not a state, and thus the setting doesn't
need to be recorded by stateblocks.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy 747f1ef8b6 st/nine: Implement SetNPatchMode
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy ded7a73eb3 st/nine: Implement D3DUSAGE_SOFTWAREPROCESSING
Buffers with this flag must be usable with both software
and hardware vertex processing. Use Staging for fast cpu access.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
2016-10-10 23:43:49 +02:00
Patrick Rudolph 19703f2a36 st/nine: Allocate more space for ATI1
ATIx are "unknown" formats that do not follow block format conventions.
Tests showed that pitch*height bytes are allocated.
apitrace used to depend on this behaviour.
It used to copy more bytes than it has to for the ATI1 block format,
but it didn't crash on Windows.

Increase buffersize for ATI1 to fix this crash.
The same issue was present in WINE but a patch has been sent by me.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Patrick Rudolph ec6c636722 st/nine: Add missing break
Add missing break instruction.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy 03f60a3357 st/nine: Implement relative addressing for ps inputs
To implement the feature we copy the ps inputs to a temp array.
This is not optimal for performance, but it is the simplest solution.

This is a feature that is very very rarely used.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy a5d308e51a st/nine: Wait for pending tasks to execute in swapchain
Fixes crash after Reset() when using thread_submit=true

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy f090705075 st/nine: Use fixed size arrays for swapchain buffers
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Patrick Rudolph a719800cb8 st/nine: Fix buffer count check for Ex devices
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy 9ff0dc3129 st/nine: Disable seamless cubemap for d3d
d3d9 doesn't have seamless cubemap.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy f0ec54ee32 st/nine: Fix some check flags
Uses the new defines introduced in previous commit.
See comment in the commit for more explanation.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:49 +02:00
Axel Davy 39e98d351f st/nine: Unify some check flags
The new defines will be reused in a later patch.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
2016-10-10 23:43:48 +02:00
Axel Davy 2290eac84e gallium/util: Really allow aliasing of dst for u_box_union_*
Gallium nine relies on aliasing to work with this function.
Without this patch, dirty region tracking was incorrect, which
could lead to incorrect textures or vertex buffers.
Fixes several game bugs with nine.
Fixes https://github.com/iXit/Mesa-3D/issues/234

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
2016-10-10 23:43:48 +02:00
Axel Davy 5e7f0ebe29 softpipe: Cap to 2 GB on 32 bits
On 32 bits system, application memory is quite limited.
softpipe uses application memory. To help prevent memory
exhaustion, limit reported memory availability to 2GB.

Some gallium nine apps do check reported memory by allocating
resources until memory is full. Gallium nine refuses allocations
when 80% of the reported memory limit is used. This change
helps some apps to start.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-10-10 23:43:48 +02:00
Axel Davy 814ca96d0d llvmpipe: Cap to 2 GB on 32 bits
On 32 bits system, application memory is quite limited.
llvmpipe uses application memory. To help prevent memory
exhaustion, limit reported memory availability to 2GB.

Some gallium nine apps do check reported memory by allocating
resources until memory is full. Gallium nine refuses allocations
when 80% of the reported memory limit is used. This change
helps some apps to start.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2016-10-10 23:43:48 +02:00
Axel Davy 218459771a gallium/os: Fix overflow on 32 bits
On systems with more than 4GB of ram,
os_get_total_physical_memory was triggering an integer
overflow for the linux and haiku path, when on
32 bits.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94561

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-10 23:43:48 +02:00
Axel Davy 9904581dc6 st/nine: Memset pipe_resource templates
Fixes regression introduced by
ecd6fce261
and is more future proof than just clearing the next
field.

Other nine usages did already zero out the templates.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2016-10-10 23:43:48 +02:00
Samuel Pitoiset d43151318a nvc0: fix valid range for shader buffers
When offset != 0, the valid range was wrong because the second
argument of util_range_add() is end, not size.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-10 21:32:16 +02:00
Ilia Mirkin 5239bd5920 nvc0/ir: fix overwriting of value backing non-constant gather offset
Normally the value is an immediate, which is moved to some temporary, so
there's no problem. In the case of a non-constant offset (as allowed by
ARB_gpu_shader5), we have to take care to copy it first before using it
to build up the bits.

This fixes a compilation error observed in F1 2015.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2016-10-10 14:28:32 -04:00
Ilia Mirkin ec05331a7b nv50/ir: only stick one preret per function
A function with multiple returns would have had multiple preret settings
at the top of the function. While this is unlikely to have caused issues
since we don't use functions in earnest, it could have in some cases
overflowed the call stack, in case a function had a lot of early
returns.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-10 10:45:06 -04:00
Nicolai Hähnle 1f95121626 radeonsi: make more use of si_have_tgsi_compute
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-10 10:38:33 +02:00
Nicolai Hähnle 38cfd5160a gallium/radeon: assign a name to LLVM output variables in debug builds
This can be helpful with R600_DEBUG=preoptir.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-10 10:38:30 +02:00
Nicolai Hähnle 39a29c2431 gallium/radeon: avoid redundant work with overlapping in/out arrays
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-10 10:37:50 +02:00
Nicolai Hähnle 77c81164bc radeonsi: support ARB_compute_variable_group_size
Not sure if it's possible to avoid programming the block size twice (once for
the userdata and once for the dispatch).

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-10-10 10:36:42 +02:00
Rob Clark 495ba8884a gallium: add missing zero-init for resource templates
Mostly test code, plus one spot I noticed in r600.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 15:50:46 -04:00
Rob Clark 3ebfc44b42 freedreno: don't try to shadow layered textures
We will only hit this with multi-planar YUV external images, so we would
probably never hit this code path in the first place.  But if we did, it
wouldn't do the right thing so just bail.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-10-07 15:50:46 -04:00
Rob Clark f88f025e8c freedreno/a3xx+a4xx: fix clip-plane lowering state
If enabled clip-planes have changed, we need to mark program state
dirty.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-10-07 15:50:46 -04:00
Eric Anholt 20d91e5ce9 vc4: Don't worry about partial Z/S clear if the other is already cleared.
We have to be careful to not smash the value they're clearing to, but
other than that we're fine.  Avoids quad clears in Processing, which likes
to do glClear(Z|S); glClear(Z).

Improves performance of Processing's QuadRendering demo at 5000 quads by
5.46507% +/- 1.35576% (n=15 before, 32 after)
2016-10-06 18:29:16 -07:00
Eric Anholt cb328123fe vc4: Try to fix the HW-2116 workaround.
We were incrementing the count at the end of vc4_start_draw(), except that
that function returns immediately if we've already started drawing on this
batch.  It also failed to count the statechanges from the GFXH-515
workaround.

This incidentally allows repeated glClear() to be coalesced, because the
fast clears aren't counted in draw_calls_queued any more.  Fixes most of
the extra flushes in Processing, which emits glClear(Z|S); glClear(Z);
glClear(C) during its frame setup.

Improves performance of Processing's QuadRendering demo at 5000 quads by
3.33538% +/- 2.05846% (n=21 before, 15 after)
2016-10-06 18:29:12 -07:00
Eric Anholt bca9a58d04 vc4: Drop dead argument from vc4_start_draw(). 2016-10-06 18:09:24 -07:00
Eric Anholt 9421a6065c vc4: Fix fallback to quad clears of depth in GLX.
The fix in the vc4-jobs series ended up triggering the fallback path on
GLX apps that use depth but not stencil.
2016-10-06 18:09:24 -07:00
Eric Anholt 8810270d06 vc4: Add the format name in miptree_debug.
I was curious if my Z/S buffer was actually ZS or ZX, and the vc4 format
of "0" didn't tell me much.
2016-10-06 18:09:24 -07:00
Eric Anholt ee577e7fa7 vc4: Fix perf debug formatting on partial Z/S clear. 2016-10-06 18:09:24 -07:00
Eric Anholt 7c7bcbbc7d vc4: Drop destination register when it's unused.
This slightly reduces instructions on shader-db, but I think it's just
perturbing register allocation -- the allocator should have always
trivially colored these nodes, before.  This commit is just to make QIR
code failing more intelligible when register allocation fails.
2016-10-06 18:09:24 -07:00
Eric Anholt d4ae5ca823 vc4: Fix live intervals analysis for screening defs in if statements.
If a conditional assignment is only conditioned on the exec mask, that's
still screening off the value in the executed channels (and, since we're
not storing to the unexcuted channels, we don't care what's in there).

Fixes a bunch of extra register pressure on Processing's Ribbons demo,
which is failing to allocate.
2016-10-06 18:09:24 -07:00
Eric Anholt 06cc3dfda4 vc4: Fix simulator when more than one vc4_screen is opened.
We would assertion fail in setting up the simulator the second time
around.  This at least postpones the assertion failure until we've closed
all of the first set of screens and started opening a new set.
2016-10-06 18:09:24 -07:00
Eric Anholt b30205b112 vc4: Fix assertion fails from trying to cast non-ALU instrs to ALU.
Fixes 100 piglit tests since the assertions were added to nir.h.  What's
amazing is that these tests used to pass, even when casting garbage.
2016-10-06 18:09:24 -07:00
Samuel Pitoiset 28ecd3eac2 nv50/ir: fix wrong check when optimizing MAD to SHLADD
Checking if MAD is supported is definitely wrong, and it's
more likely a typo I introduced few days ago which breaks
NV50 because SHLADD is not supported there.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-10-07 01:13:06 +02:00
Samuel Pitoiset a198883bf7 nvc0: dump program binary only when NV50_PROG_DEBUG is set
When the chipset is forced with NV50_PROG_CHIPSET, we actually
only want to output the binary if NV50_PROG_DEBUG is also
enabled. Otherwise, this pollutes the shader-db output.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-07 01:01:17 +02:00
Samuel Pitoiset 56a0bed2c1 nvc0: expose ARB_compute_variable_group_size
Only expose 512 threads/block on Fermi to not be limited by
32 GPRs/thread.

v4: - use 512 threads on Fermi, 1024 on Kepler+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset 11e75fffeb nv50/ir: set number of threads/block for variable local size
When a variable local size is defined as specified by
ARB_compute_variable_group_size, the fixed local size is set to 0
and a SIGFPE occurs when we compute the maximum number of regs.

This allows to use 64 GPRs/thread.

v4: - use 512 threads on Fermi, 1024 on Kepler+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-07 00:18:57 +02:00
Samuel Pitoiset 07bb4513c6 gallium: add PIPE_COMPUTE_CAP_MAX_VARIABLE_THREADS_PER_BLOCK
v3: - use a new case statement in r600_pipe_common.c
    - fix compilation of softpipe...

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-07 00:18:57 +02:00
Karol Herbst f96945c5b5 nv50/ir: optimize sub(a, 0) to a
helped some ue4 demos and divinity OS shaders

total instructions in shared programs : 2818674 -> 2818606 (-0.00%)
total gprs used in shared programs    : 379273 -> 379273 (0.00%)
total local used in shared programs   : 9505 -> 9505 (0.00%)
total bytes used in shared programs   : 25837792 -> 25837192 (-0.00%)

                local        gpr       inst      bytes
    helped           0           0          33          33
      hurt           0           0           0           0

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2016-10-06 19:39:51 +02:00
Jason Ekstrand 2ed17d46de nir: Make nir_foo_first/last_cf_node return a block instead
One of NIR's invariants is that control flow lists always start and end
with blocks.  There's no good reason why we should return a cf_node from
these functions since we know that it's always a block.  Making it a block
lets us remove a bunch of code.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-10-06 09:16:37 -07:00
Steven Toth e00fdd643b gallium/hud: Remove superfluous debug
No longer required.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-06 16:37:06 +01:00
Emil Velikov b634be0e69 svga: add svga_mksstats.h to the sources list
Otherwise it won't be picked in the tarball and the build will fail.

Fixes: 0035f7f136 ("svga: add guest statistic gathering interface")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-06 16:17:09 +01:00
Emil Velikov 9b7fd4080a st/xvmc/tests: force enable assertions
Similar to the other 'tests', enable assertions in xvmc_bench.

This silences the GCC warnings about unused-variable(s), makes the
program actually useful, as the XvMC API called. Atm the function
calls are omitted, since they're called within the assert.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-10-06 15:03:46 +01:00
Samuel Pitoiset a41cfbbf2b nvc0: dump program binary when chipset has been forced
Currently, program binaries are only dumped at upload time, but
when the chipset has been forced via NV50_PROG_CHIPSET we might
want to show the generated code, especially with shaderdb.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2016-10-05 21:15:44 +02:00
Marek Olšák cc4a19c4ad radeonsi: fix texture border colors for compute shaders
There are VM faults without this.

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:54 +02:00
Marek Olšák 844f8268e1 gallium/radeon/winsyses: set reasonable max_alloc_size
which is returned for GL_MAX_TEXTURE_BUFFER_SIZE.
It doesn't have any other use at the moment.
Bigger allocations are not rejected.

This fixes GL45-CTS.texture_buffer.texture_buffer_max_size on Bonaire.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:54 +02:00
Marek Olšák 1b37e5541c radeonsi: fix interpolateAt opcodes for .zw components
Not returning garbage in .zw seems pretty important.

This fixes:
GL45-CTS.shader_multisample_interpolation.render.interpolate_at_*_check.*

Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:23 +02:00
Marek Olšák 300a8221e9 radeonsi: add assertions to validate interpolation flags
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:23 +02:00
Marek Olšák d4a8bf89ce radeonsi: interpolate colors after interpolation weight shuffling
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:23 +02:00
Marek Olšák faee2d6dda tgsi/scan: don't set interp flags for inputs only used by INTERP (v2)
(v1 pushed, then reverted)

This fixes 9 randomly failing tests on radeonsi:
  GL45-CTS.shader_multisample_interpolation.render.interpolate_at_centroid.*

v2: use input_interpolate[input] (correct) instead of
    input_interpolate[index] (incorrect)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-05 21:03:23 +02:00