Commit Graph

18425 Commits

Author SHA1 Message Date
José Fonseca 542c5b3703 gallivm: Fix trivial out-of-bounds indirection in lp_build_cube_lookup().
Courtesy of clang:

  src/gallium/auxiliary/gallivm/lp_bld_sample.c:1483:10: warning: array index of '2' indexes past the end of an array (that contains 2 elements) [-Warray-bounds]
           tmp[2] = lp_build_swizzle_aos(coord_bld, ddx_ddy[1], swizzle02);
           ^   ~
  src/gallium/auxiliary/gallivm/lp_bld_sample.c:1430:10: note: array 'tmp' declared here
           LLVMValueRef ddx_ddy[2], tmp[2], rho_vec;
           ^
  src/gallium/auxiliary/gallivm/lp_bld_sample.c:1487:56: warning: array index of '2' indexes past the end of an array (that contains 2 elements) [-Warray-bounds]
              rho_vec = lp_build_add(coord_bld, rho_vec, tmp[2]);
                                                       ^   ~
  src/gallium/auxiliary/gallivm/lp_bld_sample.c:1430:10: note: array 'tmp' declared here
           LLVMValueRef ddx_ddy[2], tmp[2], rho_vec;
           ^
  src/gallium/auxiliary/gallivm/lp_bld_sample.c:1491:56: warning: array index of '2' indexes past the end of an array (that contains 2 elements) [-Warray-bounds]
              rho_vec = lp_build_max(coord_bld, rho_vec, tmp[2]);
                                                       ^   ~
  src/gallium/auxiliary/gallivm/lp_bld_sample.c:1430:10: note: array 'tmp' declared here
           LLVMValueRef ddx_ddy[2], tmp[2], rho_vec;
           ^
2013-04-26 08:44:37 +01:00
Jerome Glisse abb96fdea7 winsys/radeon: consolidate tracing into winsys v2
This move the tracing timeout and printing into winsys and add
an debug environement variable for it (R600_DEBUG=trace_cs).

Lot of file touched because of winsys API changes.

v2: Do not write lockup file if ib uniq id does not match last one

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-25 18:36:31 -04:00
Tom Stellard 53fbae7eac r600g/compute: Removed unused and untested code
There was a lot of code in evergreen_compute_internal.c that was not
being used at all and most of it was duplicating code from other parts
of the driver.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-04-25 13:32:22 -07:00
Tom Stellard f986087d5c r600g/compute: Use a constant buffer to store kernel parameters v2
v2:
  - Fix usage of set_constant_buffer()
  - Fix typo in comment

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-25 13:32:17 -07:00
Tom Stellard ffadc71afb r600g: Add evergreen_emit_cs_constant_buffers() v2
v2:
  - Bump R600_NUM_ATOMS

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-25 13:25:00 -07:00
Tom Stellard 83a00a1de8 r600g/compute: Don't use radeon_winsys::buffer_wait() after dispatching a kernel
The state tracker should be responsible for waiting for the kernel to
finish.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-04-25 13:24:51 -07:00
Tom Stellard 09e47f7a25 r600g/compute: Fix input buffer size calculation
Buffer size should be in bytes not dwords.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-04-25 13:24:24 -07:00
Rob Clark 73de07cbbc freedreno: use writecombine buffers
Better than uncached for writes, which are common for vertex buffer
upload, etc.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-04-25 15:10:56 -04:00
Rob Clark f706d4d340 freedreno: don't patch and re-emit same shader as much
New textures or vertex buffers don't always require patching and
re-emitting the shaders.  So do a better job of figuring out when we
actually have to patch the shader.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-04-25 15:10:56 -04:00
José Fonseca 12096f334b draw: Yield zeros for LLVM fetches of non-existing vertex elements.
If a bug in an app/stater-tacker causes vertex buffer to fetch vertex
elements that are not bound, simply return zeros instead of crashing.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-25 16:16:21 +01:00
José Fonseca 28e6a272fc trace: Only close trace files on exit.
Many applications don't exit cleanly, others may create and destroy a
screen multiple times, so we only write </trace> tag and close at exit
time.
2013-04-25 14:18:33 +01:00
José Fonseca 74d1153c9c graw: Set the vertex shader constant buffer.
We were setting the fragment shader, which wasn't needed.
2013-04-25 14:06:50 +01:00
José Fonseca e88a1dba09 graw: Simple utilities to dump and disassemble TGSI tokens.
Useful for core dumps, where calling tgsi_dump() from gdb is not an
alternative.
2013-04-25 13:03:06 +01:00
José Fonseca 1687932d2b scons: Support clang.
clang is supports most gcc options / extensions, with a some exceptions.

The biggest advantage of using clang is that compilation times are much
short.

One can tell scons to use clang when building by invoking it as

   CC=clang CXX=clang++ scons libgl-xlib
2013-04-25 11:59:01 +01:00
José Fonseca f0c296773d util/u_sse: Fix _mm_shuffle_epi8 prototype for clang.
Clang does not support __artificial__. Instead match precisely what's
in the clang headers.
2013-04-25 11:59:01 +01:00
José Fonseca 45a60e2e7a scons: Remove redundant code.
-fvisibility=hidden is already elsewhere for the whole tree.
2013-04-25 11:59:01 +01:00
Rob Clark 49a7624973 freedreno: fix bogus IMM const reg index
We were assigning incorrect const register for immediates, and
potentially writing immediate const to the wrong location.  This fixes
an incorrect-rendering bug with xonotic.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-04-24 21:09:46 -04:00
Rob Clark 9495ee12c6 freedreno: clear fixes and debugging
Set a few extra registers to make sure we are in proper state for
clearing.  And also add some debug options to mark all state dirty in
clear and gmem operations to aid in debugging.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-04-24 21:09:46 -04:00
Rob Clark d5d6ec8843 freedreno: fix texture fetch type
There is a bit we need to set for 2D vs 3D fetch, to tell the hw whether
there are two or there valid input components.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-04-24 21:09:46 -04:00
Rob Clark d086bb22bc freedreno: fix temp register usage
The previous approach of using the dst register as an intermediate
temporary doesn't work in a lot of cases.  For example, if the dst
register is the same as one of the src registers.

For now, just simplify it and always allocate a new register to use as
an intermediate.  In some cases this will result in more registers used
than required.  I think the best solution would be to implement an
optimization pass to reduce the number of registers used, which would
also solve the problem we have now of not being able to use GPRs that
are assigned for TGSI_FILE_INPUT.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-04-24 21:09:46 -04:00
Rob Clark 7a837da556 freedreno: add noop driver
It is useful for debugging.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-04-24 21:09:46 -04:00
Rob Clark eec37f1cdc freedreno: use u_math macros/helpers more
Get rid of a few self-defined macros:
  ALIGN() -> align()
  min() -> MIN2()
  max() -> MAX2()

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-04-24 21:09:46 -04:00
Rob Clark 38d8b02eba freedreno: implement fd_screen_destroy()
Opps, didn't notice that I had left it stubbed out.

Also, make things fail a bit more gracefully when things go wrong.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-04-24 21:09:46 -04:00
Rob Clark a64e2d9d9f freedreno: set SWAP bit based on format
Really this should be set based on buffer format, not on color vs
depth/stencil.  Probably there should be more formats that set the bit
as we add support for more render target formats.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2013-04-24 21:09:46 -04:00
Tom Stellard d9a32b84e3 radeon/llvm: Fix segfault with a specifc libelf implementation
The libelf implementation that is distributed here:
http://www.mr511.de/software/english.html
requires calling elf_version() prior to calling elf_memory()

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2013-04-24 16:51:25 -07:00
Alex Deucher 5bbeae7a3d r600g: use CP DMA for buffer clears on evergreen+
Lighter weight then using streamout.  Only evergreen
and newer asics support embedded data as src with
CP DMA.

Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-04-24 18:54:31 -04:00
Tom Stellard f64058803a r600g/llvm: Pass struct r600_bytecode to r600_llvm_compile
This way we don't need to update the function signature everytime we
emit a new config value.  This also fixes the build with
--enable-opencl.
2013-04-24 12:42:41 -04:00
José Fonseca e29525f79f winsys/sw/xlib: Prevent shared memory segment leakage.
Running piglit with this was causing all sort of weird stuff happening
to my desktop (Chromium webpages become blank, Qt Creator flickered,
etc).  I tracked this down to shared memory segment leakage when GL is
not shutdown properly. The segments can be seen running `ipcs` and
looking for nattch==0.

This changes fixes this by calling shmctl(IPC_RMID) soon after creation
(which does not remove the segment immediately, but simply marks it for
removal when no more processes are attached).

This matches src/mesa/drivers/x11/xm_buffer.c behaviour.

v2:
- move shmctl(IPC_RMID) after XShmAttach() for *BSD, per Chris Wilson
- remove stray debug printfs, spotted by Ian Romanick

NOTE: This is a candidate for stable branches.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-24 16:54:58 +01:00
Zack Rusin 1a87473998 draw/gs: preserve leading vertex info for gs
We need to handle the leading vertex information when
assembling primitives for the geometry shader otherwise
the resulting triangles will have vertices at incorrect
input locations.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-23 06:17:59 -04:00
Christian König c5c754d184 radeonsi: cleanup disabling tiling for UVD v3
Should fix: https://bugs.freedesktop.org/show_bug.cgi?id=63702

v2: add a comment that this is just a workaround
v3: fix typo in comment

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-04-24 11:07:26 +02:00
Kenneth Graunke f0cb66b699 mesa: Restore 78-column wrapping of license text in C++-style comments.
The previous commit introduced extra words, breaking the formatting.

This text transformation was done automatically via the following shell
command:
$ git grep 'THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY' | sed 's/:.*$//' | xargs -I {} sh -c 'vim -e -s {} < vimscript2

where 'vimscript2' is a file containing:
/THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY/;/^ *$/ !fmt -w 78 -p '// '
:wq

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-23 22:07:12 -07:00
Kenneth Graunke 3d8d5b298a mesa: Restore 78-column wrapping of license text in C-style comments.
The previous commit introduced extra words, breaking the formatting.

This text transformation was done automatically via the following shell
command:
$ git grep 'THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY' | sed 's/:.*$//' | xargs -I {} sh -c 'vim -e -s {} < vimscript

where 'vimscript' is a file containing:
/THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY/;/\*\// !fmt -w 78 -p ' * '
:wq

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-23 22:07:09 -07:00
Kenneth Graunke 96ff2edc73 mesa: Add "OR COPYRIGHT HOLDERS" to license text disclaiming liability.
This brings the license text in line with the MIT License as published
on the Open Source Initiative website:

http://opensource.org/licenses/mit-license.php

Generated automatically be the following shell command:
$ git grep 'THE AUTHORS BE LIABLE' | sed 's/:.*$//g' | xargs -I '{}' \
  sed -i 's/THE AUTHORS/THE AUTHORS OR COPYRIGHT HOLDERS/' {}

This introduces some wrapping issues, to be fixed in the next commit.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-23 22:07:06 -07:00
Kenneth Graunke dd404bc94f mesa: Change "BRIAN PAUL" to "THE AUTHORS" in license text.
Generated automatically be the following shell command:
$ git grep 'BRIAN PAUL BE LIABLE' | sed 's/:.*$//g' | xargs -I '{}' \
  sed -i 's/BRIAN PAUL/THE AUTHORS/' {}

The intention here is to protect all authors, not just Brian Paul.  I
believe that was already the sensible interpretation, but spelling it
out is probably better.

More practically, it also prevents people from accidentally copy &
pasting the license into a new file which says Brian is not liable when
he isn't even one of the authors.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-23 22:06:38 -07:00
José Fonseca 2737abb44e gallium: Replace gl_rasterization_rules with lower_left_origin and half_pixel_center.
Squashed commit of the following:

commit 04c5fa2cbb8e89d6f2fa5a75af1cca03b1f6b852
Author: José Fonseca <jfonseca@vmware.com>
Date:   Tue Apr 23 17:37:18 2013 +0100

    gallium: s/lower_left_origin/bottom_edge_rule/

commit 4dff4f64fa83b9737def136fffd161d55e4f1722
Author: José Fonseca <jfonseca@vmware.com>
Date:   Tue Apr 23 17:35:04 2013 +0100

    gallium: Move diagram to docs.

commit 442a63012c8c3c3797f45e03f2ca20ad5f399832
Author: James Benton <jbenton@vmware.com>
Date:   Fri May 11 17:50:55 2012 +0100

    gallium: Replace gl_rasterization_rules with lower_left_origin and half_pixel_center.

    This change is necessary to achieve correct results when using OpenGL
    FBOs.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-23 19:42:47 +01:00
Marek Olšák b692076420 r600g: initialize CMASK and HTILE with the GPU using streamout
This fixes a crash when a resource cannot be mapped to the CPU's address space
because it's too big.

This puts a global pipe_context in r600_screen, which is guarded by a mutex,
so that we can use pipe_context when there isn't one around.
Hopefully our multi-context support is solid.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

NOTE: This is a candidate for the 9.1 branch.
2013-04-23 20:26:20 +02:00
Marek Olšák 1ba46bbb4c gallium/u_blitter: implement buffer clearing
Although this might be useful for ARB_clear_buffer_object,
I need it for initializating resources in r600g.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

v2: comment cleanups

NOTE: This is a candidate for the 9.1 branch.
2013-04-23 20:26:20 +02:00
Vincent Lejeune edd90a19ca r600/llvm: Read stacksize from config header 2013-04-23 19:52:29 +02:00
Vincent Lejeune a7f73f5155 /bin/bash: q : commande introuvable 2013-04-23 19:52:02 +02:00
Tom Stellard a0c8942bb4 radeon/llvm: Fix build with LLVM >= r180063 2013-04-23 11:53:05 -04:00
Tom Stellard ead4db420e gallivm: Fix build with LLVM >= r180063 2013-04-23 11:53:05 -04:00
Zack Rusin 1fb8c3ce55 draw: use the prim count for ia primitives
Number of vertices to fetch doesn't always equal the number of input
vertices. To correctly compute the number if IA primitives we need
to use the total number of input vertices, not only those that
need to be fetched.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-22 20:36:07 -04:00
Zack Rusin 76587d2e5e tgsi/scan: set correct input limits for geometry shader
TGSI geometry shader input declerations are of the IN[][2] format
and the dimensions of the array have to be deduced from the input
primitive property.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-22 20:36:07 -04:00
Zack Rusin 913ed25f18 draw: add code to reset instance dependent data
We want to be able to reset certain parts of the pipeline,
in particular the input primitive index, but only either with
seperate invocations of the draw_vbo or new instances. In all
other cases (e.g. new invocations due to primitive restart)
that data needs to be preserved. Add a function through which
we can reset instance dependent data.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-22 20:36:07 -04:00
Zack Rusin 2aad06844f softpipe: fix streamout with an emptry geometry shader
Same approach as in the llvmpipe, if the geometry shader is
null and we have stream output then attach it to the vertex
shader right before executing the draw pipeline.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-22 20:36:07 -04:00
José Fonseca 7c1bf8e381 gallium: Add a new clip_halfz rasterizer state.
gl_rasterization_rules lumps too many different flags.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-22 18:39:06 +01:00
José Fonseca c0538860bf gallivm: Fix assignment of unsigned values to OUT register.
TEMP is not the only register file that accept unsigned. OUT too.

Actually, what determines the appropriate type of the destination value is
not the opcode, but rather the register.

Also cleanup/simplify code.  Add a few more asserts, but also make
code more robust by handling graceful if assert fails.

This fixes segfault / assertion in the included vert-uadd.sh graw shader.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-04-22 18:23:42 +01:00
José Fonseca 9fb5b2f45c Revert "gallivm: Emit vector selects."
It caused inumerous regressions (LLVM 3.1) in blending. In particular:

 - lp_test_blend

    type=u8nx16 rgb_func=sub rgb_src_factor=zero rgb_dst_factor=inv_src_color alpha_func=rev_sub alpha_src_factor=one alpha_dst_factor=const_color ...  MISMATCH
     Src:  0  0  0 b5 49 29  0 a2  0 21 de  0 c3 1b ec  0
     Src1: 2d 85 14  0 f8  0 79 a1 99  0 d8  0 59 16  0  0
     Dst:  0 a9 97  0 c0  0 78  0  0 8b aa f0 bd  0 78 f6
     Con: 7d  0 c0  0  0 bb 77  0  0  0 50  0 40 51  0  0
     Res:  0  0  0  0  0 29  0  0  0  0 c8  0 97 1b e3  0
     Ref:  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
    type=u8nx16 rgb_func=max rgb_src_factor=one rgb_dst_factor=inv_const_color alpha_func=min alpha_src_factor=zero alpha_dst_factor=inv_src1_alpha ...  MISMATCH
     Src:  d  0  0 e9  0 37 35 f0 62  0  0 b2 e9 f7  0 5c
     Src1: 8f  0 bf  0 a8  5  0  0 c4  0 d7  7 92  a  0 17
     Dst: cb  0 1e  0  0  0 19 8e  0 4d  0  0  0  0  3 46
     Con: aa 5a 5f 8f  0  0 bc 92  0 88  0  0 b7 8a c0 88
     Res: 44  0 13  0  0  0  7 8e  0 24  0  0  0  0  1 40
     Ref: 44  0 13  0  0 37 35  0 62 24  0  0 e9 f7  1  0

This reverts commit 1e266c7ef0.
2013-04-21 09:07:19 +01:00
José Fonseca d8a4c4c524 llvmpipe: verify function on blend test. 2013-04-21 08:53:31 +01:00
José Fonseca a79990bec0 llvmpipe: Don't support Z32_FLOAT_S8X24_UINT texture sampling support either.
Because we don't support, and the u_format fallback doesn't work for
zs formats.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-20 23:25:36 +01:00
José Fonseca c08b04992a llvmpipe: Ignore depth-stencil state if format has no depth/stencil.
Prevents assertion failures inside the driver for such state combinations.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-20 23:25:36 +01:00
José Fonseca f701a5a0fe gallivm: Disable LLVM 2.7 workaround on other versions.
2.7 was a particularly trouble ridden release.

Furthermore, the bug no longer can be reproduced ever since the
first_level state was taken in account.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-20 23:25:36 +01:00
José Fonseca 1e266c7ef0 gallivm: Emit vector selects.
They are supported on LLVM 3.1, at least on x86. (I haven't tested on PPC
though.)

Actually lp_build_linear_mip_levels() already has been emitting them for
some time.

This avoids intrinsics, which tend to be an obstacle for certain
optimization passes.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-20 23:25:36 +01:00
Rob Clark 26b39df08f freedreno: move ir -> ir2
There will be a new IR for a3xx, which has a very different shader ISA
(more scalar oriented).  So rename to avoid conflicts later when I start
adding a3xx support to the gallium driver.

Signed-off-by: Rob Clark <Rob Clark robdclark@freedesktop.org>
2013-04-20 17:59:41 -04:00
Rob Clark d8134792ae freedreno: cleanup some cruft left over from fdre
The standalone shader assembler needed some meta-data to know about
attributes/varyings/etc, to do the shader linkage.  We don't need these
parts with gallium/tgsi, so just get rid of it.

Signed-off-by: Rob Clark <Rob Clark robdclark@freedesktop.org>
2013-04-20 17:31:47 -04:00
Roland Scheidegger 85974e5fee gallivm: implement switch opcode
Should be able to handle all things which make this tricky to implement.
Fallthroughs, including most notably into/out of default, should be handled
correctly but are quite a mess.
If we see largely unoptimized switches in the wild should probably think
about some "real" switch optimization pass, e.g. things like this:

switch
case1
someinst
brk
case2
default
case3
someinst
brk
case4
someinst
endswitch

are legal, but the pointless case2/case3 statements not only cause condition
evaluation but will turn this into a "fake" fallthrough case (because
mask and defaultmask are already updated for case2 when default is
encountered) requiring executing code twice.
If default is at the end though, there's never any code re-execution, and
if that's not the case if there's no fallthrough in (not even a fake one)
and out of default there's no code re-execution neither.

v2: add comments, and use enum for break type instead of magic boolean.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-20 02:27:53 +02:00
Roland Scheidegger 8f5d4283c0 gallivm: use uint build context for mask instead of float
Unsurprisingly noone was using it except for grabbing builder.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-20 02:27:53 +02:00
Roland Scheidegger 107550e71a gallivm/tgsi: fix up breakc
It seems there was a typo in gallivm breakc handling (I am actually still
not sure it is really needed but otherwise that statement really should go
away). Also fix the wrong src argument type, even though they weren't really
used.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-20 02:27:53 +02:00
Roland Scheidegger e8d1b26a82 svga: remove TGSI_OPCODE_BREAKC instruction translation
While initially that opcode probably was meant for something along the
lines of sm3 break_comp it has never worked that way (not even the
argument count was right) and now the opcode has quite different
semantics so just remove it. (Discovered by Jose Fonseca)
2013-04-20 02:27:53 +02:00
Roland Scheidegger 794579105a gallium: document breakc and switch/case/default/endswitch
docs were missing, especially the opcode-from-hell switch however is anything
but obvious.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-20 02:27:53 +02:00
Roland Scheidegger 443950c6aa gallivm: increase nesting limit to 66
This is still not really correct, since at least for sm 4.0
the nesting limit is 64 per subroutine, and subroutine nesting itself
has a limit of 32, so since we have a flat stack we'd need 32*64.
But this should probably be better fixed with per-subroutine stacks,
since otherwise these structures get really big (like 100kB for the
lp_exec_mask).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-20 02:27:53 +02:00
Zack Rusin 12eab7cc56 draw: implement primitive assembler
Input assembler needs to be able to decompose adjacency primitives
into something that can be understood by the rest of the pipeline.
The specs say that the adjacency primitives are *only* visible
in the geometry shader, for everything else they need to be
decomposed. Which in most of the cases is not an issue, because
the geometry shader always decomposes them for us, but without
geometry shader we were passing unchanged adjacency primitives
to the rest of the pipeline and causing crashes everywhere. This
commit introduces a primitive assembler which, if geometry
shader is missing and the input primitive is one of the
adjacency primitives, decomposes them into something
that the rest of the pipeline can understand.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-18 11:51:22 -07:00
Zack Rusin e4752d0f56 util/prim: fix decomposed counts for adjacency primitives
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-18 11:37:37 -07:00
Zack Rusin c1299204ad draw/so: uses the correct index with the pre clipped coordinates
pre_clip_pos is a float[4] we just used (*float)[4] to be able to
jump within the array of vertex_headers with it. So if the idx
happened to be anything but 0, we'd actually read from some garbage
in memory. Change it to just be a simple pointer instead of casting
it to something that it's not. As suggested by Jose.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-18 11:36:38 -07:00
Eric Anholt ea6cf2b686 mesa: Use quotes on bool driconf options to prevent stdbool.h breakage.
Since stdbool.h's "true" and "false" are #defines, they got expanded when
used as macro arguments, and that expanded value was stored in the
XML string, producing XML that driconf would then fail to parse.

Currently no drivers included stdbool along with driconf, but I keep
accidentally doing so on intel as we move towards using normal C.

v2: rebase on master.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
2013-04-19 10:10:22 -07:00
Brian Paul cecbfce5eb svga: whitespace, comment fixes in svga_pipe_query.c 2013-04-19 10:04:11 -06:00
Brian Paul ef1b2b8da7 svga: whitespace, comment fixes in svga_pipe_fs/vs.c 2013-04-19 10:03:56 -06:00
José Fonseca dbb690872e gallivm: Fix half floats with MCJIT.
Prevents:

  LLVM ERROR: Cannot select: intrinsic %llvm.x86.vcvtph2ps.128
2013-04-19 10:13:19 +01:00
Jerome Glisse d0e9aaa31c radeonsi: add support for compressed texture v2
Most test pass, issue are with border color and swizzle.

Based on ircnick<maelcum> patch.

v2: Restaged commit hunk

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
2013-04-18 17:25:38 -04:00
Jerome Glisse dc21e30a62 radeonsi: add 2d tiling support for texture v3
v2: Remove left over code
v3: Restage properly the commit so hunk of first one are not in
    second one.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
2013-04-18 17:25:38 -04:00
Vadim Girlin f732036f12 gallium: handle drirc disable_glsl_line_continuations option
NOTE: This is a candidate for the 9.1 branch

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-04-19 01:05:03 +04:00
José Fonseca b72ff373fb llvmpipe: Take in consideration all current constant buffers when mapping.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
2013-04-18 20:48:12 +01:00
Christoph Bumiller 78eaaff696 nv50: add remaining RGBX formats
Not all are supported as render targets.

The state tracker fallback of using RGBA instead of RGBX currently
fails for blending, we could work around this by clearing their alpha
to 1 and modifying the color mask to disable writing alpha.
2013-04-18 21:04:22 +02:00
Christoph Bumiller 729abfd0f5 st/mesa: optionally apply texture swizzle to border color v2
This is the only sane solution for nv50 and nvc0 (really, trust me),
but since on other hardware the border colour is tightly coupled with
texture state they'd have to undo the swizzle, so I've added a cap.

The dependency of update_sampler on the texture updates was
introduced to avoid doing the apply_depthmode to the swizzle twice.

v2: Moved swizzling helper to u_format.c, extended the CAP to
provide more accurate information.
2013-04-18 20:35:40 +02:00
Christoph Bumiller 246ff8f887 nv50: set BORDER_COLOR_SRGB in sampler objects 2013-04-18 20:35:40 +02:00
Christoph Bumiller 2d5d054752 nv50: fix 4th component of Lx_SINT/UINT formats 2013-04-18 20:35:40 +02:00
Tom Stellard 3b20170b2f r600g: Fix build with --enable-opencl 2013-04-18 11:24:48 -07:00
Roland Scheidegger 50cbcf0c46 gallivm: change cubemaps / derivatives handling, take 55
Turns out the previous "fix" for handling per-pixel face selection and
derivatives didn't work out that well - the derivatives were wrong by
quite a bit, in theory transformation of the derivatives into cube space
should work, but would be _a lot_ more work than the "simplified" transform
used.
So, for explicit derivatives, I'm just giving up and go back to not honoring
them.
For implicit derivatives (and the fake explicit ones) however we try
something a little different, we just calculate rho as we would for a 3d
texture, that is after scaling the coords by the inverse major axis.
This gives the same results as calculating the derivs after projection of
the coords to the same face as long as all pixels hit the same face (and
only without rho_no_opt, otherwise it should be a bit worse). And when
not all pixels are hitting the same face, the results aren't so hot but
not catastrophically bad (I believe not off by more than a factor of 2 without
no_rho_approx and not more than sqrt(2) with no_rho_approx). I think this is
better than just picking the wrong face but who knows...

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-18 17:06:43 +02:00
Roland Scheidegger 0d07f05ee8 gallivm: Add no_rho_approx debug option
This will calculate rho correctly as
sqrt(max((ds/dx)^2 + (dt/dx)^2 + (dr/dx)^2), (ds/dx)^2 + (dt/dx)^2 + (dr/dx)^2))
instead of max(|ds/dx|,|dt/dx|,|dr/dx|,|ds/dy|,|dt/dy,|dr/dy|)
(for 3 coords - 2 coords work analogous, for 1 coord there's no point doing
the exact version), for both implicit and explicit derivatives.
While such approximation seems to be allowed in OpenGL some APIs may be less
forgiving, and the error can be quite large (sqrt(2) for 2 coords, sqrt(3) for
3 coords so wrong by nearly one mip level in the latter case).
This also helps to single out "real" bugs from "expected" ones, so it is debug
only (though at least combined with no_brilinear I didn't really see much of a
performance difference but only tested with a debug build - at least with
implicit mipmaps the instruction count is almost exactly the same though the
instructions are more complex (1 sqrt and mul/adds instead of and/max mostly).
The code when the option isn't set stays exactly the same.

v2: rename no_rho_opt to no_rho_approx.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-18 17:04:01 +02:00
José Fonseca a930136977 llvmpipe: Support half integer pixel center fs coord.
Tested with graw/fs-fragcoord 2/3, and piglit
glsl-arb-fragment-coord-conventions.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-04-18 14:18:25 +01:00
José Fonseca b191be52f2 llvmpipe: Remove the static interpolation.
No longer used.

If we ever want the old behavior we can run a loop unroller pass.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-04-18 14:18:22 +01:00
José Fonseca 6e833d4d09 gallivm: Drop pos arg from lp_build_tgsi_soa.
Never used.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-04-18 14:18:13 +01:00
Stuart Abercrombie 1a59cc777f i915g: Release old fragment shader sampler views with current pipe
We were trying to use a destroy method from a deleted context.
This fix is based on what's in the svga driver.

Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
2013-04-17 18:15:12 -07:00
Zack Rusin 8e7f7e9693 draw/so: respect leading/provoking vertex info
we were ignoring leading/provoking vertex settings which was
breaking decomposition of some strips.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-17 15:43:50 -07:00
Zack Rusin 6bb217a489 softpipe/so: use the correct variable for reporting stream out
we were using the wrong vars, reporting incorrect stream output
statistics.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-17 15:28:54 -07:00
Zack Rusin cb58c79efb gallivm/gs: fix indirect addressing in geometry shaders
We were always treating the vertex index as a scalar but when the
shader is using indirect addressing it will be a vector of indices
for each channel. This was causing some nasty crashes insides
LLVM.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-04-17 15:28:54 -07:00
Brian Paul 02039066a8 st/wgl: fix issue with SwapBuffers of minimized windows
If a window's minimized we get a zero-size window.  Skip the SwapBuffers
in that case to avoid some warning messages with the VMware svga driver.
Internal bug #996695

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-17 16:23:19 -06:00
Zack Rusin f01f754ca1 draw/gs: make sure geometry shaders don't overflow
The specification says that the geometry shader should exit if the
number of emitted vertices is bigger or equal to max_output_vertices and
we can't do that because we're running in the SoA mode, which means that
our storing routines will keep getting called on channels that have
overflown (even though they will be masked out, but we just can't skip
them).
So we need some scratch area where we can keep writing the overflown
vertices without overwriting anything important or crashing.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-16 23:38:47 -07:00
Zack Rusin be497ac9d3 draw/gs: Return early if the passed geometry shader is null
Can happen if we were using stream output without geometry
shader, by returning early we avoid a crash.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-16 23:38:47 -07:00
Zack Rusin 80ee4a407a draw: implement pipeline statistics in the draw module
This is a basic implementation of the pipeline statistics in the
draw module. The interface is similar to the stream output statistics
and also requires that the callers explicitly enable it.
Included is the implementation of the interface in llvmpipe and
softpipe. Only softpipe enables the pipeline statistics capability
though because llvmpipe is lacking gathering of the fragment shading
and rasterization statistics.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-16 23:38:47 -07:00
Zack Rusin b739376cff gallivm/gs: fix the end primitive calls
The issue with SOA execution and end_primitive opcode is that it
can be executed both when we haven't emitted any vertices, in
which case we don't want to emit an empty primitive, and when
the execution mask is zero and the execution should be skipped. We
handled only the latter of those conditions. Now we're combining the
execution mask with a mask created from emitted vertices to handle
both cases. As a result we don't need the pending_end_primitive
flag which was broken because it was static and could be affected
by both above mentioned conditions at run-time.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-16 23:38:46 -07:00
Zack Rusin 93627e33cc tgsi/exec: geometry shaders are executed on a single primitive
which means that our execution mask in GS is equal to 1 not 0xf.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-16 23:38:46 -07:00
Zack Rusin 88db6f0a73 tgsi/exec: fix the udiv and umod instructions
Same as with llvmpipe: we can't be divind/moding by zero and we
need to make sure that dividing/moding by zero produces 0xffffffff.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-16 23:38:46 -07:00
José Fonseca b8f6858fcb gallivm: JIT symbol resolution with linux perf.
Details on docs/llvmpipe.html

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-04-17 16:50:52 +01:00
José Fonseca 35ef27d485 draw: Silence uninitialized var warnings.
Trivial.
2013-04-17 16:50:52 +01:00
Vincent Lejeune 2b9ed257c0 r600g/llvm: Use gprcount from llvm 2013-04-17 17:24:29 +02:00
José Fonseca 50b3fc6204 gallium: Disambiguate TGSI_OPCODE_IF.
TGSI_OPCODE_IF condition had two possible interpretations:

- src.x != 0.0f

  - Mesa statetracker when PIPE_SHADER_CAP_INTEGERS was false either for
    vertex and fragment shaders
  - gallivm/llvmpipe
  - postprocess
  - vl state tracker
  - vega state tracker
  - most old drivers
  - old internal state trackers
  - many graw examples

- src.x != 0U

  - Mesa statetracker when PIPE_SHADER_CAP_INTEGERS was true for both
    vertex and fragment shaders
  - tgsi_exec/softpipe
  - r600
  - radeonsi
  - nv50

And drivers that use draw module also were a mess (because Mesa would
emit float IFs, but draw module supports native integers so it would
interpret IF arg as integers...)

This sort of works if the source argument is limited to float +0.0f or
+1.0f, integer 0, but would fail if source is float -0.0f, or integer in
the float NaN range.  It could also fail if source is integer 1, and
hardware flushes denormalized numbers to zero.

But with this change there are now two opcodes, IF and UIF, with clear
meaning.

Drivers that do not support native integers do not need to worry about
UIF.  However, for backwards compatibility with old state trackers and
examples, it is advisable that native integer capable drivers also
support the float IF opcode.

I tried to implement this for r600 and radeonsi based on the surrounding
code.  I couldn't do this for nouveau, so I just shunted IF/UIF
together, which matches the current behavior.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>

v2:
- Incorporate Roland's feedback.
- Fix r600_shader.c merge conflict.
- Fix typo in radeon, spotted by Michel Dänzer.
- Incorporte  Christoph Bumiller's patch to handle TGSI_OPCODE_IF(float)
  properly in nv50/ir.
2013-04-17 10:54:08 +01:00
José Fonseca f61b7da80e gallium: Eliminate TGSI_OPCODE_IFC.
Never used or implemented.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-04-17 10:54:08 +01:00
Christian König 13ddf9baf2 r600/uvd: cleanup disabling tiling on pre EG asics
Set transfer flag instead of fiddling with the tilling params directly.

Signed-off-by: Christian König <christian.koenig@amd.com>
2013-04-16 22:36:51 +02:00
Martin Andersson 4c3ed79566 r600g: Workaround for a harware bug with nested loops on Cayman
There is a hardware bug on Cayman where a BREAK/CONTINUE followed by
LOOP_STARTxxx for nested loops may put the branch stack into a state
such that ALU_PUSH_BEFORE doesn't work as expected. Workaround this
by replacing the ALU_PUSH_BEFORE with a PUSH + ALU

Fixes piglit tests EXT_transform_feedback/order*

v2: Use existing loop count and improve comment
v3: [Vadim Girlin] Set jump address for PUSH instructions

NOTE: This is a candidate for the 9.1 branch

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-04-16 18:02:11 +04:00
Marek Olšák 8616b224bf gallium/hud: fix FPS computation for framerate > 4.2k 2013-04-16 13:56:47 +02:00
Marek Olšák 332af88c39 gallium/hud: increase vertex buffer size for background black rectangles
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-16 13:56:47 +02:00
Marek Olšák 0108114619 gallium/hud: update the contents of GALLIUM_HUD=help
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-16 13:56:47 +02:00
Marek Olšák 30284f8892 gallium/hud: remove pipeline-statistics- prefix in query names
for the env var string not to be awfully long

v2: fix bug in indexing of "name"

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-16 13:56:47 +02:00
Marek Olšák dfe5367f0f r600g: implement pipeline statistics query 2013-04-16 13:56:47 +02:00
Marek Olšák 817723baf8 winsys/radeon: use query_value for timestamp, remove query_timestamp 2013-04-16 13:56:47 +02:00
Marek Olšák 413ca78af3 r600g: add a debug flag for printing virtual addresses of resources 2013-04-16 13:56:47 +02:00
Marek Olšák 05fa3595e0 r600g: add a query returning the amount of time spent during bo_map sync. 2013-04-16 13:56:47 +02:00
Matt Turner b3f1f665b0 build: Get rid of GALLIUM_WINSYS_DIRS
configure still uses it to print the enabled winsys.

Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-and-Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2013-04-15 12:05:55 -07:00
Matt Turner 3a6e548a85 build: Get rid of GALLIUM_TARGET_DIRS
configure still uses it to print the enabled targets.

Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-and-Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2013-04-15 12:05:55 -07:00
Matt Turner 2f7a37d858 build: Build pipe-loader before gallium tests
And don't build it from other Makefiles. That's awful, and breaks
distclean.

Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-and-Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2013-04-15 12:05:55 -07:00
Matt Turner 0d3b1b0e2e build: Get rid of GALLIUM_MAKE_DIRS
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-and-Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2013-04-15 12:05:55 -07:00
Matt Turner 69b69b1a0b build: Stop using GALLIUM_STATE_TRACKERS_DIRS for SUBDIRS
configure still uses it to print the enabled state trackers.

Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-and-Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2013-04-15 12:04:26 -07:00
Matt Turner 70531b4a25 build: Remove GALLIUM_DIRS
It's always constant anyway.

Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-and-Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2013-04-15 12:04:26 -07:00
Matt Turner d5e9426b96 build: Move src/mapi/mapi/* to src/mapi/
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-and-Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2013-04-15 12:04:25 -07:00
Tom Stellard d50343dff1 radeonsi: Read config values from the .AMDGPU.config ELF section
Instead of emitting configuration values (e.g. number of gprs used) in a
predefined order, the LLVM backend now emits these values in
register/value pairs.  The first dword contains the register address and
the second dword contians the value to write.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-04-15 10:54:30 -07:00
Tom Stellard 9277b04c02 radeon/llvm: Handle ELF formatted binary output from the LLVM backend 2013-04-15 10:54:29 -07:00
Tom Stellard 7782d19cdc radeon/llvm: Use a struct for storing compiled code 2013-04-15 10:13:10 -07:00
Roland Scheidegger 1d6eb23f2d gallivm: fix small but severe bug in handling multiple lod level strides
Inserting the value for the second quad in the wrong place for the
following shuffle. This meant the row or image stride was undefined which is
quite catastrophic, can lead to bogus texels fetched or just segfault.
This code is only hit for SoA path currently, still surprising it
didn't crash more or caused more visible issues (I think llvm used a
broadcast shuffle for the undefined parts of the vector, hence the undefined
value for the second quad was just the same as that from the first quad,
so as long as both quads hit the same mip level everything was fine, and since
lower mips always have the same large stride it made it less likely to
hit out-of-bound memory in case of differing lods).

Note: this is a candidate for stable branches.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-15 15:23:40 +02:00
Francisco Jerez 02b808b08a clover: Fix usage of incorrect object as destination in clEnqueueCopyBufferToImage.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
2013-04-13 14:24:10 +02:00
Francisco Jerez 1a8ad6c2e3 clover: Define platform class and merge with device_registry.
Null platform IDs are OK according to the spec, but some applications have
been reported to get paranoid and assume that our NULL platform is unusable.

As it doesn't hurt to have device enumeration separate from the rest of the
device code (quite the opposite, it makes the code cleaner), make the API use
an actual platform object that keeps track of the available devices instead of
the former NULL pointer.

Reported-and-reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
2013-04-13 14:20:16 +02:00
Francisco Jerez 6ace452055 clover: Add missing fields to the module serializer.
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
2013-04-13 14:12:49 +02:00
Tom Stellard c6a86fb563 r300g: Fix bug in OMOD optimization
https://bugs.freedesktop.org/show_bug.cgi?id=60503

NOTE: This is a candidate for the stable branches.
2013-04-12 08:33:31 -07:00
Emil Velikov ac1118d53c nvc0: set ret variable if launch desc allocation failed
Pointed out by gcc

nve4_compute.c: In function 'nve4_launch_grid':
nve4_compute.c:511:7: warning: 'ret' may be used uninitialized in
 this function [-Wmaybe-uninitialized]
    if (ret)
       ^

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

Edit by Christoph Bumiller:
Set it to -1 to indicate failure and only when it's actually required.
2013-04-12 17:15:14 +02:00
Emil Velikov 48bcb94dc3 nvc0: bail out early during nve4_compute_setup()
Exit gracefully rather than trying to create a random object, whenever the
chipset is unknown

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2013-04-12 17:10:11 +02:00
Emil Velikov e28c266682 nvc0: compile nve4_cache_split_name() only in debug build
As otherwise it is unused - pointed out by gcc

nve4_compute.c:586:20: warning: 'nve4_cache_split_name' defined but not used [-Wunused-function]
 static const char *nve4_cache_split_name(unsigned value)
                    ^

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2013-04-12 17:09:03 +02:00
Emil Velikov 249f3d73cf nv50/codegen: do not emitATOM() if the subOp is unknown
For debug build we'll hit the assert, for release we are going to emit random data
as subOp is used uninitilised. Spotted by gcc

codegen/nv50_ir_emit_nv50.cpp: In member function 'void nv50_ir::CodeEmitterNV50::emitATOM(const nv50_ir::Instruction*)':
codegen/nv50_ir_emit_nv50.cpp:1554:12: warning: 'subOp' may be used uninitialized in this function [-Wmaybe-uninitialized]
    uint8_t subOp;
            ^

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2013-04-12 17:08:26 +02:00
Christoph Bumiller 4da54c91d2 nvc0: implement multisample textures 2013-04-12 13:02:18 +02:00
Christoph Bumiller 71c1c8a9b8 nvc0: patch up TEX cases with 5 or 6 sources on nve4
Hackishly fixes alignment requirement of 2nd tuple for now.
2013-04-12 11:41:35 +02:00
Christoph Bumiller 2b62ba7cb0 nvc0: fix 2D engine MS2 resolve 2013-04-12 11:41:35 +02:00
Christoph Bumiller 69804c2ab8 nv50,nvc0: add RGBX16/32_FLOAT formats 2013-04-12 11:41:35 +02:00
Dave Airlie f024c72476 r600g: add get_sample_position support (v3)
v2: I rewrote this to use the sample positions properly.
v3: rewrite properly to use bitfield to cast back to signed ints

Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-04-11 21:09:29 +01:00
Dave Airlie cc906396c7 gallium: add get_sample_position interface
This is to be used to implement glGet GL_SAMPLE_POSITION.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-04-11 21:09:28 +01:00
Dave Airlie 184278a804 r600g: fix two issues in compressed msaa reading code
I've no idea when sample_chan would ever be 4 here, but 4 is most
definitely wrong, array textures have it as 3 as well.

Also the cayman code though unused is obviously wrong.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-04-11 21:09:27 +01:00
Christian König 5b2855bfe7 radeon/uvd: add UVD implementation v5
Just everything you need for UVD with r600g and radeonsi.

v2: move UVD code to radeon subdir, clean up build system additions,
    remove an unused SI function, disable tiling on SI for now.
v3: some minor indentation fix and rebased
v4: dpb size calculation fixed
v5: implement proper fall-back in case the kernel doesn't support UVD,
    based on patches from Andreas Boll but cleaned up a bit more.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2013-04-11 17:10:28 +02:00
Christian König f91e4d2c9d radeon/winsys: add uvd ring support to winsys v3
Separated from UVD patch for clarity.

v2: sync with next tree for 3.10
v3: as pointed out by Andreas Bool check for drm minor >= 32

http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.10-wip

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
2013-04-11 17:10:01 +02:00
Fredrik Höglund fb69dbb0d1 r600g: Add support for GL_ARB_texture_buffer_range
Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-11 00:10:45 +02:00
Marek Olšák 34c3f98641 r600g: fix valgrind warning on Cayman
Warning: "Conditional jump or move depends on uninitialised value(s)".
2013-04-10 21:56:51 +02:00
Zack Rusin fe29f99293 gallivm/tgsi: handle untyped moves
both mov and ucmp can be used to move variables of any type.
correctly note that about ucmp in the tgsi_info and make
sure gallivm can handle that by correctly casting the untyped
moves.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-04-10 12:37:17 -07:00
Zack Rusin d56f2d5267 gallivm: fix loops and conditionals within GS
We were using simple temporaries, without using alloca or phi
nodes which meant that on every iteration of the loop our
temporaries, which were holding the number of vertices and
primitives which were emitted, were being reset to zero. Now
we're using alloca to allocate those variables to preserve
them across conditionals.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-04-10 12:33:59 -07:00
Zack Rusin c1cd19c3b8 llvmpipe: implement PIPE_QUERY_SO_STATISTICS
We were missing the implementation of PIPE_QUERY_SO_STATISTICS
query, this change implements it on top of the existing
facilities.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-04-10 12:32:56 -07:00
Zack Rusin 7466e0b6c8 gallivm: fix unsigned divide and remainder opcodes
We want to both make sure we never divide by zero to not generate
sigfpe and that divide by zero is guaranteed to return 0xffffffff.
Based on José idea.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-04-10 12:31:22 -07:00
Zack Rusin 1ad4a4eeb3 gallivm: fix breakc
we break when the mask values are 0 not, 1, plus it's bit comparison
not a floating point comparison. This fixes both.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2013-04-10 12:25:34 -07:00
Christian König ccf3e8fc9b radeonsi: remove sampler writemask v3
v2: fix instrinsic name as well
v3: LLVM revision incremented as well

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-04-10 10:41:29 +02:00
Niels Ole Salscheider 31f14f3def pipe-loader: Fix out of source build
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
2013-04-10 09:45:04 +02:00
Brian Paul acd4fb8b5a st/osmesa: re-use buffers in OSMesaMakeCurrent()
Rather than creating a new buffer each time.  Fixes problems found
with vtk.

Tested-by: Kevin H. Hobbs <hobbsk@ohio.edu>
2013-04-09 18:30:23 -06:00
Christian König 462647453c st/vdpau: fix subtitle related bug v2
Drawing subtitles didn't increased the dirty area of the surface.

Reported and tested by freeedrich on irc.

v2: don't clear the surface

Signed-off-by: Christian König <christian.koenig@amd.com>
2013-04-09 21:11:32 +02:00
Brian Paul 4ad360133c softpipe: misc updates to image dumping in softpipe_flush() 2013-04-09 08:27:53 -06:00
Vinson Lee 04ffce3004 tgsi: Ensure struct tgsi_ind_register field Index is initialized.
Fixes uninitialized scalar variable defect reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-08 18:59:34 -07:00
Martin Andersson a8246927e3 r600g: Fix UMAD on Cayman
The multiplication part of tgsi_umad did not work on Cayman, because it did
not populate the correct vector slots.

This fixed hardlocks in the EXT_transform_feedback/order tests.

NOTE: This is a candidate for the stable branches.
(might not be easy to cherry-pick though)

Signed-off-by: Marek Olšák <maraeo@gmail.com>
2013-04-09 03:09:37 +02:00
Vincent Lejeune 5019af2145 r600g/llvm: Add support for native isa for pre EG
This fixes bug 62756 :
https://bugs.freedesktop.org/show_bug.cgi?id=62756#c12
2013-04-08 15:11:59 +02:00
Marek Olšák eff66bc9f8 gallium/util: add const to a parameter of util_max_layer 2013-04-06 23:57:15 +02:00
Tom Stellard 302f53dc20 radeonsi: Add compute support v3
v2:
  - Only dump shaders when env variable is set.

v3:
  - Don't emit VGT registers

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com
2013-04-05 18:43:34 -04:00
Tom Stellard 4f7fe2cf2c radeonsi: Set TCL1_ACTION_ENA when invalidating the texture cache
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com
2013-04-05 18:43:34 -04:00
Tom Stellard 0ccf82c557 radeonsi: Remove si_pm4_inval_vertex_cache()
This function is a holdover from r600g and is identical to
si_pm4_inval_texture_cache(), so it is not needed.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com
2013-04-05 18:43:34 -04:00
Tom Stellard c5e5b3401c gallium: PIPE_COMPUTE_CAP_IR_TARGET - allow drivers to specify a processor v2
This target string now contains four values instead of three.  The old
processor field (which was really being interpreted as arch) has been split
into two fields: processor and arch.  This allows drivers to pass a
more a more detailed description of the hardware to compiler frontends.

v2:
  - Adapt to libclc changes

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2013-04-05 18:43:34 -04:00
Wladimir 1a868acbec util: add ETC as compressed format
Add UTIL_FORMAT_LAYOUT_ETC to util_format_is_compressed. It was missing.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-05 16:14:51 -06:00
Brian Paul de99b6d117 gallium/u_blitter: fix is_blit_generic_supported() stencil checking
Don't check if there's sampler support for stencil if we're not
going to actually blit/copy stencil values.  Fixes the case where
we mistakenly said we can't support a blit of depth values from
S8Z24 to X8Z24.

Also, rename the is_stencil variable to dst_has_stencil to improve
readability.

NOTE: This is a candidate for the stable branches.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-05 16:14:51 -06:00
Rob Clark aac7f06ad8 freedreno: use autogenerated register defs
Switch to use the envytools generated headers for register/bitfield
definitions.  This is the first step in preparing to add a3xx support,
since it avoids having conflicting names for a3xx and a2xx registers.
And since I'm using envytools for a3xx it is simpler to just use it for
everything.

This shouldn't cause any functional change, it is really just a lot of
renaming.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2013-04-05 14:33:16 -04:00
José Fonseca 1fefc65d20 st/wgl: Install our windows message hook to threads created before the ICD is loaded.
Otherwise we will not receive destroy windows events, causing framebuffers
to leak.

This happens particularly with java and jogl.

Tested with java + jogl, MATLAB.

VMware Internal Bug Number: 1013086.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-05 18:27:54 +01:00
Adam Jackson ca70de9bd2 llvmpipe: Work without sse2 if llvm is new enough
At least on llvm 3.2 this appears to work fine.  Tested on an Athlon XP
2600+, which has sse and 3dnow but not sse2.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
2013-04-05 11:32:53 -04:00
Jerome Glisse b8998f976e winsys/radeon: add command stream replay dump for faulty lockup v3
Build time option, set RADEON_CS_DUMP_ON_LOCKUP to 1 in radeon_drm_cs.h to
enable it.

When enabled after each cs submission the code will try to detect lockup by
waiting on one of the buffer of the cs to become idle, after a timeout it
will consider that the cs triggered a lockup and will write a radeon_lockup.c
file in current directory that have all information for replaying the cs.

To build this file :
gcc -O0 -g radeon_lockup.c -ldrm -o radeon_lockup -I/usr/include/libdrm

v2: Add radeon_ctx.h file to mesa git tree
v3: Slightly improve dumped file for easier editing, only dump first faulty cs

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
2013-04-05 10:22:05 -04:00
Brian Paul 5192262833 st/xlib: add HUD support for xlib/GLX
For the softpipe and llvmpipe drivers.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-04 17:00:42 -06:00
Brian Paul f5071783c1 gallium/hud: add GALLIUM_HUD_PERIOD env var
To set the graph update rate, in seconds.  The default update rate
has also been changed to 1/2 second.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-04 17:00:42 -06:00
Brian Paul 6211c45186 gallium/hud: initialize sampler state
The default wrap mode (PIPE_TEX_WRAP_REPEAT) is incompatible with
unnormalized texcoords (at least for softpipe).

v2: use PIPE_TEX_WRAP_CLAMP_TO_EDGE

Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-04 17:00:42 -06:00
Roland Scheidegger 9eef86bb55 gallivm: some minor cube map cleanup
The ar_ge_as_at variable was just very very confusing since the condition
was actually the other way around (as_at_ge_ar). So change the condition
(and the selects depending on it) to match the variable name.
And also change the chosen major axis in case the coord values are the
same. OpenGL doesn't care one bit which one is chosen in this case but
it looks like dx10 would require z chosen over y, and y chosen over x
(previously did x chosen over y, y chosen over z). Since it's all the
same effort just honor dx10's wishes. (Though actually, for some prefered
orderings, we could save one (or two with derivatives) selects since the
tnewx and tnewz (and the corresponding dmax values) are the same.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-04 23:22:10 +02:00
Zack Rusin be9a42e980 llvmpipe: implement ucmp
and add a test for it

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-04 12:09:55 -07:00
Paul Berry 5db2249493 Avoid spurious GCC warnings in STATIC_ASSERT() macro.
GCC 4.8 now warns about typedefs that are local to a scope and not
used anywhere within that scope.  This produced spurious warnings with
the STATIC_ASSERT() macro (which used a typedef to provoke a compile
error in the event of an assertion failure).

This patch switches to a simpler technique that avoids the warning.

v2: Avoid GCC-specific syntax.  Also update p_compiler.h.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-04-04 09:52:18 -07:00
Erik Faye-Lund 456f40e18d freedreno: document debug flag
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
2013-04-04 10:41:50 -06:00
Brian Paul e95514c0ea st/wgl: add HUD support
v2: fix a few minor issues spotted by Jose.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-04 10:41:35 -06:00
Brian Paul 0c1dcf906d st/wgl: make stw_current_context() non-static
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-04 08:50:16 -06:00
Brian Paul 92e5e45ff1 util: add debug_memory_check_block(), debug_memory_tag()
The former just checks that the given block is valid by checking
the header and footer.

The later sets the memory block's tag.  With extra debug code, we
can use that for monitoring/checking particular allocations.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-04 08:50:15 -06:00
Brian Paul a408ea9692 gallium/hud: replace malloc w/ MALLOC
To match the FREE() called used later.  Fixes things on Windows.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-04 08:50:15 -06:00
Vincent Lejeune 9276961223 r600g/llvm: Workaround for wrong tex.offset_* 2013-04-04 16:03:04 +02:00
Roland Scheidegger ce5096a0a9 gallivm: honor explicit derivatives values for cube maps.
This is trivial now, though need to make sure we pass all the necessary
derivative values (which is 3 each for ddx/ddy not 2).
Passes piglit arb_shader_texture_lod-texgradcube test.

v2: add the forgotten abs() for all incoming derivatives (discovered
by new piglit arb_shader_texture_lod-texgradcube test, though more by
luck as it was failing only for exactly one pixel...).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-04 01:03:42 +02:00
Roland Scheidegger f621015cb5 gallivm: do per-pixel cube face selection (finally!!!)
This proved to be tricky, the problem is that after selection/mirroring
we cannot calculate reasonable derivatives (if not all pixels in a quad
end up on the same face the derivatives could get "randomly" exceedingly
large).
However, it is actually quite easy to simply calculate the derivatives
before selection/mirroring and then transform them similar to
the cube coordinates (they only need selection/projection, but not
mirroring as we're not interested in the sign bit, of course). While
there is a tiny bit more work to do (need to calculate derivs for 3
coords instead of 2, and additional selects) it also simplifies things
somewhat for the coord selection itself (as we save some broadcast aos
shuffles, and we don't need to calculate the average vector) - hence if
derivatives aren't needed this should actually be faster.
Also, this has the benefit that this will (trivially) work for explicit
derivatives too, which we completely ignored before that (will be in a
separate commit for better trackability).
Note that while the way for getting rho looks very different, it should
result in "nearly" the same values as before (the "nearly" is only because
before the code would choose the face based on an "average" vector and hence
the derivatives calculated according to this face, where now (for implicit
derivatives) the derivatives are projected on the face selected for the
first (top-left) pixel in a quad, so not necessarly the same face).
The transformation done might not quite be state-of-the-art, calculating
length(dx,dy) as max(dx,dy) certainly isn't neither but this stays the
same as before (that is I think a better transform would _somehow_ take
the "derivative major axis" into account so that derivative changes in
the major axis wouldn't get ignored).
Should solve some accuracy problems with cubemaps (can easily be seen with
the cubemap demo when switching wrapping/filtering), though we still don't
do seamless filtering to fix it completely (so not per-sample but per-pixel
is certainly better than per-quad and already sufficient for accurate
results with nearest tex filter).

As for performance, it seems to be a tiny bit faster too (maybe 3% or so
with cubemap demo). Which I'd have expected with nearest/nearest filtering
where this will be less instructions, but the difference seems to actually
be larger with linear/linear_mipmap_linear where it is slightly more
instructions, probably the code appears less serialized allowing better
scheduling (on a sandy bridge cpu). It actually seems to be now at least
as fast as the old path using a conditional when using 128bit vectors too
(that is probably more a result of testing with a newer cpu though), for now
that old path is still there but unused.
No piglit regressions.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-04 01:03:42 +02:00
Roland Scheidegger bdfbeb9633 gallivm: minor rho calculation optimization for 1 or 3 coords
Using a different packing for the single coord case should save a shuffle.
Plus some minor style fixes.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-04 01:03:42 +02:00
Roland Scheidegger 067a0ae420 gallivm: use f16c hw support for float->half and half->float conversion
Should be way faster of course on cpus supporting this (includes AMD
Bulldozer and Jaguar cores, Intel Ivy Bridge and up (except budget models)).
Passes piglit fbo-blending-formats GL_ARB_texture_float -auto on Ivy Bridge.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-04-04 01:03:42 +02:00
Zack Rusin 302df7cc85 draw/llvmpipe: allow independent so attachments to the vs
When geometry shaders are present, one needs to be able to create
an empty geometry shader with stream output that needs to be
resolved later and attached to the currently bound vertex shader.
Lets add support for it to llvmpipe and draw. draw allows attaching
independent stream output info to any vertex shader and llvmpipe
resolves at draw time which vertex shader the given empty geometry
shader should be linked to.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin 246e68735f llvmpipe: reset so buffers when not appending
We need to reset the internal state of the so buffers or we'll
keep appending even though we're not supposed to.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin 7ca65a68e1 draw: remove unused function
we use draw_set_mapped_so_targets nowadays

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin b16ae0f792 draw/llvm: use an enum instead of magic numbers
I think this was there before and got accidently
removed during a merge. Same code as for the GS
context, which is also using an enum instead of
hardcoded numbers.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin 49b7d933f8 draw/gs: cleanup some debugging code
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin 822c21c776 draw/so: maintain an exact number of written vertices
It's quite helpful during the rendering when we know
exactly the count of the vertices available in the
buffer.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin d8543bd752 draw: Implement support for primitive id
We were largely ignoring primitive id.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin f6bfb62c50 draw/so: Fix bogus assert
We do support so with multiple primitives.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin e6fc635351 draw/gs: Fix memory corruption with multiple primitives
We were flushing with incorrect number of primitives. TGSI exec
can only work with a single primitive at a time. Plus the fetching
with multiple primitives on llvm paths wasn't copying the last
element.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Zack Rusin f313b0c850 gallivm: cleanup the gs interface
Instead of void pointers use a base interface.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-03 10:16:25 -07:00
Brian Paul ac114c6824 svga: add new memory-used HUD query
To track the amount of memory used by all pipe_resources (textures
and buffers).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-03 11:02:47 -06:00
Brian Paul a69efa9482 util: add new util_resource_size() function in u_resource.[ch]
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-03 11:02:47 -06:00
Brian Paul a3cccdec90 util: move functions from u_resource.c to u_transfer.c
The functions are prototyped in u_transfer.h and are related to the
other functions in u_transfer.c.

The next patch will re-use the u_resource.c file for new code.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-03 11:02:47 -06:00
Vincent Lejeune 159d934066 r600g/llvm: Do not override llvm provided stack_size 2013-04-03 18:39:49 +02:00
Vincent Lejeune 097a6ecdfe r600g/llvm: Do not change cf_alu inst when adding alus 2013-04-03 18:22:40 +02:00
Marek Olšák ff01e0db0e radeonsi: add more cases for copying unsupported formats to resource_copy_region
Ported from r600g commit:

8891b2f9c9

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

NOTE: This is a candidate for the 9.1 branch.
2013-04-03 10:58:33 -04:00
Brian Paul 3838edaf5d svga: add HUD queries for number of draw calls, number of fallbacks
The fallbacks count is the number of drawing calls that use a "draw"
module fallback, such as polygon stipple.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-03 09:56:08 -06:00
Brian Paul 49ed1f3cb3 svga: refactor occlusion query code
This is in preparation for adding new query types for the HUD.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-03 09:56:07 -06:00
Brian Paul a9ae7e9c28 gallium/hud: try L8 texture for font if I8 format isn't supported 2013-04-03 09:44:57 -06:00
Brian Paul 0289ebaa0f svga: add case for PIPE_CAP_QUERY_PIPELINE_STATISTICS 2013-04-03 08:19:44 -06:00
Christoph Bumiller 80eef069f0 nv50,nvc0: remove MS resolve formats hack
Mesa now allows BlitFramebuffer resolve between RGBA and BGRA.
2013-04-03 13:19:15 +02:00
Christoph Bumiller 4de70bf43c nvc0: fix 128 bit compressed storage type selection 2013-04-03 12:54:44 +02:00
Christoph Bumiller 8e1dd58a7e nvc0: place staging textures in GART and map them directly 2013-04-03 12:54:44 +02:00
Christoph Bumiller ba9b0b682f nv50: account for pesky prefetch in size calculation of linear textures 2013-04-03 12:54:44 +02:00
Christoph Bumiller f0a0d59f0f nvc0: honour scaled coordiantes setting for linear textures 2013-04-03 12:54:44 +02:00
Christoph Bumiller d801545964 nvc0: fix for 2d engine R source formats writing RRR1 and not R001 2013-04-03 12:54:43 +02:00
Christoph Bumiller 6417d56c19 nv50,nvc0: disable DEPTH_RANGE_NEAR/FAR clipping during blit
We send position.z == 0, DEPTH_RANGE may be some arbitrary range
not including 0 (for exmaple in piglit's hiz tests).
2013-04-03 12:54:43 +02:00
Christoph Bumiller 2a8145d36b nouveau: accelerate buffer copies in resource_copy_region 2013-04-03 12:54:43 +02:00
Christoph Bumiller 3ed4bbd769 nvc0: demagic some of the NVE4_COMPUTE_UPLOAD methods
It's actually the same as P2MF.
2013-04-03 12:54:43 +02:00
Christoph Bumiller fb0334adb3 nvc0: read PM counters for each warp scheduler separately 2013-04-03 12:54:43 +02:00
Christoph Bumiller 7bac075f25 nvc0: add some metrics to driver specific queries 2013-04-03 12:54:43 +02:00
Christoph Bumiller 198f514aa6 nvc0: add some driver statistics queries 2013-04-03 12:54:43 +02:00
Christoph Bumiller 7628cc247f nvc0: disable compressed storage type 0xdb for now
Single-sample color compression doesn't seem that useful anyway.
2013-04-03 12:54:43 +02:00
Christoph Bumiller ea12fc3f6c nvc0: use correct hw query for PRIMITIVES_GENERATED
It was the same as SO_STATISTICS[1] before.
2013-04-03 12:54:43 +02:00
Christoph Bumiller 6bca4e7085 nvc0: use fence to check state of queries that don't write sequence
This still isn't optimal, since the fence will signal a bit late,
but better than checking on the bo, which may never be ready if it
is shared (which is likely).
2013-04-03 12:54:43 +02:00
Christoph Bumiller 3d2790cead gallium/hud: add support for PIPE_QUERY_PIPELINE_STATISTICS
Also, renamed "pixels-rendered" to "samples-passed" because the
occlusion counter increments even if colour and depth writes are
disabled, or (on some implementations) for killed fragments that
passed the depth test when PS early_fragment_tests is set.
2013-04-03 12:54:43 +02:00
Christoph Bumiller c620aad71c gallium/docs: fix definition of PIPE_QUERY_SO_STATISTICS
Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-03 12:54:43 +02:00
Christoph Bumiller f35e96d973 gallium: add PIPE_CAP_QUERY_PIPELINE_STATISTICS
Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-03 12:54:43 +02:00
Roland Scheidegger 450950c57a gallivm: bring back optimized but incorrect float to smallfloat optimizations
Conceptually the same as previously done in float_to_half.
Should cut down number of instructions from 14 to 10 or so, but
will promote some NaNs to Infs, so it's disabled.
It gets a bit tricky though handling all the cases correctly...
Passes basic tests either way (though there are no tests testing special
cases, but some manual tests injecting them seemed promising).

v2: style and comment fixes suggested by Jose

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-02 18:24:31 +02:00
Roland Scheidegger 3febc4a1cd gallivm: consolidate code for float-to-half and float-to-packed conversion.
This replaces the existing float-to-half implementation.
There are definitely a couple of differences - the old implementation
had unspecified(?) rounding behavior, and could at least in theory
construct Inf values out of NaNs. NaNs and Infs should now always be
properly propagated, and rounding behavior is now towards zero
(note this means too large but non-Infinity values get propagated to max
representable value, not Infinity).
The implementation will definitely not match util code, however (which
does nearest rounding, which also means too large values will get
propagated to Infinity).

Also fix a bogus round mask probably leading to rounding bugs...
v2: fix a logic bug in handling infs/nans.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-04-02 18:24:31 +02:00
Vadim Girlin 9be624b3ef r600g: don't reserve more stack space than required v5
Reduced stack size allows to run more threads in some cases,
improving performance for the shaders that use stack (that is, for the
shaders with control flow instructions). E.g. with unigine-based apps.

v4: implement exact computation taking into account wavefront size
v5: add cases for RV620, RS880

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-04-02 19:34:14 +04:00
Vadim Girlin 7e04227f39 r600g: fix range handling for tgsi input declarations v2
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
2013-04-02 19:34:14 +04:00
Marek Olšák f8502b7e71 gallium/hud: do .xxxx swizzling for the font texture in the fragment shader
This allows using L8 and R8 for the font if I8 isn't supported.

Tested-by: Brian Paul <brianp@vmware.com>
2013-04-02 16:57:57 +02:00
Brian Paul 98b64cc20f hud: flush/unmap the vertex buffer before drawing
The VMware svga driver is picky about making sure the VBO is unmapped
before drawing.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
2013-04-02 08:17:28 -06:00
Brian Paul bdd3770b78 draw: use pipe_transfer_unmap() to match pipe_transfer_map() 2013-04-02 08:17:28 -06:00
Roland Scheidegger 9b329f4c09 gallivm: fix signed small float to float conversion
Introduced by 5f41e08cf3,
just a silly typo.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=62921.
2013-04-02 13:21:07 +02:00
Christian König a0dca4409a radeonsi: add instance divisor support v3
v2: reduce key size, don't copy key around to much.
v3: remove key size reduction

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2013-04-02 13:01:43 +02:00
Christian König cf9b31f78a radeonsi: add start instance support
This works different than on R600, we need to add the start instance manually.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2013-04-02 13:01:43 +02:00
Christian König e4ed58763a radeonsi: add instanceid support
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2013-04-02 13:01:43 +02:00
Christian König 83df955ca9 radeon/llvm: move system value fetching to common code
This should be used by both SI and R600.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2013-04-02 13:01:42 +02:00
Michel Dänzer c6efb4870b radeonsi: Handle arbitrary 2-byte formats in resource_copy_region
Fixes mplayer -vo vdpau OSD.

NOTE: This is a candidate for the 9.1 branch.

Reported-by: Igor Vagulin <igor.vagulin@gmail.com>

Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Christian König <christian.koenig@amd.com>
2013-04-02 11:42:35 +02:00
Maarten Lankhorst 6d20c646d6 nvc0: Fix fd leak in nvc0_create_decoder
NOTE: This is a candidate for the 9.0 and 9.1 branches.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
2013-04-02 10:25:26 +02:00
Vincent Lejeune 50fd9c4544 r600g/llvm: Update LLVM_REVISION.txt 2013-04-01 23:50:20 +02:00
Vincent Lejeune 8c8c4e3977 r600g/llvm: Use stack_size provided from llvm. 2013-04-01 23:43:57 +02:00
Vincent Lejeune 4ac0d85ca6 r600g/llvm: uses function attribute to pass shader type 2013-04-01 23:43:42 +02:00
Vincent Lejeune af38695f51 r600g/llvm: Add support for cf_alu native encode 2013-04-01 23:43:27 +02:00
Mike Lothian 777a7f2003 clover: Fix build with LLVM 3.3 2013-04-01 10:50:23 -07:00
Brian Paul 1165ff1af1 llvmpipe: use triangle subdivision to avoid fixed-point overflow issues
If we're drawing to a surface that's 2048 x 2048 pixels or larger there's
danger of fixed-point overflow in the triangle rasterization code.  That
leads to various rendering glitches.

Rather than implement some intricate changes to the rasterization code,
simply subdivide triangles into smaller subtriangles to avoid the issue.
Only do this when the drawing surface is larger than 2048 by 2048.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-04-01 08:40:35 -06:00
Adam Jackson e26d5940ff gallivm: Minor comment cleanup
Signed-off-by: Adam Jackson <ajax@redhat.com>
2013-04-01 09:45:38 -04:00
Vincent Lejeune c3fb34ee8d r600g/llvm: Update LLVM_REVISION 2013-03-31 21:37:20 +02:00
Vincent Lejeune 67a8ee7aaa r600g/llvm: use native encode for tex 2013-03-31 21:35:47 +02:00
Roland Scheidegger 5f41e08cf3 gallivm: consolidate some half-to-float and r11g11b10-to-float code
Similar enough that we can try to use shared code.
v2: fix a stupid bug using wrong variable causing mayhem with Inf and NaNs.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com
2013-03-29 16:39:40 +01:00
Christoph Bumiller ee624ced36 nvc0: implement MP performance counters
There's more, but this only adds (most) of the counters that are
handled directly by the shader processors.
The other counter domains are not handled on the multiprocessor and
there are no FIFO object methods for configuring them.
Instead, they have to be programmed by the kernel via PCOUNTER, and
the interface for this isn't in place yet.
2013-03-29 00:33:01 +01:00
Christoph Bumiller 480359bcf6 nvc0: enable compression when supported 2013-03-29 00:33:01 +01:00
Christoph Bumiller 25722e3454 nvc0: use NOUVEAU_GETPARAM_GRAPH_UNITS to get MP count 2013-03-29 00:33:00 +01:00
Christoph Bumiller 443b247878 nv50,nvc0: fix 3d blits, restore viewport after blit 2013-03-29 00:33:00 +01:00
Christoph Bumiller 090e73fc46 nv50: fix 3D render target setup 2013-03-29 00:33:00 +01:00
Brian Paul b54ce3738a llvmpipe: put .bmp extension on dumped image files 2013-03-28 17:17:26 -06:00
Brian Paul e90c56bc4e llvmpipe: add 'f' suffix to 1.0 in fixed_to_float() 2013-03-28 17:17:26 -06:00
Brian Paul 499aa3ddb4 draw: fix some build breakage when LLVM is not used
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62883
Tested-by: Vinson Lee <vlee@freedesktop.org>
2013-03-28 17:15:58 -06:00
Marek Olšák a19f6e880a st/dri: fix crash with HUD and single buffering 2013-03-28 18:17:21 +01:00
Zack Rusin d066133a76 llvmpipe/draw: Fix texture sampling in geometry shaders
We weren't correctly propagating the samplers and sampler views
when they were related to geometry shaders.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-27 03:53:02 -07:00