Commit Graph

13958 Commits

Author SHA1 Message Date
Chia-I Wu 4b5c0a8341 ilo: replace ilo_sampler_cso with ilo_state_sampler 2015-06-15 01:06:45 +08:00
Chia-I Wu 745ef2c07b ilo: replace ilo_view_surface with ilo_state_surface 2015-06-15 01:06:45 +08:00
Chia-I Wu c10c1ac0cf ilo: replace ilo_zs_surface with ilo_state_zs 2015-06-15 01:06:44 +08:00
Chia-I Wu 6dad848d1a ilo: add ilo_state_ps
We want to make ilo_shader_cso a union of ilo_state_{vs,hs,ds,gs,ps}.
2015-06-15 01:06:44 +08:00
Chia-I Wu df9f846ac6 ilo: add ilo_state_{vs,hs,ds,gs}
We want to make ilo_shader_cso a union of ilo_state_{vs,hs,ds,gs} and ps
payload.
2015-06-15 01:06:44 +08:00
Chia-I Wu a0bb1c2d17 ilo: add ilo_state_sbe
We want to replace ilo_kernel_routing with ilo_state_sbe.
2015-06-15 01:06:44 +08:00
Chia-I Wu 1ccab943b6 ilo: add ilo_state_vf
We want to replace ilo_ve_state with ilo_state_vf.
2015-06-15 01:06:44 +08:00
Chia-I Wu 9c77ebef24 ilo: add ilo_state_urb 2015-06-15 01:06:44 +08:00
Chia-I Wu 3ff40be0ee ilo: add ilo_state_sol 2015-06-15 01:06:44 +08:00
Chia-I Wu 62bb643718 ilo: add ilo_state_cc
We want to replace ilo_dsa_state and ilo_blend_state with ilo_state_cc.
2015-06-15 01:06:44 +08:00
Chia-I Wu 6be8b6053d ilo: add ilo_state_raster
We want to replace ilo_rasterizer_state with ilo_state_raster.
2015-06-15 01:06:44 +08:00
Chia-I Wu 4fa7ed99a1 ilo: add ilo_state_viewport
We want to replace ilo_viewport_cso and ilo_scissor_state with
ilo_state_viewport.
2015-06-14 23:00:04 +08:00
Chia-I Wu 61fea171af ilo: add ilo_state_sampler
We want to replace ilo_sampler_cso with ilo_state_sampler.
2015-06-14 23:00:04 +08:00
Chia-I Wu f5f2007322 ilo: add ilo_state_surface
We want to replace ilo_view_surface with ilo_state_surface.
2015-06-14 23:00:04 +08:00
Chia-I Wu b91250a56b ilo: add ilo_state_zs
We want to replace ilo_zs_surface with ilo_state_zs.  One noteworthy
difference is that ilo_state_zs always aligns level 0 to 8x4 when HiZ is
enabled.  HiZ will not be enabled for 1D surfaces as a result.
2015-06-14 23:00:03 +08:00
Chia-I Wu 9af1fc590d ilo: update genhw headers
Generate these new enums

  enum gen_reorder_mode;
  enum gen_clip_mode;
  enum gen_front_winding;
  enum gen_fill_mode;
  enum gen_cull_mode;
  enum gen_pixel_location;
  enum gen_sample_count;
  enum gen_inputattr_select;
  enum gen_msrast_mode;
  enum gen_prefilter_op;

Correct the type of GEN6_SAMPLER_DW0_BASE_LOD.  Rename gen_logicop_function,
gen_sampler_mip_filter, gen_sampler_map_filter, gen_sampler_aniso_ratio, and
others.
2015-06-14 15:43:20 +08:00
Chia-I Wu 9cb0df4b50 ilo: add ilo_image_disable_aux()
When aux bo allocation fails, ilo_image_disable_aux() should be called to
disable aux buffer.
2015-06-14 15:43:20 +08:00
Chia-I Wu f0de65cbc2 ilo: add array_size and level_count to ilo_image
We will use them for bound checking.
2015-06-14 15:43:20 +08:00
Chia-I Wu f9d2bbe967 ilo: add pipe_texture_target to ilo_image
Save the target in ilo_image instead of passing it around.
2015-06-14 15:43:20 +08:00
Chia-I Wu 9da9cf729f ilo: fix "Render Cache Read Write Mode"
It needs be set to R/W only when using certain messages via DP render cache.
Since we only use RT wrties with the render cache, we never need to set it.
2015-06-14 15:43:20 +08:00
Chia-I Wu 1885ac4908 ilo: avoid resource owning in core
It is up to the users whether to reference count the BOs or not.
2015-06-14 15:43:20 +08:00
Chia-I Wu ab7229b9b6 ilo: assert core objects are zero-initialized
Core objects are usually embedded inside calloc()'ed objects and we expect
them to be zero-initialized.
2015-06-14 15:43:20 +08:00
Tom Stellard 4d35eef326 radeon/llvm: Handle LLVM backend rename from R600 to AMDGPU
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-06-12 21:02:00 -07:00
Emil Velikov d15c06b514 vc4: automake: enable subdir-objects
Silence the warnings about the future incompatibility with automake 2.0

Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-06-12 15:42:22 +01:00
Emil Velikov 1df5a6c71e mesa; add a dummy _mesa_error_no_memory() symbol to libglsl_util
Rather than forcing everyone to provide their own definition of the symbol
provide a common (dummy) one.

This helps us resolve the build of the standalone pipe-drivers (amongst
others), which are missing the symbol.

Cc: Rob Clark <robclark@freedesktop.org>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-06-12 15:32:18 +01:00
Emil Velikov 4722743f4b gallium: use $(top_builddir) when referencing static archives
Just like every other place in gallium.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-06-12 15:32:17 +01:00
Emil Velikov 3f5dc9b94f freedreno: use CXX linker rather than explicit link against libstdc++
Cc: Rob Clark <robclark@freedesktop.org>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-06-12 15:32:17 +01:00
Jose Fonseca 0dde821bcc trace: Add missing p_compiler.h include.
For boolean.

Trivial.
2015-06-12 12:14:11 +01:00
Brian Paul 7217faf39f llvmpipe: simplify lp_resource_copy()
Just implement it in terms of util_resource_copy_region().  Both the
original code and util_resource_copy_region() boil down to mapping,
calling util_copy_box() and unmapping.

No piglit regressions.  This will also help to implement GL_ARB_copy_image.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-06-10 08:20:58 -06:00
Dave Airlie c6877c9e59 nouveau: set imported buffers to what the kernel gives us
When we import a dma-buf fd from another driver the kernel
gives us the right info, and this trashes it.

Convert the kernel bo flags into the domain flags.

This helps getting reverse prime and glamor working.

Cc: mesa-stable@lists.freedesktop.org
Acked-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-06-10 14:10:01 +10:00
Eric Anholt 9dca3beb62 vc4: Drop qir include from vc4_screen.h
We didn't need any of it except for the list header, and qir.h pulls in
nir.h, which is not really interesting to winsys.
2015-06-09 12:25:50 -07:00
Eric Anholt 8d10b2a046 vc4: Drop subdirectory in vc4 build.
Just because we put the source in a subdir, doesn't mean we need helper
libraries in the build.  This will also simplify the Android build setup.
2015-06-09 12:25:50 -07:00
Eric Anholt e67b12eaf8 vc4: Update to current kernel validation code.
After profiling on real hardware, I found a few ways to cut down the
kernel overhead.
2015-06-09 12:25:50 -07:00
Chih-Wei Huang c5e11e5f7f android: build with libcxx on android lollipop
On Lollipop, apparently stlport is gone and libcxx must be used instead.
We still support stlport when building on earlier android releases.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-06-09 12:25:50 -07:00
Chih-Wei Huang 1842832660 android: enable the radeonsi driver
Based on the nice work of Paulo Sergio Travaglia <pstglia@gmail.com>.

The main modifications are:

- Include paths for LLVM header files and shared/static libraries
- Set C++ flag "c++11" to avoid compiling errors on LLVM header files
- Set defines for LLVM
- Add GALLIVM source files
- Changes path of libelf library for lollipop

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Anholt <eric@anholt.net>
2015-06-09 12:25:50 -07:00
Martin Peres 8614b9e489 softpipe/query: force parenthesis around a logical not
This makes GCC5 happy.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
2015-06-08 12:38:08 +03:00
Michel Dänzer 56e38edc96 radeonsi: Add CIK SDMA support
Based on the corresponding SI support. Same as that, this is currently
only enabled for one-dimensional buffer copies due to issues with
multi-dimensional SDMA copies.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-06-08 18:13:22 +09:00
Michel Dänzer 79f2acb8f8 r600g,radeonsi: Assert that there's enough space after flushing
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-06-08 18:10:35 +09:00
Marek Olšák 6aff87bb01 r600g: fix a coverity defect in streamout code
Reported by Ilia Mirkin.
2015-06-05 19:44:32 +02:00
Marek Olšák b6ebe7eabf tgsi/ureg: don't emit in/out arrays if drivers don't support ranged declarations
Softpipe, llvmpipe, r300g, and radeonsi pass tests. Other drivers need testing.

Freedreno and nv30 are definitely broken. Other drivers seem to be alright.
2015-06-05 19:44:32 +02:00
Roland Scheidegger 4fd42a7c27 llvmpipe: Implement stencil export
Pretty trivial, fixes the issue that we're expected to be able to blit
stencil surfaces (as the blit just relies on util blitter code which needs
stencil export to do it).
2 piglits skip->pass, 11 fail->pass

v2: prettify, keep different stencil ref value handling out of depth/stencil
test itself.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-06-04 03:56:19 +02:00
Eric Anholt ec1c72d38e vc4: Don't bother with safe list traversal in CSE.
We don't remove or move instructions.
2015-05-29 22:09:53 -07:00
Eric Anholt 78c773bb36 vc4: Convert from simple_list.h to list.h
list.h is a nicer and more familiar set of list functions/macros.
2015-05-29 22:09:53 -07:00
Eric Anholt 21a22a61c0 vc4: Make sure we allocate idle BOs from the cache.
We were returning the most recently freed BO, without checking if it
was idle yet.  This meant that we generally stalled immediately on the
previous frame when generating a new one.  Instead, allocate new BOs
when the *oldest* BO is still busy, so that the cache scales with how
much is needed to keep some frames outstanding, as originally
intended.

Note that if you don't have some throttling happening, this means that
you can accidentally run the system out of memory.  The kernel is now
applying some throttling on all execs, to hopefully avoid this.
2015-05-29 18:15:00 -07:00
Eric Anholt c821ccf0e3 vc4: Fix return value handling for BO waits.
If the wait ever returned -ETIME, we'd abort because the errno was
stored in errno and not drmIoctl()'s return value.
2015-05-29 18:15:00 -07:00
Marek Olšák 7116250b7a radeon/llvm: reset temps_count on deallocation
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-05-29 11:52:44 +02:00
Marek Olšák 7afc992c20 radeon/llvm: don't use a static array size for radeon_llvm_context::arrays (v2)
v2: - don't use realloc (tgsi_shader_info provides the size)

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-05-29 11:52:44 +02:00
Dave Airlie 065978d36b softpipe: fix offset wrapping calculations (v2)
Roland pointed out my previous attempt was lacking, so I enhanced the
texwrap piglit test, and tested them. This fixes the offset calculations
in a number of areas by adding the offset first, it also fixes the fastpaths,
which I forgot to address in the previous commit.

v2: try and avoid divides in most paths, the repeat mirror path
really was ugly no matter which way I went, so I left it having
the divide.
Also fix the gather lod calculation bug.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-05-29 13:15:47 +10:00
Eric Anholt 10aacf5ae8 vc4: Just stream out fallback IB contents.
The idea I had when I wrote the original shadow code was that you'd see a
set_index_buffer to the IB, then a bunch of draws out of it.  What's
actually happening in openarena is that set_index_buffer occurs at every
draw, so we end up making a new shadow BO every time, and converting more
of the BO than is actually used in the draw.

While I could maybe come up with a better caching scheme, for now just
do the simple thing that doesn't result in a new shadow IB allocation
per draw.

Improves performance of isosurf in drawelements mode by 58.7967% +/-
3.86152% (n=8).
2015-05-27 17:29:11 -07:00
Eric Anholt f8de6277bf vc4: Don't try to put our dmabuf-exported BOs into the BO cache.
We'd sometimes try to reallocate something that X was using as a new
pipe_resource, and potentially conflict in our rendering.  But even
worse, if we reallocated the BO as a shader, the kernel would reject
rendering using the shader.
2015-05-27 17:29:11 -07:00
Eric Anholt b0edc19a52 vc4: Don't forget to make our raster shadow textures non-raster.
Not sure what happened in my testing that made the previous shadow
code fix glxgears swapbuffering, but this also fixes lots of CopyArea
in X (like dragging xlogo around in metacity).
2015-05-27 17:29:11 -07:00
Samuel Pitoiset 41630c0653 vc4: make vc4_begin_query() return a boolean
I forgot to make the change in 96f164f6f0.
This fixes a warning with GCC and probably an error with Clang.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-05-27 17:29:03 -07:00
Marek Olšák 224a77cc60 radeonsi: use a switch statement in si_delete_shader_selector
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-05-26 12:42:37 +02:00
Marek Olšák 0c5a309cee radeonsi: use a switch statement in si_shader_selector_key
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-05-26 12:42:37 +02:00
Marek Olšák fa7f606e89 radeonsi: fix scratch buffer setup for geometry shaders
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-05-26 12:42:37 +02:00
Marek Olšák f41517242a radeonsi: remove unused cases from si_shader_io_get_unique_index
These can't occur between VS and GS, because GS is only supported
in the core profile.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-05-26 12:42:37 +02:00
Marek Olšák af4b9c7c2e radeonsi: don't count special outputs for the VS export count
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-05-26 12:42:36 +02:00
Marek Olšák e4339bc988 radeonsi: add support for PIPE_CAP_TGSI_TEXCOORD
Without it, texcoords are mapped to GENERIC[0..7], PointCoord is mapped to
GENERIC[8], and user-defined varyings start from GENERIC[9]. Since texcoords
can only be used between VS and PS, and PointCoord is PS-only, it's silly to
always start from GENERIC[9] in all other shaders (such as LS, HS, ES, GS).

This adds support for TEXCOORD and PCOORD semantics. As a result, st/mesa
will use GENERIC[0] as a base for user-defined varyings, which should make
linking ES and GS as well as tessellation shaders at runtime easier.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-05-26 12:42:31 +02:00
Marek Olšák 92c31bb0dd gallium: use const in set_tess_state
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-26 11:46:28 +02:00
Ilia Mirkin 3ec1815285 nv30: falling back to draw path for edgeflag does no good
The problem is that the EDGEFLAG has to be toggled at vertex submission
time. This can be done from either the draw or the regular paths. Avoid
falling back to draw just because there's an edgeflag.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-05-25 21:45:31 -04:00
Ilia Mirkin 25be70462d nv30/draw: switch varying hookup logic to know about texcoords
Commit 8acaf862df switched things over to use TEXCOORD instead of
GENERIC, but did not update the nv30 swtnl draw paths. This teaches the
draw logic about TEXCOORD.

Among other things, this fixes a crash in demos/arbocclude when using
swtnl. Curiously enough, the point-sprite piglit works without this.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-05-25 21:45:31 -04:00
Ilia Mirkin c3d36a2e1a nv30/draw: allocate vertex buffers in gart
These are only used once per draw, so it makes sense to keep them in
GART. Also take this opportunity to modernize the buffer mapping API
usage.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-05-25 21:45:22 -04:00
Ilia Mirkin fdad7dfbda nv30/draw: only use the DMA1 object (GART) if the bo is not in VRAM
Instead of always having it in the data, let the bo placement decide it.
This fixes glxgears with swtnl forced on.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-05-25 21:45:08 -04:00
Ilia Mirkin 3600439897 nv30/draw: fix indexed draws with swtnl path and a resource index buffer
The map = assignment was missing.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-05-25 20:16:51 -04:00
Roland Scheidegger 6a111e54d7 llvmpipe: (trivial) add parantheses in (!x == y) expression
Apparently some compilers think we probably wanted to do !(x == y) instead
and issue a warning, so just shut it up... No functional change, obviously.

Cc: <mesa-stable@lists.freedesktop.org>
2015-05-25 22:24:42 +02:00
Ilia Mirkin 147816375d nv30/draw: draw expects constbuf size in bytes, not vec4 units
This fixes glxgears with NV30_SWTNL=1 forced on. Probably fixes a bunch
of other situations where we fall back to the swtnl path.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-05-25 14:11:16 -04:00
Ilia Mirkin 89585edf3c nv30/draw: avoid leaving stale pointers in draw state
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-05-25 14:11:16 -04:00
Ilia Mirkin 7518fc3c66 nv30: fix clip plane uploads and enable changes
nv30_validate_clip depends on the rasterizer state. Also we should
upload all the new clip planes on change since next time the plane data
won't have changed, but the enables might.

This fixes fixed-clip-enables and vs-clip-vertex-enables shader tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-05-24 12:00:03 -04:00
Ilia Mirkin aba3392541 nv30: avoid doing extra work on clear and hitting unexpected states
Clearing can happen at a time when various state objects are incoherent
and not ready for a draw. Some of the validation functions don't handle
this well, so only flush the framebuffer state. This has the advantage
of also not doing extra work.

This works around some crashes that can happen when clearing.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
2015-05-24 12:00:03 -04:00
Ilia Mirkin 9870ed05dd nv30: avoid leaking render state and draw shaders
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-05-24 02:26:29 -04:00
Ilia Mirkin 605ce36d7f nv30: don't leak fragprog consts
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-05-24 01:33:06 -04:00
Ilia Mirkin fa7f9f123b nv50/ir: avoid messing up arg1 of PFETCH
There can be scenarios where the "indirect" arg of a PFETCH becomes
known, and so the code will attempt to propagate it. Use this
opportunity to just fold it into the first argument, and prevent the
load propagation pass from touching PFETCH further.

This fixes gs-input-array-vec4-index-rd.shader_test and
vs-output-array-vec4-index-wr-before-gs.shader_test on nvc0 at least.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-05-23 22:15:15 -04:00
Ilia Mirkin c922758685 nv30: check nouveau_bo_map output of notify bo
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-23 19:10:07 -04:00
Ilia Mirkin 921917c8d8 nvc0: a geometry shader can have up to 1024 vertices output
The 1024 is already reported everywhere, not sure where this 0x1ff came
from.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-05-23 17:55:21 -04:00
Samuel Pitoiset c783fd476c nv50: fix PIPE_QUERY_TIMESTAMP_DISJOINT, based on nvc0
PIPE_QUERY_TIMESTAMP_DISJOINT could not work because q->ready was always
set to FALSE. To fix this issue, add more different states for queries
according to nvc0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-23 19:00:55 +02:00
Ilia Mirkin 217301843a nvc0/ir: LOAD's can't be used for shader inputs
We forgot to convert to VFETCH in case of indirect access. Fix that.

This avoids crashes on the new gs-input-array-vec4-index-rd and
vs-output-array-vec4-index-wr-before-gs but they still fail.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-05-22 19:08:24 -04:00
Ilia Mirkin 0bab3962f5 nv50/ir: guess that the constant offset is the starting slot of array
When we get something like IN[ADDR[0].x+5], we will now guess that we
should look at IN[5] for the "base" information.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-05-22 19:08:14 -04:00
Ilia Mirkin d1eea18a59 nvc0/ir: set ftz when sources are floats, not just destinations
In the case of a compare, the destination might be a predicate, but we
still want to flush denorms.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
2015-05-22 16:51:05 -04:00
Ilia Mirkin a85aba190d nv50/ir: allow OP_SET to merge with OP_SET_AND/etc as well as a neg
This covers the pattern where a KILL_IF is used, which triggers a
comparison of -x to 0. This can usually be folded into the comparison whose
result is being compared to 0, however it may, itself, have already been
combined with another comparison. That shouldn't impact the logic of
this pass however. With this and the & 1.0 change, code like

00000020: 001c0001 80081df4     set b32 $r0 lt f32 $r0 0x3e800000
00000028: 001c0000 201fc000     and b32 $r0 $r0 0x3f800000
00000030: 7f9c001e dd885c00     set $p0 0x1 lt f32 neg $r0 0x0
00000038: 0000003c 19800000     $p0 discard

becomes

00000020: 001c001d b5881df4     set $p0 0x1 lt f32 $r0 0x3e800000
00000028: 0000003c 19800000     $p0 discard

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-22 16:51:05 -04:00
Ilia Mirkin d2a474e8d4 nvc0/ir: optimize set & 1.0 to produce boolean-float sets
This has started to happen more now that the backend is producing
KILL_IF more often.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
2015-05-22 16:51:05 -04:00
Ilia Mirkin e5ad19a46e nvc0/ir: allow iset to produce a boolean float
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-22 16:51:05 -04:00
Ilia Mirkin 0ec6b8ea8c nvc0/ir: avoid jumping to a sched instruction
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-22 16:51:05 -04:00
Samuel Pitoiset a21d23e191 nv50: fix PIPELINE_STATISTICS with HUD, based on nvc0
Tested on NVA8. No regression for ARB_pipeline_statistics piglit tests.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-22 11:39:23 +02:00
Samuel Pitoiset 867fd2b5f5 nv50: fix 64-bit queries with HUD, based on nvc0
A sequence number is written for 32-bits queries to make sure they are
ready, but not for 64-bits queries. Instead, we have to use a fence in
order to fix the HUD because it doesn't wait until the result is ready.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-22 11:39:23 +02:00
Christian König 6921ea42a1 radeon/vce: adapt new firmware interface changes
v2: make this also compatible with original released firmware
v3 (chk): switch to original idea of separate files for fw versions

Signed-off-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v2)
2015-05-22 10:17:24 +02:00
Christian König 2b40c306d2 radeon/vce: move CPB handling function into common code
They are not firmware version dependent.

Signed-off-by: Christian König <christian.koenig@amd.com>
2015-05-22 10:17:24 +02:00
Ilia Mirkin 6cdb29d52f freedreno/a3xx: set .zw of sprite coords to .01
Fixes non-determinism in bin/point-sprite rendering, and the stars on
the intro screen to neverball.

Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-20 21:54:00 -04:00
Ilia Mirkin 3e7bc67285 freedreno/ir3: fix immediate usage in tgsi tex fe
get_immediate will return a const reference, the requested immediate
isn't necessarily in the x slot. Make sure to use the swizzle.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-05-20 21:53:59 -04:00
Jason Ekstrand 2126c68e5c nir: Get rid of the array elements parameter on load/store intrinsics
Previously, we used intrinsic->const_index[1] to represent "the number of
array elements to load" for load/store intrinsics.  However, this set to 1
by every pass that ever creates a load/store intrinsic.  Also, while it
might make some sense for registers, it makes no sense whatsoever in SSA.
On top of that, the i965 backend was the only backend to ever support it;
freedreno and vc4 just assert that it's always 1.  Let's just delete it.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2015-05-20 09:28:06 -07:00
Marek Olšák e1c4e8aaaa gallium: remove TGSI_SAT_MINUS_PLUS_ONE
It's a remnant of some old NV extension. Unused.

I also have a patch that removes predicates if anyone is interested.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2015-05-20 15:40:46 +02:00
Dave Airlie 55a7b5165d softpipe: start adding gather support (v2)
This adds both ARB_texture_gather and the enhanced gather
for ARB_gpu_shader5.

This passes all the piglit tests, it relies on the GLSL
lowering pass to make textureGatherOffsets work.

v2: use inline to get gather component (Brian)
fix function name, add asserts (Brian)

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-05-20 12:32:59 +10:00
Dave Airlie 0108eae291 softpipe: use arrays to make gather easier
This is a prep change for gather, and it makes more sense
to use an array in these cases.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-05-20 12:32:55 +10:00
Dave Airlie 3f5c67d651 softpipe: add textureOffset support.
This was an oversight when GLSL1.30 was enabled, I think my
misunderstanding.

This fixes a bunch of tex-miplevel-selection tests under softpipe,
and is required for textureGather support.

I'm not sure this won't make sampling slowering, but its softpipe,
correctness first and all that.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-05-20 12:32:47 +10:00
Dave Airlie 8bec83a307 softpipe: move control into a filter args struct
more stuff for offsets and gather will go in here later.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-05-20 12:32:44 +10:00
Dave Airlie 99e583120c softpipe: move some image filter parameters into a struct
This moves some of the image filter args into a struct,
and passes that instead, this is prep work for adding texture
gather support which needs new arguments.

review: make filter args const.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-05-20 12:32:27 +10:00
Rob Clark e6f912f07e freedreno: fence fix
A fence can outlive the ctx, so we shouldn't deref the ctx to get at the
screen.  We need some updates in libdrm_freedreno API to completely
handle fences properly, but this is at least an improvement.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-05-18 17:47:54 -04:00
Ilia Mirkin ae405d429f gk110/ir: switch to gk104-style sched codes rather than all-in-one
Matches change to envydis/envyas tools.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-18 12:59:52 -04:00
Marek Olšák 369aca1b4a trace: implement new tessellation functions
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-05-16 14:51:22 +02:00
Alexander von Gluck IV 624b38add9 gallium/drivers: Add extern "C" wrappers to public entry
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-05-15 13:55:59 -04:00
Rob Clark 4925c35660 freedreno: fix bug in tile/slot calculation
This was causing corruption with hw binning on a306.  Unlikely that it
is a306 specific, but rather the smaller gmem size resulted in different
tile configuration which was triggering the bug at certain resolutions.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Cc: "10.4" and "10.5" and "10.6" <mesa-stable@lists.freedesktop.org>
2015-05-14 14:46:14 -04:00
Rob Clark fcc7d6323b freedreno: enable a306
Whitelist adreno 306 (as found in msm8916/apq8016).  Works pretty much
out of the box, although the smaller GMEM size requires more tiles to
fit 1920x1080, so bump up the max # of tiles as well.

Since it is just whitelist + trivial change, it makes sense to land on
all the active release branches.

Note that a305c ends up with gpu-id "306", hence a306 ends up with
gpu-id of "307".  Apparently that is what happens when you let the
marketing dept name things.

Cc: "10.4" and "10.5" and "10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-05-14 14:46:14 -04:00
Samuel Pitoiset 175cbb447a nvc0: remove unused nv50_tsc_wrap_mode() function
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-14 13:27:44 -04:00
Samuel Pitoiset ac1ac94b38 nv50/ir: silence compiler warnings about mismatched tags
These warnings have been detected by Clang 3.6.

codegen/nv50_ir_from_tgsi.cpp:1319:10: warning: struct 'Source' was
previously declared as a class [-Wmismatched-tags] const struct tgsi::Source *code;

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-14 13:27:44 -04:00
Samuel Pitoiset 70651b7041 nv50/ir: remove unused private field cycle to SchedDataCalculator
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-14 13:27:43 -04:00
Samuel Pitoiset 7469f2fd23 nv30: remove unused nvfx_fp_memcpy() function and comment nv40_fp_bra()
The nv40_fp_bra() function in the same file is also unused but this is
the only place where the nv30/nv40 isa is documented.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-14 13:27:43 -04:00
Samuel Pitoiset 48c84a36dd nvc0: do not expose MP counters for nvf0 (GK110+)
This fixes a crash when trying to monitor MP counters because compute
support is not implemented for nvf0.

Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-14 13:27:43 -04:00
Roland Scheidegger adcf8f8a13 softpipe: enable ARB_texture_view
Some bits were already there for texture views but some were missing.
In particular for cube map views things needed to change a bit.
For simplicity I ended up removing the separate face addr bit (just use
the z bit) - cube arrays didn't use it already, so just follow the same
logic there. (In theory using separate bits could allow for better hash
function but I don't think anyone ever did some measurements of that so
probably not worth the trouble, if we'd reintroduce it we'd certainly
wanted to use the same logic for cube arrays and cube maps.)
Also extend the seamless cube sampling to cube arrays - as there were no
piglit failures before this is apparently untested, but things now generally
work quite the same for cube textures and cube array textures so there
hopefully shouldn't be any trouble...

49 new piglits, 47 pass, 2 fail (both due to fake multisampling).

v2: incorporate Brian's feedback, add sampler view validation,
function rename, formatting fixes.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-05-13 22:57:50 +02:00
Roland Scheidegger e6c66f4fb0 llvmpipe: enable ARB_texture_view
All the functionality was pretty much there, just not tested.
Trivially fix up the missing pieces (take target info from view not
resource), and add some missing bits for cubes.
Also add some minimal debug validation to detect uninitialized target values
in the view...

49 new piglits, 47 pass, 2 fail (both related to fake multisampling,
not texture_view itself). No other piglit changes.

v2: move sampler view validation to sampler view creation, update docs.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-05-13 22:57:50 +02:00
Ilia Mirkin c696a318ef nouveau: document nouveau_heap
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-12 18:58:49 -04:00
Ilia Mirkin d06ce2f1df nvc0: switch mechanism for shader eviction to be a while loop
This aligns it to work similarly to nv50. However there's no library
code there, so the whole thing can be freed. Here we end up with an
allocated node that's not attached to a specific program.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86792
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-05-12 18:47:17 -04:00
Marek Olšák 79ffc08ae8 gallium: add PIPE_CAP_DEVICE_RESET_STATUS_QUERY
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-12 19:38:31 +02:00
Dave Airlie 9ab90c058f r600: use pipe->hw prim convert from radeonsi
This avoids future addition to PIPE_PRIM_ from causing regressions
on r600g.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-05-11 06:43:18 +10:00
Rob Clark 1cbdafc47a freedreno/ir3/nir: fix build break after f752effa
Our lower if/else pass was missed when converting NIR to use linked
lists rather than hashsets to track use/def sets.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-05-10 06:03:53 -04:00
Ilia Mirkin da136dc07d nv50/ir: only enable mul saturate on G200+
Commit 44673512a8 enabled support for saturating fmul. However
experimentally this does not seem to work on the older chips. Restrict
the feature to G200 (NVA0) and later.

Reported-by: Pierre Moreau <pierre.morrow@free.fr>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90350
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Cc: mesa-stable@lists.freedesktop.org
2015-05-09 13:41:51 -04:00
Ilia Mirkin 7892210400 nvc0: reset the instanced elements state when doing blit using 3d engine
Since we update num_vtxelts here, we could otherwise end up with stale
instancing information in the upper bits which wouldn't otherwise get
reset. (Also we run the risk of the previous draw having set the first
element as instanced.)

This appears as one of the causes for the test pointed out in fdo#90363
to fail on nvc0.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90363
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-05-09 13:36:23 -04:00
Ilia Mirkin e9b1ea29bf nvc0: keep track of PGRAPH state in nvc0_screen
See identical commit for nv50. Destroying the current context and then
creating a new one or switching to another existing context would cause
the "current" state to not be properly initialized, so we save it off in
the screen.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-05-09 13:36:23 -04:00
Ilia Mirkin f617029db3 nv50: keep track of PGRAPH state in nv50_screen
Normally this is kept in nv50_context, and on switching the active
context, the state is copied from the previous context. However when the
last context is destroyed, this is lost, and a new context might later
be created. When the currently-active context is destroyed, save its
state in the screen, and restore it when setting the current context.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90363
Reported-by: Matteo Bruni <matteo.mystral@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Matteo Bruni <matteo.mystral@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2015-05-09 13:36:23 -04:00
Jason Ekstrand 7a30668ad6 util: Move gallium's linked list to util
The linked list in gallium is pretty much the kernel list and we would like
to have a C-based linked list for all of mesa.  Let's not duplicate and
just steal the gallium one.

Acked-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2015-05-08 17:16:13 -07:00
Ilia Mirkin c4ac09e30e nv50/ir: only propagate saturate up if some actual folding took place
The former logic would copy the saturate up to any mul with an immediate
if there was a subsequent mul with a saturate. However we only want to
do that if we collapsed 2 muls by multiplying their immediates (or were
able to put the immediate in as a post-multiplier).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-05-08 18:56:56 -04:00
Ilia Mirkin 55b66dc4de nv50/ir: add SHL to the list of U32 opcodes
Having the wrong inferred type prevents a number of optimizations,
including constant propagation (since float immediates work differently
than integer immediates).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-06 20:50:03 -04:00
Vinson Lee 382b1a36e3 r600g: Fix Clang return-type build error.
Fix Clang return-type error introduced with commit
96f164f6f0 "gallium: make
pipe_context::begin_query return a boolean".

  CC       r600_query.lo
r600_query.c:443:3: error: non-void function 'r600_begin_query' should return a value [-Wreturn-type]
                return;
                ^

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-05-06 12:21:34 -07:00
Chia-I Wu ef5d4bcc3a ilo: silence a compiler warning
Silence

  ilo_query.c:120:7: warning: 'return' with no value, in function returning non-void

since commit 96f164f6.
2015-05-06 16:35:30 +08:00
Samuel Pitoiset cea910bc28 nvc0: all queries use an unsigned 64-bits integer by default
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Martin Peres <martin.peres@free.fr>
2015-05-06 00:03:36 +03:00
Samuel Pitoiset 35a9286be6 nvc0: make begin_query return false when all MP counters are used
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Martin Peres <martin.peres@free.fr>
2015-05-06 00:03:36 +03:00
Samuel Pitoiset ed7d3886cc nvc0: define driver-specific query groups
This patch defines "Driver statistics" and "MP counters" groups, but
only the latter will be exposed through GL_AMD_performance_monitor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
2015-05-06 00:03:36 +03:00
Samuel Pitoiset 96f164f6f0 gallium: make pipe_context::begin_query return a boolean
GL_AMD_performance_monitor must return an error when a monitoring
session cannot be started.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
2015-05-06 00:03:36 +03:00
Samuel Pitoiset 546ec980f8 gallium: replace pipe_driver_query_info::max_value by a union
This allows queries to return different numeric types.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
2015-05-06 00:03:35 +03:00
Samuel Pitoiset b620829b5e gallium: add new fields to pipe_driver_query_info
According to the spec of GL_AMD_performance_monitor, valid type values
returned are UNSIGNED_INT, UNSIGNED_INT64_AMD, PERCENTAGE_AMD, FLOAT.
This also introduces the new field group_id in order to categorize
queries into groups.

v2: add PIPE_DRIVER_QUERY_TYPE_BYTES

v3: fix incorrect query type for radeon and svga drivers

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
2015-05-06 00:03:35 +03:00
Chia-I Wu 4348046a2f ilo: use ilo_image exclusively in core
Initialize ilo_view_surface and ilo_zs_surface from ilo_image instead of
ilo_texture.
2015-05-02 22:28:31 +08:00
Chia-I Wu 9b705ec32d ilo: add ilo_image_can_enable_aux()
It replaces ilo_texture_can_enable_hiz().
2015-05-02 22:14:07 +08:00
Chia-I Wu 430594c34f ilo: make ilo_image more self-contained
Add depth0, sample_count, and scanout to ilo_image.
2015-05-02 22:14:06 +08:00
Chia-I Wu f6ca4084c7 ilo: add ilo_image_init_for_imported()
It replaces ilo_image_update_for_imported_bo() and enables more error
checkings for imported textures.
2015-05-02 22:14:06 +08:00
Chia-I Wu 938c9b8cea ilo: prepare for image init for imported bo
Refactoring in prepraration for ilo_image_init_for_imported().
2015-05-02 22:14:06 +08:00
Chia-I Wu 3f9415077b ilo: constify ilo_image_params
Make ilo_image_params const in functions that do not modify it.
2015-05-02 22:14:06 +08:00
Chia-I Wu c209aa7a8f ilo: improve readability of ilo_image
Improve docs, rename struct fields, and reorder walk types.  No real changes.
2015-05-02 22:14:06 +08:00
Chia-I Wu 9b72bf5bd2 ilo: move command builder to core 2015-05-02 22:14:06 +08:00
Chia-I Wu 9e24c49e64 ilo: move ilo_state_3d* to core
ilo state structs (struct ilo_xxx_state) are moved as well.
2015-05-02 22:14:06 +08:00
Chia-I Wu 8ab18262c5 ilo: add ilo_buffer.h to core
Rename the original ilo_buffer to ilo_buffer_resource to avoid name conflict.
2015-05-02 22:14:06 +08:00
Chia-I Wu 3afbeb115a ilo: move BOs from ilo_texture to ilo_image
We want to work with ilo_image instead of ilo_texture in core.
2015-05-02 22:14:06 +08:00
Chia-I Wu ac47563cb4 ilo: move ilo_layout.[ch] to core as ilo_image.[ch]
Move files and s/layout/image/.
2015-05-02 22:14:06 +08:00
Chia-I Wu 8252765532 ilo: add ilo_format.[ch] to core
The original ilo_format.[ch] are removed.
2015-05-02 22:14:06 +08:00
Chia-I Wu 9b7080c8b3 ilo: add ilo_fence.h to core
Implement pipe_fence_handle on top of ilo_fence.
2015-05-02 22:14:06 +08:00
Chia-I Wu 2182beb431 ilo: add ilo_dev_init() to core
Move init_dev() from ilo_screen.c to core.
2015-05-02 22:14:06 +08:00
Chia-I Wu 7562f9e907 ilo: rename ilo_dev_info to ilo_dev
With intel_winsys being embedded in it, drop the "_info" suffix.
2015-05-02 22:14:06 +08:00
Chia-I Wu 19351af53d ilo: move intel_winsys to ilo_dev_info
We want to use ilo_dev_info instead of ilo_screen in core.
2015-05-02 22:14:06 +08:00
Chia-I Wu b3197fe5f4 ilo: add ilo_dev.h to core
Move what are remaining in ilo_common.h (that is, ilo_dev_*) to ilo_dev.h.
2015-05-02 22:14:06 +08:00
Chia-I Wu 7bb4fa72c0 ilo: add ilo_debug.[ch] to core
They consist of the debug helpers that used to live in ilo_common.h and
ilo_screen.c.
2015-05-02 22:14:06 +08:00
Chia-I Wu a5797873d0 ilo: add ilo_core.h to core
ilo_core.h includes the common gallium headers that were included in
ilo_common.h.
2015-05-02 22:14:05 +08:00
Chia-I Wu bbe91576b7 ilo: move intel_winsys.h to core
Add a new subdirectory and start moving files that do not depend on
ilo_screen/ilo_context to it.
2015-05-02 22:14:05 +08:00
Ilia Mirkin 33f0d1138d nvc0/ir: fix predicated PFETCH for real
Commit a9d08a250 accidentally didn't make use of the new src1 variable.
Use it.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-04-30 02:02:47 -04:00
Ilia Mirkin db269ae495 nv50/ir: fix asFlow() const helper for OP_JOIN
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-04-29 23:34:30 -04:00
Ilia Mirkin a9d08a250a nvc0/ir: fix predicated PFETCH emission
src1 would contain the predicate, which would get emitted as a register
source by an undiscerning srcId helper. Work around this in the same way
as in emitTEX.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-04-29 23:34:22 -04:00
Ilia Mirkin 515ac907e6 gk110/ir: fix set with a register dest to not auto-set the abs flag
This was causing src0 to always have the absolute value flag set.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-04-29 18:03:19 -04:00
Marek Olšák a582b22c63 winsys/radeon: add a private interface for radeon_surface 2015-04-29 21:51:40 +02:00
Marek Olšák dcfbc006b6 winsys/radeon: move radeon_winsys.h to drivers/radeon 2015-04-29 21:51:40 +02:00
Emil Velikov b124dc2b70 r300: do not link against libdrm_intel
Accidentally added since the introduction of the file.

Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-04-29 15:15:19 +01:00
Axel Davy 559342d01d gallium/svga: Remove useless ARRAY_SIZE declaration
This is already declared in util/macros.h

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2015-04-29 08:28:10 +02:00
Axel Davy 64880d073a util/macros: Move DIV_ROUND_UP to util/macros.h
Move DIV_ROUND_UP to a shared location accessible everywhere

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2015-04-29 08:28:10 +02:00
Ilia Mirkin 6fe0d4f035 nvc0/ir: flush denorms to zero in non-compute shaders
This will set the FTZ flag (flush denorms to zero) on all opcodes that
can take it.

This resolves issues in Unigine Heaven 4.0 where there were solid-filled
boxes popping up.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89455
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-28 20:17:03 -04:00
Ilia Mirkin e312a69958 nvc0: expose GLSL version 410
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-28 12:48:22 -04:00
Marek Olšák 6d05396b00 r600g,radeonsi: add a driver query returning GPU load
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-04-28 16:05:45 +02:00
Marek Olšák 0b8e73a6ae r600g,radeonsi: add driver queries for GPU temperature and shader+memory clocks
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-04-28 16:05:45 +02:00
Ilia Mirkin 9143940da2 gm107/ir: add lane/vertex count sysvals
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-27 21:25:29 -04:00
Ilia Mirkin 89e0b08794 gk110/ir: add support for writing per-patch and shader outputs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-27 21:25:28 -04:00
Ilia Mirkin 52614f59b7 freedreno/a3xx: color masking works like a blend for some formats
When there is a colormask active that does not cover all the channels,
enable reading in the destination like with a combining blend
operation. This fixes fbo-blending-formats on a3xx.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-27 20:17:07 -04:00
Ilia Mirkin 9fc3f47278 freedreno/a3xx: add support for S8 and Z32F_S8
Enables ARB_depth_buffer_float. There is no sampling support for
interleaved Z32F_S8, so we store the two textures separately, one as
Z32F, the other as S8. As a result, we need a lot of additional logic
for restores and transfers.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-27 20:17:07 -04:00
Ilia Mirkin 1571da6ac3 freedreno/a3xx: add Z32F support
32-bit depth buffers are stored as unorm, and thus need special handling
when moving to and from gmem. They are copied into gmem by writing
depth, and resolved from gmem using a special resolve bit which
apparently float-ifies the data.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-27 20:17:07 -04:00
Ilia Mirkin 0a4cb00c77 freedreno: add fd_transfer to wrap around pipe_transfer
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-27 20:17:07 -04:00
Ilia Mirkin f5c1101996 freedreno/a3xx: add support for disabling depth clipping
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-27 20:17:07 -04:00
Zoë Blade 05e7f7f438 Fix a few typos
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-04-27 17:28:29 +03:00
Marek Olšák db2415189a radeonsi: set an optimal value for DB_Z_INFO.ZRANGE_PRECISION
Required because of a VI hw bug.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-04-27 15:57:07 +02:00
Marek Olšák bed98eef9a radeonsi: remove deprecated and useless registers
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-04-27 15:56:27 +02:00
Marek Olšák 393b0e0531 radeonsi: remove useless includes
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-04-27 15:56:27 +02:00
Marek Olšák d8269be1ce gallium/radeon: print winsys info with R600_DEBUG=info
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-04-27 15:56:27 +02:00
Marek Olšák ecc7f2ed91 gallium/radeon: don't crash when getting out-of-bounds TEMP references
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-04-23 16:14:39 +02:00
Dave Airlie 8a41cd2407 softpipe: fix stencil write to use an integer value
This fixes a number of regressions since
61393bdcdc
u_tile: fix stencil texturing tests under softpipe

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89960
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-04-23 08:32:30 +10:00
Rob Clark cb24d3b7ad freedreno: misc minor cleanups
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark 1b58d8c2bf freedreno/a4xx: (partial) gl_FragCoord.zw
The bit to enable .z is still commented out, as it is triggering gpu
hangs in 0ad.  But at least gl_FragCoord.w works now, and we know what
bits we are *supposed* to set for .z (with that uncommented all piglit
fragcoord tests are passing).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark a869183123 freedreno/a4xx: primitive-restart
This was the missing bit to get dolphin-emu working on a4xx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark 632ea2a113 freedreno/nir: sysval fixes
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark 13527df143 freedreno/a4xx: wire up integer texture sampling
Similar to a3xx, the compiler needs to know the return type of the sam,
etc, instructions.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark 48a651e98c freedreno/a4xx: formats updates/fixes
Update formats table with new formats that Ilia has figured out, and fix
sampling from srgb texture and integer vbo's.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark 21ceedfd8b freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:27 -04:00
Emil Velikov 86919352e3 android: use LOCAL_SHARED_LIBRARIES over TARGET_OUT_HEADERS
... to manage the LIBDRM*_CFLAGS. The former is the recommended approach
by the Android build system developers while the latter has been
depreciated for quite some time.

Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-04-22 14:23:28 +01:00
Emil Velikov 413bc0a618 ilo: remove unused include from Android.mk
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
2015-04-22 14:18:47 +01:00
Ilia Mirkin 0904774af1 freedreno/a3xx: enable polymode setting with non-fill modes
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-18 17:35:23 -04:00
Ilia Mirkin 6357601628 freedreno/a3xx: fix integer and 32-bit float border colors
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-18 17:35:23 -04:00
Ilia Mirkin 6895c3554e freedreno/a3xx: add support for float R/RG render targets
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-18 17:35:23 -04:00
Rob Clark 95e68adcd9 freedreno/ir3/nir: few little fixes
isaml needs to scale up coords based on LoD.  Also fix bogus bary.f
varying # when there are non-bary frag shader inputs.  And use sub.s of
a positive immediate rather than add.s of negative (since CP is better
about figuring out that those can be collapsed into the cat2 instr).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 11:40:14 -04:00
Rob Clark efbf14e893 freedreno/ir3/nir: lower if/else
For now, completely flatten if/else blocks.  That will almost certainly
change once we have flow control.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 11:40:14 -04:00
Rob Clark e5e11b5baf freedreno/a4xx: support for large shaders
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 10:40:50 -04:00
Rob Clark 20ea698c49 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 10:40:44 -04:00
Rob Clark 57f0d3b3c6 freedreno/ir3/nir: UBO support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 10:40:36 -04:00
Rob Clark 87807e5cc5 freedreno/ir3: move out helper
We'll also want it in NIR f/e for implementing UBO support.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 10:40:28 -04:00
Rob Clark 70b2f872ea freedreno/a4xx: sysvals and UBOs
Basically just sync up the cmdstream emit parts to match the changes
already done on a3xx.

Also, fix scheduling for mem instructions.  This is needed on a4xx, and
I am a bit surprised it isn't needed for a3xx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 10:40:18 -04:00
Marek Olšák b79c620663 radeonsi: add a debug option to compile shaders when they're created
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2015-04-16 18:36:29 +02:00
Emil Velikov a7d018accf radeonsi: remove bogus r600-- triple
As mentioned by Michel Dänzer for LLVM >= 3.6 we create the
LLVMTargetMachine (with triple amdgcn--), as we setup the radeonsi
context. For older LLVM or hardware (r600) the triple is always r600--
and is created at a later stage - radeon_llvm_compile()

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-04-16 14:15:19 +01:00
Glenn Kennard 17d69862a9 r600g/sb: Skip empty ALU clause while scheduling
Fixes assert triggered by
ext_transform_feedback-intervening-read output use_gs
piglit test.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-04-16 12:43:20 +10:00
Eric Anholt b229e6c7de vc4: Don't try to use color load/stores to blit across format changes.
We could potentially support the right combination of 8888 to 565, but the
important thing for now is to not mix up our orderings of 8888.  Fixes
fbo-copyteximage regressions.
2015-04-15 16:50:23 -07:00
Eric Anholt cff2e08c4c vc4: Don't try to use color load/stores to do depth/stencil blits.
Fixes regressions in fbo-generatemipmap-formats on depth/stencil (which
does blits to work around baselevel/lastlevel).
2015-04-15 16:50:23 -07:00
Eric Anholt 3a728d4dfb vc4: Update the shadow texture for public textures on every draw.
We don't know who else has written to it, so we'd better update it every
time.  This makes the gears spin in X again.
2015-04-15 16:50:23 -07:00
Eric Anholt bd957b1b79 vc4: Hook up VC4_DEBUG=perf to some useful printfs. 2015-04-15 16:50:22 -07:00
Tom Stellard e0994e0f97 radeon/llvm: Improve codegen for KILL_IF
Rather than emitting one kill instruction per component of KILL_IF's src
reg, we now or the components of the src register together and use the
result as a condition for just one kill instruction.

shader-db stats (bonaire):

979 shaders
Totals:
SGPRS: 34872 -> 34848 (-0.07 %)
VGPRS: 20696 -> 20676 (-0.10 %)
Code Size: 749032 -> 748452 (-0.08 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 12288 -> 12288 (0.00 %) bytes per wave

Totals from affected shaders:
SGPRS: 1184 -> 1160 (-2.03 %)
VGPRS: 600 -> 580 (-3.33 %)
Code Size: 13200 -> 12620 (-4.39 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Increases:
SGPRS: 2 (0.00 %)
VGPRS: 0 (0.00 %)
Code Size: 0 (0.00 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)

Decreases:
SGPRS: 5 (0.01 %)
VGPRS: 5 (0.01 %)
Code Size: 25 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)

*** BY PERCENTAGE ***

Max Increase:

SGPRS: 32 -> 40 (25.00 %)
VGPRS: 0 -> 0 (0.00 %)
Code Size: 0 -> 0 (0.00 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Max Decrease:

SGPRS: 32 -> 24 (-25.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 116 -> 96 (-17.24 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

*** BY UNIT ***

Max Increase:

SGPRS: 64 -> 72 (12.50 %)
VGPRS: 0 -> 0 (0.00 %)
Code Size: 0 -> 0 (0.00 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Max Decrease:

SGPRS: 32 -> 24 (-25.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 424 -> 356 (-16.04 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-04-14 13:37:12 +00:00
Tom Stellard c6d79ed289 radeon/llvm: Run LLVM's instruction combining pass
This should improve code quality in general and will help with some
future changes to how we emit kill instructions.

shader-db shows a few regressions, but these don't seem to be the result
of deficiencies in instcombine.  They're mostly caused by the scheduler
making different decisions than before.

shader-db stats (bonaire):

979 shaders
Totals:
SGPRS: 35056 -> 34872 (-0.52 %)
VGPRS: 20624 -> 20696 (0.35 %)
Code Size: 764372 -> 749032 (-2.01 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 12288 -> 12288 (0.00 %) bytes per wave

Totals from affected shaders:
SGPRS: 13264 -> 13072 (-1.45 %)
VGPRS: 8248 -> 8316 (0.82 %)
Code Size: 486320 -> 470992 (-3.15 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 11264 -> 11264 (0.00 %) bytes per wave

Increases:
SGPRS: 6 (0.01 %)
VGPRS: 20 (0.02 %)
Code Size: 14 (0.01 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)

Decreases:
SGPRS: 32 (0.03 %)
VGPRS: 8 (0.01 %)
Code Size: 244 (0.25 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)

*** BY PERCENTAGE ***

Max Increase:

SGPRS: 32 -> 48 (50.00 %)
VGPRS: 12 -> 20 (66.67 %)
Code Size: 216 -> 224 (3.70 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Max Decrease:

SGPRS: 40 -> 32 (-20.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 368 -> 280 (-23.91 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

*** BY UNIT ***

Max Increase:

SGPRS: 32 -> 48 (50.00 %)
VGPRS: 28 -> 36 (28.57 %)
Code Size: 39320 -> 40132 (2.07 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Max Decrease:

SGPRS: 72 -> 64 (-11.11 %)
VGPRS: 48 -> 40 (-16.67 %)
Code Size: 6272 -> 5852 (-6.70 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-04-14 13:37:05 +00:00
Tom Stellard 2569c7109d radeonsi: Add header and footer to shader stat dump
This makes it easier to parse.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-04-14 13:36:59 +00:00
Eric Anholt 1be329e64c vc4: Add a blitter path using just the render thread.
This accelerates the path for generating the shadow tiled texture when
asked to sample from a raster texture (typical in glamor).
2015-04-13 23:20:46 -07:00
Eric Anholt 76d56752cc vc4: Allow submitting jobs with no bin CL in validation.
For blitting, we want to fire off an RCL-only job.  This takes a bit of
tweaking in our validation and the simulator support (and corresponding
new code in the kernel).
2015-04-13 23:20:45 -07:00
Eric Anholt 43b20795b7 vc4: Move the blit code to a separate file.
There will be other blit code showing up, and it seems like the place
you'd look.
2015-04-13 23:20:45 -07:00
Eric Anholt e214a59635 vc4: Separate out a bit of code for submitting jobs to the kernel.
I want to be able to have multiple jobs being set up at the same time (for
example, a render job to do a little fixup blit in the course of doing a
render to the main FBO).
2015-04-13 23:20:45 -07:00
Eric Anholt 44b63cf5c0 vc4: When asked to sample from a raster texture, make a shadow tiled copy.
So, it turns out my simulator doesn't *quite* match the hardware.  And the
errata about raster textures tells you most of what's wrong, but there's
still stuff wrong after that.  Instead, if we're asked to sample from
raster, we'll just blit it to a tiled temporary.

Raster textures should only be screen scanout, and word is that it's
faster to copy to tiled using the tiling engine first than to texture from
an entire raster texture, anyway.
2015-04-13 22:34:06 -07:00
Eric Anholt d04b07f8e2 vc4: Fix off-by-one in branch target validation. 2015-04-13 22:34:06 -07:00
Eric Anholt 7fa2f2e366 vc4: Use NIR-level lowering for idiv.
This fixes the idiv tests in piglit.
2015-04-13 21:36:40 -07:00
Eric Anholt 84ebaff1b7 vc4: Add a bunch of type conversions.
These are required to get piglit's idiv tests working.  The
unsigned<->float conversions are wrong, but are good enough to get
piglit's small ranges of values working.
2015-04-13 21:36:40 -07:00
Eric Anholt adae027260 vc4: Use the blit interface for updating shadow textures.
This lets us plug in a better blit implementation and have it impact the
shadow update, too.
2015-04-13 10:39:24 -07:00
Eric Anholt 39b6f7e76c vc4: Remove dead fields from vc4_surface. 2015-04-13 10:39:24 -07:00
Eric Anholt 5100221ff7 vc4: Skip sending down the clear colors if not clearing. 2015-04-13 10:39:24 -07:00
Eric Anholt 725620f21d vc4: Sync with kernel changes to relax BCL versus RCL validation.
There was no reason to tie the two packets' values together.
2015-04-13 10:39:23 -07:00
Eric Anholt cb88d2cfcb vc4: Fix another space allocation mistake.
We're over-allocating our BCL in vc4_draw.c, so this never mattered.
However, new RCL-only blit support might end up here without having set up
any BCL contents.
2015-04-13 10:39:02 -07:00
Eric Anholt 8eb9304ee7 vc4: Add missed accounting for the size of the semaphore.
This wouldn't have mattered except in the worst case scenario RCL setup.
2015-04-13 10:33:30 -07:00
Rob Clark b98c0262d1 freedreno/ir3/nir: couple little fixes
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:41:03 -04:00
Rob Clark 1b936bb9f8 freedreno/ir3/nir: handle system values
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:40:57 -04:00
Rob Clark 715b2e0dbb freedreno/ir3/nir: handle txs and query_levels tex ops
These correspond to the tgsi TXQ opcode

(plus sneak in a fix for two-sided color)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:40:43 -04:00
Rob Clark 97e8fc3fdd freedreno/ir3/nir: split out tex helpers
We'll need these in one or two other spots.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:40:36 -04:00
Rob Clark 6e8160d6e3 freedreno/ir3/nir: simplify emit_tex()
Just build up arrays for src0/src1, and use create_collect()..

Also add back missing .3d flag for 3d/cube textures.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:40:28 -04:00
Rob Clark d5357c16cc freedreno/ir3/cp: handle indirect properly
I noticed some cases where we where trying to copy-propagate indirect
src's into places they cannot go, like 2nd src for cat3 (mad, etc).
Expand out valid_flags() to be aware of relativ flag, and fix up a few
related spots.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:40:21 -04:00
Rob Clark 49be76166b freedreno/ir3/sched: avoid getting stuck on addr conflicts
When we get in a scenario where we cannot schedule any more instructions
due to address register conflict, clone the instruction that writes the
address register, and switch the remaining unscheduled users for the
current address register over to the new clone.

This is simpler and more robust than the previous attempt (which tried
and sometimes failed to ensure all other dependencies of users of the
address register were scheduled first).. hint it would try to schedule
instructions that were not actually needed for any output value.

We probably need to do the same with predicate register, although so far
it isn't so heavily used so we aren't running into problems with it
(yet).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:40:15 -04:00
Rob Clark 4cf4006674 freedreno/ir3/nir: add variable-indexing support
A bit fugly.. try and make this cleaner..  note if we hoist all the
get_addr() out of the loop we can drop the hashtable and just use
create_addr()..

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:40:09 -04:00
Rob Clark 972ce757d7 freedreno/ir3/asm: change assert to warning
It probably *should* be an assert, but for now TGSI f/e isn't very good
about dealing w/ CONST vs ABS/NEG.  So for debug builds, print a warning
instead of crashing with an assert for now.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:40:03 -04:00
Rob Clark 09cbd97a47 freedreno/ir3/nir: set first_driver_param
Without this, a3xx breaks.. a4xx would too if it had already implemented
support for passing driver params.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:39:56 -04:00
Rob Clark f0e9a632a1 freedreno/ir3/cp: support to swap mad src's
For a normal MAD (ie. not MADSH), if first source is gpr and second
source is const, we can swap the first two sources to avoid needing a
mov instruction.

This gives back the biggest advantage TGSI f/e had over NIR f/e for
common shaders, since TGSI f/e had this logic in the f/e.  Note that
doing this in copy-prop step has the advantage that it will also work
for cases like:

   MOV TEMP[b], CONST[x]
   MAD TEMP[d], TEMP[a], TEMP[b], TEMP[c]

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:39:46 -04:00
Roland Scheidegger 586536a4e1 gallivm: don't use control flow when doing indirect constant buffer lookups
llvm goes crazy when doing that, using way more memory and time, though there's
probably more to it - this points to a very much similar issue as fixed in
8a9f5ecdb1. In any case I've seen a quite
plain looking vertex shader with just ~50 simple tgsi instructions (but with a
dozen or so such indirect constant buffer lookups) go from a terribly high
~440ms compile time (consuming 25MB of memory in the process) down to a still
awful ~230ms and 13MB with this fix (with llvm 3.3), so there's still obvious
improvements possible (but I have no clue why it's so slow...).
The resulting shader is most likely also faster (certainly seemed so though
I don't have any hard numbers as it may have been influenced by compile times)
since generally fetching constants outside the buffer range is most likely an
app error (that is we expect all indices to be valid).
It is possible this fixes some mysterious vertex shader slowdowns we've seen
ever since we are conforming to newer apis at least partially (the main draw
loop also has similar looking conditionals which we probably could do without -
if not for the fetch at least for the additional elts condition.)

v2: use static vars for the fake bufs, minor code cleanups

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-04-09 01:32:30 +02:00
Glenn Kennard f2947807c8 r600g/sb: Enable SB for geometry shaders
Add SV_GEOMETRY_EMIT special variable type to track the
implicit dependencies between CUT/EMIT_VERTEX/MEM_RING
instructions so GCM/scheduler doesn't reorder them.

Mark emit instructions as unkillable so DCE doesn't eat them.

Enable only for evergreen/cayman as there are a few
unexplained GS piglit regressions on R6xx/R7xx with SB
enabled otherwise.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-04-08 08:18:35 +10:00
Glenn Kennard 06bb68da4a r600g/sb: Update last_cf for loops
CF_END could end up emitted in the middle of a shader on cayman
when there was a loop at the very end.

Fixes glsl-1.50-geometry-end-primitive and
ext_transform_feedback-geometry-shaders-basic piglit tests.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-04-08 08:18:17 +10:00
Ilia Mirkin ae720c66cb nv50,nvc0: limit the y-tiling of 3d textures to the first level's tiling
We limit y-tiling to 0x20 when depth is involved. However the function is
run for each miplevel, and the hardware expects miplevel 0 to have the
highest tiling settings. Perform the y-tiling limit on all levels of a
3d texture, not just the ones that have depth.

Fixes:
  texelFetch fs sampler3D 98x129x1-98x129x9

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Nick Tenney <nick.tenney@gmail.com> # GT216
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
2015-04-06 23:06:55 -04:00
Dave Airlie ad84689f73 r600g: fix op3 abs issue
This code to handle absolute values on op3 srcs was a bit too simple,
it really needs a temp reg per src, not one per channel, make it
easier and let sb clean up the mess.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89831

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-04-07 11:40:16 +10:00
Rob Clark 8b0b81339b freedreno/ir3: add NIR compiler
The NIR compiler frontend is an alternative to the TGSI f/e, producing
the same ir3 IR and using the same backend passes for scheduling, etc.

It is not enabled by default yet, as there are still some regressions.
To enable, use 'FD_MESA_DEBUG=nir'.  It is enough to use with, for
example, xonotic or supertuxkart.

With the NIR f/e, scalarizing and a number of other lowering steps
happen in NIR, so we don't have to do them in ir3.  Which simplifies the
f/e and allows the lowered instructions to pass through other
optimization stages.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-05 16:36:40 -04:00
Ilia Mirkin 700d949ea1 freedreno/a3xx: don't decode srgb on mem2gmem
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:35 -04:00
Ilia Mirkin b060b56772 freedreno/a3xx: pass sprite coord mode through to program emit
Use the correct sprite replacement depending on the flip of the coord
mode, using either T or 1-T depending on whether we have an upper-left or
lower-left coordinate origin. This fixes all the point sprite piglits.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:35 -04:00
Ilia Mirkin 1de72dfc8a freedreno/a3xx: add UBO support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:35 -04:00
Ilia Mirkin c7811f56c2 freedreno/ir3: insert nop between sfu/mem operations
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:35 -04:00
Ilia Mirkin 14dfd8cc43 freedreno: dirty context when reallocating a bound bo
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:35 -04:00
Ilia Mirkin bde2045fa2 freedreno: keep track of buffer valid ranges
Copies nouveau_buffer and radeon_buffer. This allows a write to proceed
to an uninitialized part of a buffer even when the GPU is using the
previously-initialized portions.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:35 -04:00
Ilia Mirkin dacf22e0a3 freedreno: mark resources as being read so that writes flush the queue
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:34 -04:00
Ilia Mirkin 2e1445c8f3 freedreno: don't bother setting resource timestamps
Waiting on a bo being ready is handled in fd_bo_cpu_prep. No need to
keep separate timestamps around.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:34 -04:00
Ilia Mirkin 1fee3061d5 freedreno: add a reading flag to indicate gpu is reading rsc
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:34 -04:00
Ilia Mirkin ea0952a9db freedreno: fix resource flushing confusion
A resource flush is an upload of a hypothetically-staging texture to the
GPU. For a UMA system, this will largely be a no-op or
cache-maintenance. Move the render flush logic into transfer_map where
it belongs, and clear out the transfer_flush function.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:34 -04:00
Ilia Mirkin bfb0a8eb69 freedreno: remove tex_resource
pipe_sampler_view already contains a texture, remove the redundant
tex_resource member which pointed at the same thing.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:34 -04:00
Rob Clark 6cd9c94ce4 freedreno/ir3: handle FRAG IN's without interpolation specified
Fallback to picking based on semantic name.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-05 16:36:34 -04:00
Rob Clark f513f006ce freedreno/ir3/cmdline: add @const headers for immediates
Since NIR f/e currently encodes immediates in instructions (rather than
passing via const), we need to ensure that when const's are used the get
initialized to the proper values.  Otherwise comparing NIR to TGSI
compiler, it will use proper immediate values in one case, and randomly
initialize values in the other.  Which confuses ir3test.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-05 16:36:34 -04:00
Rob Clark 6bc12bb5fd freedreno/ir3/cmdline: remove hack for old compiler
Since we dropped the old compiler, we don't need this hack anymore.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-05 16:36:34 -04:00