Commit Graph

13958 Commits

Author SHA1 Message Date
Rob Clark fcc7d6323b freedreno: enable a306
Whitelist adreno 306 (as found in msm8916/apq8016).  Works pretty much
out of the box, although the smaller GMEM size requires more tiles to
fit 1920x1080, so bump up the max # of tiles as well.

Since it is just whitelist + trivial change, it makes sense to land on
all the active release branches.

Note that a305c ends up with gpu-id "306", hence a306 ends up with
gpu-id of "307".  Apparently that is what happens when you let the
marketing dept name things.

Cc: "10.4" and "10.5" and "10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-05-14 14:46:14 -04:00
Samuel Pitoiset 175cbb447a nvc0: remove unused nv50_tsc_wrap_mode() function
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-14 13:27:44 -04:00
Samuel Pitoiset ac1ac94b38 nv50/ir: silence compiler warnings about mismatched tags
These warnings have been detected by Clang 3.6.

codegen/nv50_ir_from_tgsi.cpp:1319:10: warning: struct 'Source' was
previously declared as a class [-Wmismatched-tags] const struct tgsi::Source *code;

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-14 13:27:44 -04:00
Samuel Pitoiset 70651b7041 nv50/ir: remove unused private field cycle to SchedDataCalculator
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-14 13:27:43 -04:00
Samuel Pitoiset 7469f2fd23 nv30: remove unused nvfx_fp_memcpy() function and comment nv40_fp_bra()
The nv40_fp_bra() function in the same file is also unused but this is
the only place where the nv30/nv40 isa is documented.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-14 13:27:43 -04:00
Samuel Pitoiset 48c84a36dd nvc0: do not expose MP counters for nvf0 (GK110+)
This fixes a crash when trying to monitor MP counters because compute
support is not implemented for nvf0.

Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-14 13:27:43 -04:00
Roland Scheidegger adcf8f8a13 softpipe: enable ARB_texture_view
Some bits were already there for texture views but some were missing.
In particular for cube map views things needed to change a bit.
For simplicity I ended up removing the separate face addr bit (just use
the z bit) - cube arrays didn't use it already, so just follow the same
logic there. (In theory using separate bits could allow for better hash
function but I don't think anyone ever did some measurements of that so
probably not worth the trouble, if we'd reintroduce it we'd certainly
wanted to use the same logic for cube arrays and cube maps.)
Also extend the seamless cube sampling to cube arrays - as there were no
piglit failures before this is apparently untested, but things now generally
work quite the same for cube textures and cube array textures so there
hopefully shouldn't be any trouble...

49 new piglits, 47 pass, 2 fail (both due to fake multisampling).

v2: incorporate Brian's feedback, add sampler view validation,
function rename, formatting fixes.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-05-13 22:57:50 +02:00
Roland Scheidegger e6c66f4fb0 llvmpipe: enable ARB_texture_view
All the functionality was pretty much there, just not tested.
Trivially fix up the missing pieces (take target info from view not
resource), and add some missing bits for cubes.
Also add some minimal debug validation to detect uninitialized target values
in the view...

49 new piglits, 47 pass, 2 fail (both related to fake multisampling,
not texture_view itself). No other piglit changes.

v2: move sampler view validation to sampler view creation, update docs.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-05-13 22:57:50 +02:00
Ilia Mirkin c696a318ef nouveau: document nouveau_heap
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-12 18:58:49 -04:00
Ilia Mirkin d06ce2f1df nvc0: switch mechanism for shader eviction to be a while loop
This aligns it to work similarly to nv50. However there's no library
code there, so the whole thing can be freed. Here we end up with an
allocated node that's not attached to a specific program.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86792
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-05-12 18:47:17 -04:00
Marek Olšák 79ffc08ae8 gallium: add PIPE_CAP_DEVICE_RESET_STATUS_QUERY
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-05-12 19:38:31 +02:00
Dave Airlie 9ab90c058f r600: use pipe->hw prim convert from radeonsi
This avoids future addition to PIPE_PRIM_ from causing regressions
on r600g.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-05-11 06:43:18 +10:00
Rob Clark 1cbdafc47a freedreno/ir3/nir: fix build break after f752effa
Our lower if/else pass was missed when converting NIR to use linked
lists rather than hashsets to track use/def sets.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-05-10 06:03:53 -04:00
Ilia Mirkin da136dc07d nv50/ir: only enable mul saturate on G200+
Commit 44673512a8 enabled support for saturating fmul. However
experimentally this does not seem to work on the older chips. Restrict
the feature to G200 (NVA0) and later.

Reported-by: Pierre Moreau <pierre.morrow@free.fr>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90350
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Cc: mesa-stable@lists.freedesktop.org
2015-05-09 13:41:51 -04:00
Ilia Mirkin 7892210400 nvc0: reset the instanced elements state when doing blit using 3d engine
Since we update num_vtxelts here, we could otherwise end up with stale
instancing information in the upper bits which wouldn't otherwise get
reset. (Also we run the risk of the previous draw having set the first
element as instanced.)

This appears as one of the causes for the test pointed out in fdo#90363
to fail on nvc0.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90363
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-05-09 13:36:23 -04:00
Ilia Mirkin e9b1ea29bf nvc0: keep track of PGRAPH state in nvc0_screen
See identical commit for nv50. Destroying the current context and then
creating a new one or switching to another existing context would cause
the "current" state to not be properly initialized, so we save it off in
the screen.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-05-09 13:36:23 -04:00
Ilia Mirkin f617029db3 nv50: keep track of PGRAPH state in nv50_screen
Normally this is kept in nv50_context, and on switching the active
context, the state is copied from the previous context. However when the
last context is destroyed, this is lost, and a new context might later
be created. When the currently-active context is destroyed, save its
state in the screen, and restore it when setting the current context.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90363
Reported-by: Matteo Bruni <matteo.mystral@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Matteo Bruni <matteo.mystral@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2015-05-09 13:36:23 -04:00
Jason Ekstrand 7a30668ad6 util: Move gallium's linked list to util
The linked list in gallium is pretty much the kernel list and we would like
to have a C-based linked list for all of mesa.  Let's not duplicate and
just steal the gallium one.

Acked-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2015-05-08 17:16:13 -07:00
Ilia Mirkin c4ac09e30e nv50/ir: only propagate saturate up if some actual folding took place
The former logic would copy the saturate up to any mul with an immediate
if there was a subsequent mul with a saturate. However we only want to
do that if we collapsed 2 muls by multiplying their immediates (or were
able to put the immediate in as a post-multiplier).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-05-08 18:56:56 -04:00
Ilia Mirkin 55b66dc4de nv50/ir: add SHL to the list of U32 opcodes
Having the wrong inferred type prevents a number of optimizations,
including constant propagation (since float immediates work differently
than integer immediates).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-05-06 20:50:03 -04:00
Vinson Lee 382b1a36e3 r600g: Fix Clang return-type build error.
Fix Clang return-type error introduced with commit
96f164f6f0 "gallium: make
pipe_context::begin_query return a boolean".

  CC       r600_query.lo
r600_query.c:443:3: error: non-void function 'r600_begin_query' should return a value [-Wreturn-type]
                return;
                ^

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-05-06 12:21:34 -07:00
Chia-I Wu ef5d4bcc3a ilo: silence a compiler warning
Silence

  ilo_query.c:120:7: warning: 'return' with no value, in function returning non-void

since commit 96f164f6.
2015-05-06 16:35:30 +08:00
Samuel Pitoiset cea910bc28 nvc0: all queries use an unsigned 64-bits integer by default
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Martin Peres <martin.peres@free.fr>
2015-05-06 00:03:36 +03:00
Samuel Pitoiset 35a9286be6 nvc0: make begin_query return false when all MP counters are used
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Martin Peres <martin.peres@free.fr>
2015-05-06 00:03:36 +03:00
Samuel Pitoiset ed7d3886cc nvc0: define driver-specific query groups
This patch defines "Driver statistics" and "MP counters" groups, but
only the latter will be exposed through GL_AMD_performance_monitor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
2015-05-06 00:03:36 +03:00
Samuel Pitoiset 96f164f6f0 gallium: make pipe_context::begin_query return a boolean
GL_AMD_performance_monitor must return an error when a monitoring
session cannot be started.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
2015-05-06 00:03:36 +03:00
Samuel Pitoiset 546ec980f8 gallium: replace pipe_driver_query_info::max_value by a union
This allows queries to return different numeric types.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
2015-05-06 00:03:35 +03:00
Samuel Pitoiset b620829b5e gallium: add new fields to pipe_driver_query_info
According to the spec of GL_AMD_performance_monitor, valid type values
returned are UNSIGNED_INT, UNSIGNED_INT64_AMD, PERCENTAGE_AMD, FLOAT.
This also introduces the new field group_id in order to categorize
queries into groups.

v2: add PIPE_DRIVER_QUERY_TYPE_BYTES

v3: fix incorrect query type for radeon and svga drivers

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
2015-05-06 00:03:35 +03:00
Chia-I Wu 4348046a2f ilo: use ilo_image exclusively in core
Initialize ilo_view_surface and ilo_zs_surface from ilo_image instead of
ilo_texture.
2015-05-02 22:28:31 +08:00
Chia-I Wu 9b705ec32d ilo: add ilo_image_can_enable_aux()
It replaces ilo_texture_can_enable_hiz().
2015-05-02 22:14:07 +08:00
Chia-I Wu 430594c34f ilo: make ilo_image more self-contained
Add depth0, sample_count, and scanout to ilo_image.
2015-05-02 22:14:06 +08:00
Chia-I Wu f6ca4084c7 ilo: add ilo_image_init_for_imported()
It replaces ilo_image_update_for_imported_bo() and enables more error
checkings for imported textures.
2015-05-02 22:14:06 +08:00
Chia-I Wu 938c9b8cea ilo: prepare for image init for imported bo
Refactoring in prepraration for ilo_image_init_for_imported().
2015-05-02 22:14:06 +08:00
Chia-I Wu 3f9415077b ilo: constify ilo_image_params
Make ilo_image_params const in functions that do not modify it.
2015-05-02 22:14:06 +08:00
Chia-I Wu c209aa7a8f ilo: improve readability of ilo_image
Improve docs, rename struct fields, and reorder walk types.  No real changes.
2015-05-02 22:14:06 +08:00
Chia-I Wu 9b72bf5bd2 ilo: move command builder to core 2015-05-02 22:14:06 +08:00
Chia-I Wu 9e24c49e64 ilo: move ilo_state_3d* to core
ilo state structs (struct ilo_xxx_state) are moved as well.
2015-05-02 22:14:06 +08:00
Chia-I Wu 8ab18262c5 ilo: add ilo_buffer.h to core
Rename the original ilo_buffer to ilo_buffer_resource to avoid name conflict.
2015-05-02 22:14:06 +08:00
Chia-I Wu 3afbeb115a ilo: move BOs from ilo_texture to ilo_image
We want to work with ilo_image instead of ilo_texture in core.
2015-05-02 22:14:06 +08:00
Chia-I Wu ac47563cb4 ilo: move ilo_layout.[ch] to core as ilo_image.[ch]
Move files and s/layout/image/.
2015-05-02 22:14:06 +08:00
Chia-I Wu 8252765532 ilo: add ilo_format.[ch] to core
The original ilo_format.[ch] are removed.
2015-05-02 22:14:06 +08:00
Chia-I Wu 9b7080c8b3 ilo: add ilo_fence.h to core
Implement pipe_fence_handle on top of ilo_fence.
2015-05-02 22:14:06 +08:00
Chia-I Wu 2182beb431 ilo: add ilo_dev_init() to core
Move init_dev() from ilo_screen.c to core.
2015-05-02 22:14:06 +08:00
Chia-I Wu 7562f9e907 ilo: rename ilo_dev_info to ilo_dev
With intel_winsys being embedded in it, drop the "_info" suffix.
2015-05-02 22:14:06 +08:00
Chia-I Wu 19351af53d ilo: move intel_winsys to ilo_dev_info
We want to use ilo_dev_info instead of ilo_screen in core.
2015-05-02 22:14:06 +08:00
Chia-I Wu b3197fe5f4 ilo: add ilo_dev.h to core
Move what are remaining in ilo_common.h (that is, ilo_dev_*) to ilo_dev.h.
2015-05-02 22:14:06 +08:00
Chia-I Wu 7bb4fa72c0 ilo: add ilo_debug.[ch] to core
They consist of the debug helpers that used to live in ilo_common.h and
ilo_screen.c.
2015-05-02 22:14:06 +08:00
Chia-I Wu a5797873d0 ilo: add ilo_core.h to core
ilo_core.h includes the common gallium headers that were included in
ilo_common.h.
2015-05-02 22:14:05 +08:00
Chia-I Wu bbe91576b7 ilo: move intel_winsys.h to core
Add a new subdirectory and start moving files that do not depend on
ilo_screen/ilo_context to it.
2015-05-02 22:14:05 +08:00
Ilia Mirkin 33f0d1138d nvc0/ir: fix predicated PFETCH for real
Commit a9d08a250 accidentally didn't make use of the new src1 variable.
Use it.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-04-30 02:02:47 -04:00
Ilia Mirkin db269ae495 nv50/ir: fix asFlow() const helper for OP_JOIN
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-04-29 23:34:30 -04:00
Ilia Mirkin a9d08a250a nvc0/ir: fix predicated PFETCH emission
src1 would contain the predicate, which would get emitted as a register
source by an undiscerning srcId helper. Work around this in the same way
as in emitTEX.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-04-29 23:34:22 -04:00
Ilia Mirkin 515ac907e6 gk110/ir: fix set with a register dest to not auto-set the abs flag
This was causing src0 to always have the absolute value flag set.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2015-04-29 18:03:19 -04:00
Marek Olšák a582b22c63 winsys/radeon: add a private interface for radeon_surface 2015-04-29 21:51:40 +02:00
Marek Olšák dcfbc006b6 winsys/radeon: move radeon_winsys.h to drivers/radeon 2015-04-29 21:51:40 +02:00
Emil Velikov b124dc2b70 r300: do not link against libdrm_intel
Accidentally added since the introduction of the file.

Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-04-29 15:15:19 +01:00
Axel Davy 559342d01d gallium/svga: Remove useless ARRAY_SIZE declaration
This is already declared in util/macros.h

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2015-04-29 08:28:10 +02:00
Axel Davy 64880d073a util/macros: Move DIV_ROUND_UP to util/macros.h
Move DIV_ROUND_UP to a shared location accessible everywhere

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
2015-04-29 08:28:10 +02:00
Ilia Mirkin 6fe0d4f035 nvc0/ir: flush denorms to zero in non-compute shaders
This will set the FTZ flag (flush denorms to zero) on all opcodes that
can take it.

This resolves issues in Unigine Heaven 4.0 where there were solid-filled
boxes popping up.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89455
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-28 20:17:03 -04:00
Ilia Mirkin e312a69958 nvc0: expose GLSL version 410
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-28 12:48:22 -04:00
Marek Olšák 6d05396b00 r600g,radeonsi: add a driver query returning GPU load
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-04-28 16:05:45 +02:00
Marek Olšák 0b8e73a6ae r600g,radeonsi: add driver queries for GPU temperature and shader+memory clocks
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-04-28 16:05:45 +02:00
Ilia Mirkin 9143940da2 gm107/ir: add lane/vertex count sysvals
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-27 21:25:29 -04:00
Ilia Mirkin 89e0b08794 gk110/ir: add support for writing per-patch and shader outputs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-27 21:25:28 -04:00
Ilia Mirkin 52614f59b7 freedreno/a3xx: color masking works like a blend for some formats
When there is a colormask active that does not cover all the channels,
enable reading in the destination like with a combining blend
operation. This fixes fbo-blending-formats on a3xx.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-27 20:17:07 -04:00
Ilia Mirkin 9fc3f47278 freedreno/a3xx: add support for S8 and Z32F_S8
Enables ARB_depth_buffer_float. There is no sampling support for
interleaved Z32F_S8, so we store the two textures separately, one as
Z32F, the other as S8. As a result, we need a lot of additional logic
for restores and transfers.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-27 20:17:07 -04:00
Ilia Mirkin 1571da6ac3 freedreno/a3xx: add Z32F support
32-bit depth buffers are stored as unorm, and thus need special handling
when moving to and from gmem. They are copied into gmem by writing
depth, and resolved from gmem using a special resolve bit which
apparently float-ifies the data.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-27 20:17:07 -04:00
Ilia Mirkin 0a4cb00c77 freedreno: add fd_transfer to wrap around pipe_transfer
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-27 20:17:07 -04:00
Ilia Mirkin f5c1101996 freedreno/a3xx: add support for disabling depth clipping
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-27 20:17:07 -04:00
Zoë Blade 05e7f7f438 Fix a few typos
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-04-27 17:28:29 +03:00
Marek Olšák db2415189a radeonsi: set an optimal value for DB_Z_INFO.ZRANGE_PRECISION
Required because of a VI hw bug.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-04-27 15:57:07 +02:00
Marek Olšák bed98eef9a radeonsi: remove deprecated and useless registers
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-04-27 15:56:27 +02:00
Marek Olšák 393b0e0531 radeonsi: remove useless includes
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-04-27 15:56:27 +02:00
Marek Olšák d8269be1ce gallium/radeon: print winsys info with R600_DEBUG=info
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-04-27 15:56:27 +02:00
Marek Olšák ecc7f2ed91 gallium/radeon: don't crash when getting out-of-bounds TEMP references
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-04-23 16:14:39 +02:00
Dave Airlie 8a41cd2407 softpipe: fix stencil write to use an integer value
This fixes a number of regressions since
61393bdcdc
u_tile: fix stencil texturing tests under softpipe

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89960
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-04-23 08:32:30 +10:00
Rob Clark cb24d3b7ad freedreno: misc minor cleanups
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark 1b58d8c2bf freedreno/a4xx: (partial) gl_FragCoord.zw
The bit to enable .z is still commented out, as it is triggering gpu
hangs in 0ad.  But at least gl_FragCoord.w works now, and we know what
bits we are *supposed* to set for .z (with that uncommented all piglit
fragcoord tests are passing).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark a869183123 freedreno/a4xx: primitive-restart
This was the missing bit to get dolphin-emu working on a4xx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark 632ea2a113 freedreno/nir: sysval fixes
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark 13527df143 freedreno/a4xx: wire up integer texture sampling
Similar to a3xx, the compiler needs to know the return type of the sam,
etc, instructions.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark 48a651e98c freedreno/a4xx: formats updates/fixes
Update formats table with new formats that Ilia has figured out, and fix
sampling from srgb texture and integer vbo's.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark 21ceedfd8b freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:27 -04:00
Emil Velikov 86919352e3 android: use LOCAL_SHARED_LIBRARIES over TARGET_OUT_HEADERS
... to manage the LIBDRM*_CFLAGS. The former is the recommended approach
by the Android build system developers while the latter has been
depreciated for quite some time.

Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-04-22 14:23:28 +01:00
Emil Velikov 413bc0a618 ilo: remove unused include from Android.mk
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
2015-04-22 14:18:47 +01:00
Ilia Mirkin 0904774af1 freedreno/a3xx: enable polymode setting with non-fill modes
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-18 17:35:23 -04:00
Ilia Mirkin 6357601628 freedreno/a3xx: fix integer and 32-bit float border colors
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-18 17:35:23 -04:00
Ilia Mirkin 6895c3554e freedreno/a3xx: add support for float R/RG render targets
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-18 17:35:23 -04:00
Rob Clark 95e68adcd9 freedreno/ir3/nir: few little fixes
isaml needs to scale up coords based on LoD.  Also fix bogus bary.f
varying # when there are non-bary frag shader inputs.  And use sub.s of
a positive immediate rather than add.s of negative (since CP is better
about figuring out that those can be collapsed into the cat2 instr).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 11:40:14 -04:00
Rob Clark efbf14e893 freedreno/ir3/nir: lower if/else
For now, completely flatten if/else blocks.  That will almost certainly
change once we have flow control.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 11:40:14 -04:00
Rob Clark e5e11b5baf freedreno/a4xx: support for large shaders
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 10:40:50 -04:00
Rob Clark 20ea698c49 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 10:40:44 -04:00
Rob Clark 57f0d3b3c6 freedreno/ir3/nir: UBO support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 10:40:36 -04:00
Rob Clark 87807e5cc5 freedreno/ir3: move out helper
We'll also want it in NIR f/e for implementing UBO support.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 10:40:28 -04:00
Rob Clark 70b2f872ea freedreno/a4xx: sysvals and UBOs
Basically just sync up the cmdstream emit parts to match the changes
already done on a3xx.

Also, fix scheduling for mem instructions.  This is needed on a4xx, and
I am a bit surprised it isn't needed for a3xx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 10:40:18 -04:00
Marek Olšák b79c620663 radeonsi: add a debug option to compile shaders when they're created
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2015-04-16 18:36:29 +02:00
Emil Velikov a7d018accf radeonsi: remove bogus r600-- triple
As mentioned by Michel Dänzer for LLVM >= 3.6 we create the
LLVMTargetMachine (with triple amdgcn--), as we setup the radeonsi
context. For older LLVM or hardware (r600) the triple is always r600--
and is created at a later stage - radeon_llvm_compile()

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-04-16 14:15:19 +01:00
Glenn Kennard 17d69862a9 r600g/sb: Skip empty ALU clause while scheduling
Fixes assert triggered by
ext_transform_feedback-intervening-read output use_gs
piglit test.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-04-16 12:43:20 +10:00
Eric Anholt b229e6c7de vc4: Don't try to use color load/stores to blit across format changes.
We could potentially support the right combination of 8888 to 565, but the
important thing for now is to not mix up our orderings of 8888.  Fixes
fbo-copyteximage regressions.
2015-04-15 16:50:23 -07:00
Eric Anholt cff2e08c4c vc4: Don't try to use color load/stores to do depth/stencil blits.
Fixes regressions in fbo-generatemipmap-formats on depth/stencil (which
does blits to work around baselevel/lastlevel).
2015-04-15 16:50:23 -07:00
Eric Anholt 3a728d4dfb vc4: Update the shadow texture for public textures on every draw.
We don't know who else has written to it, so we'd better update it every
time.  This makes the gears spin in X again.
2015-04-15 16:50:23 -07:00
Eric Anholt bd957b1b79 vc4: Hook up VC4_DEBUG=perf to some useful printfs. 2015-04-15 16:50:22 -07:00
Tom Stellard e0994e0f97 radeon/llvm: Improve codegen for KILL_IF
Rather than emitting one kill instruction per component of KILL_IF's src
reg, we now or the components of the src register together and use the
result as a condition for just one kill instruction.

shader-db stats (bonaire):

979 shaders
Totals:
SGPRS: 34872 -> 34848 (-0.07 %)
VGPRS: 20696 -> 20676 (-0.10 %)
Code Size: 749032 -> 748452 (-0.08 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 12288 -> 12288 (0.00 %) bytes per wave

Totals from affected shaders:
SGPRS: 1184 -> 1160 (-2.03 %)
VGPRS: 600 -> 580 (-3.33 %)
Code Size: 13200 -> 12620 (-4.39 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Increases:
SGPRS: 2 (0.00 %)
VGPRS: 0 (0.00 %)
Code Size: 0 (0.00 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)

Decreases:
SGPRS: 5 (0.01 %)
VGPRS: 5 (0.01 %)
Code Size: 25 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)

*** BY PERCENTAGE ***

Max Increase:

SGPRS: 32 -> 40 (25.00 %)
VGPRS: 0 -> 0 (0.00 %)
Code Size: 0 -> 0 (0.00 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Max Decrease:

SGPRS: 32 -> 24 (-25.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 116 -> 96 (-17.24 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

*** BY UNIT ***

Max Increase:

SGPRS: 64 -> 72 (12.50 %)
VGPRS: 0 -> 0 (0.00 %)
Code Size: 0 -> 0 (0.00 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Max Decrease:

SGPRS: 32 -> 24 (-25.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 424 -> 356 (-16.04 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-04-14 13:37:12 +00:00
Tom Stellard c6d79ed289 radeon/llvm: Run LLVM's instruction combining pass
This should improve code quality in general and will help with some
future changes to how we emit kill instructions.

shader-db shows a few regressions, but these don't seem to be the result
of deficiencies in instcombine.  They're mostly caused by the scheduler
making different decisions than before.

shader-db stats (bonaire):

979 shaders
Totals:
SGPRS: 35056 -> 34872 (-0.52 %)
VGPRS: 20624 -> 20696 (0.35 %)
Code Size: 764372 -> 749032 (-2.01 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 12288 -> 12288 (0.00 %) bytes per wave

Totals from affected shaders:
SGPRS: 13264 -> 13072 (-1.45 %)
VGPRS: 8248 -> 8316 (0.82 %)
Code Size: 486320 -> 470992 (-3.15 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 11264 -> 11264 (0.00 %) bytes per wave

Increases:
SGPRS: 6 (0.01 %)
VGPRS: 20 (0.02 %)
Code Size: 14 (0.01 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)

Decreases:
SGPRS: 32 (0.03 %)
VGPRS: 8 (0.01 %)
Code Size: 244 (0.25 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)

*** BY PERCENTAGE ***

Max Increase:

SGPRS: 32 -> 48 (50.00 %)
VGPRS: 12 -> 20 (66.67 %)
Code Size: 216 -> 224 (3.70 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Max Decrease:

SGPRS: 40 -> 32 (-20.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 368 -> 280 (-23.91 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

*** BY UNIT ***

Max Increase:

SGPRS: 32 -> 48 (50.00 %)
VGPRS: 28 -> 36 (28.57 %)
Code Size: 39320 -> 40132 (2.07 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Max Decrease:

SGPRS: 72 -> 64 (-11.11 %)
VGPRS: 48 -> 40 (-16.67 %)
Code Size: 6272 -> 5852 (-6.70 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-04-14 13:37:05 +00:00
Tom Stellard 2569c7109d radeonsi: Add header and footer to shader stat dump
This makes it easier to parse.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-04-14 13:36:59 +00:00
Eric Anholt 1be329e64c vc4: Add a blitter path using just the render thread.
This accelerates the path for generating the shadow tiled texture when
asked to sample from a raster texture (typical in glamor).
2015-04-13 23:20:46 -07:00
Eric Anholt 76d56752cc vc4: Allow submitting jobs with no bin CL in validation.
For blitting, we want to fire off an RCL-only job.  This takes a bit of
tweaking in our validation and the simulator support (and corresponding
new code in the kernel).
2015-04-13 23:20:45 -07:00
Eric Anholt 43b20795b7 vc4: Move the blit code to a separate file.
There will be other blit code showing up, and it seems like the place
you'd look.
2015-04-13 23:20:45 -07:00
Eric Anholt e214a59635 vc4: Separate out a bit of code for submitting jobs to the kernel.
I want to be able to have multiple jobs being set up at the same time (for
example, a render job to do a little fixup blit in the course of doing a
render to the main FBO).
2015-04-13 23:20:45 -07:00
Eric Anholt 44b63cf5c0 vc4: When asked to sample from a raster texture, make a shadow tiled copy.
So, it turns out my simulator doesn't *quite* match the hardware.  And the
errata about raster textures tells you most of what's wrong, but there's
still stuff wrong after that.  Instead, if we're asked to sample from
raster, we'll just blit it to a tiled temporary.

Raster textures should only be screen scanout, and word is that it's
faster to copy to tiled using the tiling engine first than to texture from
an entire raster texture, anyway.
2015-04-13 22:34:06 -07:00
Eric Anholt d04b07f8e2 vc4: Fix off-by-one in branch target validation. 2015-04-13 22:34:06 -07:00
Eric Anholt 7fa2f2e366 vc4: Use NIR-level lowering for idiv.
This fixes the idiv tests in piglit.
2015-04-13 21:36:40 -07:00
Eric Anholt 84ebaff1b7 vc4: Add a bunch of type conversions.
These are required to get piglit's idiv tests working.  The
unsigned<->float conversions are wrong, but are good enough to get
piglit's small ranges of values working.
2015-04-13 21:36:40 -07:00
Eric Anholt adae027260 vc4: Use the blit interface for updating shadow textures.
This lets us plug in a better blit implementation and have it impact the
shadow update, too.
2015-04-13 10:39:24 -07:00
Eric Anholt 39b6f7e76c vc4: Remove dead fields from vc4_surface. 2015-04-13 10:39:24 -07:00
Eric Anholt 5100221ff7 vc4: Skip sending down the clear colors if not clearing. 2015-04-13 10:39:24 -07:00
Eric Anholt 725620f21d vc4: Sync with kernel changes to relax BCL versus RCL validation.
There was no reason to tie the two packets' values together.
2015-04-13 10:39:23 -07:00
Eric Anholt cb88d2cfcb vc4: Fix another space allocation mistake.
We're over-allocating our BCL in vc4_draw.c, so this never mattered.
However, new RCL-only blit support might end up here without having set up
any BCL contents.
2015-04-13 10:39:02 -07:00
Eric Anholt 8eb9304ee7 vc4: Add missed accounting for the size of the semaphore.
This wouldn't have mattered except in the worst case scenario RCL setup.
2015-04-13 10:33:30 -07:00
Rob Clark b98c0262d1 freedreno/ir3/nir: couple little fixes
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:41:03 -04:00
Rob Clark 1b936bb9f8 freedreno/ir3/nir: handle system values
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:40:57 -04:00
Rob Clark 715b2e0dbb freedreno/ir3/nir: handle txs and query_levels tex ops
These correspond to the tgsi TXQ opcode

(plus sneak in a fix for two-sided color)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:40:43 -04:00
Rob Clark 97e8fc3fdd freedreno/ir3/nir: split out tex helpers
We'll need these in one or two other spots.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:40:36 -04:00
Rob Clark 6e8160d6e3 freedreno/ir3/nir: simplify emit_tex()
Just build up arrays for src0/src1, and use create_collect()..

Also add back missing .3d flag for 3d/cube textures.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:40:28 -04:00
Rob Clark d5357c16cc freedreno/ir3/cp: handle indirect properly
I noticed some cases where we where trying to copy-propagate indirect
src's into places they cannot go, like 2nd src for cat3 (mad, etc).
Expand out valid_flags() to be aware of relativ flag, and fix up a few
related spots.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:40:21 -04:00
Rob Clark 49be76166b freedreno/ir3/sched: avoid getting stuck on addr conflicts
When we get in a scenario where we cannot schedule any more instructions
due to address register conflict, clone the instruction that writes the
address register, and switch the remaining unscheduled users for the
current address register over to the new clone.

This is simpler and more robust than the previous attempt (which tried
and sometimes failed to ensure all other dependencies of users of the
address register were scheduled first).. hint it would try to schedule
instructions that were not actually needed for any output value.

We probably need to do the same with predicate register, although so far
it isn't so heavily used so we aren't running into problems with it
(yet).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:40:15 -04:00
Rob Clark 4cf4006674 freedreno/ir3/nir: add variable-indexing support
A bit fugly.. try and make this cleaner..  note if we hoist all the
get_addr() out of the loop we can drop the hashtable and just use
create_addr()..

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:40:09 -04:00
Rob Clark 972ce757d7 freedreno/ir3/asm: change assert to warning
It probably *should* be an assert, but for now TGSI f/e isn't very good
about dealing w/ CONST vs ABS/NEG.  So for debug builds, print a warning
instead of crashing with an assert for now.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:40:03 -04:00
Rob Clark 09cbd97a47 freedreno/ir3/nir: set first_driver_param
Without this, a3xx breaks.. a4xx would too if it had already implemented
support for passing driver params.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:39:56 -04:00
Rob Clark f0e9a632a1 freedreno/ir3/cp: support to swap mad src's
For a normal MAD (ie. not MADSH), if first source is gpr and second
source is const, we can swap the first two sources to avoid needing a
mov instruction.

This gives back the biggest advantage TGSI f/e had over NIR f/e for
common shaders, since TGSI f/e had this logic in the f/e.  Note that
doing this in copy-prop step has the advantage that it will also work
for cases like:

   MOV TEMP[b], CONST[x]
   MAD TEMP[d], TEMP[a], TEMP[b], TEMP[c]

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-11 11:39:46 -04:00
Roland Scheidegger 586536a4e1 gallivm: don't use control flow when doing indirect constant buffer lookups
llvm goes crazy when doing that, using way more memory and time, though there's
probably more to it - this points to a very much similar issue as fixed in
8a9f5ecdb1. In any case I've seen a quite
plain looking vertex shader with just ~50 simple tgsi instructions (but with a
dozen or so such indirect constant buffer lookups) go from a terribly high
~440ms compile time (consuming 25MB of memory in the process) down to a still
awful ~230ms and 13MB with this fix (with llvm 3.3), so there's still obvious
improvements possible (but I have no clue why it's so slow...).
The resulting shader is most likely also faster (certainly seemed so though
I don't have any hard numbers as it may have been influenced by compile times)
since generally fetching constants outside the buffer range is most likely an
app error (that is we expect all indices to be valid).
It is possible this fixes some mysterious vertex shader slowdowns we've seen
ever since we are conforming to newer apis at least partially (the main draw
loop also has similar looking conditionals which we probably could do without -
if not for the fetch at least for the additional elts condition.)

v2: use static vars for the fake bufs, minor code cleanups

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-04-09 01:32:30 +02:00
Glenn Kennard f2947807c8 r600g/sb: Enable SB for geometry shaders
Add SV_GEOMETRY_EMIT special variable type to track the
implicit dependencies between CUT/EMIT_VERTEX/MEM_RING
instructions so GCM/scheduler doesn't reorder them.

Mark emit instructions as unkillable so DCE doesn't eat them.

Enable only for evergreen/cayman as there are a few
unexplained GS piglit regressions on R6xx/R7xx with SB
enabled otherwise.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-04-08 08:18:35 +10:00
Glenn Kennard 06bb68da4a r600g/sb: Update last_cf for loops
CF_END could end up emitted in the middle of a shader on cayman
when there was a loop at the very end.

Fixes glsl-1.50-geometry-end-primitive and
ext_transform_feedback-geometry-shaders-basic piglit tests.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-04-08 08:18:17 +10:00
Ilia Mirkin ae720c66cb nv50,nvc0: limit the y-tiling of 3d textures to the first level's tiling
We limit y-tiling to 0x20 when depth is involved. However the function is
run for each miplevel, and the hardware expects miplevel 0 to have the
highest tiling settings. Perform the y-tiling limit on all levels of a
3d texture, not just the ones that have depth.

Fixes:
  texelFetch fs sampler3D 98x129x1-98x129x9

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Nick Tenney <nick.tenney@gmail.com> # GT216
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
2015-04-06 23:06:55 -04:00
Dave Airlie ad84689f73 r600g: fix op3 abs issue
This code to handle absolute values on op3 srcs was a bit too simple,
it really needs a temp reg per src, not one per channel, make it
easier and let sb clean up the mess.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89831

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-04-07 11:40:16 +10:00
Rob Clark 8b0b81339b freedreno/ir3: add NIR compiler
The NIR compiler frontend is an alternative to the TGSI f/e, producing
the same ir3 IR and using the same backend passes for scheduling, etc.

It is not enabled by default yet, as there are still some regressions.
To enable, use 'FD_MESA_DEBUG=nir'.  It is enough to use with, for
example, xonotic or supertuxkart.

With the NIR f/e, scalarizing and a number of other lowering steps
happen in NIR, so we don't have to do them in ir3.  Which simplifies the
f/e and allows the lowered instructions to pass through other
optimization stages.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-05 16:36:40 -04:00
Ilia Mirkin 700d949ea1 freedreno/a3xx: don't decode srgb on mem2gmem
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:35 -04:00
Ilia Mirkin b060b56772 freedreno/a3xx: pass sprite coord mode through to program emit
Use the correct sprite replacement depending on the flip of the coord
mode, using either T or 1-T depending on whether we have an upper-left or
lower-left coordinate origin. This fixes all the point sprite piglits.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:35 -04:00
Ilia Mirkin 1de72dfc8a freedreno/a3xx: add UBO support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:35 -04:00
Ilia Mirkin c7811f56c2 freedreno/ir3: insert nop between sfu/mem operations
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:35 -04:00
Ilia Mirkin 14dfd8cc43 freedreno: dirty context when reallocating a bound bo
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:35 -04:00
Ilia Mirkin bde2045fa2 freedreno: keep track of buffer valid ranges
Copies nouveau_buffer and radeon_buffer. This allows a write to proceed
to an uninitialized part of a buffer even when the GPU is using the
previously-initialized portions.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:35 -04:00
Ilia Mirkin dacf22e0a3 freedreno: mark resources as being read so that writes flush the queue
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:34 -04:00
Ilia Mirkin 2e1445c8f3 freedreno: don't bother setting resource timestamps
Waiting on a bo being ready is handled in fd_bo_cpu_prep. No need to
keep separate timestamps around.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:34 -04:00
Ilia Mirkin 1fee3061d5 freedreno: add a reading flag to indicate gpu is reading rsc
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:34 -04:00
Ilia Mirkin ea0952a9db freedreno: fix resource flushing confusion
A resource flush is an upload of a hypothetically-staging texture to the
GPU. For a UMA system, this will largely be a no-op or
cache-maintenance. Move the render flush logic into transfer_map where
it belongs, and clear out the transfer_flush function.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:34 -04:00
Ilia Mirkin bfb0a8eb69 freedreno: remove tex_resource
pipe_sampler_view already contains a texture, remove the redundant
tex_resource member which pointed at the same thing.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-05 16:36:34 -04:00
Rob Clark 6cd9c94ce4 freedreno/ir3: handle FRAG IN's without interpolation specified
Fallback to picking based on semantic name.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-05 16:36:34 -04:00
Rob Clark f513f006ce freedreno/ir3/cmdline: add @const headers for immediates
Since NIR f/e currently encodes immediates in instructions (rather than
passing via const), we need to ensure that when const's are used the get
initialized to the proper values.  Otherwise comparing NIR to TGSI
compiler, it will use proper immediate values in one case, and randomly
initialize values in the other.  Which confuses ir3test.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-05 16:36:34 -04:00
Rob Clark 6bc12bb5fd freedreno/ir3/cmdline: remove hack for old compiler
Since we dropped the old compiler, we don't need this hack anymore.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-05 16:36:34 -04:00
Rob Clark f370e95421 freedreno/ir3: handle const/immed/abs/neg in cp
Be smarter about propagating copies from const or immed, or with abs/neg
modifiers.  Also, realize that absneg.s and absneg.f are really "fancy"
mov instructions.

This opens up the possibility to remove more copies.  It helps the TGSI
frontend a bit, but will be really needed for the NIR f/e which builds
everything up in SSA form (ie. will *always* insert a mov from const or
immediate).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-05 16:36:34 -04:00
Rob Clark 104713d9f2 freedreno/ir3: split float/int abs/neg
Even though in the end, they map to the same bits, the backend will need
to be able to differentiate float abs/neg vs integer abs/neg.  Rather
than making the backend figure it out based on instruction opcode (which
when combined with mov/absneg instructions, can be awkward), just split
out different flags for each so the frontend can signal it's intentions
more clearly.  Also, since (neg) for bitwise op's is actually a bitwise-
not, split it out into bnot flag.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-05 12:44:01 -04:00
Rob Clark 203f37540a freedreno/ir3: add ir3 builder helpers
Add helpers for constructing SSA forms of instructions.

Only partial cat5/cat6 coverage.. but we can add stuff as needed.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-05 12:44:01 -04:00
Rob Clark b1c9fb9fca freedreno/ir3: fix sam argument order comment
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-05 12:44:01 -04:00
Rob Clark 101142c401 xa: support for drivers which use NIR
We need to pull in libnir.la and it's dependency libglsl_util.la.  Also,
_mesa_error_no_memory() must be defined.

Fortunately with libnir.la (vs pulling in all of libglsl.la) we don't
also need libstdc++.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-05 09:24:17 -04:00
Ilia Mirkin ba353935a3 nv50: allocate more offset space for occlusion queries
Commit 1a170980a0 started writing to q->data[4]/[5] but kept the
per-query space at 16, which meant that in some cases we would write
past the end of the buffer. Rotate by 32, like nvc0 does. This ensures
that we always have 32 bytes in front of us, and the data writes will go
within the allocated space.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89679
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Nick Tenney <nick.tenney@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
2015-04-04 11:30:03 -04:00
Ilia Mirkin 01d3b750b3 nv50/ir: avoid folding immediates into imad operations
Commit 09ee907266 added logic to fold immediates into mad operations,
but the emission code is only there for fmad. Only allow it on float
types.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 18:42:31 -04:00
Ilia Mirkin 603d28f32c nv50/ir: fix imad emission when dst == src2
Commit fb63df2215 added 4-byte mad support, but only supported
emission for floats. Disable it for ints for now.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 18:35:59 -04:00
Eric Anholt a9152376b4 vc4: Add support for nir_iabs.
Tested using the GLSL 1.30 tests for integer abs().  Not currently used,
but it was one of the new opcodes used by robclark's idiv lowering.
2015-04-02 10:32:35 -07:00
Ilia Mirkin 4a3c0e9950 freedreno/a3xx: add MRT support
The hardware only supports 4 MRTs. It should be possible to emulate
support for 8, but doesn't seem worth the trouble.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin 6f4c1976f4 freedreno: convert blit program to array for each number of rts
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin d9992ab35a freedreno: add support for laying out MRTs in gmem
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin 602bc6c88d freedreno: add core infrastructure support for MRTs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin d13803c76f freedreno/ir3: add support for FS_COLOR0_WRITES_ALL_CBUFS property
This will enable the driver to tell which regids to link up to which
MRT outputs.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin f27ec59084 freedreno/a3xx: add independent blend function support
This is needed for MRT support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin 8efa3e340d freedreno: remove alpha key from ir3_shader
This complication is unnecessary and makes MRTs more complicated and
likely to generate tons of variants.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Stéphane Marchesin 70eed78cac i915g: Implement EGL_EXT_image_dma_buf_import
This adds all the plumbing to get EGL_EXT_image_dma_buf_import in
i915g.

Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
2015-04-01 20:13:37 -07:00
Eric Anholt 26261bca21 vc4: Add shader-db dumping of NIR instruction count.
I was previously using temporary disables of VC4 optimization to show the
benefits of improved NIR optimization, but this can get me quick and dirty
numbers for NIR-only improvements without having to add hacks to disable
VC4's code (disabling of which might hide ways that the NIR changes would
hurt actual VC4 codegen).
2015-04-01 10:57:01 -07:00
Eric Anholt 73e2d4837d vc4: Convert to consuming NIR.
NIR brings us better optimization than I would have bothered to write
within the driver, developers sharing future optimization work, and the
ability to share device-specific lowering code that we and other
GLES2-level drivers need.

total uniforms in shared programs: 13421 -> 13422 (0.01%)
uniforms in affected programs:     62 -> 63 (1.61%)
total instructions in shared programs: 39961 -> 39707 (-0.64%)
instructions in affected programs:     15494 -> 15240 (-1.64%)

v2: Add missing imov support, and assert that there are no dest saturates.
v3: Rebase on the target-specific algebraic series.
v4: Rebase on gallium-includes-from-NIR changes in mater.
v5: Rebase on variables being in lists instead of hash tables.
v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which
    I'm not committing)
2015-04-01 10:57:01 -07:00
Eric Anholt 486dcfbbd9 vc4: Tell shader-db how big our UBOs are, if present.
I had regressed them for a while with the NIR work.
2015-04-01 10:57:01 -07:00
Marcin Ślusarz f9e2295560 nouveau: synchronize "scratch runout" destruction with the command stream
When nvc0_push_vbo calls nouveau_scratch_done it does not mean
scratch buffers can be freed immediately. It means "when hardware
advances to this place in the command stream the scratch buffers
can be freed".

To fix it, just postpone scratch runout destruction after current
fence is signalled.

The bug existed for a very long time. Nobody noticed, because
"scratch runout" code path is rarely executed.

Fixes hang at the very beginning of first mission in "Serious Sam 3"
on nve7/gk107. It manifested as:

nouveau E[   PFIFO][0000:01:00.0] read fault at 0x000a9e0000 [PTE] from GR/GPC0/PE_2 on channel 0x007f853000 [Sam3[17056]]

Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-31 22:04:31 +02:00
Tom Stellard 4c53d2acbb radeonsi/compute: Default to the same PIPE_SHADER_CAP values as other shader types v2
v2:
  - Fix typo

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-03-31 15:40:51 +00:00
Leo Liu a714fbacf7 radeon/vce: implement video usability information support
This will help encoding VUI into the bitstream

v2: make backward compatible

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2015-03-31 12:31:58 -04:00
Roland Scheidegger 489866938f llvmpipe: enable ARB_texture_gather
Just announce support for 4 components.
While here also increase the max/min texel offsets (the limit is completely
artificial, was chosen because that's what other hardware did, however there's
other drivers using larger limits).
Over a thousand little piglits skip->pass.

v2: update docs/GL3.txt

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-31 17:23:51 +02:00
Roland Scheidegger 1863ed21ff gallivm: simplify sampler interface
This has got a bit out of control with more and more parameters added.
Worse, whenever something in there changes all callees have to be updated
for that, even though they don't really do much with any parameter in there
except pass it on to the actual sampling function.
Hence simply put almost everything into a struct. Also instead of relying
on some arguments being NULL, be explicit and set this in a key (which is
just reused for function generation for simplicity). (The code still relies
on them being NULL in the end for now.)
Technically there is a minimal functional change here for shadow sampling:
if shadow sampling is done is now determined explicitly by the texture
function (either sample_c or the gl-style tex func inherit this from target)
instead of the static texture state. These two should always match, however.
Otherwise, it should generate all the same code.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-31 17:23:51 +02:00
Eric Anholt 1dcc1ee314 vc4: Drop integer multiplies with 0 to moves of 0.
This cleans up more instructions generated by uniform array indexing
multiplies.

total instructions in shared programs: 39989 -> 39961 (-0.07%)
instructions in affected programs:     896 -> 868 (-3.12%)
2015-03-30 12:57:45 -07:00
Eric Anholt 8c5dcdbccb vc4: Add a constant folding pass.
This cleans up some pointless operations generated by the in-driver mul24
lowering (commonly generated by making a vec4 index for a matrix in a
uniform array).

I could fill in other operations, but pretty much anything else ought to
be getting handled at the NIR level, I think.

total uniforms in shared programs: 13423 -> 13421 (-0.01%)
uniforms in affected programs:     346 -> 344 (-0.58%)
2015-03-30 12:57:45 -07:00
Eric Anholt c519c4d85e vc4: Don't bother masking out the low 24 bits for integer multiplies
The hardware just uses the low 24 lines, saving us an AND to drop the high
bits.

total uniforms in shared programs: 13433 -> 13423 (-0.07%)
uniforms in affected programs:     356 -> 346 (-2.81%)
total instructions in shared programs: 40003 -> 39989 (-0.03%)
instructions in affected programs:     910 -> 896 (-1.54%)
2015-03-30 09:23:39 -07:00
Eric Anholt 5df8bf86fe vc4: Make integer multiply use 24 bits for the low parts.
The hardware uses the low 24 bits in integer multiplies, so we can have
fewer high bits (and so probably drop them more frequently).
2015-03-30 09:23:39 -07:00
Michel Dänzer d64adc3a79 radeonsi: Cache LLVMTargetMachineRef in context instead of in screen
Fixes a crash in genymotion with several threads compiling shaders
concurrently.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89746

Cc: 10.5 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-03-30 15:15:10 +09:00
Ilia Mirkin ee670c9efa freedreno/a3xx: add support for point sprite coordinate replacement
This does not (yet) support different coordinate origins, so the tests
still fail due to fbo flipping.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-28 14:54:41 -04:00
Ilia Mirkin 995f55a6ce freedreno/a3xx: make vs-set point size work
This appears to need the A2XX version of the point list, so select it at
draw time if necessary.

Experimentally, always using the A2XX version causes hangs when PSIZE
isn't actually emitted.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-28 14:54:41 -04:00
Ilia Mirkin 7fc5da8b93 freedreno/a3xx: point size should not be divided by 2
The division is probably a holdover from the days when the fixed point
inline functions generated by headergen were broken.

Also reduce the maximum point size to 4092 (vs 4096), which is what the
blob does.

Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-28 14:54:41 -04:00
Ilia Mirkin 738c8319ac freedreno/a3xx: fix 3d texture layout
The SZ2 field contains the layer size of a lower miplevel. It only
contains 4 bits, which limits the maximum layer size it can describe. In
situations where the next miplevel would be too big, the hardware
appears to keep minifying the size until it hits one of that size.
Unfortunately the hardware's ideas about sizes can differ from
freedreno's which can still lead to issues. Minimize those by stopping
to minify as soon as possible.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
2015-03-28 14:54:41 -04:00
Ilia Mirkin 3735643df3 freedreno/a3xx: LAYERSZ2 appears to have no effect on arrays
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-28 14:54:40 -04:00
Roland Scheidegger b2424fb030 llvmpipe: simplify address calculation for 4x4 blocks
These functions looked quite complicated, even though what they actually did
was trivial (ever since we dropped swizzled rendering). Also drop lookup of
format block per bytes done for each block, and do it once per scene instead.
This improves everybody's favorite "benchmark" by 3% or so, though
lp_rast_shade_quads_all() which calls this shows up still quite high for a
function which does little more than call the jit function.
(This would most likely be much better handled by the jit function itself,
the strides are passed through anyway already, though for being able to
handle layers it would definitely add some complexity.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-28 02:59:42 +01:00
Ilia Mirkin 58030a8f99 nv50/ir/gk110: fix offset flag position for TXD opcode
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-27 19:02:19 -04:00
Ilia Mirkin 49b86007aa nv50/ir: take postFactor into account when doing peephole optimizations
Multiply operations can have a post-factor on them, which other ops
don't support. Only perform the peephole optimizations when there is no
post-factor involved.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89758
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-27 19:02:19 -04:00
Roland Scheidegger 8dad9455ff gallivm: pass jit_context pointer through to sampling
The callbacks used for getting the dynamic texture/sampler state were using
the jit_context from the generated jit function. This works just fine, however
that way it's impossible to generate separate functions for texture sampling,
as will be done in the next commit. Hence, pass this pointer through all
interfaces so it can be passed to a separate function (technically, it would
probably be possible to extract this pointer from the current function instead,
but this feels hacky and would probably require some more hacks if we'd use
real functions instead of inlining all shader functions at some point).
There should be no difference in the generated code for now.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-27 19:25:53 +01:00
Ilia Mirkin 1b87d73a9f gallium/util: remove u_linkage
Does not appear to be used in tree. Coverity spotted some errors in the
bitmask stuff, but the whole thing appears to be unused.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-03-26 21:02:09 -04:00
Eric Anholt 7bc39c8418 vc4: Add a dump-the-surface-contents routine.
This has been useful once again while trying to debug stride issues
between render targets and texturing.
2015-03-24 10:39:12 -07:00
Eric Anholt 7f797e3d17 vc4: Fix pitch alignment of linear textures.
Fixes some non-power-of-two texture rendering when I force ARGB8888 to
raster.
2015-03-24 10:39:12 -07:00
Eric Anholt b3ea377f86 vc4: Write the alignment of level width consistently in validation.
16 / cpp happens to be the same as utile_w on the only raster format
supported (4 bytes per pixel), but simulator/hw source code generally
talks in terms of utiles.
2015-03-24 10:39:12 -07:00
Eric Anholt 8975a09494 vc4: Fix use of a bool as an enum.
The enum compared to was 0, so it worked out, but it sure looked wrong.
2015-03-24 10:39:12 -07:00
Eric Anholt 04605c21f6 vc4: Decide the HW's format before laying out the miptree.
I'm experimenting with a workaround for raster texture misrendering on
hardware, and this lets me look at the format chosen when computing
strides.
2015-03-24 10:39:12 -07:00
Eric Anholt 25d60763d9 vc4: Use our device-specific ioctls for create/mmap.
They don't do anything special for us, but I've been told by kernel
maintainers that relying on dumb for my acceleration-capable buffers
is not OK.
2015-03-24 10:39:12 -07:00
Eric Anholt af3d747194 vc4: Make a new #define for making code conditional on the simulator.
I'd like to compile as much of the device-specific code as possible
when building for simulator, and using if (using_simulator) instead of
ifdefs helps.
2015-03-24 10:39:12 -07:00
Eric Anholt 9bafcf630a vc4: Add some useful debug printfs for miptrees.
I keep rewriting these.
2015-03-24 10:39:12 -07:00
Ilia Mirkin 43277fcd59 Revert "nv50,nvc0: remove bogus 64_FLOAT formats"
This reverts commit 20346808cf.

The conversion is actually done since these are the *B macro variants
and no vtx format is supplied, which makes them go through the translate
module.

This restores the following piglit tests to passing:

  draw-vertices user
  gl-2.0-vertexattribpointer

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-23 20:57:52 -04:00
Giuseppe Bilotta 76039b38f0 gallium: implement get_device_vendor() for existing drivers
The only hackish ones are llvmpipe and softpipe, which currently return
the same string as for get_vendor(), while ideally they should return
the CPU vendor.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-03-23 13:25:34 +00:00
Emil Velikov 7c7954b09d galahad: actually remove the driver
Should have been part of 429a4355259(galahad: remove driver). Seems like
I've erroneously committed the trimmed patch.

Reported-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-03-21 22:35:27 +00:00
Roland Scheidegger e8039208c4 llvmpipe: use global llvm context for PIPE_SUBSYSTEM_EMBEDDED
There's 2 reasons why we'd want to use the global context:
1) There still seems to be one memory "leak" left when using multiple llvm
contexts (it is not a true leak as the memory disappears into some still
addressable pool but nevertheless the memory consumption grows). See
http://cgit.freedesktop.org/~jrfonseca/llvm-jitstress/
2) These contexts get kinda big - even when disposing modules etc. after
compiling a shader the LLVMContext can easily be over 100kB. So when there's
lots of llvm contexts arounds it adds up.

The downside is that at least right now this is absolutely not thread safe,
so this only works safely in environments where multiple pipe contexts are not
used concurrently.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-21 01:52:03 +01:00
Dave Airlie 9d97cd2e3e u_primconvert: add primitive restart support
This add primitive restart support to the prim conversion.

This involves changing the API for the translate functions
as we need to pass the prim restart index and the original
number of indices into the translate functions.

primitive restart is support for quads, quad strips
and polygons.

This deal with the case where the actual number of output
primitives is less than the initially calculated number,
by filling the rest of the output primitives with the restart
index, the other option is to reduce the output prim number,
but that will make the generator code a bit messier.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-03-20 09:46:30 +10:00
Rob Clark aee26d292f freedreno/ir3: fix infinite recursion in sched
One more case we need to handle.  One of the src instructions for the
indirect could also end up being ourself.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-18 10:42:33 -04:00
Rob Clark 62cc003b7d freedreno: fix spelling
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-18 10:42:33 -04:00
Marek Olšák a984abdad3 radeonsi: increase coords array size for radeon_llvm_emit_prepare_cube_coords
radeon_llvm_emit_prepare_cube_coords uses coords[4] in some cases (TXB2 etc.)

Discovered by Coverity. Reported by Ilia Mirkin.

Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-18 12:04:27 +01:00
Emil Velikov 3f94a5afcb r600g: constify r600_shader_tgsi_instruction lists.
Massive list of constant data. Annotate it as such.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-03-17 23:52:39 +00:00
Emil Velikov 63cf2b4448 r600g: kill off r600_shader_tgsi_instruction::{tgsi_opcode,is_op3}
Both of which are no longer used. Use designated initializer to make
things obvious as people add/remove TGSI_OPCODEs.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-03-17 23:52:35 +00:00
Emil Velikov 5e68c6b322 r600g: use the tgsi opcode from parse.FullToken.FullInstruction
... rather than the local one in inst_info->tgsi_opcode.

This will allow us to simplify struct r600_shader_tgsi_instruction.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-03-17 23:52:32 +00:00
Marek Olšák b5f19db976 radeonsi: implement TGSI_OPCODE_BFI (v2)
v2: Don't use the intrinsics, the shader backend can recognize these
    patterns and generates optimal code automatically.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-03-16 14:58:19 +01:00
Marek Olšák d3723c614f radeonsi: add a helper for extracting bitfields from parameters (v2)
This will be used a lot (especially by tessellation).

v2: don't use the bfe intrinsic

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-03-16 14:58:19 +01:00
Marek Olšák dc39413640 radeonsi: move scratch reloc state setup
- move it to its own function
- do it after all states are emitted
- bump SI_MAX_DRAW_CS_DWORDS

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-16 12:54:19 +01:00
Marek Olšák 567c8d7300 radeonsi: don't emit PA_SC_LINE_STIPPLE if not rendering lines
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-16 12:54:19 +01:00
Marek Olšák 1f4bb38264 radeonsi: don't emit PA_SC_LINE_STIPPLE after every rasterizer state change
Do it only when the line stipple state is changed.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-16 12:54:19 +01:00
Marek Olšák f5832f3f9d radeonsi: move PA_SU_SC_MODE_CNTL to rasterizer state
This requires enabling the optional GL provoking vertex behavior for quads.

+ some cosmetic changes, so that the register is set exactly the same as
on r600.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-16 12:54:19 +01:00
Marek Olšák 98a2398222 radeonsi: implement line and polygon smoothing
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-16 12:54:19 +01:00
Marek Olšák 303d23e10d radeonsi: add shader code for smoothing
The fragment shader multiplies the alpha channel with gl_SampleMaskIn.
If blending is enabled, it looks like MSAA.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-16 12:54:19 +01:00
Marek Olšák 4f20a8f278 radeonsi: split sample locations into its own state atom
Sample locations are not updated as often as framebuffers.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-16 12:54:18 +01:00
Marek Olšák f7796a966d radeonsi: add basic code for overrasterization
This will be used for line and polygon smoothing.
This is GCN-only even though it's in shared code.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-16 12:54:18 +01:00
Marek Olšák 1921fa4304 radeonsi: small cleanup in si_shader_selector_key
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-16 12:54:18 +01:00
Marek Olšák 52ff1edc51 radeonsi: simplify accessing alpha pointer in si_llvm_emit_fs_epilogue
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-03-16 12:54:18 +01:00
Marek Olšák 955ebf2890 radeonsi: add support for easy opcodes from ARB_gpu_shader5
I have to use the BFE instrinsics, because BFE is one of the most complex
instructions that can't be matched easily. BFE has 3 conditional branches
and one of them is quite big.

In the isel DAG, lowered BFE has 27 nodes (including leafs).
2015-03-16 12:54:18 +01:00
Marek Olšák 755a2907a3 radeonsi: implement bit-finding opcodes from ARB_gpu_shader5
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-03-16 12:54:18 +01:00
Marek Olšák ca90cde81e radeonsi: implement gl_SampleMaskIn
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-03-16 12:54:18 +01:00
Marek Olšák f9fd0c4a55 radeonsi: add support for SQRT
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-03-16 12:54:18 +01:00
Marek Olšák d73c1c1304 radeonsi: add support for FMA
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
2015-03-16 12:54:18 +01:00
Marek Olšák dfea35666e gallium/radeon: don't use LLVMReadOnlyAttribute for ALU
None of the instructions use a pointer argument.
(+ small cosmetic changes)

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-03-16 12:54:18 +01:00
Marek Olšák 216543ea54 gallium: add FMA and DFMA opcodes (v3)
Needed by ARB_gpu_shader5.

v2: select DMAD for FMA with double precision
v3: add and select DFMA

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-16 12:54:18 +01:00
Rob Clark e92bc6b38e freedreno: update generated headers
Fix a3xx texture layer-size.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
2015-03-15 18:00:19 -04:00
Rob Clark d3fb949c03 freedreno/ir3: remove old compiler
Now that piglit is no longer falling back to old compiler for any tests,
we can remove it.  Hurray \o/

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-15 13:27:03 -04:00
Rob Clark feb858b788 freedreno/ir3: avoid scheduler deadlock
Deadlock can occur if we schedule an address register write, yet some
instructions which depend on that address register value also depend on
other unscheduled instructions that depend on a different address
register value.  To solve this, before scheduling an address register
write, ensure that all the other dependencies of the instructions which
consume this address register are already scheduled.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-15 13:26:56 -04:00
Rob Clark 7208e96bb8 freedreno/ir3: bit of cleanup
Add an array_insert() macro to simplify inserting into dynamically sized
arrays, add a comment, and remove unused prototype inherited from the
original freedreno.git/fdre-a3xx test code, etc.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-15 13:26:44 -04:00
Ilia Mirkin 620e29b748 freedreno: fix slice pitch calculations
For example if width were 65, the first slice would get 96 while the
second would get 32. However the hardware appears to expect the second
pitch to be 64, based on halving the 96 (and aligning up to 32).

This fixes texelFetch piglit tests on a3xx below a certain size. Going
higher they break again, but most likely due to unrelated reasons.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2015-03-13 16:05:16 -04:00
Ilia Mirkin 89b26d5a36 freedreno/a3xx: use the same layer size for all slices
We only program in one layer size per texture, so that means that all
levels must share one size. This makes the piglit test

bin/texelFetch fs sampler2DArray

have the same breakage as its non-array version instead of being
completely off, and makes

bin/ext_texture_array-gen-mipmap

start passing.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
2015-03-13 16:05:16 -04:00
Samuel Pitoiset e5cd42ed9a nvc0: fix wrong max value for driver queries
The maximum value of a Gallium HUD's panel is automatically adjusted
when the current value is greater than the max. If we set the
pipe_query_driver_info::max_value to UINT64_MAX, the maximum value is
never adjusted and this results in a flat line instead of a pretty curve
which is correctly scaled.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-09 20:47:05 -04:00
Alexandre Demers 7a37d5c3a4 r600g: Use R600_MAX_VIEWPORTS instead of 16
Lets define R600_MAX_VIEWPORTS instead of using 16 here and there
in the code when looping through viewports and scissors. It is
easier to understand what this number represents.

v2: Missed a case where R600_MAX_VIEWPORTS should have been used.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-03-09 23:02:05 +01:00
Marek Olšák c939231e72 r300g: fix sRGB->sRGB blits
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
2015-03-09 21:22:22 +01:00
Marek Olšák 9953586af2 r300g: fix a crash when resolving into an sRGB texture
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
2015-03-09 21:03:49 +01:00
Marek Olšák 113601086d r300g: use memset for clearing the shader key 2015-03-09 20:58:32 +01:00
Marek Olšák 4815c187b7 r300g: remove the broken SNORM->UNORM shader lowering pass
Not used anymore.
2015-03-09 20:58:32 +01:00
Marek Olšák 74a757f92f r300g: fix RGTC1 and LATC1 SNORM formats
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
2015-03-09 20:58:32 +01:00
Stefan Dösinger f710b99071 r300g: Fix the ATI1N swizzle (RGTC1 and LATC1)
This fixes the GL_COMPRESSED_RED_RGTC1 part of piglit's rgtc-teximage-01
test as well as the precision part of Wine's 3dc format test (fd.o bug
89156).

The Z component seems to contain a lower precision version of the
result, probably a temporary value from the decompression computation.
The Y and W component contain different data that depends on the input
values as well, but I could not make sense of them (Not that I tried
very hard).

GL_COMPRESSED_SIGNED_RED_RGTC1 still seems to have precision problems in
piglit, and both formats are affected by a compiler bug if they're
sampled by the shader with a swizzle other than .xyzw. Wine uses .xxxx,
which returns random garbage.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89156
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
2015-03-09 20:58:32 +01:00
Tom Stellard 51b43c559f radeonsi: Add additional information to shader dumps
This adds SGPR count, VGPR count, shader size, LDS size, and scratch
usage to shader dumps.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-03-09 13:53:33 +00:00
Tom Stellard bbfa1c3239 radeonsi/compute: Use value from compiler for COMPUTE_PGM_RSRC1.FLOAT_MODE
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-03-09 13:53:33 +00:00
Ilia Mirkin cb3eb43ad6 freedreno/ir3: get the # of miplevels from getinfo
This fixes ARB_texture_query_levels to actually return the desired
value.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
2015-03-09 10:50:39 -04:00
Ilia Mirkin 8ac957a51c freedreno/ir3: fix array count returned by TXQ
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
2015-03-09 10:50:39 -04:00
Ilia Mirkin f3dfe6513c freedreno: move fb state copy after checking for size change
Fixes: 1f3ca56b ("freedreno: use util_copy_framebuffer_state()")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
2015-03-09 10:50:39 -04:00
Rob Clark fd17db6fe5 freedreno: replace glsl130 debug flag with glsl120
Now that relative-dst works, we should never fall back to the old
compiler.  (Which is almost true, other than a couple edge case sched
fails in piglit).

So replace glsl130 flag to force GLSL 130 and integers on a3xx/a4xx with
a glsl120 flag to force GLSL 120 and !integers.

If this commit breaks any game/app/etc use FD_MESA_DEBUG=glsl120 as a
workaround and please let me know.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark 060d349920 freedreno/ir3: relative dst
To simplify RA, assign arrays that are written to first.  Since enough
dependency information is in the graph to preserve order of reads and
writes of array, so all SSA names for the array collapse into one, just
assign the entire thing by array-id.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark b7703212d8 freedreno/ir3: split out array_fanin() helper
We'll need this too for relative dst..

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark 17754b70d7 freedreno/ir3: drop deref nodes
The meta-deref instruction doesn't really do what we need for relative
destination.  Instead, since each instruction can reference at most a
single address value, track the dependency on the address register via
instr->address.  This lets us express the dependency regardless of
whether it is used for dst and/or src.

The foreach_ssa_src{_n} iterator macros now also iterates the address
register so, at least in SSA form, the address register behaves as an
additional virtual src to the instruction.  Which is pretty much what
we want, as far as scheduling/etc.

TODO:
For now, the foreach_src{_n} iterators are unchanged.  We could wrap
the address in an ir3_register and make the foreach_src_{_n} iterators
behave the same way.  But that seems unnecessary at this point, since
we mainly care about the address dependency when in SSA form.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark f8f7548f46 freedreno/ir3: helpful iterator macros
I remembered that we are using c99.. which makes some sugary iterator
macros easier.  So introduce iterator macros to iterate all src
registers and all SSA src instructions.  The _n variants also return
the src #, since there are a handful of places that need this.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark 26b79ac3e4 freedreno/ir3: fix register usage calculations
For cat1 instructions, use reg() as well for relative src, to ensure
proper accounting of register usage.  Also, for relative instructions,
use reg->size rather than reg->wrmask to determine the number of
components read/written.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark 3ecc834e75 freedreno/ir3: couple tweaks for cmdline compiler
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark 0f797f7b7d freedreno/ir3: split up ssa_dst
And a couple other trivial renames, to prepare for relative dst.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Rob Clark 27648efa20 freedreno/ir3: fix failed assert in grouping
Turns out there are scenarios where we need to insert mov's in "front"
of an input.  Triggered by shaders like:

  VERT
  DCL IN[0]
  DCL IN[1]
  DCL OUT[0], POSITION
  DCL OUT[1], GENERIC[9]
  DCL SAMP[0]
  DCL TEMP[0], LOCAL
    0: MOV TEMP[0].xy, IN[1].xyyy
    1: MOV TEMP[0].w, IN[1].wwww
    2: TXF TEMP[0], TEMP[0], SAMP[0], 1D_ARRAY
    3: MOV OUT[1], TEMP[0]
    4: MOV OUT[0], IN[0]
    5: END

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-08 17:42:43 -04:00
Mark Janes b28c037d64 r300g: Fix build, invalid extern "C" around header inclusion.
A previous patch to fix header inclusion within extern "C" neglected
to fix the occurences of this pattern in r300 files.

When the helper to detect this issue was pushed to master, it broke
the build for the r300 driver.  This patch fixes the r300 build.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89477
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-06 22:08:44 -05:00
Mark Janes c4b91a1f5c nouveau: Fix build, invalid extern "C" around header inclusion.
A previous patch to fix header inclusion within extern "C" neglected
to fix the occurences of this pattern in nouveau files.

When the helper to detect this issue was pushed to master, it broke
the build for the nouveau driver.  This patch fixes the nouveau build.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89477
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-06 22:08:11 -05:00
Ilia Mirkin 20346808cf nv50,nvc0: remove bogus 64_FLOAT formats
There is no HW support for these and the VBO pusher doesn't know about
them. No need to, either, since the st will be lowering them to 2x32.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-03-06 22:06:05 -05:00
Chia-I Wu bca6c8572f ilo: clarify valid and preferred tilings
We did it right until the switch to gen_surface_tiling, which has
GEN8_TILING_W.  Generally, GEN8_TILING_W may be valid but not preferred.
2015-03-07 04:32:39 +08:00
Chia-I Wu bf061a3d2e ilo: clean up Gen6 WAs
Add a help function for each WA and make PIPE_CONTROL flags match the WA
descriptions.  Call gen6_wa_pre_pipe_contro() only before PIPE_CONTROLs.
Fix missing gen6_wa_pre_3dstate_vs_toggle() in the rectlist path.
2015-03-07 02:17:54 +08:00
Chia-I Wu ba5670fc50 ilo: add generic ilo_render_3dprimitive()
It replaces gen[6-8]_3dprimitive().
2015-03-07 01:45:52 +08:00
Chia-I Wu 8b2eecfbf8 ilo: add generic ilo_render_pipe_control()
It replaces gen[6-8]_pipe_control() and a direct gen6_PIPE_CONTROL() call in
ilo_render_emit_flush().
2015-03-07 01:40:23 +08:00
Chia-I Wu 35b713ad75 ilo: fix padding of linear sampler views
Should use the temporary variable in the loop instead of layout->bo_height.
2015-03-07 01:38:35 +08:00
Chia-I Wu dda4823844 ilo: do not check for interleaved_samples
interleaved_samples is only zero-initialized when layout_want_mcs() is called.
We should not check for it.  There is also no need to.
2015-03-07 01:38:35 +08:00
Chia-I Wu ebad062e9a ilo: enable L3 cache in MOCS
This enables L3 cache in MOCS almost everywhere.
2015-03-06 04:50:19 +08:00
Chia-I Wu c7d17f8a80 ilo: track if a ilo_view_surface is a scanout
Scanouts require a different cache type.
2015-03-06 04:43:20 +08:00
Chia-I Wu e7c74ef43d ilo: clean up SURFACE_STATE and BINDING_TABLE_STATE
Add ilo_builder_surface_pointer() to replace ilo_builder_surface_write().
Make Gen8+ take a different path in gen6_SURFACE_STATE().
2015-03-06 04:43:20 +08:00
Rob Clark 60096ed906 freedreno/ir3: fix silly typo for binning pass shaders
Was resulting in gl_PointSize write being optimized out, causing
particle system type shaders to hang if hw binning enabled.

Fixes neverball, OGLES2ParticleSystem, etc.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-05 15:36:47 -05:00
Chia-I Wu 4ddd981e40 ilo: add more convenient intel_bo_{ref,unref}()
They both check for NULL and intel_bo_ref() returns the referenced bo.  They
replace intel_bo_{reference,unreference}().
2015-03-06 02:25:03 +08:00
Chia-I Wu 70ef171e91 ilo: add intel_bo_set_tiling()
Make intel_winsys_alloc_bo() always allocate a linear bo, and add
intel_bo_set_tiling() to set the tiling.  Document the purpose of tiling.
2015-03-06 02:25:03 +08:00
Chia-I Wu 0ac706535a ilo: replace intel_tiling_mode by gen_surface_tiling
The former is used by the kernel driver to set up fence registers and to pass
tiling info across processes.  It lacks INTEL_TILING_W, which made our code
less expressive.
2015-03-06 02:25:03 +08:00
Chia-I Wu eb32ac1956 ilo: update genhw headers
The main change is non-inline <enum>s are now generated as C enums.
2015-03-06 02:25:03 +08:00
Mark Janes 237dcb4aa7 Fix invalid extern "C" around header inclusion.
System headers may contain C++ declarations, which cannot be given C
linkage.  For this reason, include statements should never occur
inside extern "C".

This patch moves the C linkage statements to enclose only the
declarations within a single header.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-03-05 10:21:40 -08:00
Chia-I Wu b5eb6f769d ilo: improve WA handling in rectlist path
Add wrappers for 3DPRIMITIVE to make sure we clear current_pipe_control_dw1
and deferred_pipe_control_dw1 after it.  Add missing
gen7_wa_post_ps_and_later().
2015-03-04 15:28:05 -07:00
Chia-I Wu 1424bdd61b ilo: clean up Gen7.5 WAs
These WAs

  gen7_wa_post_3dstate_push_constant_alloc_ps()
  gen7_wa_pre_vs()
  gen7_wa_pre_3dstate_sf_depth_bias()
  first half of gen7_wa_pre_depth()
  gen7_wa_post_ps_and_later()

are Gen7-specific.  Update copy-and-pasted gen8_wa_pre_depth() also.
2015-03-04 15:28:05 -07:00
Chia-I Wu 68d2e395d9 ilo: add ILO_DEBUG=hang
When set, detect and dump the hanging batch bufffer.
2015-03-05 04:52:49 +08:00
Chia-I Wu af4cff5d6f ilo: add some more winsys functions
Add intel_winsys_get_reset_stats(), intel_winsys_import_userptr(), and
intel_bo_map_async().  The latter two are stubs, but we are not going to use
them immediately either.
2015-03-04 13:42:17 -07:00
Matt Turner ade0b580e7 r300g: Check return value of snprintf().
Would have at least prevented the crash the previous patch fixed.

Cc: 10.4, 10.5 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=540970
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-03-04 11:15:09 -08:00
Matt Turner f5e2aa1324 r300g: Use PATH_MAX instead of limiting ourselves to 100 chars.
When built with Gentoo's package manager, the Mesa source directory
exists seven directories deep. The path to the .test file is too long
and is silently truncated, leading to a crash. Just use PATH_MAX.

Cc: 10.4, 10.5 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=540970
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-03-04 11:15:09 -08:00
Rob Clark b709adf7cc freedreno/ir3: fix old compiler after f6b2e8af74
If first_driver_param is left as zero (calloc'd struct), the result is
c0 getting clobbered.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-04 11:37:58 -05:00
Jose Fonseca d0b1c74b73 svga: Set MSVC2013 compat flags.
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-03-04 15:12:19 +00:00
Jose Fonseca 2c25008e8e softpipe,trace: Set MSVC 2008 compat flags.
Although we don't deploy these, we need to use them for debugging.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-03-04 15:12:17 +00:00
Jose Fonseca 00faf9f000 scons: Use -Werror MSVC compatibility flags per-directory.
Matching what we already do with autotools builds.

Reviewed-by: Brian Paul <brianp@vmware.com>
2015-03-04 15:12:06 +00:00
Rob Clark 8e67fd798e freedreno/a4xx: re-enable int (conditional on glsl130)
Re-enable integer, now that we can handle flat varyings.  Still, ofc,
conditional on FD_MESA_DEBUG=glsl130, until we can deprecate _old
compiler..

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-03 10:41:00 -05:00
Rob Clark e9f2abe349 freedreno/ir3: handle flat bypass for a4xx
We may not need this for later a4xx patchlevels, but we do at least need
this for patchlevel 0.  Bypass bary.f for fetching varyings when flat
shading is needed (rather than configure via cmdstream).  This requires
a special dummy bary.f w/ (ei) flag to signal to scheduler when all
varyings are consumed.  And requires shader variants based on rasterizer
flatshade state to handle TGSI_INTERPOLATE_COLOR.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-03 10:41:00 -05:00
Rob Clark 9d732d3125 freedreno/ir3: add support for memory (cat6) instructions
Scheduled basically the same as texture (cat5) instructions, using (sy)
flag for synchronization.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-03 10:41:00 -05:00
Rob Clark 20b50a0712 freedreno/ir3: fix up cat6 instruction encodings
I think there is at least one more sub-encoding, but these two should be
enough to cover the common load/store instructions.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-03 10:41:00 -05:00
Rob Clark 583a8a8f65 freedreno/a3xx,a4xx: silence some warnings
fd3_emit.c: In function ‘fd3_emit_vertex_bufs’:
  fd3_emit.c:377:11: warning: unused variable ‘semantic’ [-Wunused-variable]
     uint8_t semantic = sem2name(vp->inputs[i].semantic);

and

  fd4_emit.c: In function ‘fd4_emit_vertex_bufs’:
  fd4_emit.c:304:11: warning: unused variable ‘semantic’ [-Wunused-variable]
     uint8_t semantic = sem2name(vp->inputs[i].semantic);

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-03-03 10:41:00 -05:00
Jose Fonseca 80c5bd7ef0 configure: Leverage gcc warn options to enable safe use of C99 features where possible.
The main objective of this change is to enable Linux developers to use
more of C99 throughout Mesa, with confidence that the portions that need
to be built with MSVC -- and only those portions --, stay portable.

This is achieved by using the appropriate -Werror= options only on the
places they need to be used.

Unfortunately we still need MSVC 2008 on a few portions of the code
(namely llvmpipe and its dependencies).  I hope to eventually eliminate
this so that we can use C99 everywhere, but there are technical/logistic
challenges (specifically, newer Windows SDKs no longer bundle MSVC,
instead require a full installation of Visual Studio, and that has
hindered adoption of newer MSVC versions on our build processes.)
Thankfully we have more directy control over our OpenGL driver, which is
why we're now able to migrate to MSVC 2013 for most of the tree.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-03-03 09:25:11 +00:00
Jose Fonseca 9a07435ff8 identity: Remove.
It's unmaintained, and most likely broken: I use trace driver every now
and then, and everytime I do I need to fix it up.

It's also unused: identity_screen_create is never called.

Above all, it's dead weight: if identity driver had the infrastructure
for other pass-through drivers (like trace and rbug), then it would make
sense on its own right.  But as it is implemmented, it's just another
driver to (forget) to update whenever there is a gallium interface
change.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-03-02 14:12:46 +00:00
Kenneth Graunke 982723dfa2 Revert "configure: Leverage gcc warn options to enable safe use of C99 features where possible."
This reverts commit 79daa510c7.

I apparently hadn't done a clean build when testing this; it broke the
build for Tom, Ben, and myself.  We like the idea; let's try a v2.
2015-02-27 16:13:10 -08:00
Tom Stellard da85ab4b65 radeonsi/compute: Enable PIPE_SHADER_CAP_DOUBLES v2
v2:
  - Simplify ifdef

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-02-27 14:57:52 +00:00
Jose Fonseca 79daa510c7 configure: Leverage gcc warn options to enable safe use of C99 features where possible.
The main objective of this change is to enable Linux developers to use
more of C99 throughout Mesa, with confidence that the portions that need
to be built with MSVC -- and only those portions --, stay portable.

This is achieved by using the appropriate -Werror= options only on the
places they need to be used.

Unfortunately we still need MSVC 2008 on a few portions of the code
(namely llvmpipe and its dependencies).  I hope to eventually eliminate
this so that we can use C99 everywhere, but there are technical/logistic
challenges (specifically, newer Windows SDKs no longer bundle MSVC,
instead require a full installation of Visual Studio, and that has
hindered adoption of newer MSVC versions on our build processes.)
Thankfully we have more directy control over our OpenGL driver, which is
why we're now able to migrate to MSVC 2013 for most of the tree.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2015-02-27 14:30:36 +00:00
Vinson Lee 8170eba7e7 r300g/tests: Include stdio.h.
Fix build error.

  CC       compiler/tests/r300_compiler_tests-radeon_compiler_regalloc_tests.o
compiler/tests/radeon_compiler_regalloc_tests.c: In function ‘test_runner_rc_regalloc’:
compiler/tests/radeon_compiler_regalloc_tests.c:57:3: error: implicit declaration of function ‘fprintf’ [-Werror=implicit-function-declaration]
   fprintf(stderr, "Failed to load program\n");
   ^

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2015-02-26 21:01:32 -08:00
Brian Paul 40cfa0c347 radeon/compiler: include stdio.h
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89343
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-02-26 17:53:05 -07:00
Brian Paul 538e13d4a1 r300g: remove dependency on compiler.h
It only needs typical stdio.h and stdlib.h functions.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2015-02-26 08:38:38 -07:00
Rob Clark 864340219b freedreno: drop ARRAY_SIZE macro
Since now ARRAY_SIZE has been added to util/macros.h.  Fixes a bunch of:

  freedreno_util.h:79:0: warning: "ARRAY_SIZE" redefined
   #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
   ^
  In file included from ../../../../src/gallium/include/pipe/p_compiler.h:36:0,
                   from ../../../../src/gallium/include/pipe/p_context.h:31,
                   from freedreno_context.h:32,
                   from freedreno_context.c:29:
  ../../../../src/util/macros.h:29:0: note: this is the location of the previous definition
   #  define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
   ^

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-02-25 08:37:58 -05:00
Marek Olšák 1180e61a1b r600g,radeonsi: fix streamout after pipeline stats have been used
EVENT_TYPE_PIPELINESTAT_STOP disables streamout queries too.

Luckily, pipeline stats are enabled by default, so we don't even have to
emit EVENT_TYPE_PIPELINESTAT_START.

Tested on Hawaii, Bonaire, Redwood, RV730.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-02-24 21:21:04 +01:00
Marek Olšák fdf2c04737 radeonsi: small cleanup around current_rast_prim
- remove the last parameter of si_emit_rasterizer_prim_state
- remove the last unused parameter of si_emit_draw_registers
- use current_rast_prim in si_emit_draw_registers

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-02-24 21:21:04 +01:00
Marek Olšák 0b1f31ab7f radeonsi: set current_rast_prim in the right place
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-02-24 21:21:04 +01:00
Marek Olšák 4eb0ccf9e7 radeonsi: simplify obtaining a shader property in si_emit_clip_regs
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-02-24 21:21:04 +01:00
Marek Olšák 5349437154 radeonsi: only preload VertexID for the GS copy shader
The copy shader doesn't use any other preloaded VGPRs.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-02-24 21:21:04 +01:00
Marek Olšák ffd701e677 radeonsi: dump the shader key when dumping shaders
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-02-24 21:21:04 +01:00
Marek Olšák 93daf5a2f6 r600g,radeonsi: cleanup of hex literals
0x3F800000 -> fui(1.0)
0x00000000 -> 0

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-02-24 21:21:04 +01:00
Marek Olšák fa913a2dc6 radeonsi: set PA_SU_HARDWARE_SCREEN_OFFSET to 0
It was probably 0 already, but it doesn't hurt to set it.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-02-24 21:21:04 +01:00
Glenn Kennard d80701df8a r600g: Implement GL_ARB_draw_indirect for EG/CM
Requires Evergreen/Cayman and radeon kernel module
2.41.0 or newer.

Expected piglit fails due to hardware limitations:
* arb_draw_indirect-draw-arrays-prim-restart
  Restarts not applied for DrawArrays commands
* arb_draw_indirect-vertexid
  Base vertex offset is not included in vertex id

Marek: bump vgt_state num_dw by 3 (= space needed for one register write)

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2015-02-24 21:21:04 +01:00
Rob Clark dd70e78674 freedreno/a4xx: aniso filtering
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-02-24 14:23:38 -05:00
Rob Clark c70097ae86 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-02-24 14:23:38 -05:00
Rob Clark daccbd27ce freedreno/a4xx: add ARB_instanced_arrays support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-02-24 14:23:38 -05:00
Rob Clark e13398714c freedreno/a4xx: handle index_bias (i.e. base_vertex)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-02-24 14:23:38 -05:00
Rob Clark 283bb4848e freedreno/a4xx: add support for vertexid and instanceid sysvals
ir3 bits of it already in place from a3xx patch..

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-02-24 14:23:38 -05:00
Rob Clark 4aef0d79ee freedreno/a4xx: pass number of instances to draw
a4xx has it's own draw packet, so needs equivalent update to what a3xx
already got.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-02-24 14:23:38 -05:00
Eric Anholt 49d3c6a8e6 vc4: Update to current kernel sources.
New BO create and mmap ioctls are added.  The submit ABI gains a flags
argument, and the pointers are fixed at 64-bit.  Shaders are now fixed at
the start of their BOs.
2015-02-24 13:49:12 +00:00
Eric Anholt 1d1e820a6d r600: Fix build after 984f306937
Same as for the CLAMP macro, undef it before including a header file that
tries to make fields with that name.
2015-02-24 13:49:12 +00:00
Matt Turner bfcdb84383 mesa: Use assert() instead of ASSERT wrapper.
Acked-by: Eric Anholt <eric@anholt.net>
2015-02-23 10:49:47 -08:00
Marek Olšák 050bf75c8b radeonsi: fix a warning caused by previous commit
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
2015-02-23 11:45:00 +01:00
Marek Olšák 7820a11e3d radeonsi: fix point sprites
Broken by a27b74819a.

This fix is critical and should be ported to stable ASAP.

Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
2015-02-23 11:40:55 +01:00
Rob Clark 51e335742e freedreno/a4xx: set PC_PRIM_VTX_CNTL.VAROUT properly
Fixes xonotic, some webgl stuff, and really pretty much anything with
more than 4 varyings.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-02-21 17:11:02 -05:00
Rob Clark fb1301e40a freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-02-21 17:11:02 -05:00
Rob Clark bdf023482a freedreno/a4xx: bit of cleanup
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-02-21 17:11:02 -05:00
Rob Clark e17437386c freedreno: implement fence
I never actually implemented the stubbed out fence stuff back in the
early days.  Fix that.

We'll need a few libdrm_freedreno changes to handle timeout properly,
so ignore that for now to avoid a libdrm_freedreno dependency bump.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-02-21 17:11:02 -05:00
Rob Clark 6855226653 freedreno/a2xx: fix increment in assert
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88883
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-02-21 17:11:01 -05:00
Chia-I Wu 9fe81879c5 ilo: R32G32B32_FLOAT need no special care on Gen8+
Gen8+ must use VALIGN_4.  Unlike prior Gens, R32G32B32_FLOAT should supposedly
support VALIGN_4.
2015-02-21 11:33:54 +08:00
Chia-I Wu 226109436f ilo: 128 BPP formats can use TiledY on Gen7.5+
The restriction is lifted.
2015-02-21 11:33:54 +08:00
Ilia Mirkin f8e4792b22 nvc0: enable double support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-02-20 19:51:50 -05:00
Ilia Mirkin 5491458843 nvc0/ir: remove merge/split pairs to allow normal propagation to occur
Because the TGSI interface creates merges for each instruction source
and then splits them back out, there are a lot of unnecessary
merge/split pairs which do essentially nothing. The various modifier/etc
propagation doesn't know how to walk though those, so just remove them
when they're unnecessary.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-02-20 19:51:50 -05:00
Ilia Mirkin 93812dc10a nvc0/ir: add support for new TGSI double opcodes
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-02-20 19:51:43 -05:00
Ilia Mirkin ef8f09be33 nvc0/ir: handle zero and negative sqrt arguments
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-02-20 19:30:28 -05:00
Ilia Mirkin 88127874a3 nvc0/ir: no instruction can load a double immediate
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-02-20 19:30:28 -05:00
Ilia Mirkin b87b498b88 nvc0/ir: fix lowering of RSQ/RCP/SQRT/MOD to work with F64
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-02-20 19:30:28 -05:00
Ilia Mirkin 93ebe91bae gm107/ir: fix F2F flipped stype/dtype flags
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-02-20 19:30:27 -05:00
Ilia Mirkin dbf4a674b9 gm107/ir: fix DSET boolean float flag
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-02-20 19:30:27 -05:00
Ilia Mirkin 727018bb0c gm107/ir: fix DMUL opcode encoding
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-02-20 19:30:27 -05:00
Ilia Mirkin 493ad88e1b gk110/ir: add emission of dadd/dmul/dmad opcodes
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-02-20 19:30:27 -05:00
Ilia Mirkin fd0b1a4cbf nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-02-20 19:30:27 -05:00
Marek Olšák f5ac5e20b1 gallium/radeon: fix an uninitialized-variable warning 2015-02-20 20:20:10 +01:00
Ilia Mirkin c85a686d02 gallium: add new double-related shader caps to all the getters
Missed a few drivers in the earlier changes, this should fix up all the
ones that print unknown caps or don't have a default statement.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-02-20 14:09:25 -05:00
Brian Paul 71b155a2cb svga: add missing _DROUND,DFRACEXP_DLDEXP_SUPPORTED switch cases
To silence unhandled switch case warnings.
2015-02-20 08:09:40 -07:00
Marek Olšák 7692704b14 radeonsi: don't use SQC_CACHES to flush ICACHE and KCACHE on SI
This reverts 73c2b0d18c.

It doesn't seem to be reliable. It's probably missing a wait packet or
something, because it's just a register write and doesn't wait for anything.
SURFACE_SYNC at least seems to wait until the flush is done. Just guessing.

Let's not complicate things and revert this.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88561

Cc: 10.5 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2015-02-20 12:06:22 +01:00
Eric Anholt 85316d059c vc4: Keep an array of pointers to instructions defining the temps around.
The optimization passes are always regenerating it and throwing it away,
but it's not hard to keep track of.
2015-02-19 23:35:17 -08:00
Eric Anholt 877b48a531 vc4: Move qir_uniform() and the constant-value versions to vc4_qir.c/h.
I may want them in optimization passes, and they're not really particular
to the program translation stage.
2015-02-19 23:35:17 -08:00
Eric Anholt 14dc281c13 vc4: Enforce one-uniform-per-instruction after optimization.
This lets us more intelligently decide which uniform values should be put
into temporaries, by choosing the most reused values to push to temps
first.

total uniforms in shared programs: 13457 -> 13433 (-0.18%)
uniforms in affected programs:     1524 -> 1500 (-1.57%)
total instructions in shared programs: 40198 -> 40019 (-0.45%)
instructions in affected programs:     6027 -> 5848 (-2.97%)

I noticed this opportunity because with the NIR work, some programs were
happening to make different uniform copy propagation choices that
significantly increased instruction counts.
2015-02-19 23:35:17 -08:00
Eric Anholt 09c844fcd9 vc4: Rename add_uniform() to qir_uniform(). 2015-02-19 23:35:17 -08:00
Eric Anholt 96f6efc561 vc4: Shut up runtime warnings about new pipe caps. 2015-02-19 23:35:13 -08:00
Ilia Mirkin 5000a5f67b nv50: add PIPELINE_STATISTICS query support, based on nvc0
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Nick Tenney <nick.tenney@gmail.com>
2015-02-19 23:12:35 -05:00
Ilia Mirkin f883df74e0 svga: add missing :
Fixes: 924ee3f408 ("gallium: add shader cap for dldexp/dfracexp support")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-02-19 20:18:02 -05:00
Ilia Mirkin 924ee3f408 gallium: add shader cap for dldexp/dfracexp support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-02-19 19:32:52 -05:00
Ilia Mirkin 899d779cb7 gallium: add a cap to enable double rounding opcodes
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2015-02-19 19:32:49 -05:00
Ilia Mirkin 069dab7576 freedreno: add missing PIPE_CAP_RESOURCE_FROM_USER_MEMORY to switch
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-02-19 00:25:03 -05:00