KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Rob Clark	fcc7d6323b	freedreno: enable a306 Whitelist adreno 306 (as found in msm8916/apq8016). Works pretty much out of the box, although the smaller GMEM size requires more tiles to fit 1920x1080, so bump up the max # of tiles as well. Since it is just whitelist + trivial change, it makes sense to land on all the active release branches. Note that a305c ends up with gpu-id "306", hence a306 ends up with gpu-id of "307". Apparently that is what happens when you let the marketing dept name things. Cc: "10.4" and "10.5" and "10.6" <mesa-stable@lists.freedesktop.org> Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-05-14 14:46:14 -04:00
Samuel Pitoiset	175cbb447a	nvc0: remove unused nv50_tsc_wrap_mode() function Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-05-14 13:27:44 -04:00
Samuel Pitoiset	ac1ac94b38	nv50/ir: silence compiler warnings about mismatched tags These warnings have been detected by Clang 3.6. codegen/nv50_ir_from_tgsi.cpp:1319:10: warning: struct 'Source' was previously declared as a class [-Wmismatched-tags] const struct tgsi::Source *code; Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-05-14 13:27:44 -04:00
Samuel Pitoiset	70651b7041	nv50/ir: remove unused private field cycle to SchedDataCalculator Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-05-14 13:27:43 -04:00
Samuel Pitoiset	7469f2fd23	nv30: remove unused nvfx_fp_memcpy() function and comment nv40_fp_bra() The nv40_fp_bra() function in the same file is also unused but this is the only place where the nv30/nv40 isa is documented. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-05-14 13:27:43 -04:00
Samuel Pitoiset	48c84a36dd	nvc0: do not expose MP counters for nvf0 (GK110+) This fixes a crash when trying to monitor MP counters because compute support is not implemented for nvf0. Reported-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-05-14 13:27:43 -04:00
Roland Scheidegger	adcf8f8a13	softpipe: enable ARB_texture_view Some bits were already there for texture views but some were missing. In particular for cube map views things needed to change a bit. For simplicity I ended up removing the separate face addr bit (just use the z bit) - cube arrays didn't use it already, so just follow the same logic there. (In theory using separate bits could allow for better hash function but I don't think anyone ever did some measurements of that so probably not worth the trouble, if we'd reintroduce it we'd certainly wanted to use the same logic for cube arrays and cube maps.) Also extend the seamless cube sampling to cube arrays - as there were no piglit failures before this is apparently untested, but things now generally work quite the same for cube textures and cube array textures so there hopefully shouldn't be any trouble... 49 new piglits, 47 pass, 2 fail (both due to fake multisampling). v2: incorporate Brian's feedback, add sampler view validation, function rename, formatting fixes. Reviewed-by: Brian Paul <brianp@vmware.com>	2015-05-13 22:57:50 +02:00
Roland Scheidegger	e6c66f4fb0	llvmpipe: enable ARB_texture_view All the functionality was pretty much there, just not tested. Trivially fix up the missing pieces (take target info from view not resource), and add some missing bits for cubes. Also add some minimal debug validation to detect uninitialized target values in the view... 49 new piglits, 47 pass, 2 fail (both related to fake multisampling, not texture_view itself). No other piglit changes. v2: move sampler view validation to sampler view creation, update docs. Reviewed-by: Brian Paul <brianp@vmware.com>	2015-05-13 22:57:50 +02:00
Ilia Mirkin	c696a318ef	nouveau: document nouveau_heap Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-05-12 18:58:49 -04:00
Ilia Mirkin	d06ce2f1df	nvc0: switch mechanism for shader eviction to be a while loop This aligns it to work similarly to nv50. However there's no library code there, so the whole thing can be freed. Here we end up with an allocated node that's not attached to a specific program. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86792 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2015-05-12 18:47:17 -04:00
Marek Olšák	79ffc08ae8	gallium: add PIPE_CAP_DEVICE_RESET_STATUS_QUERY Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-05-12 19:38:31 +02:00
Dave Airlie	9ab90c058f	r600: use pipe->hw prim convert from radeonsi This avoids future addition to PIPE_PRIM_ from causing regressions on r600g. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-05-11 06:43:18 +10:00
Rob Clark	1cbdafc47a	freedreno/ir3/nir: fix build break after `f752effa` Our lower if/else pass was missed when converting NIR to use linked lists rather than hashsets to track use/def sets. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-05-10 06:03:53 -04:00
Ilia Mirkin	da136dc07d	nv50/ir: only enable mul saturate on G200+ Commit `44673512a8` enabled support for saturating fmul. However experimentally this does not seem to work on the older chips. Restrict the feature to G200 (NVA0) and later. Reported-by: Pierre Moreau <pierre.morrow@free.fr> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90350 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Tested-by: Pierre Moreau <pierre.morrow@free.fr> Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Cc: mesa-stable@lists.freedesktop.org	2015-05-09 13:41:51 -04:00
Ilia Mirkin	7892210400	nvc0: reset the instanced elements state when doing blit using 3d engine Since we update num_vtxelts here, we could otherwise end up with stale instancing information in the upper bits which wouldn't otherwise get reset. (Also we run the risk of the previous draw having set the first element as instanced.) This appears as one of the causes for the test pointed out in fdo#90363 to fail on nvc0. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90363 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2015-05-09 13:36:23 -04:00
Ilia Mirkin	e9b1ea29bf	nvc0: keep track of PGRAPH state in nvc0_screen See identical commit for nv50. Destroying the current context and then creating a new one or switching to another existing context would cause the "current" state to not be properly initialized, so we save it off in the screen. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2015-05-09 13:36:23 -04:00
Ilia Mirkin	f617029db3	nv50: keep track of PGRAPH state in nv50_screen Normally this is kept in nv50_context, and on switching the active context, the state is copied from the previous context. However when the last context is destroyed, this is lost, and a new context might later be created. When the currently-active context is destroyed, save its state in the screen, and restore it when setting the current context. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90363 Reported-by: Matteo Bruni <matteo.mystral@gmail.com> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Tested-by: Matteo Bruni <matteo.mystral@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2015-05-09 13:36:23 -04:00
Jason Ekstrand	7a30668ad6	util: Move gallium's linked list to util The linked list in gallium is pretty much the kernel list and we would like to have a C-based linked list for all of mesa. Let's not duplicate and just steal the gallium one. Acked-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Rob Clark <robclark@freedesktop.org>	2015-05-08 17:16:13 -07:00
Ilia Mirkin	c4ac09e30e	nv50/ir: only propagate saturate up if some actual folding took place The former logic would copy the saturate up to any mul with an immediate if there was a subsequent mul with a saturate. However we only want to do that if we collapsed 2 muls by multiplying their immediates (or were able to put the immediate in as a post-multiplier). Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2015-05-08 18:56:56 -04:00
Ilia Mirkin	55b66dc4de	nv50/ir: add SHL to the list of U32 opcodes Having the wrong inferred type prevents a number of optimizations, including constant propagation (since float immediates work differently than integer immediates). Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-05-06 20:50:03 -04:00
Vinson Lee	382b1a36e3	r600g: Fix Clang return-type build error. Fix Clang return-type error introduced with commit `96f164f6f0` "gallium: make pipe_context::begin_query return a boolean". CC r600_query.lo r600_query.c:443:3: error: non-void function 'r600_begin_query' should return a value [-Wreturn-type] return; ^ Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-05-06 12:21:34 -07:00
Chia-I Wu	ef5d4bcc3a	ilo: silence a compiler warning Silence ilo_query.c:120:7: warning: 'return' with no value, in function returning non-void since commit `96f164f6`.	2015-05-06 16:35:30 +08:00
Samuel Pitoiset	cea910bc28	nvc0: all queries use an unsigned 64-bits integer by default Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Martin Peres <martin.peres@free.fr>	2015-05-06 00:03:36 +03:00
Samuel Pitoiset	35a9286be6	nvc0: make begin_query return false when all MP counters are used Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Martin Peres <martin.peres@free.fr>	2015-05-06 00:03:36 +03:00
Samuel Pitoiset	ed7d3886cc	nvc0: define driver-specific query groups This patch defines "Driver statistics" and "MP counters" groups, but only the latter will be exposed through GL_AMD_performance_monitor. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Martin Peres <martin.peres@free.fr>	2015-05-06 00:03:36 +03:00
Samuel Pitoiset	96f164f6f0	gallium: make pipe_context::begin_query return a boolean GL_AMD_performance_monitor must return an error when a monitoring session cannot be started. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Martin Peres <martin.peres@free.fr>	2015-05-06 00:03:36 +03:00
Samuel Pitoiset	546ec980f8	gallium: replace pipe_driver_query_info::max_value by a union This allows queries to return different numeric types. Signed-off-by: Samuel Pitoiset <samuel.pitoiset at gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Martin Peres <martin.peres@free.fr>	2015-05-06 00:03:35 +03:00
Samuel Pitoiset	b620829b5e	gallium: add new fields to pipe_driver_query_info According to the spec of GL_AMD_performance_monitor, valid type values returned are UNSIGNED_INT, UNSIGNED_INT64_AMD, PERCENTAGE_AMD, FLOAT. This also introduces the new field group_id in order to categorize queries into groups. v2: add PIPE_DRIVER_QUERY_TYPE_BYTES v3: fix incorrect query type for radeon and svga drivers Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Martin Peres <martin.peres@free.fr>	2015-05-06 00:03:35 +03:00
Chia-I Wu	4348046a2f	ilo: use ilo_image exclusively in core Initialize ilo_view_surface and ilo_zs_surface from ilo_image instead of ilo_texture.	2015-05-02 22:28:31 +08:00
Chia-I Wu	9b705ec32d	ilo: add ilo_image_can_enable_aux() It replaces ilo_texture_can_enable_hiz().	2015-05-02 22:14:07 +08:00
Chia-I Wu	430594c34f	ilo: make ilo_image more self-contained Add depth0, sample_count, and scanout to ilo_image.	2015-05-02 22:14:06 +08:00
Chia-I Wu	f6ca4084c7	ilo: add ilo_image_init_for_imported() It replaces ilo_image_update_for_imported_bo() and enables more error checkings for imported textures.	2015-05-02 22:14:06 +08:00
Chia-I Wu	938c9b8cea	ilo: prepare for image init for imported bo Refactoring in prepraration for ilo_image_init_for_imported().	2015-05-02 22:14:06 +08:00
Chia-I Wu	3f9415077b	ilo: constify ilo_image_params Make ilo_image_params const in functions that do not modify it.	2015-05-02 22:14:06 +08:00
Chia-I Wu	c209aa7a8f	ilo: improve readability of ilo_image Improve docs, rename struct fields, and reorder walk types. No real changes.	2015-05-02 22:14:06 +08:00
Chia-I Wu	9b72bf5bd2	ilo: move command builder to core	2015-05-02 22:14:06 +08:00
Chia-I Wu	9e24c49e64	ilo: move ilo_state_3d* to core ilo state structs (struct ilo_xxx_state) are moved as well.	2015-05-02 22:14:06 +08:00
Chia-I Wu	8ab18262c5	ilo: add ilo_buffer.h to core Rename the original ilo_buffer to ilo_buffer_resource to avoid name conflict.	2015-05-02 22:14:06 +08:00
Chia-I Wu	3afbeb115a	ilo: move BOs from ilo_texture to ilo_image We want to work with ilo_image instead of ilo_texture in core.	2015-05-02 22:14:06 +08:00
Chia-I Wu	ac47563cb4	ilo: move ilo_layout.[ch] to core as ilo_image.[ch] Move files and s/layout/image/.	2015-05-02 22:14:06 +08:00
Chia-I Wu	8252765532	ilo: add ilo_format.[ch] to core The original ilo_format.[ch] are removed.	2015-05-02 22:14:06 +08:00
Chia-I Wu	9b7080c8b3	ilo: add ilo_fence.h to core Implement pipe_fence_handle on top of ilo_fence.	2015-05-02 22:14:06 +08:00
Chia-I Wu	2182beb431	ilo: add ilo_dev_init() to core Move init_dev() from ilo_screen.c to core.	2015-05-02 22:14:06 +08:00
Chia-I Wu	7562f9e907	ilo: rename ilo_dev_info to ilo_dev With intel_winsys being embedded in it, drop the "_info" suffix.	2015-05-02 22:14:06 +08:00
Chia-I Wu	19351af53d	ilo: move intel_winsys to ilo_dev_info We want to use ilo_dev_info instead of ilo_screen in core.	2015-05-02 22:14:06 +08:00
Chia-I Wu	b3197fe5f4	ilo: add ilo_dev.h to core Move what are remaining in ilo_common.h (that is, ilo_dev_*) to ilo_dev.h.	2015-05-02 22:14:06 +08:00
Chia-I Wu	7bb4fa72c0	ilo: add ilo_debug.[ch] to core They consist of the debug helpers that used to live in ilo_common.h and ilo_screen.c.	2015-05-02 22:14:06 +08:00
Chia-I Wu	a5797873d0	ilo: add ilo_core.h to core ilo_core.h includes the common gallium headers that were included in ilo_common.h.	2015-05-02 22:14:05 +08:00
Chia-I Wu	bbe91576b7	ilo: move intel_winsys.h to core Add a new subdirectory and start moving files that do not depend on ilo_screen/ilo_context to it.	2015-05-02 22:14:05 +08:00
Ilia Mirkin	33f0d1138d	nvc0/ir: fix predicated PFETCH for real Commit `a9d08a250` accidentally didn't make use of the new src1 variable. Use it. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2015-04-30 02:02:47 -04:00
Ilia Mirkin	db269ae495	nv50/ir: fix asFlow() const helper for OP_JOIN Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2015-04-29 23:34:30 -04:00
Ilia Mirkin	a9d08a250a	nvc0/ir: fix predicated PFETCH emission src1 would contain the predicate, which would get emitted as a register source by an undiscerning srcId helper. Work around this in the same way as in emitTEX. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2015-04-29 23:34:22 -04:00
Ilia Mirkin	515ac907e6	gk110/ir: fix set with a register dest to not auto-set the abs flag This was causing src0 to always have the absolute value flag set. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2015-04-29 18:03:19 -04:00
Marek Olšák	a582b22c63	winsys/radeon: add a private interface for radeon_surface	2015-04-29 21:51:40 +02:00
Marek Olšák	dcfbc006b6	winsys/radeon: move radeon_winsys.h to drivers/radeon	2015-04-29 21:51:40 +02:00
Emil Velikov	b124dc2b70	r300: do not link against libdrm_intel Accidentally added since the introduction of the file. Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-04-29 15:15:19 +01:00
Axel Davy	559342d01d	gallium/svga: Remove useless ARRAY_SIZE declaration This is already declared in util/macros.h Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Axel Davy <axel.davy@ens.fr>	2015-04-29 08:28:10 +02:00
Axel Davy	64880d073a	util/macros: Move DIV_ROUND_UP to util/macros.h Move DIV_ROUND_UP to a shared location accessible everywhere Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Axel Davy <axel.davy@ens.fr>	2015-04-29 08:28:10 +02:00
Ilia Mirkin	6fe0d4f035	nvc0/ir: flush denorms to zero in non-compute shaders This will set the FTZ flag (flush denorms to zero) on all opcodes that can take it. This resolves issues in Unigine Heaven 4.0 where there were solid-filled boxes popping up. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89455 Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-28 20:17:03 -04:00
Ilia Mirkin	e312a69958	nvc0: expose GLSL version 410 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-28 12:48:22 -04:00
Marek Olšák	6d05396b00	r600g,radeonsi: add a driver query returning GPU load Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2015-04-28 16:05:45 +02:00
Marek Olšák	0b8e73a6ae	r600g,radeonsi: add driver queries for GPU temperature and shader+memory clocks Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2015-04-28 16:05:45 +02:00
Ilia Mirkin	9143940da2	gm107/ir: add lane/vertex count sysvals Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-27 21:25:29 -04:00
Ilia Mirkin	89e0b08794	gk110/ir: add support for writing per-patch and shader outputs Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-27 21:25:28 -04:00
Ilia Mirkin	52614f59b7	freedreno/a3xx: color masking works like a blend for some formats When there is a colormask active that does not cover all the channels, enable reading in the destination like with a combining blend operation. This fixes fbo-blending-formats on a3xx. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-27 20:17:07 -04:00
Ilia Mirkin	9fc3f47278	freedreno/a3xx: add support for S8 and Z32F_S8 Enables ARB_depth_buffer_float. There is no sampling support for interleaved Z32F_S8, so we store the two textures separately, one as Z32F, the other as S8. As a result, we need a lot of additional logic for restores and transfers. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-27 20:17:07 -04:00
Ilia Mirkin	1571da6ac3	freedreno/a3xx: add Z32F support 32-bit depth buffers are stored as unorm, and thus need special handling when moving to and from gmem. They are copied into gmem by writing depth, and resolved from gmem using a special resolve bit which apparently float-ifies the data. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-27 20:17:07 -04:00
Ilia Mirkin	0a4cb00c77	freedreno: add fd_transfer to wrap around pipe_transfer Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-27 20:17:07 -04:00
Ilia Mirkin	f5c1101996	freedreno/a3xx: add support for disabling depth clipping Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-27 20:17:07 -04:00
Zoë Blade	05e7f7f438	Fix a few typos Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2015-04-27 17:28:29 +03:00
Marek Olšák	db2415189a	radeonsi: set an optimal value for DB_Z_INFO.ZRANGE_PRECISION Required because of a VI hw bug. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-04-27 15:57:07 +02:00
Marek Olšák	bed98eef9a	radeonsi: remove deprecated and useless registers Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-04-27 15:56:27 +02:00
Marek Olšák	393b0e0531	radeonsi: remove useless includes Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-04-27 15:56:27 +02:00
Marek Olšák	d8269be1ce	gallium/radeon: print winsys info with R600_DEBUG=info Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-04-27 15:56:27 +02:00
Marek Olšák	ecc7f2ed91	gallium/radeon: don't crash when getting out-of-bounds TEMP references Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-04-23 16:14:39 +02:00
Dave Airlie	8a41cd2407	softpipe: fix stencil write to use an integer value This fixes a number of regressions since `61393bdcdc` u_tile: fix stencil texturing tests under softpipe Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89960 Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-23 08:32:30 +10:00
Rob Clark	cb24d3b7ad	freedreno: misc minor cleanups Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-22 13:20:28 -04:00
Rob Clark	1b58d8c2bf	freedreno/a4xx: (partial) gl_FragCoord.zw The bit to enable .z is still commented out, as it is triggering gpu hangs in 0ad. But at least gl_FragCoord.w works now, and we know what bits we are supposed to set for .z (with that uncommented all piglit fragcoord tests are passing). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-22 13:20:28 -04:00
Rob Clark	a869183123	freedreno/a4xx: primitive-restart This was the missing bit to get dolphin-emu working on a4xx. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-22 13:20:28 -04:00
Rob Clark	632ea2a113	freedreno/nir: sysval fixes Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-22 13:20:28 -04:00
Rob Clark	13527df143	freedreno/a4xx: wire up integer texture sampling Similar to a3xx, the compiler needs to know the return type of the sam, etc, instructions. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-22 13:20:28 -04:00
Rob Clark	48a651e98c	freedreno/a4xx: formats updates/fixes Update formats table with new formats that Ilia has figured out, and fix sampling from srgb texture and integer vbo's. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-22 13:20:28 -04:00
Rob Clark	21ceedfd8b	freedreno: update generated headers Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-22 13:20:27 -04:00
Emil Velikov	86919352e3	android: use LOCAL_SHARED_LIBRARIES over TARGET_OUT_HEADERS ... to manage the LIBDRM*_CFLAGS. The former is the recommended approach by the Android build system developers while the latter has been depreciated for quite some time. Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>	2015-04-22 14:23:28 +01:00
Emil Velikov	413bc0a618	ilo: remove unused include from Android.mk Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>	2015-04-22 14:18:47 +01:00
Ilia Mirkin	0904774af1	freedreno/a3xx: enable polymode setting with non-fill modes Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-18 17:35:23 -04:00
Ilia Mirkin	6357601628	freedreno/a3xx: fix integer and 32-bit float border colors Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-18 17:35:23 -04:00
Ilia Mirkin	6895c3554e	freedreno/a3xx: add support for float R/RG render targets Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-18 17:35:23 -04:00
Rob Clark	95e68adcd9	freedreno/ir3/nir: few little fixes isaml needs to scale up coords based on LoD. Also fix bogus bary.f varying # when there are non-bary frag shader inputs. And use sub.s of a positive immediate rather than add.s of negative (since CP is better about figuring out that those can be collapsed into the cat2 instr). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-17 11:40:14 -04:00
Rob Clark	efbf14e893	freedreno/ir3/nir: lower if/else For now, completely flatten if/else blocks. That will almost certainly change once we have flow control. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-17 11:40:14 -04:00
Rob Clark	e5e11b5baf	freedreno/a4xx: support for large shaders Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-17 10:40:50 -04:00
Rob Clark	20ea698c49	freedreno: update generated headers Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-17 10:40:44 -04:00
Rob Clark	57f0d3b3c6	freedreno/ir3/nir: UBO support Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-17 10:40:36 -04:00
Rob Clark	87807e5cc5	freedreno/ir3: move out helper We'll also want it in NIR f/e for implementing UBO support. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-17 10:40:28 -04:00
Rob Clark	70b2f872ea	freedreno/a4xx: sysvals and UBOs Basically just sync up the cmdstream emit parts to match the changes already done on a3xx. Also, fix scheduling for mem instructions. This is needed on a4xx, and I am a bit surprised it isn't needed for a3xx. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-17 10:40:18 -04:00
Marek Olšák	b79c620663	radeonsi: add a debug option to compile shaders when they're created Tested-by: Tom Stellard <thomas.stellard@amd.com>	2015-04-16 18:36:29 +02:00
Emil Velikov	a7d018accf	radeonsi: remove bogus r600-- triple As mentioned by Michel Dänzer for LLVM >= 3.6 we create the LLVMTargetMachine (with triple amdgcn--), as we setup the radeonsi context. For older LLVM or hardware (r600) the triple is always r600-- and is created at a later stage - radeon_llvm_compile() Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-04-16 14:15:19 +01:00
Glenn Kennard	17d69862a9	r600g/sb: Skip empty ALU clause while scheduling Fixes assert triggered by ext_transform_feedback-intervening-read output use_gs piglit test. Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-16 12:43:20 +10:00
Eric Anholt	b229e6c7de	vc4: Don't try to use color load/stores to blit across format changes. We could potentially support the right combination of 8888 to 565, but the important thing for now is to not mix up our orderings of 8888. Fixes fbo-copyteximage regressions.	2015-04-15 16:50:23 -07:00
Eric Anholt	cff2e08c4c	vc4: Don't try to use color load/stores to do depth/stencil blits. Fixes regressions in fbo-generatemipmap-formats on depth/stencil (which does blits to work around baselevel/lastlevel).	2015-04-15 16:50:23 -07:00
Eric Anholt	3a728d4dfb	vc4: Update the shadow texture for public textures on every draw. We don't know who else has written to it, so we'd better update it every time. This makes the gears spin in X again.	2015-04-15 16:50:23 -07:00
Eric Anholt	bd957b1b79	vc4: Hook up VC4_DEBUG=perf to some useful printfs.	2015-04-15 16:50:22 -07:00
Tom Stellard	e0994e0f97	radeon/llvm: Improve codegen for KILL_IF Rather than emitting one kill instruction per component of KILL_IF's src reg, we now or the components of the src register together and use the result as a condition for just one kill instruction. shader-db stats (bonaire): 979 shaders Totals: SGPRS: 34872 -> 34848 (-0.07 %) VGPRS: 20696 -> 20676 (-0.10 %) Code Size: 749032 -> 748452 (-0.08 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 12288 -> 12288 (0.00 %) bytes per wave Totals from affected shaders: SGPRS: 1184 -> 1160 (-2.03 %) VGPRS: 600 -> 580 (-3.33 %) Code Size: 13200 -> 12620 (-4.39 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Increases: SGPRS: 2 (0.00 %) VGPRS: 0 (0.00 %) Code Size: 0 (0.00 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Decreases: SGPRS: 5 (0.01 %) VGPRS: 5 (0.01 %) Code Size: 25 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) * BY PERCENTAGE * Max Increase: SGPRS: 32 -> 40 (25.00 %) VGPRS: 0 -> 0 (0.00 %) Code Size: 0 -> 0 (0.00 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 32 -> 24 (-25.00 %) VGPRS: 16 -> 12 (-25.00 %) Code Size: 116 -> 96 (-17.24 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave * BY UNIT * Max Increase: SGPRS: 64 -> 72 (12.50 %) VGPRS: 0 -> 0 (0.00 %) Code Size: 0 -> 0 (0.00 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 32 -> 24 (-25.00 %) VGPRS: 16 -> 12 (-25.00 %) Code Size: 424 -> 356 (-16.04 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-04-14 13:37:12 +00:00
Tom Stellard	c6d79ed289	radeon/llvm: Run LLVM's instruction combining pass This should improve code quality in general and will help with some future changes to how we emit kill instructions. shader-db shows a few regressions, but these don't seem to be the result of deficiencies in instcombine. They're mostly caused by the scheduler making different decisions than before. shader-db stats (bonaire): 979 shaders Totals: SGPRS: 35056 -> 34872 (-0.52 %) VGPRS: 20624 -> 20696 (0.35 %) Code Size: 764372 -> 749032 (-2.01 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 12288 -> 12288 (0.00 %) bytes per wave Totals from affected shaders: SGPRS: 13264 -> 13072 (-1.45 %) VGPRS: 8248 -> 8316 (0.82 %) Code Size: 486320 -> 470992 (-3.15 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 11264 -> 11264 (0.00 %) bytes per wave Increases: SGPRS: 6 (0.01 %) VGPRS: 20 (0.02 %) Code Size: 14 (0.01 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Decreases: SGPRS: 32 (0.03 %) VGPRS: 8 (0.01 %) Code Size: 244 (0.25 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) * BY PERCENTAGE * Max Increase: SGPRS: 32 -> 48 (50.00 %) VGPRS: 12 -> 20 (66.67 %) Code Size: 216 -> 224 (3.70 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 40 -> 32 (-20.00 %) VGPRS: 16 -> 12 (-25.00 %) Code Size: 368 -> 280 (-23.91 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave * BY UNIT * Max Increase: SGPRS: 32 -> 48 (50.00 %) VGPRS: 28 -> 36 (28.57 %) Code Size: 39320 -> 40132 (2.07 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 72 -> 64 (-11.11 %) VGPRS: 48 -> 40 (-16.67 %) Code Size: 6272 -> 5852 (-6.70 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-04-14 13:37:05 +00:00
Tom Stellard	2569c7109d	radeonsi: Add header and footer to shader stat dump This makes it easier to parse. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-04-14 13:36:59 +00:00
Eric Anholt	1be329e64c	vc4: Add a blitter path using just the render thread. This accelerates the path for generating the shadow tiled texture when asked to sample from a raster texture (typical in glamor).	2015-04-13 23:20:46 -07:00
Eric Anholt	76d56752cc	vc4: Allow submitting jobs with no bin CL in validation. For blitting, we want to fire off an RCL-only job. This takes a bit of tweaking in our validation and the simulator support (and corresponding new code in the kernel).	2015-04-13 23:20:45 -07:00
Eric Anholt	43b20795b7	vc4: Move the blit code to a separate file. There will be other blit code showing up, and it seems like the place you'd look.	2015-04-13 23:20:45 -07:00
Eric Anholt	e214a59635	vc4: Separate out a bit of code for submitting jobs to the kernel. I want to be able to have multiple jobs being set up at the same time (for example, a render job to do a little fixup blit in the course of doing a render to the main FBO).	2015-04-13 23:20:45 -07:00
Eric Anholt	44b63cf5c0	vc4: When asked to sample from a raster texture, make a shadow tiled copy. So, it turns out my simulator doesn't quite match the hardware. And the errata about raster textures tells you most of what's wrong, but there's still stuff wrong after that. Instead, if we're asked to sample from raster, we'll just blit it to a tiled temporary. Raster textures should only be screen scanout, and word is that it's faster to copy to tiled using the tiling engine first than to texture from an entire raster texture, anyway.	2015-04-13 22:34:06 -07:00
Eric Anholt	d04b07f8e2	vc4: Fix off-by-one in branch target validation.	2015-04-13 22:34:06 -07:00
Eric Anholt	7fa2f2e366	vc4: Use NIR-level lowering for idiv. This fixes the idiv tests in piglit.	2015-04-13 21:36:40 -07:00
Eric Anholt	84ebaff1b7	vc4: Add a bunch of type conversions. These are required to get piglit's idiv tests working. The unsigned<->float conversions are wrong, but are good enough to get piglit's small ranges of values working.	2015-04-13 21:36:40 -07:00
Eric Anholt	adae027260	vc4: Use the blit interface for updating shadow textures. This lets us plug in a better blit implementation and have it impact the shadow update, too.	2015-04-13 10:39:24 -07:00
Eric Anholt	39b6f7e76c	vc4: Remove dead fields from vc4_surface.	2015-04-13 10:39:24 -07:00
Eric Anholt	5100221ff7	vc4: Skip sending down the clear colors if not clearing.	2015-04-13 10:39:24 -07:00
Eric Anholt	725620f21d	vc4: Sync with kernel changes to relax BCL versus RCL validation. There was no reason to tie the two packets' values together.	2015-04-13 10:39:23 -07:00
Eric Anholt	cb88d2cfcb	vc4: Fix another space allocation mistake. We're over-allocating our BCL in vc4_draw.c, so this never mattered. However, new RCL-only blit support might end up here without having set up any BCL contents.	2015-04-13 10:39:02 -07:00
Eric Anholt	8eb9304ee7	vc4: Add missed accounting for the size of the semaphore. This wouldn't have mattered except in the worst case scenario RCL setup.	2015-04-13 10:33:30 -07:00
Rob Clark	b98c0262d1	freedreno/ir3/nir: couple little fixes Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:41:03 -04:00
Rob Clark	1b936bb9f8	freedreno/ir3/nir: handle system values Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:57 -04:00
Rob Clark	715b2e0dbb	freedreno/ir3/nir: handle txs and query_levels tex ops These correspond to the tgsi TXQ opcode (plus sneak in a fix for two-sided color) Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:43 -04:00
Rob Clark	97e8fc3fdd	freedreno/ir3/nir: split out tex helpers We'll need these in one or two other spots. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:36 -04:00
Rob Clark	6e8160d6e3	freedreno/ir3/nir: simplify emit_tex() Just build up arrays for src0/src1, and use create_collect().. Also add back missing .3d flag for 3d/cube textures. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:28 -04:00
Rob Clark	d5357c16cc	freedreno/ir3/cp: handle indirect properly I noticed some cases where we where trying to copy-propagate indirect src's into places they cannot go, like 2nd src for cat3 (mad, etc). Expand out valid_flags() to be aware of relativ flag, and fix up a few related spots. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:21 -04:00
Rob Clark	49be76166b	freedreno/ir3/sched: avoid getting stuck on addr conflicts When we get in a scenario where we cannot schedule any more instructions due to address register conflict, clone the instruction that writes the address register, and switch the remaining unscheduled users for the current address register over to the new clone. This is simpler and more robust than the previous attempt (which tried and sometimes failed to ensure all other dependencies of users of the address register were scheduled first).. hint it would try to schedule instructions that were not actually needed for any output value. We probably need to do the same with predicate register, although so far it isn't so heavily used so we aren't running into problems with it (yet). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:15 -04:00
Rob Clark	4cf4006674	freedreno/ir3/nir: add variable-indexing support A bit fugly.. try and make this cleaner.. note if we hoist all the get_addr() out of the loop we can drop the hashtable and just use create_addr().. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:09 -04:00
Rob Clark	972ce757d7	freedreno/ir3/asm: change assert to warning It probably should be an assert, but for now TGSI f/e isn't very good about dealing w/ CONST vs ABS/NEG. So for debug builds, print a warning instead of crashing with an assert for now. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:03 -04:00
Rob Clark	09cbd97a47	freedreno/ir3/nir: set first_driver_param Without this, a3xx breaks.. a4xx would too if it had already implemented support for passing driver params. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:39:56 -04:00
Rob Clark	f0e9a632a1	freedreno/ir3/cp: support to swap mad src's For a normal MAD (ie. not MADSH), if first source is gpr and second source is const, we can swap the first two sources to avoid needing a mov instruction. This gives back the biggest advantage TGSI f/e had over NIR f/e for common shaders, since TGSI f/e had this logic in the f/e. Note that doing this in copy-prop step has the advantage that it will also work for cases like: MOV TEMP[b], CONST[x] MAD TEMP[d], TEMP[a], TEMP[b], TEMP[c] Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:39:46 -04:00
Roland Scheidegger	586536a4e1	gallivm: don't use control flow when doing indirect constant buffer lookups llvm goes crazy when doing that, using way more memory and time, though there's probably more to it - this points to a very much similar issue as fixed in `8a9f5ecdb1`. In any case I've seen a quite plain looking vertex shader with just ~50 simple tgsi instructions (but with a dozen or so such indirect constant buffer lookups) go from a terribly high ~440ms compile time (consuming 25MB of memory in the process) down to a still awful ~230ms and 13MB with this fix (with llvm 3.3), so there's still obvious improvements possible (but I have no clue why it's so slow...). The resulting shader is most likely also faster (certainly seemed so though I don't have any hard numbers as it may have been influenced by compile times) since generally fetching constants outside the buffer range is most likely an app error (that is we expect all indices to be valid). It is possible this fixes some mysterious vertex shader slowdowns we've seen ever since we are conforming to newer apis at least partially (the main draw loop also has similar looking conditionals which we probably could do without - if not for the fetch at least for the additional elts condition.) v2: use static vars for the fake bufs, minor code cleanups Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-04-09 01:32:30 +02:00
Glenn Kennard	f2947807c8	r600g/sb: Enable SB for geometry shaders Add SV_GEOMETRY_EMIT special variable type to track the implicit dependencies between CUT/EMIT_VERTEX/MEM_RING instructions so GCM/scheduler doesn't reorder them. Mark emit instructions as unkillable so DCE doesn't eat them. Enable only for evergreen/cayman as there are a few unexplained GS piglit regressions on R6xx/R7xx with SB enabled otherwise. Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-08 08:18:35 +10:00
Glenn Kennard	06bb68da4a	r600g/sb: Update last_cf for loops CF_END could end up emitted in the middle of a shader on cayman when there was a loop at the very end. Fixes glsl-1.50-geometry-end-primitive and ext_transform_feedback-geometry-shaders-basic piglit tests. Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-08 08:18:17 +10:00
Ilia Mirkin	ae720c66cb	nv50,nvc0: limit the y-tiling of 3d textures to the first level's tiling We limit y-tiling to 0x20 when depth is involved. However the function is run for each miplevel, and the hardware expects miplevel 0 to have the highest tiling settings. Perform the y-tiling limit on all levels of a 3d texture, not just the ones that have depth. Fixes: texelFetch fs sampler3D 98x129x1-98x129x9 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Tested-by: Nick Tenney <nick.tenney@gmail.com> # GT216 Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>	2015-04-06 23:06:55 -04:00
Dave Airlie	ad84689f73	r600g: fix op3 abs issue This code to handle absolute values on op3 srcs was a bit too simple, it really needs a temp reg per src, not one per channel, make it easier and let sb clean up the mess. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89831 Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-07 11:40:16 +10:00
Rob Clark	8b0b81339b	freedreno/ir3: add NIR compiler The NIR compiler frontend is an alternative to the TGSI f/e, producing the same ir3 IR and using the same backend passes for scheduling, etc. It is not enabled by default yet, as there are still some regressions. To enable, use 'FD_MESA_DEBUG=nir'. It is enough to use with, for example, xonotic or supertuxkart. With the NIR f/e, scalarizing and a number of other lowering steps happen in NIR, so we don't have to do them in ir3. Which simplifies the f/e and allows the lowered instructions to pass through other optimization stages. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 16:36:40 -04:00
Ilia Mirkin	700d949ea1	freedreno/a3xx: don't decode srgb on mem2gmem Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	b060b56772	freedreno/a3xx: pass sprite coord mode through to program emit Use the correct sprite replacement depending on the flip of the coord mode, using either T or 1-T depending on whether we have an upper-left or lower-left coordinate origin. This fixes all the point sprite piglits. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	1de72dfc8a	freedreno/a3xx: add UBO support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	c7811f56c2	freedreno/ir3: insert nop between sfu/mem operations Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	14dfd8cc43	freedreno: dirty context when reallocating a bound bo Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	bde2045fa2	freedreno: keep track of buffer valid ranges Copies nouveau_buffer and radeon_buffer. This allows a write to proceed to an uninitialized part of a buffer even when the GPU is using the previously-initialized portions. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	dacf22e0a3	freedreno: mark resources as being read so that writes flush the queue Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:34 -04:00
Ilia Mirkin	2e1445c8f3	freedreno: don't bother setting resource timestamps Waiting on a bo being ready is handled in fd_bo_cpu_prep. No need to keep separate timestamps around. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:34 -04:00
Ilia Mirkin	1fee3061d5	freedreno: add a reading flag to indicate gpu is reading rsc Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:34 -04:00
Ilia Mirkin	ea0952a9db	freedreno: fix resource flushing confusion A resource flush is an upload of a hypothetically-staging texture to the GPU. For a UMA system, this will largely be a no-op or cache-maintenance. Move the render flush logic into transfer_map where it belongs, and clear out the transfer_flush function. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:34 -04:00
Ilia Mirkin	bfb0a8eb69	freedreno: remove tex_resource pipe_sampler_view already contains a texture, remove the redundant tex_resource member which pointed at the same thing. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:34 -04:00
Rob Clark	6cd9c94ce4	freedreno/ir3: handle FRAG IN's without interpolation specified Fallback to picking based on semantic name. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 16:36:34 -04:00
Rob Clark	f513f006ce	freedreno/ir3/cmdline: add @const headers for immediates Since NIR f/e currently encodes immediates in instructions (rather than passing via const), we need to ensure that when const's are used the get initialized to the proper values. Otherwise comparing NIR to TGSI compiler, it will use proper immediate values in one case, and randomly initialize values in the other. Which confuses ir3test. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 16:36:34 -04:00
Rob Clark	6bc12bb5fd	freedreno/ir3/cmdline: remove hack for old compiler Since we dropped the old compiler, we don't need this hack anymore. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 16:36:34 -04:00
Rob Clark	f370e95421	freedreno/ir3: handle const/immed/abs/neg in cp Be smarter about propagating copies from const or immed, or with abs/neg modifiers. Also, realize that absneg.s and absneg.f are really "fancy" mov instructions. This opens up the possibility to remove more copies. It helps the TGSI frontend a bit, but will be really needed for the NIR f/e which builds everything up in SSA form (ie. will always insert a mov from const or immediate). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 16:36:34 -04:00
Rob Clark	104713d9f2	freedreno/ir3: split float/int abs/neg Even though in the end, they map to the same bits, the backend will need to be able to differentiate float abs/neg vs integer abs/neg. Rather than making the backend figure it out based on instruction opcode (which when combined with mov/absneg instructions, can be awkward), just split out different flags for each so the frontend can signal it's intentions more clearly. Also, since (neg) for bitwise op's is actually a bitwise- not, split it out into bnot flag. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 12:44:01 -04:00
Rob Clark	203f37540a	freedreno/ir3: add ir3 builder helpers Add helpers for constructing SSA forms of instructions. Only partial cat5/cat6 coverage.. but we can add stuff as needed. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 12:44:01 -04:00
Rob Clark	b1c9fb9fca	freedreno/ir3: fix sam argument order comment Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 12:44:01 -04:00
Rob Clark	101142c401	xa: support for drivers which use NIR We need to pull in libnir.la and it's dependency libglsl_util.la. Also, _mesa_error_no_memory() must be defined. Fortunately with libnir.la (vs pulling in all of libglsl.la) we don't also need libstdc++. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 09:24:17 -04:00
Ilia Mirkin	ba353935a3	nv50: allocate more offset space for occlusion queries Commit `1a170980a0` started writing to q->data[4]/[5] but kept the per-query space at 16, which meant that in some cases we would write past the end of the buffer. Rotate by 32, like nvc0 does. This ensures that we always have 32 bytes in front of us, and the data writes will go within the allocated space. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89679 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Tested-by: Nick Tenney <nick.tenney@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>	2015-04-04 11:30:03 -04:00
Ilia Mirkin	01d3b750b3	nv50/ir: avoid folding immediates into imad operations Commit `09ee907266` added logic to fold immediates into mad operations, but the emission code is only there for fmad. Only allow it on float types. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 18:42:31 -04:00
Ilia Mirkin	603d28f32c	nv50/ir: fix imad emission when dst == src2 Commit `fb63df2215` added 4-byte mad support, but only supported emission for floats. Disable it for ints for now. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 18:35:59 -04:00
Eric Anholt	a9152376b4	vc4: Add support for nir_iabs. Tested using the GLSL 1.30 tests for integer abs(). Not currently used, but it was one of the new opcodes used by robclark's idiv lowering.	2015-04-02 10:32:35 -07:00
Ilia Mirkin	4a3c0e9950	freedreno/a3xx: add MRT support The hardware only supports 4 MRTs. It should be possible to emulate support for 8, but doesn't seem worth the trouble. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	6f4c1976f4	freedreno: convert blit program to array for each number of rts Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	d9992ab35a	freedreno: add support for laying out MRTs in gmem Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	602bc6c88d	freedreno: add core infrastructure support for MRTs Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	d13803c76f	freedreno/ir3: add support for FS_COLOR0_WRITES_ALL_CBUFS property This will enable the driver to tell which regids to link up to which MRT outputs. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	f27ec59084	freedreno/a3xx: add independent blend function support This is needed for MRT support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	8efa3e340d	freedreno: remove alpha key from ir3_shader This complication is unnecessary and makes MRTs more complicated and likely to generate tons of variants. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Stéphane Marchesin	70eed78cac	i915g: Implement EGL_EXT_image_dma_buf_import This adds all the plumbing to get EGL_EXT_image_dma_buf_import in i915g. Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>	2015-04-01 20:13:37 -07:00
Eric Anholt	26261bca21	vc4: Add shader-db dumping of NIR instruction count. I was previously using temporary disables of VC4 optimization to show the benefits of improved NIR optimization, but this can get me quick and dirty numbers for NIR-only improvements without having to add hacks to disable VC4's code (disabling of which might hide ways that the NIR changes would hurt actual VC4 codegen).	2015-04-01 10:57:01 -07:00
Eric Anholt	73e2d4837d	vc4: Convert to consuming NIR. NIR brings us better optimization than I would have bothered to write within the driver, developers sharing future optimization work, and the ability to share device-specific lowering code that we and other GLES2-level drivers need. total uniforms in shared programs: 13421 -> 13422 (0.01%) uniforms in affected programs: 62 -> 63 (1.61%) total instructions in shared programs: 39961 -> 39707 (-0.64%) instructions in affected programs: 15494 -> 15240 (-1.64%) v2: Add missing imov support, and assert that there are no dest saturates. v3: Rebase on the target-specific algebraic series. v4: Rebase on gallium-includes-from-NIR changes in mater. v5: Rebase on variables being in lists instead of hash tables. v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which I'm not committing)	2015-04-01 10:57:01 -07:00
Eric Anholt	486dcfbbd9	vc4: Tell shader-db how big our UBOs are, if present. I had regressed them for a while with the NIR work.	2015-04-01 10:57:01 -07:00
Marcin Ślusarz	f9e2295560	nouveau: synchronize "scratch runout" destruction with the command stream When nvc0_push_vbo calls nouveau_scratch_done it does not mean scratch buffers can be freed immediately. It means "when hardware advances to this place in the command stream the scratch buffers can be freed". To fix it, just postpone scratch runout destruction after current fence is signalled. The bug existed for a very long time. Nobody noticed, because "scratch runout" code path is rarely executed. Fixes hang at the very beginning of first mission in "Serious Sam 3" on nve7/gk107. It manifested as: nouveau E[ PFIFO][0000:01:00.0] read fault at 0x000a9e0000 [PTE] from GR/GPC0/PE_2 on channel 0x007f853000 [Sam3[17056]] Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-31 22:04:31 +02:00
Tom Stellard	4c53d2acbb	radeonsi/compute: Default to the same PIPE_SHADER_CAP values as other shader types v2 v2: - Fix typo Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-03-31 15:40:51 +00:00
Leo Liu	a714fbacf7	radeon/vce: implement video usability information support This will help encoding VUI into the bitstream v2: make backward compatible Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2015-03-31 12:31:58 -04:00
Roland Scheidegger	489866938f	llvmpipe: enable ARB_texture_gather Just announce support for 4 components. While here also increase the max/min texel offsets (the limit is completely artificial, was chosen because that's what other hardware did, however there's other drivers using larger limits). Over a thousand little piglits skip->pass. v2: update docs/GL3.txt Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-31 17:23:51 +02:00
Roland Scheidegger	1863ed21ff	gallivm: simplify sampler interface This has got a bit out of control with more and more parameters added. Worse, whenever something in there changes all callees have to be updated for that, even though they don't really do much with any parameter in there except pass it on to the actual sampling function. Hence simply put almost everything into a struct. Also instead of relying on some arguments being NULL, be explicit and set this in a key (which is just reused for function generation for simplicity). (The code still relies on them being NULL in the end for now.) Technically there is a minimal functional change here for shadow sampling: if shadow sampling is done is now determined explicitly by the texture function (either sample_c or the gl-style tex func inherit this from target) instead of the static texture state. These two should always match, however. Otherwise, it should generate all the same code. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-31 17:23:51 +02:00
Eric Anholt	1dcc1ee314	vc4: Drop integer multiplies with 0 to moves of 0. This cleans up more instructions generated by uniform array indexing multiplies. total instructions in shared programs: 39989 -> 39961 (-0.07%) instructions in affected programs: 896 -> 868 (-3.12%)	2015-03-30 12:57:45 -07:00
Eric Anholt	8c5dcdbccb	vc4: Add a constant folding pass. This cleans up some pointless operations generated by the in-driver mul24 lowering (commonly generated by making a vec4 index for a matrix in a uniform array). I could fill in other operations, but pretty much anything else ought to be getting handled at the NIR level, I think. total uniforms in shared programs: 13423 -> 13421 (-0.01%) uniforms in affected programs: 346 -> 344 (-0.58%)	2015-03-30 12:57:45 -07:00
Eric Anholt	c519c4d85e	vc4: Don't bother masking out the low 24 bits for integer multiplies The hardware just uses the low 24 lines, saving us an AND to drop the high bits. total uniforms in shared programs: 13433 -> 13423 (-0.07%) uniforms in affected programs: 356 -> 346 (-2.81%) total instructions in shared programs: 40003 -> 39989 (-0.03%) instructions in affected programs: 910 -> 896 (-1.54%)	2015-03-30 09:23:39 -07:00
Eric Anholt	5df8bf86fe	vc4: Make integer multiply use 24 bits for the low parts. The hardware uses the low 24 bits in integer multiplies, so we can have fewer high bits (and so probably drop them more frequently).	2015-03-30 09:23:39 -07:00
Michel Dänzer	d64adc3a79	radeonsi: Cache LLVMTargetMachineRef in context instead of in screen Fixes a crash in genymotion with several threads compiling shaders concurrently. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89746 Cc: 10.5 <mesa-stable@lists.freedesktop.org> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-30 15:15:10 +09:00
Ilia Mirkin	ee670c9efa	freedreno/a3xx: add support for point sprite coordinate replacement This does not (yet) support different coordinate origins, so the tests still fail due to fbo flipping. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-28 14:54:41 -04:00
Ilia Mirkin	995f55a6ce	freedreno/a3xx: make vs-set point size work This appears to need the A2XX version of the point list, so select it at draw time if necessary. Experimentally, always using the A2XX version causes hangs when PSIZE isn't actually emitted. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-28 14:54:41 -04:00
Ilia Mirkin	7fc5da8b93	freedreno/a3xx: point size should not be divided by 2 The division is probably a holdover from the days when the fixed point inline functions generated by headergen were broken. Also reduce the maximum point size to 4092 (vs 4096), which is what the blob does. Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-28 14:54:41 -04:00
Ilia Mirkin	738c8319ac	freedreno/a3xx: fix 3d texture layout The SZ2 field contains the layer size of a lower miplevel. It only contains 4 bits, which limits the maximum layer size it can describe. In situations where the next miplevel would be too big, the hardware appears to keep minifying the size until it hits one of that size. Unfortunately the hardware's ideas about sizes can differ from freedreno's which can still lead to issues. Minimize those by stopping to minify as soon as possible. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>	2015-03-28 14:54:41 -04:00
Ilia Mirkin	3735643df3	freedreno/a3xx: LAYERSZ2 appears to have no effect on arrays Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-28 14:54:40 -04:00
Roland Scheidegger	b2424fb030	llvmpipe: simplify address calculation for 4x4 blocks These functions looked quite complicated, even though what they actually did was trivial (ever since we dropped swizzled rendering). Also drop lookup of format block per bytes done for each block, and do it once per scene instead. This improves everybody's favorite "benchmark" by 3% or so, though lp_rast_shade_quads_all() which calls this shows up still quite high for a function which does little more than call the jit function. (This would most likely be much better handled by the jit function itself, the strides are passed through anyway already, though for being able to handle layers it would definitely add some complexity.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-28 02:59:42 +01:00
Ilia Mirkin	58030a8f99	nv50/ir/gk110: fix offset flag position for TXD opcode Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-27 19:02:19 -04:00
Ilia Mirkin	49b86007aa	nv50/ir: take postFactor into account when doing peephole optimizations Multiply operations can have a post-factor on them, which other ops don't support. Only perform the peephole optimizations when there is no post-factor involved. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89758 Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-27 19:02:19 -04:00
Roland Scheidegger	8dad9455ff	gallivm: pass jit_context pointer through to sampling The callbacks used for getting the dynamic texture/sampler state were using the jit_context from the generated jit function. This works just fine, however that way it's impossible to generate separate functions for texture sampling, as will be done in the next commit. Hence, pass this pointer through all interfaces so it can be passed to a separate function (technically, it would probably be possible to extract this pointer from the current function instead, but this feels hacky and would probably require some more hacks if we'd use real functions instead of inlining all shader functions at some point). There should be no difference in the generated code for now. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-27 19:25:53 +01:00
Ilia Mirkin	1b87d73a9f	gallium/util: remove u_linkage Does not appear to be used in tree. Coverity spotted some errors in the bitmask stuff, but the whole thing appears to be unused. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Brian Paul <brianp@vmware.com>	2015-03-26 21:02:09 -04:00
Eric Anholt	7bc39c8418	vc4: Add a dump-the-surface-contents routine. This has been useful once again while trying to debug stride issues between render targets and texturing.	2015-03-24 10:39:12 -07:00
Eric Anholt	7f797e3d17	vc4: Fix pitch alignment of linear textures. Fixes some non-power-of-two texture rendering when I force ARGB8888 to raster.	2015-03-24 10:39:12 -07:00
Eric Anholt	b3ea377f86	vc4: Write the alignment of level width consistently in validation. 16 / cpp happens to be the same as utile_w on the only raster format supported (4 bytes per pixel), but simulator/hw source code generally talks in terms of utiles.	2015-03-24 10:39:12 -07:00
Eric Anholt	8975a09494	vc4: Fix use of a bool as an enum. The enum compared to was 0, so it worked out, but it sure looked wrong.	2015-03-24 10:39:12 -07:00
Eric Anholt	04605c21f6	vc4: Decide the HW's format before laying out the miptree. I'm experimenting with a workaround for raster texture misrendering on hardware, and this lets me look at the format chosen when computing strides.	2015-03-24 10:39:12 -07:00
Eric Anholt	25d60763d9	vc4: Use our device-specific ioctls for create/mmap. They don't do anything special for us, but I've been told by kernel maintainers that relying on dumb for my acceleration-capable buffers is not OK.	2015-03-24 10:39:12 -07:00
Eric Anholt	af3d747194	vc4: Make a new #define for making code conditional on the simulator. I'd like to compile as much of the device-specific code as possible when building for simulator, and using if (using_simulator) instead of ifdefs helps.	2015-03-24 10:39:12 -07:00
Eric Anholt	9bafcf630a	vc4: Add some useful debug printfs for miptrees. I keep rewriting these.	2015-03-24 10:39:12 -07:00
Ilia Mirkin	43277fcd59	Revert "nv50,nvc0: remove bogus 64_FLOAT formats" This reverts commit `20346808cf`. The conversion is actually done since these are the *B macro variants and no vtx format is supplied, which makes them go through the translate module. This restores the following piglit tests to passing: draw-vertices user gl-2.0-vertexattribpointer Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-23 20:57:52 -04:00
Giuseppe Bilotta	76039b38f0	gallium: implement get_device_vendor() for existing drivers The only hackish ones are llvmpipe and softpipe, which currently return the same string as for get_vendor(), while ideally they should return the CPU vendor. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-23 13:25:34 +00:00
Emil Velikov	7c7954b09d	galahad: actually remove the driver Should have been part of 429a4355259(galahad: remove driver). Seems like I've erroneously committed the trimmed patch. Reported-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>	2015-03-21 22:35:27 +00:00
Roland Scheidegger	e8039208c4	llvmpipe: use global llvm context for PIPE_SUBSYSTEM_EMBEDDED There's 2 reasons why we'd want to use the global context: 1) There still seems to be one memory "leak" left when using multiple llvm contexts (it is not a true leak as the memory disappears into some still addressable pool but nevertheless the memory consumption grows). See http://cgit.freedesktop.org/~jrfonseca/llvm-jitstress/ 2) These contexts get kinda big - even when disposing modules etc. after compiling a shader the LLVMContext can easily be over 100kB. So when there's lots of llvm contexts arounds it adds up. The downside is that at least right now this is absolutely not thread safe, so this only works safely in environments where multiple pipe contexts are not used concurrently. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-21 01:52:03 +01:00
Dave Airlie	9d97cd2e3e	u_primconvert: add primitive restart support This add primitive restart support to the prim conversion. This involves changing the API for the translate functions as we need to pass the prim restart index and the original number of indices into the translate functions. primitive restart is support for quads, quad strips and polygons. This deal with the case where the actual number of output primitives is less than the initially calculated number, by filling the rest of the output primitives with the restart index, the other option is to reduce the output prim number, but that will make the generator code a bit messier. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-03-20 09:46:30 +10:00
Rob Clark	aee26d292f	freedreno/ir3: fix infinite recursion in sched One more case we need to handle. One of the src instructions for the indirect could also end up being ourself. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-18 10:42:33 -04:00
Rob Clark	62cc003b7d	freedreno: fix spelling Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-18 10:42:33 -04:00
Marek Olšák	a984abdad3	radeonsi: increase coords array size for radeon_llvm_emit_prepare_cube_coords radeon_llvm_emit_prepare_cube_coords uses coords[4] in some cases (TXB2 etc.) Discovered by Coverity. Reported by Ilia Mirkin. Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-18 12:04:27 +01:00
Emil Velikov	3f94a5afcb	r600g: constify r600_shader_tgsi_instruction lists. Massive list of constant data. Annotate it as such. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-03-17 23:52:39 +00:00
Emil Velikov	63cf2b4448	r600g: kill off r600_shader_tgsi_instruction::{tgsi_opcode,is_op3} Both of which are no longer used. Use designated initializer to make things obvious as people add/remove TGSI_OPCODEs. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-03-17 23:52:35 +00:00
Emil Velikov	5e68c6b322	r600g: use the tgsi opcode from parse.FullToken.FullInstruction ... rather than the local one in inst_info->tgsi_opcode. This will allow us to simplify struct r600_shader_tgsi_instruction. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-03-17 23:52:32 +00:00
Marek Olšák	b5f19db976	radeonsi: implement TGSI_OPCODE_BFI (v2) v2: Don't use the intrinsics, the shader backend can recognize these patterns and generates optimal code automatically. Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-16 14:58:19 +01:00
Marek Olšák	d3723c614f	radeonsi: add a helper for extracting bitfields from parameters (v2) This will be used a lot (especially by tessellation). v2: don't use the bfe intrinsic Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-16 14:58:19 +01:00
Marek Olšák	dc39413640	radeonsi: move scratch reloc state setup - move it to its own function - do it after all states are emitted - bump SI_MAX_DRAW_CS_DWORDS Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:19 +01:00
Marek Olšák	567c8d7300	radeonsi: don't emit PA_SC_LINE_STIPPLE if not rendering lines Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:19 +01:00
Marek Olšák	1f4bb38264	radeonsi: don't emit PA_SC_LINE_STIPPLE after every rasterizer state change Do it only when the line stipple state is changed. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:19 +01:00
Marek Olšák	f5832f3f9d	radeonsi: move PA_SU_SC_MODE_CNTL to rasterizer state This requires enabling the optional GL provoking vertex behavior for quads. + some cosmetic changes, so that the register is set exactly the same as on r600. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:19 +01:00
Marek Olšák	98a2398222	radeonsi: implement line and polygon smoothing Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:19 +01:00
Marek Olšák	303d23e10d	radeonsi: add shader code for smoothing The fragment shader multiplies the alpha channel with gl_SampleMaskIn. If blending is enabled, it looks like MSAA. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:19 +01:00
Marek Olšák	4f20a8f278	radeonsi: split sample locations into its own state atom Sample locations are not updated as often as framebuffers. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	f7796a966d	radeonsi: add basic code for overrasterization This will be used for line and polygon smoothing. This is GCN-only even though it's in shared code. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	1921fa4304	radeonsi: small cleanup in si_shader_selector_key Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	52ff1edc51	radeonsi: simplify accessing alpha pointer in si_llvm_emit_fs_epilogue Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	955ebf2890	radeonsi: add support for easy opcodes from ARB_gpu_shader5 I have to use the BFE instrinsics, because BFE is one of the most complex instructions that can't be matched easily. BFE has 3 conditional branches and one of them is quite big. In the isel DAG, lowered BFE has 27 nodes (including leafs).	2015-03-16 12:54:18 +01:00
Marek Olšák	755a2907a3	radeonsi: implement bit-finding opcodes from ARB_gpu_shader5 Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	ca90cde81e	radeonsi: implement gl_SampleMaskIn Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	f9fd0c4a55	radeonsi: add support for SQRT Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	d73c1c1304	radeonsi: add support for FMA Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	dfea35666e	gallium/radeon: don't use LLVMReadOnlyAttribute for ALU None of the instructions use a pointer argument. (+ small cosmetic changes) Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-16 12:54:18 +01:00
Marek Olšák	216543ea54	gallium: add FMA and DFMA opcodes (v3) Needed by ARB_gpu_shader5. v2: select DMAD for FMA with double precision v3: add and select DFMA Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-16 12:54:18 +01:00
Rob Clark	e92bc6b38e	freedreno: update generated headers Fix a3xx texture layer-size. Signed-off-by: Rob Clark <robclark@freedesktop.org> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>	2015-03-15 18:00:19 -04:00
Rob Clark	d3fb949c03	freedreno/ir3: remove old compiler Now that piglit is no longer falling back to old compiler for any tests, we can remove it. Hurray \o/ Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-15 13:27:03 -04:00
Rob Clark	feb858b788	freedreno/ir3: avoid scheduler deadlock Deadlock can occur if we schedule an address register write, yet some instructions which depend on that address register value also depend on other unscheduled instructions that depend on a different address register value. To solve this, before scheduling an address register write, ensure that all the other dependencies of the instructions which consume this address register are already scheduled. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-15 13:26:56 -04:00
Rob Clark	7208e96bb8	freedreno/ir3: bit of cleanup Add an array_insert() macro to simplify inserting into dynamically sized arrays, add a comment, and remove unused prototype inherited from the original freedreno.git/fdre-a3xx test code, etc. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-15 13:26:44 -04:00
Ilia Mirkin	620e29b748	freedreno: fix slice pitch calculations For example if width were 65, the first slice would get 96 while the second would get 32. However the hardware appears to expect the second pitch to be 64, based on halving the 96 (and aligning up to 32). This fixes texelFetch piglit tests on a3xx below a certain size. Going higher they break again, but most likely due to unrelated reasons. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Reviewed-by: Rob Clark <robclark@freedesktop.org>	2015-03-13 16:05:16 -04:00
Ilia Mirkin	89b26d5a36	freedreno/a3xx: use the same layer size for all slices We only program in one layer size per texture, so that means that all levels must share one size. This makes the piglit test bin/texelFetch fs sampler2DArray have the same breakage as its non-array version instead of being completely off, and makes bin/ext_texture_array-gen-mipmap start passing. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Reviewed-by: Rob Clark <robclark@freedesktop.org>	2015-03-13 16:05:16 -04:00
Samuel Pitoiset	e5cd42ed9a	nvc0: fix wrong max value for driver queries The maximum value of a Gallium HUD's panel is automatically adjusted when the current value is greater than the max. If we set the pipe_query_driver_info::max_value to UINT64_MAX, the maximum value is never adjusted and this results in a flat line instead of a pretty curve which is correctly scaled. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-09 20:47:05 -04:00
Alexandre Demers	7a37d5c3a4	r600g: Use R600_MAX_VIEWPORTS instead of 16 Lets define R600_MAX_VIEWPORTS instead of using 16 here and there in the code when looping through viewports and scissors. It is easier to understand what this number represents. v2: Missed a case where R600_MAX_VIEWPORTS should have been used. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2015-03-09 23:02:05 +01:00
Marek Olšák	c939231e72	r300g: fix sRGB->sRGB blits Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>	2015-03-09 21:22:22 +01:00
Marek Olšák	9953586af2	r300g: fix a crash when resolving into an sRGB texture Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>	2015-03-09 21:03:49 +01:00
Marek Olšák	113601086d	r300g: use memset for clearing the shader key	2015-03-09 20:58:32 +01:00
Marek Olšák	4815c187b7	r300g: remove the broken SNORM->UNORM shader lowering pass Not used anymore.	2015-03-09 20:58:32 +01:00
Marek Olšák	74a757f92f	r300g: fix RGTC1 and LATC1 SNORM formats Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>	2015-03-09 20:58:32 +01:00
Stefan Dösinger	f710b99071	r300g: Fix the ATI1N swizzle (RGTC1 and LATC1) This fixes the GL_COMPRESSED_RED_RGTC1 part of piglit's rgtc-teximage-01 test as well as the precision part of Wine's 3dc format test (fd.o bug 89156). The Z component seems to contain a lower precision version of the result, probably a temporary value from the decompression computation. The Y and W component contain different data that depends on the input values as well, but I could not make sense of them (Not that I tried very hard). GL_COMPRESSED_SIGNED_RED_RGTC1 still seems to have precision problems in piglit, and both formats are affected by a compiler bug if they're sampled by the shader with a swizzle other than .xyzw. Wine uses .xxxx, which returns random garbage. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89156 Signed-off-by: Marek Olšák <marek.olsak@amd.com> Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>	2015-03-09 20:58:32 +01:00
Tom Stellard	51b43c559f	radeonsi: Add additional information to shader dumps This adds SGPR count, VGPR count, shader size, LDS size, and scratch usage to shader dumps. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-03-09 13:53:33 +00:00
Tom Stellard	bbfa1c3239	radeonsi/compute: Use value from compiler for COMPUTE_PGM_RSRC1.FLOAT_MODE Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-03-09 13:53:33 +00:00
Ilia Mirkin	cb3eb43ad6	freedreno/ir3: get the # of miplevels from getinfo This fixes ARB_texture_query_levels to actually return the desired value. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robclark@freedesktop.org> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>	2015-03-09 10:50:39 -04:00
Ilia Mirkin	8ac957a51c	freedreno/ir3: fix array count returned by TXQ Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robclark@freedesktop.org> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>	2015-03-09 10:50:39 -04:00
Ilia Mirkin	f3dfe6513c	freedreno: move fb state copy after checking for size change Fixes: `1f3ca56b` ("freedreno: use util_copy_framebuffer_state()") Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robclark@freedesktop.org> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>	2015-03-09 10:50:39 -04:00
Rob Clark	fd17db6fe5	freedreno: replace glsl130 debug flag with glsl120 Now that relative-dst works, we should never fall back to the old compiler. (Which is almost true, other than a couple edge case sched fails in piglit). So replace glsl130 flag to force GLSL 130 and integers on a3xx/a4xx with a glsl120 flag to force GLSL 120 and !integers. If this commit breaks any game/app/etc use FD_MESA_DEBUG=glsl120 as a workaround and please let me know. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-08 17:42:43 -04:00
Rob Clark	060d349920	freedreno/ir3: relative dst To simplify RA, assign arrays that are written to first. Since enough dependency information is in the graph to preserve order of reads and writes of array, so all SSA names for the array collapse into one, just assign the entire thing by array-id. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-08 17:42:43 -04:00
Rob Clark	b7703212d8	freedreno/ir3: split out array_fanin() helper We'll need this too for relative dst.. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-08 17:42:43 -04:00
Rob Clark	17754b70d7	freedreno/ir3: drop deref nodes The meta-deref instruction doesn't really do what we need for relative destination. Instead, since each instruction can reference at most a single address value, track the dependency on the address register via instr->address. This lets us express the dependency regardless of whether it is used for dst and/or src. The foreach_ssa_src{_n} iterator macros now also iterates the address register so, at least in SSA form, the address register behaves as an additional virtual src to the instruction. Which is pretty much what we want, as far as scheduling/etc. TODO: For now, the foreach_src{_n} iterators are unchanged. We could wrap the address in an ir3_register and make the foreach_src_{_n} iterators behave the same way. But that seems unnecessary at this point, since we mainly care about the address dependency when in SSA form. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-08 17:42:43 -04:00
Rob Clark	f8f7548f46	freedreno/ir3: helpful iterator macros I remembered that we are using c99.. which makes some sugary iterator macros easier. So introduce iterator macros to iterate all src registers and all SSA src instructions. The _n variants also return the src #, since there are a handful of places that need this. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-08 17:42:43 -04:00
Rob Clark	26b79ac3e4	freedreno/ir3: fix register usage calculations For cat1 instructions, use reg() as well for relative src, to ensure proper accounting of register usage. Also, for relative instructions, use reg->size rather than reg->wrmask to determine the number of components read/written. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-08 17:42:43 -04:00
Rob Clark	3ecc834e75	freedreno/ir3: couple tweaks for cmdline compiler Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-08 17:42:43 -04:00
Rob Clark	0f797f7b7d	freedreno/ir3: split up ssa_dst And a couple other trivial renames, to prepare for relative dst. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-08 17:42:43 -04:00
Rob Clark	27648efa20	freedreno/ir3: fix failed assert in grouping Turns out there are scenarios where we need to insert mov's in "front" of an input. Triggered by shaders like: VERT DCL IN[0] DCL IN[1] DCL OUT[0], POSITION DCL OUT[1], GENERIC[9] DCL SAMP[0] DCL TEMP[0], LOCAL 0: MOV TEMP[0].xy, IN[1].xyyy 1: MOV TEMP[0].w, IN[1].wwww 2: TXF TEMP[0], TEMP[0], SAMP[0], 1D_ARRAY 3: MOV OUT[1], TEMP[0] 4: MOV OUT[0], IN[0] 5: END Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-08 17:42:43 -04:00
Mark Janes	b28c037d64	r300g: Fix build, invalid extern "C" around header inclusion. A previous patch to fix header inclusion within extern "C" neglected to fix the occurences of this pattern in r300 files. When the helper to detect this issue was pushed to master, it broke the build for the r300 driver. This patch fixes the r300 build. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89477 Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-06 22:08:44 -05:00
Mark Janes	c4b91a1f5c	nouveau: Fix build, invalid extern "C" around header inclusion. A previous patch to fix header inclusion within extern "C" neglected to fix the occurences of this pattern in nouveau files. When the helper to detect this issue was pushed to master, it broke the build for the nouveau driver. This patch fixes the nouveau build. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89477 Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-06 22:08:11 -05:00
Ilia Mirkin	20346808cf	nv50,nvc0: remove bogus 64_FLOAT formats There is no HW support for these and the VBO pusher doesn't know about them. No need to, either, since the st will be lowering them to 2x32. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-06 22:06:05 -05:00
Chia-I Wu	bca6c8572f	ilo: clarify valid and preferred tilings We did it right until the switch to gen_surface_tiling, which has GEN8_TILING_W. Generally, GEN8_TILING_W may be valid but not preferred.	2015-03-07 04:32:39 +08:00
Chia-I Wu	bf061a3d2e	ilo: clean up Gen6 WAs Add a help function for each WA and make PIPE_CONTROL flags match the WA descriptions. Call gen6_wa_pre_pipe_contro() only before PIPE_CONTROLs. Fix missing gen6_wa_pre_3dstate_vs_toggle() in the rectlist path.	2015-03-07 02:17:54 +08:00
Chia-I Wu	ba5670fc50	ilo: add generic ilo_render_3dprimitive() It replaces gen[6-8]_3dprimitive().	2015-03-07 01:45:52 +08:00
Chia-I Wu	8b2eecfbf8	ilo: add generic ilo_render_pipe_control() It replaces gen[6-8]_pipe_control() and a direct gen6_PIPE_CONTROL() call in ilo_render_emit_flush().	2015-03-07 01:40:23 +08:00
Chia-I Wu	35b713ad75	ilo: fix padding of linear sampler views Should use the temporary variable in the loop instead of layout->bo_height.	2015-03-07 01:38:35 +08:00
Chia-I Wu	dda4823844	ilo: do not check for interleaved_samples interleaved_samples is only zero-initialized when layout_want_mcs() is called. We should not check for it. There is also no need to.	2015-03-07 01:38:35 +08:00
Chia-I Wu	ebad062e9a	ilo: enable L3 cache in MOCS This enables L3 cache in MOCS almost everywhere.	2015-03-06 04:50:19 +08:00
Chia-I Wu	c7d17f8a80	ilo: track if a ilo_view_surface is a scanout Scanouts require a different cache type.	2015-03-06 04:43:20 +08:00
Chia-I Wu	e7c74ef43d	ilo: clean up SURFACE_STATE and BINDING_TABLE_STATE Add ilo_builder_surface_pointer() to replace ilo_builder_surface_write(). Make Gen8+ take a different path in gen6_SURFACE_STATE().	2015-03-06 04:43:20 +08:00
Rob Clark	60096ed906	freedreno/ir3: fix silly typo for binning pass shaders Was resulting in gl_PointSize write being optimized out, causing particle system type shaders to hang if hw binning enabled. Fixes neverball, OGLES2ParticleSystem, etc. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-05 15:36:47 -05:00
Chia-I Wu	4ddd981e40	ilo: add more convenient intel_bo_{ref,unref}() They both check for NULL and intel_bo_ref() returns the referenced bo. They replace intel_bo_{reference,unreference}().	2015-03-06 02:25:03 +08:00
Chia-I Wu	70ef171e91	ilo: add intel_bo_set_tiling() Make intel_winsys_alloc_bo() always allocate a linear bo, and add intel_bo_set_tiling() to set the tiling. Document the purpose of tiling.	2015-03-06 02:25:03 +08:00
Chia-I Wu	0ac706535a	ilo: replace intel_tiling_mode by gen_surface_tiling The former is used by the kernel driver to set up fence registers and to pass tiling info across processes. It lacks INTEL_TILING_W, which made our code less expressive.	2015-03-06 02:25:03 +08:00
Chia-I Wu	eb32ac1956	ilo: update genhw headers The main change is non-inline <enum>s are now generated as C enums.	2015-03-06 02:25:03 +08:00
Mark Janes	237dcb4aa7	Fix invalid extern "C" around header inclusion. System headers may contain C++ declarations, which cannot be given C linkage. For this reason, include statements should never occur inside extern "C". This patch moves the C linkage statements to enclose only the declarations within a single header. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-05 10:21:40 -08:00
Chia-I Wu	b5eb6f769d	ilo: improve WA handling in rectlist path Add wrappers for 3DPRIMITIVE to make sure we clear current_pipe_control_dw1 and deferred_pipe_control_dw1 after it. Add missing gen7_wa_post_ps_and_later().	2015-03-04 15:28:05 -07:00
Chia-I Wu	1424bdd61b	ilo: clean up Gen7.5 WAs These WAs gen7_wa_post_3dstate_push_constant_alloc_ps() gen7_wa_pre_vs() gen7_wa_pre_3dstate_sf_depth_bias() first half of gen7_wa_pre_depth() gen7_wa_post_ps_and_later() are Gen7-specific. Update copy-and-pasted gen8_wa_pre_depth() also.	2015-03-04 15:28:05 -07:00
Chia-I Wu	68d2e395d9	ilo: add ILO_DEBUG=hang When set, detect and dump the hanging batch bufffer.	2015-03-05 04:52:49 +08:00
Chia-I Wu	af4cff5d6f	ilo: add some more winsys functions Add intel_winsys_get_reset_stats(), intel_winsys_import_userptr(), and intel_bo_map_async(). The latter two are stubs, but we are not going to use them immediately either.	2015-03-04 13:42:17 -07:00
Matt Turner	ade0b580e7	r300g: Check return value of snprintf(). Would have at least prevented the crash the previous patch fixed. Cc: 10.4, 10.5 <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=540970 Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-04 11:15:09 -08:00
Matt Turner	f5e2aa1324	r300g: Use PATH_MAX instead of limiting ourselves to 100 chars. When built with Gentoo's package manager, the Mesa source directory exists seven directories deep. The path to the .test file is too long and is silently truncated, leading to a crash. Just use PATH_MAX. Cc: 10.4, 10.5 <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=540970 Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-04 11:15:09 -08:00
Rob Clark	b709adf7cc	freedreno/ir3: fix old compiler after `f6b2e8af74` If first_driver_param is left as zero (calloc'd struct), the result is c0 getting clobbered. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-04 11:37:58 -05:00
Jose Fonseca	d0b1c74b73	svga: Set MSVC2013 compat flags. Reviewed-by: Brian Paul <brianp@vmware.com>	2015-03-04 15:12:19 +00:00
Jose Fonseca	2c25008e8e	softpipe,trace: Set MSVC 2008 compat flags. Although we don't deploy these, we need to use them for debugging. Reviewed-by: Brian Paul <brianp@vmware.com>	2015-03-04 15:12:17 +00:00
Jose Fonseca	00faf9f000	scons: Use -Werror MSVC compatibility flags per-directory. Matching what we already do with autotools builds. Reviewed-by: Brian Paul <brianp@vmware.com>	2015-03-04 15:12:06 +00:00
Rob Clark	8e67fd798e	freedreno/a4xx: re-enable int (conditional on glsl130) Re-enable integer, now that we can handle flat varyings. Still, ofc, conditional on FD_MESA_DEBUG=glsl130, until we can deprecate _old compiler.. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-03 10:41:00 -05:00
Rob Clark	e9f2abe349	freedreno/ir3: handle flat bypass for a4xx We may not need this for later a4xx patchlevels, but we do at least need this for patchlevel 0. Bypass bary.f for fetching varyings when flat shading is needed (rather than configure via cmdstream). This requires a special dummy bary.f w/ (ei) flag to signal to scheduler when all varyings are consumed. And requires shader variants based on rasterizer flatshade state to handle TGSI_INTERPOLATE_COLOR. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-03 10:41:00 -05:00
Rob Clark	9d732d3125	freedreno/ir3: add support for memory (cat6) instructions Scheduled basically the same as texture (cat5) instructions, using (sy) flag for synchronization. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-03 10:41:00 -05:00
Rob Clark	20b50a0712	freedreno/ir3: fix up cat6 instruction encodings I think there is at least one more sub-encoding, but these two should be enough to cover the common load/store instructions. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-03 10:41:00 -05:00
Rob Clark	583a8a8f65	freedreno/a3xx,a4xx: silence some warnings fd3_emit.c: In function ‘fd3_emit_vertex_bufs’: fd3_emit.c:377:11: warning: unused variable ‘semantic’ [-Wunused-variable] uint8_t semantic = sem2name(vp->inputs[i].semantic); and fd4_emit.c: In function ‘fd4_emit_vertex_bufs’: fd4_emit.c:304:11: warning: unused variable ‘semantic’ [-Wunused-variable] uint8_t semantic = sem2name(vp->inputs[i].semantic); Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-03-03 10:41:00 -05:00
Jose Fonseca	80c5bd7ef0	configure: Leverage gcc warn options to enable safe use of C99 features where possible. The main objective of this change is to enable Linux developers to use more of C99 throughout Mesa, with confidence that the portions that need to be built with MSVC -- and only those portions --, stay portable. This is achieved by using the appropriate -Werror= options only on the places they need to be used. Unfortunately we still need MSVC 2008 on a few portions of the code (namely llvmpipe and its dependencies). I hope to eventually eliminate this so that we can use C99 everywhere, but there are technical/logistic challenges (specifically, newer Windows SDKs no longer bundle MSVC, instead require a full installation of Visual Studio, and that has hindered adoption of newer MSVC versions on our build processes.) Thankfully we have more directy control over our OpenGL driver, which is why we're now able to migrate to MSVC 2013 for most of the tree. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-03-03 09:25:11 +00:00
Jose Fonseca	9a07435ff8	identity: Remove. It's unmaintained, and most likely broken: I use trace driver every now and then, and everytime I do I need to fix it up. It's also unused: identity_screen_create is never called. Above all, it's dead weight: if identity driver had the infrastructure for other pass-through drivers (like trace and rbug), then it would make sense on its own right. But as it is implemmented, it's just another driver to (forget) to update whenever there is a gallium interface change. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-03-02 14:12:46 +00:00
Kenneth Graunke	982723dfa2	Revert "configure: Leverage gcc warn options to enable safe use of C99 features where possible." This reverts commit `79daa510c7`. I apparently hadn't done a clean build when testing this; it broke the build for Tom, Ben, and myself. We like the idea; let's try a v2.	2015-02-27 16:13:10 -08:00
Tom Stellard	da85ab4b65	radeonsi/compute: Enable PIPE_SHADER_CAP_DOUBLES v2 v2: - Simplify ifdef Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-02-27 14:57:52 +00:00
Jose Fonseca	79daa510c7	configure: Leverage gcc warn options to enable safe use of C99 features where possible. The main objective of this change is to enable Linux developers to use more of C99 throughout Mesa, with confidence that the portions that need to be built with MSVC -- and only those portions --, stay portable. This is achieved by using the appropriate -Werror= options only on the places they need to be used. Unfortunately we still need MSVC 2008 on a few portions of the code (namely llvmpipe and its dependencies). I hope to eventually eliminate this so that we can use C99 everywhere, but there are technical/logistic challenges (specifically, newer Windows SDKs no longer bundle MSVC, instead require a full installation of Visual Studio, and that has hindered adoption of newer MSVC versions on our build processes.) Thankfully we have more directy control over our OpenGL driver, which is why we're now able to migrate to MSVC 2013 for most of the tree. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2015-02-27 14:30:36 +00:00
Vinson Lee	8170eba7e7	r300g/tests: Include stdio.h. Fix build error. CC compiler/tests/r300_compiler_tests-radeon_compiler_regalloc_tests.o compiler/tests/radeon_compiler_regalloc_tests.c: In function ‘test_runner_rc_regalloc’: compiler/tests/radeon_compiler_regalloc_tests.c:57:3: error: implicit declaration of function ‘fprintf’ [-Werror=implicit-function-declaration] fprintf(stderr, "Failed to load program\n"); ^ Signed-off-by: Vinson Lee <vlee@freedesktop.org>	2015-02-26 21:01:32 -08:00
Brian Paul	40cfa0c347	radeon/compiler: include stdio.h Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89343 Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-02-26 17:53:05 -07:00
Brian Paul	538e13d4a1	r300g: remove dependency on compiler.h It only needs typical stdio.h and stdlib.h functions. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-02-26 08:38:38 -07:00
Rob Clark	864340219b	freedreno: drop ARRAY_SIZE macro Since now ARRAY_SIZE has been added to util/macros.h. Fixes a bunch of: freedreno_util.h:79:0: warning: "ARRAY_SIZE" redefined #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) ^ In file included from ../../../../src/gallium/include/pipe/p_compiler.h:36:0, from ../../../../src/gallium/include/pipe/p_context.h:31, from freedreno_context.h:32, from freedreno_context.c:29: ../../../../src/util/macros.h:29:0: note: this is the location of the previous definition # define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x))) ^ Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-02-25 08:37:58 -05:00
Marek Olšák	1180e61a1b	r600g,radeonsi: fix streamout after pipeline stats have been used EVENT_TYPE_PIPELINESTAT_STOP disables streamout queries too. Luckily, pipeline stats are enabled by default, so we don't even have to emit EVENT_TYPE_PIPELINESTAT_START. Tested on Hawaii, Bonaire, Redwood, RV730. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-02-24 21:21:04 +01:00
Marek Olšák	fdf2c04737	radeonsi: small cleanup around current_rast_prim - remove the last parameter of si_emit_rasterizer_prim_state - remove the last unused parameter of si_emit_draw_registers - use current_rast_prim in si_emit_draw_registers Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-02-24 21:21:04 +01:00
Marek Olšák	0b1f31ab7f	radeonsi: set current_rast_prim in the right place Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-02-24 21:21:04 +01:00
Marek Olšák	4eb0ccf9e7	radeonsi: simplify obtaining a shader property in si_emit_clip_regs Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-02-24 21:21:04 +01:00
Marek Olšák	5349437154	radeonsi: only preload VertexID for the GS copy shader The copy shader doesn't use any other preloaded VGPRs. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-02-24 21:21:04 +01:00
Marek Olšák	ffd701e677	radeonsi: dump the shader key when dumping shaders Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-02-24 21:21:04 +01:00
Marek Olšák	93daf5a2f6	r600g,radeonsi: cleanup of hex literals 0x3F800000 -> fui(1.0) 0x00000000 -> 0 Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-02-24 21:21:04 +01:00
Marek Olšák	fa913a2dc6	radeonsi: set PA_SU_HARDWARE_SCREEN_OFFSET to 0 It was probably 0 already, but it doesn't hurt to set it. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-02-24 21:21:04 +01:00
Glenn Kennard	d80701df8a	r600g: Implement GL_ARB_draw_indirect for EG/CM Requires Evergreen/Cayman and radeon kernel module 2.41.0 or newer. Expected piglit fails due to hardware limitations: * arb_draw_indirect-draw-arrays-prim-restart Restarts not applied for DrawArrays commands * arb_draw_indirect-vertexid Base vertex offset is not included in vertex id Marek: bump vgt_state num_dw by 3 (= space needed for one register write) Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2015-02-24 21:21:04 +01:00
Rob Clark	dd70e78674	freedreno/a4xx: aniso filtering Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-02-24 14:23:38 -05:00
Rob Clark	c70097ae86	freedreno: update generated headers Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-02-24 14:23:38 -05:00
Rob Clark	daccbd27ce	freedreno/a4xx: add ARB_instanced_arrays support Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-02-24 14:23:38 -05:00
Rob Clark	e13398714c	freedreno/a4xx: handle index_bias (i.e. base_vertex) Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-02-24 14:23:38 -05:00
Rob Clark	283bb4848e	freedreno/a4xx: add support for vertexid and instanceid sysvals ir3 bits of it already in place from a3xx patch.. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-02-24 14:23:38 -05:00
Rob Clark	4aef0d79ee	freedreno/a4xx: pass number of instances to draw a4xx has it's own draw packet, so needs equivalent update to what a3xx already got. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-02-24 14:23:38 -05:00
Eric Anholt	49d3c6a8e6	vc4: Update to current kernel sources. New BO create and mmap ioctls are added. The submit ABI gains a flags argument, and the pointers are fixed at 64-bit. Shaders are now fixed at the start of their BOs.	2015-02-24 13:49:12 +00:00
Eric Anholt	1d1e820a6d	r600: Fix build after `984f306937` Same as for the CLAMP macro, undef it before including a header file that tries to make fields with that name.	2015-02-24 13:49:12 +00:00
Matt Turner	bfcdb84383	mesa: Use assert() instead of ASSERT wrapper. Acked-by: Eric Anholt <eric@anholt.net>	2015-02-23 10:49:47 -08:00
Marek Olšák	050bf75c8b	radeonsi: fix a warning caused by previous commit Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>	2015-02-23 11:45:00 +01:00
Marek Olšák	7820a11e3d	radeonsi: fix point sprites Broken by `a27b74819a`. This fix is critical and should be ported to stable ASAP. Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>	2015-02-23 11:40:55 +01:00
Rob Clark	51e335742e	freedreno/a4xx: set PC_PRIM_VTX_CNTL.VAROUT properly Fixes xonotic, some webgl stuff, and really pretty much anything with more than 4 varyings. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-02-21 17:11:02 -05:00
Rob Clark	fb1301e40a	freedreno: update generated headers Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-02-21 17:11:02 -05:00
Rob Clark	bdf023482a	freedreno/a4xx: bit of cleanup Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-02-21 17:11:02 -05:00
Rob Clark	e17437386c	freedreno: implement fence I never actually implemented the stubbed out fence stuff back in the early days. Fix that. We'll need a few libdrm_freedreno changes to handle timeout properly, so ignore that for now to avoid a libdrm_freedreno dependency bump. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-02-21 17:11:02 -05:00
Rob Clark	6855226653	freedreno/a2xx: fix increment in assert Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88883 Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-02-21 17:11:01 -05:00
Chia-I Wu	9fe81879c5	ilo: R32G32B32_FLOAT need no special care on Gen8+ Gen8+ must use VALIGN_4. Unlike prior Gens, R32G32B32_FLOAT should supposedly support VALIGN_4.	2015-02-21 11:33:54 +08:00
Chia-I Wu	226109436f	ilo: 128 BPP formats can use TiledY on Gen7.5+ The restriction is lifted.	2015-02-21 11:33:54 +08:00
Ilia Mirkin	f8e4792b22	nvc0: enable double support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-02-20 19:51:50 -05:00
Ilia Mirkin	5491458843	nvc0/ir: remove merge/split pairs to allow normal propagation to occur Because the TGSI interface creates merges for each instruction source and then splits them back out, there are a lot of unnecessary merge/split pairs which do essentially nothing. The various modifier/etc propagation doesn't know how to walk though those, so just remove them when they're unnecessary. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-02-20 19:51:50 -05:00
Ilia Mirkin	93812dc10a	nvc0/ir: add support for new TGSI double opcodes Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-02-20 19:51:43 -05:00
Ilia Mirkin	ef8f09be33	nvc0/ir: handle zero and negative sqrt arguments Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-02-20 19:30:28 -05:00
Ilia Mirkin	88127874a3	nvc0/ir: no instruction can load a double immediate Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-02-20 19:30:28 -05:00
Ilia Mirkin	b87b498b88	nvc0/ir: fix lowering of RSQ/RCP/SQRT/MOD to work with F64 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-02-20 19:30:28 -05:00
Ilia Mirkin	93ebe91bae	gm107/ir: fix F2F flipped stype/dtype flags Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-02-20 19:30:27 -05:00
Ilia Mirkin	dbf4a674b9	gm107/ir: fix DSET boolean float flag Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-02-20 19:30:27 -05:00
Ilia Mirkin	727018bb0c	gm107/ir: fix DMUL opcode encoding Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-02-20 19:30:27 -05:00
Ilia Mirkin	493ad88e1b	gk110/ir: add emission of dadd/dmul/dmad opcodes Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-02-20 19:30:27 -05:00
Ilia Mirkin	fd0b1a4cbf	nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-02-20 19:30:27 -05:00
Marek Olšák	f5ac5e20b1	gallium/radeon: fix an uninitialized-variable warning	2015-02-20 20:20:10 +01:00
Ilia Mirkin	c85a686d02	gallium: add new double-related shader caps to all the getters Missed a few drivers in the earlier changes, this should fix up all the ones that print unknown caps or don't have a default statement. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Dave Airlie <airlied@redhat.com>	2015-02-20 14:09:25 -05:00
Brian Paul	71b155a2cb	svga: add missing _DROUND,DFRACEXP_DLDEXP_SUPPORTED switch cases To silence unhandled switch case warnings.	2015-02-20 08:09:40 -07:00
Marek Olšák	7692704b14	radeonsi: don't use SQC_CACHES to flush ICACHE and KCACHE on SI This reverts `73c2b0d18c`. It doesn't seem to be reliable. It's probably missing a wait packet or something, because it's just a register write and doesn't wait for anything. SURFACE_SYNC at least seems to wait until the flush is done. Just guessing. Let's not complicate things and revert this. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88561 Cc: 10.5 <mesa-stable@lists.freedesktop.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2015-02-20 12:06:22 +01:00
Eric Anholt	85316d059c	vc4: Keep an array of pointers to instructions defining the temps around. The optimization passes are always regenerating it and throwing it away, but it's not hard to keep track of.	2015-02-19 23:35:17 -08:00
Eric Anholt	877b48a531	vc4: Move qir_uniform() and the constant-value versions to vc4_qir.c/h. I may want them in optimization passes, and they're not really particular to the program translation stage.	2015-02-19 23:35:17 -08:00
Eric Anholt	14dc281c13	vc4: Enforce one-uniform-per-instruction after optimization. This lets us more intelligently decide which uniform values should be put into temporaries, by choosing the most reused values to push to temps first. total uniforms in shared programs: 13457 -> 13433 (-0.18%) uniforms in affected programs: 1524 -> 1500 (-1.57%) total instructions in shared programs: 40198 -> 40019 (-0.45%) instructions in affected programs: 6027 -> 5848 (-2.97%) I noticed this opportunity because with the NIR work, some programs were happening to make different uniform copy propagation choices that significantly increased instruction counts.	2015-02-19 23:35:17 -08:00
Eric Anholt	09c844fcd9	vc4: Rename add_uniform() to qir_uniform().	2015-02-19 23:35:17 -08:00
Eric Anholt	96f6efc561	vc4: Shut up runtime warnings about new pipe caps.	2015-02-19 23:35:13 -08:00
Ilia Mirkin	5000a5f67b	nv50: add PIPELINE_STATISTICS query support, based on nvc0 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Tested-by: Nick Tenney <nick.tenney@gmail.com>	2015-02-19 23:12:35 -05:00
Ilia Mirkin	f883df74e0	svga: add missing : Fixes: `924ee3f408` ("gallium: add shader cap for dldexp/dfracexp support") Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-02-19 20:18:02 -05:00
Ilia Mirkin	924ee3f408	gallium: add shader cap for dldexp/dfracexp support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Dave Airlie <airlied@redhat.com>	2015-02-19 19:32:52 -05:00
Ilia Mirkin	899d779cb7	gallium: add a cap to enable double rounding opcodes Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Dave Airlie <airlied@redhat.com>	2015-02-19 19:32:49 -05:00
Ilia Mirkin	069dab7576	freedreno: add missing PIPE_CAP_RESOURCE_FROM_USER_MEMORY to switch Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-02-19 00:25:03 -05:00

... 5 6 7 8 9 ...

13958 Commits