KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Rob Clark	1b58d8c2bf	freedreno/a4xx: (partial) gl_FragCoord.zw The bit to enable .z is still commented out, as it is triggering gpu hangs in 0ad. But at least gl_FragCoord.w works now, and we know what bits we are supposed to set for .z (with that uncommented all piglit fragcoord tests are passing). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-22 13:20:28 -04:00
Rob Clark	a869183123	freedreno/a4xx: primitive-restart This was the missing bit to get dolphin-emu working on a4xx. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-22 13:20:28 -04:00
Rob Clark	632ea2a113	freedreno/nir: sysval fixes Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-22 13:20:28 -04:00
Rob Clark	13527df143	freedreno/a4xx: wire up integer texture sampling Similar to a3xx, the compiler needs to know the return type of the sam, etc, instructions. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-22 13:20:28 -04:00
Rob Clark	48a651e98c	freedreno/a4xx: formats updates/fixes Update formats table with new formats that Ilia has figured out, and fix sampling from srgb texture and integer vbo's. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-22 13:20:28 -04:00
Rob Clark	21ceedfd8b	freedreno: update generated headers Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-22 13:20:27 -04:00
Emil Velikov	9450bd56be	gallium/targets/d3dadapter9: drop the libdrm prefix for drm.h The path is provided by libdrm.pc and already used appropriately by the build system. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>	2015-04-22 16:03:01 +01:00
Brian Paul	02e93be55e	cso: minor comment fix	2015-04-22 08:58:05 -06:00
Chih-Wei Huang	b0e33c2256	android: fix the building rules for Android 5.0 Android 5.0 allows modules to generate source into $OUT/gen, which will then be copied into $OUT/obj and $OUT/obj_$(TARGET_2ND_ARCH) as necessary. Modules will need to change calls to local-intermediates-dir into local-generated-sources-dir. The patch changes local-intermediates-dir into local-generated-sources-dir. If the Android version is less than 5.0, fallback to local-intermediates-dir. The patch also fixes the 64-bit building issue of Android 5.0. v2 [Emil Velikov] - Keep the LOCAL_UNSTRIPPED_PATH variable. Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>	2015-04-22 15:53:35 +01:00
Emil Velikov	6fb8017866	android: add $(mesa_top)/src include to the whole of mesa Many parts of mesa already have the include with others depending on it but it's missing. Add it once at the top makefile and be done with it. Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>	2015-04-22 14:26:22 +01:00
Emil Velikov	86919352e3	android: use LOCAL_SHARED_LIBRARIES over TARGET_OUT_HEADERS ... to manage the LIBDRM*_CFLAGS. The former is the recommended approach by the Android build system developers while the latter has been depreciated for quite some time. Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>	2015-04-22 14:23:28 +01:00
Emil Velikov	413bc0a618	ilo: remove unused include from Android.mk Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>	2015-04-22 14:18:47 +01:00
EdB	c1485f4b7d	clover: remove pre llvm 3.5.0 compatibility code Acked-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-04-20 18:11:04 +00:00
Nick Sarnie	645f77fe50	gallivm: Fix build against LLVM 3.7 SVN r235265 LLVM removed JITEmitDebugInfo from TargetOptions since they weren't used v2: Be consistent with the LLVM version check (Aaron Watry) Signed-off-by: Nick Sarnie <commendsarnex@gmail.com> Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>	2015-04-20 13:34:45 +09:00
Ilia Mirkin	b2e871bd48	indices: fix provoking vertex for quads/quadstrips This allows drivers to provide consistent flat shading for quads. Otherwise a driver that only supported tris would have to force last provoking vertex when drawing quads (and would have to say that quads don't follow the provoking vertex convention). Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robclark@freedesktop.org>	2015-04-18 18:27:22 -04:00
Ilia Mirkin	1cdb01d716	primconvert: select pv convention only from flatshade_first This should match to how drivers program hardware. flatshade relates to whether color inputs are interpolated, not the provoking vertex convention. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robclark@freedesktop.org>	2015-04-18 18:27:09 -04:00
Ilia Mirkin	0904774af1	freedreno/a3xx: enable polymode setting with non-fill modes Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-18 17:35:23 -04:00
Ilia Mirkin	6357601628	freedreno/a3xx: fix integer and 32-bit float border colors Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-18 17:35:23 -04:00
Ilia Mirkin	6895c3554e	freedreno/a3xx: add support for float R/RG render targets Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-18 17:35:23 -04:00
Tobias Nygren	cfab4ea9c6	adjust a couple of ifdefs to handle NetBSD correctly Acked-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Tobias Nygren <tnn@NetBSD.org>	2015-04-17 12:04:48 -07:00
Rob Clark	95e68adcd9	freedreno/ir3/nir: few little fixes isaml needs to scale up coords based on LoD. Also fix bogus bary.f varying # when there are non-bary frag shader inputs. And use sub.s of a positive immediate rather than add.s of negative (since CP is better about figuring out that those can be collapsed into the cat2 instr). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-17 11:40:14 -04:00
Rob Clark	efbf14e893	freedreno/ir3/nir: lower if/else For now, completely flatten if/else blocks. That will almost certainly change once we have flow control. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-17 11:40:14 -04:00
Rob Clark	e5e11b5baf	freedreno/a4xx: support for large shaders Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-17 10:40:50 -04:00
Rob Clark	20ea698c49	freedreno: update generated headers Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-17 10:40:44 -04:00
Rob Clark	57f0d3b3c6	freedreno/ir3/nir: UBO support Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-17 10:40:36 -04:00
Rob Clark	87807e5cc5	freedreno/ir3: move out helper We'll also want it in NIR f/e for implementing UBO support. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-17 10:40:28 -04:00
Rob Clark	70b2f872ea	freedreno/a4xx: sysvals and UBOs Basically just sync up the cmdstream emit parts to match the changes already done on a3xx. Also, fix scheduling for mem instructions. This is needed on a4xx, and I am a bit surprised it isn't needed for a3xx. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-17 10:40:18 -04:00
Rob Clark	7a9063e7c7	gallium/ttn: fix TXF There is a level param stashed away in the .w component of the first src. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-17 10:34:15 -04:00
Rob Clark	ef7c4f39bf	gallium/ttn: add UBO support v2: move ishl into ttn (instead of driver backend) to keep the units consistent between immediate and indirect offsets Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-17 10:34:15 -04:00
Rob Clark	8efe20467b	gallium/ttn: minor cleanup v2: also use ttn_src_for_indirect() everywhere for addr access, rather than open-coding it for INPUT/CONST srcs v3: move ralloc out of ttn_src_for_indirect() into the one call site that needs a ptr Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-17 10:34:15 -04:00
Rob Clark	a3cce7a38e	gallium/ttn: add support for TXL2 Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-17 10:34:15 -04:00
Rob Clark	f44d836d7a	gallium/ttn: add support for texture offsets Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-17 10:34:14 -04:00
Jose Fonseca	8638e3ae1b	libgl-gdi: Prevent "pure virtual method called" error when. When running piglit w/ llvmpipe on Windows several tests terminate abnormally just when the test exits. The problem was that LLVMContextDispose was being called after LLVM global destructors. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2015-04-16 20:37:34 +01:00
Marek Olšák	b79c620663	radeonsi: add a debug option to compile shaders when they're created Tested-by: Tom Stellard <thomas.stellard@amd.com>	2015-04-16 18:36:29 +02:00
Emil Velikov	a7d018accf	radeonsi: remove bogus r600-- triple As mentioned by Michel Dänzer for LLVM >= 3.6 we create the LLVMTargetMachine (with triple amdgcn--), as we setup the radeonsi context. For older LLVM or hardware (r600) the triple is always r600-- and is created at a later stage - radeon_llvm_compile() Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2015-04-16 14:15:19 +01:00
Glenn Kennard	17d69862a9	r600g/sb: Skip empty ALU clause while scheduling Fixes assert triggered by ext_transform_feedback-intervening-read output use_gs piglit test. Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-16 12:43:20 +10:00
Eric Anholt	b229e6c7de	vc4: Don't try to use color load/stores to blit across format changes. We could potentially support the right combination of 8888 to 565, but the important thing for now is to not mix up our orderings of 8888. Fixes fbo-copyteximage regressions.	2015-04-15 16:50:23 -07:00
Eric Anholt	cff2e08c4c	vc4: Don't try to use color load/stores to do depth/stencil blits. Fixes regressions in fbo-generatemipmap-formats on depth/stencil (which does blits to work around baselevel/lastlevel).	2015-04-15 16:50:23 -07:00
Eric Anholt	3a728d4dfb	vc4: Update the shadow texture for public textures on every draw. We don't know who else has written to it, so we'd better update it every time. This makes the gears spin in X again.	2015-04-15 16:50:23 -07:00
Eric Anholt	bd957b1b79	vc4: Hook up VC4_DEBUG=perf to some useful printfs.	2015-04-15 16:50:22 -07:00
Brian Paul	11bfee4c3a	tgsi: also dump label for TGSI_OPCODE_BGNSUB opcode So we can see the label associated with subroutines. Reviewed-by: José Fonseca <jfonseca@vmware.com>	2015-04-15 16:30:49 -06:00
Jose Fonseca	1aa50339d8	st/wgl: Couple of fixes to opengl32.dll's wglCreateContext/wglDeleteContext dispatch. - Use GetModuleHandle instead of LoadLibrary to avoid incrementing the opengl32.dll reference count (otherwise the opengl32.dll will linger in memory forever.) - Ensure we use our fake wglCreateContext/wglDeleteContext when using Mesa as a drop-in replacement for opengl32.dll Untested. Just noticed by accident. Reviewed-by: Brian Paul <brianp@vmware.com>	2015-04-15 09:58:38 +01:00
Tom Stellard	e0994e0f97	radeon/llvm: Improve codegen for KILL_IF Rather than emitting one kill instruction per component of KILL_IF's src reg, we now or the components of the src register together and use the result as a condition for just one kill instruction. shader-db stats (bonaire): 979 shaders Totals: SGPRS: 34872 -> 34848 (-0.07 %) VGPRS: 20696 -> 20676 (-0.10 %) Code Size: 749032 -> 748452 (-0.08 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 12288 -> 12288 (0.00 %) bytes per wave Totals from affected shaders: SGPRS: 1184 -> 1160 (-2.03 %) VGPRS: 600 -> 580 (-3.33 %) Code Size: 13200 -> 12620 (-4.39 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Increases: SGPRS: 2 (0.00 %) VGPRS: 0 (0.00 %) Code Size: 0 (0.00 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Decreases: SGPRS: 5 (0.01 %) VGPRS: 5 (0.01 %) Code Size: 25 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) * BY PERCENTAGE * Max Increase: SGPRS: 32 -> 40 (25.00 %) VGPRS: 0 -> 0 (0.00 %) Code Size: 0 -> 0 (0.00 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 32 -> 24 (-25.00 %) VGPRS: 16 -> 12 (-25.00 %) Code Size: 116 -> 96 (-17.24 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave * BY UNIT * Max Increase: SGPRS: 64 -> 72 (12.50 %) VGPRS: 0 -> 0 (0.00 %) Code Size: 0 -> 0 (0.00 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 32 -> 24 (-25.00 %) VGPRS: 16 -> 12 (-25.00 %) Code Size: 424 -> 356 (-16.04 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-04-14 13:37:12 +00:00
Tom Stellard	c6d79ed289	radeon/llvm: Run LLVM's instruction combining pass This should improve code quality in general and will help with some future changes to how we emit kill instructions. shader-db shows a few regressions, but these don't seem to be the result of deficiencies in instcombine. They're mostly caused by the scheduler making different decisions than before. shader-db stats (bonaire): 979 shaders Totals: SGPRS: 35056 -> 34872 (-0.52 %) VGPRS: 20624 -> 20696 (0.35 %) Code Size: 764372 -> 749032 (-2.01 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 12288 -> 12288 (0.00 %) bytes per wave Totals from affected shaders: SGPRS: 13264 -> 13072 (-1.45 %) VGPRS: 8248 -> 8316 (0.82 %) Code Size: 486320 -> 470992 (-3.15 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 11264 -> 11264 (0.00 %) bytes per wave Increases: SGPRS: 6 (0.01 %) VGPRS: 20 (0.02 %) Code Size: 14 (0.01 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Decreases: SGPRS: 32 (0.03 %) VGPRS: 8 (0.01 %) Code Size: 244 (0.25 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) * BY PERCENTAGE * Max Increase: SGPRS: 32 -> 48 (50.00 %) VGPRS: 12 -> 20 (66.67 %) Code Size: 216 -> 224 (3.70 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 40 -> 32 (-20.00 %) VGPRS: 16 -> 12 (-25.00 %) Code Size: 368 -> 280 (-23.91 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave * BY UNIT * Max Increase: SGPRS: 32 -> 48 (50.00 %) VGPRS: 28 -> 36 (28.57 %) Code Size: 39320 -> 40132 (2.07 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 72 -> 64 (-11.11 %) VGPRS: 48 -> 40 (-16.67 %) Code Size: 6272 -> 5852 (-6.70 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-04-14 13:37:05 +00:00
Tom Stellard	2569c7109d	radeonsi: Add header and footer to shader stat dump This makes it easier to parse. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-04-14 13:36:59 +00:00
Eric Anholt	1be329e64c	vc4: Add a blitter path using just the render thread. This accelerates the path for generating the shadow tiled texture when asked to sample from a raster texture (typical in glamor).	2015-04-13 23:20:46 -07:00
Eric Anholt	76d56752cc	vc4: Allow submitting jobs with no bin CL in validation. For blitting, we want to fire off an RCL-only job. This takes a bit of tweaking in our validation and the simulator support (and corresponding new code in the kernel).	2015-04-13 23:20:45 -07:00
Eric Anholt	43b20795b7	vc4: Move the blit code to a separate file. There will be other blit code showing up, and it seems like the place you'd look.	2015-04-13 23:20:45 -07:00
Eric Anholt	e214a59635	vc4: Separate out a bit of code for submitting jobs to the kernel. I want to be able to have multiple jobs being set up at the same time (for example, a render job to do a little fixup blit in the course of doing a render to the main FBO).	2015-04-13 23:20:45 -07:00
Eric Anholt	44b63cf5c0	vc4: When asked to sample from a raster texture, make a shadow tiled copy. So, it turns out my simulator doesn't quite match the hardware. And the errata about raster textures tells you most of what's wrong, but there's still stuff wrong after that. Instead, if we're asked to sample from raster, we'll just blit it to a tiled temporary. Raster textures should only be screen scanout, and word is that it's faster to copy to tiled using the tiling engine first than to texture from an entire raster texture, anyway.	2015-04-13 22:34:06 -07:00
Eric Anholt	d04b07f8e2	vc4: Fix off-by-one in branch target validation.	2015-04-13 22:34:06 -07:00
Eric Anholt	7fa2f2e366	vc4: Use NIR-level lowering for idiv. This fixes the idiv tests in piglit.	2015-04-13 21:36:40 -07:00
Eric Anholt	84ebaff1b7	vc4: Add a bunch of type conversions. These are required to get piglit's idiv tests working. The unsigned<->float conversions are wrong, but are good enough to get piglit's small ranges of values working.	2015-04-13 21:36:40 -07:00
Eric Anholt	adae027260	vc4: Use the blit interface for updating shadow textures. This lets us plug in a better blit implementation and have it impact the shadow update, too.	2015-04-13 10:39:24 -07:00
Eric Anholt	39b6f7e76c	vc4: Remove dead fields from vc4_surface.	2015-04-13 10:39:24 -07:00
Eric Anholt	5100221ff7	vc4: Skip sending down the clear colors if not clearing.	2015-04-13 10:39:24 -07:00
Eric Anholt	725620f21d	vc4: Sync with kernel changes to relax BCL versus RCL validation. There was no reason to tie the two packets' values together.	2015-04-13 10:39:23 -07:00
Eric Anholt	cb88d2cfcb	vc4: Fix another space allocation mistake. We're over-allocating our BCL in vc4_draw.c, so this never mattered. However, new RCL-only blit support might end up here without having set up any BCL contents.	2015-04-13 10:39:02 -07:00
Eric Anholt	8eb9304ee7	vc4: Add missed accounting for the size of the semaphore. This wouldn't have mattered except in the worst case scenario RCL setup.	2015-04-13 10:33:30 -07:00
Rob Clark	b98c0262d1	freedreno/ir3/nir: couple little fixes Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:41:03 -04:00
Rob Clark	1b936bb9f8	freedreno/ir3/nir: handle system values Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:57 -04:00
Rob Clark	715b2e0dbb	freedreno/ir3/nir: handle txs and query_levels tex ops These correspond to the tgsi TXQ opcode (plus sneak in a fix for two-sided color) Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:43 -04:00
Rob Clark	97e8fc3fdd	freedreno/ir3/nir: split out tex helpers We'll need these in one or two other spots. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:36 -04:00
Rob Clark	6e8160d6e3	freedreno/ir3/nir: simplify emit_tex() Just build up arrays for src0/src1, and use create_collect().. Also add back missing .3d flag for 3d/cube textures. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:28 -04:00
Rob Clark	d5357c16cc	freedreno/ir3/cp: handle indirect properly I noticed some cases where we where trying to copy-propagate indirect src's into places they cannot go, like 2nd src for cat3 (mad, etc). Expand out valid_flags() to be aware of relativ flag, and fix up a few related spots. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:21 -04:00
Rob Clark	49be76166b	freedreno/ir3/sched: avoid getting stuck on addr conflicts When we get in a scenario where we cannot schedule any more instructions due to address register conflict, clone the instruction that writes the address register, and switch the remaining unscheduled users for the current address register over to the new clone. This is simpler and more robust than the previous attempt (which tried and sometimes failed to ensure all other dependencies of users of the address register were scheduled first).. hint it would try to schedule instructions that were not actually needed for any output value. We probably need to do the same with predicate register, although so far it isn't so heavily used so we aren't running into problems with it (yet). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:15 -04:00
Rob Clark	4cf4006674	freedreno/ir3/nir: add variable-indexing support A bit fugly.. try and make this cleaner.. note if we hoist all the get_addr() out of the loop we can drop the hashtable and just use create_addr().. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:09 -04:00
Rob Clark	972ce757d7	freedreno/ir3/asm: change assert to warning It probably should be an assert, but for now TGSI f/e isn't very good about dealing w/ CONST vs ABS/NEG. So for debug builds, print a warning instead of crashing with an assert for now. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:40:03 -04:00
Rob Clark	09cbd97a47	freedreno/ir3/nir: set first_driver_param Without this, a3xx breaks.. a4xx would too if it had already implemented support for passing driver params. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:39:56 -04:00
Rob Clark	f0e9a632a1	freedreno/ir3/cp: support to swap mad src's For a normal MAD (ie. not MADSH), if first source is gpr and second source is const, we can swap the first two sources to avoid needing a mov instruction. This gives back the biggest advantage TGSI f/e had over NIR f/e for common shaders, since TGSI f/e had this logic in the f/e. Note that doing this in copy-prop step has the advantage that it will also work for cases like: MOV TEMP[b], CONST[x] MAD TEMP[d], TEMP[a], TEMP[b], TEMP[c] Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 11:39:46 -04:00
Rob Clark	fd65122a90	gallium/ttn: add support for system values So far just the system values that freedreno supports, so we may add more later. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-11 10:43:16 -04:00
Rob Clark	2faa878f13	gallium/ttn: fix TXD With TXD we also have the ddx/ddy sources (before the sampler). Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-11 10:43:16 -04:00
Rob Clark	ca3ae90490	gallium/ttn: add TXQ support (v2) Split out from ttn_tex() since it is kind of a weird instruction that maps to two NIR opcodes, and it was cleaner this way. v2: query_levels doesn't take any args Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-11 10:43:15 -04:00
Rob Clark	0b71451920	gallium/ttn: split out helper to get texture info We'll need this as well for TXQ. Split this out first to reduce noise in the next patch. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-11 10:43:15 -04:00
Rob Clark	96c0f9328d	gallium/ttn: add support for temp arrays Since the rest of NIR really would rather have these as variables rather than registers, create a nir_variable per array. But rather than completely re-arrange ttn to be variable based rather than register based, keep the registers. In the cases where there is a matching var for the reg, ttn_emit_instruction will append the appropriate intrinsic to get things back from the shadow reg into the variable. NOTE: this doesn't quite handle TEMP[ADDR[]] when the DCL doesn't give an array id. But those just kinda suck, and should really go away. AFAICT we don't get those from glsl. Might be an issue for some other state tracker. v2: rework to use load_var/store_var with deref chains v3: create new "burner" reg for temporarily holding the (potentially writemask'd) dest after each instruction; add load_var to initialize temporary dest in case not all components are overwritten v4: review comments: asserts and use ttn_src_for_indirect() in ttn_array_deref() so we can drop later patch converting to use vec1 for addr reg (since ttn_src_for_indirect() handles the imov to vec1 from tgsi addr component that we want) v5: rebase: new requirements about parent mem ctx for derefs Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-11 10:41:45 -04:00
Rob Clark	b91d987140	gallium/ttn: minor cleanup Extract tgsi_dst->Index into a local.. split out from 'gallium/ttn: add support for temp arrays' for noise reduction.. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-11 10:24:50 -04:00
Nick Sarnie	f9048ee3c8	gallivm: Fix build since llvm-3.7.0svn r234495 Revert `50e9fa2ed6` as LLVM reverted their change. Signed-off-by: Nick Sarnie <commendsarnex@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>	2015-04-10 13:30:23 -04:00
Vinson Lee	50e9fa2ed6	gallivm: Fix build since llvm-3.7.0svn r234460. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89963 Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-04-09 10:41:26 -07:00
Roland Scheidegger	a873b79fa5	draw: (trivial) don't print the shader twice with GALLIVM_DEBUG=tgsi (or ir) Neither the shader nor the key change when doing elts or linear variant, so this was just annoying (probably mildly useful at some point when we printed the IR per function too). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-04-09 01:32:30 +02:00
Roland Scheidegger	586536a4e1	gallivm: don't use control flow when doing indirect constant buffer lookups llvm goes crazy when doing that, using way more memory and time, though there's probably more to it - this points to a very much similar issue as fixed in `8a9f5ecdb1`. In any case I've seen a quite plain looking vertex shader with just ~50 simple tgsi instructions (but with a dozen or so such indirect constant buffer lookups) go from a terribly high ~440ms compile time (consuming 25MB of memory in the process) down to a still awful ~230ms and 13MB with this fix (with llvm 3.3), so there's still obvious improvements possible (but I have no clue why it's so slow...). The resulting shader is most likely also faster (certainly seemed so though I don't have any hard numbers as it may have been influenced by compile times) since generally fetching constants outside the buffer range is most likely an app error (that is we expect all indices to be valid). It is possible this fixes some mysterious vertex shader slowdowns we've seen ever since we are conforming to newer apis at least partially (the main draw loop also has similar looking conditionals which we probably could do without - if not for the fetch at least for the additional elts condition.) v2: use static vars for the fake bufs, minor code cleanups Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-04-09 01:32:30 +02:00
Dave Airlie	6b722c390b	u_tile: fix warnings about incompatible casts. Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-08 10:31:42 +10:00
Glenn Kennard	f2947807c8	r600g/sb: Enable SB for geometry shaders Add SV_GEOMETRY_EMIT special variable type to track the implicit dependencies between CUT/EMIT_VERTEX/MEM_RING instructions so GCM/scheduler doesn't reorder them. Mark emit instructions as unkillable so DCE doesn't eat them. Enable only for evergreen/cayman as there are a few unexplained GS piglit regressions on R6xx/R7xx with SB enabled otherwise. Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-08 08:18:35 +10:00
Glenn Kennard	06bb68da4a	r600g/sb: Update last_cf for loops CF_END could end up emitted in the middle of a shader on cayman when there was a loop at the very end. Fixes glsl-1.50-geometry-end-primitive and ext_transform_feedback-geometry-shaders-basic piglit tests. Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-08 08:18:17 +10:00
Dave Airlie	61393bdcdc	u_tile: fix stencil texturing tests under softpipe arb_stencil_texturing-draw failed under softpipe because we got a float back from the texturing function, and then tried to U2F it, stencil texturing returns ints, so we should fix the tiling to retrieve the stencil values as integers not floats. Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-08 08:17:32 +10:00
Ilia Mirkin	ae720c66cb	nv50,nvc0: limit the y-tiling of 3d textures to the first level's tiling We limit y-tiling to 0x20 when depth is involved. However the function is run for each miplevel, and the hardware expects miplevel 0 to have the highest tiling settings. Perform the y-tiling limit on all levels of a 3d texture, not just the ones that have depth. Fixes: texelFetch fs sampler3D 98x129x1-98x129x9 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Tested-by: Nick Tenney <nick.tenney@gmail.com> # GT216 Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>	2015-04-06 23:06:55 -04:00
Dave Airlie	ad84689f73	r600g: fix op3 abs issue This code to handle absolute values on op3 srcs was a bit too simple, it really needs a temp reg per src, not one per channel, make it easier and let sb clean up the mess. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89831 Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-07 11:40:16 +10:00
Rob Clark	8b0b81339b	freedreno/ir3: add NIR compiler The NIR compiler frontend is an alternative to the TGSI f/e, producing the same ir3 IR and using the same backend passes for scheduling, etc. It is not enabled by default yet, as there are still some regressions. To enable, use 'FD_MESA_DEBUG=nir'. It is enough to use with, for example, xonotic or supertuxkart. With the NIR f/e, scalarizing and a number of other lowering steps happen in NIR, so we don't have to do them in ir3. Which simplifies the f/e and allows the lowered instructions to pass through other optimization stages. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 16:36:40 -04:00
Ilia Mirkin	700d949ea1	freedreno/a3xx: don't decode srgb on mem2gmem Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	b060b56772	freedreno/a3xx: pass sprite coord mode through to program emit Use the correct sprite replacement depending on the flip of the coord mode, using either T or 1-T depending on whether we have an upper-left or lower-left coordinate origin. This fixes all the point sprite piglits. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	1de72dfc8a	freedreno/a3xx: add UBO support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	c7811f56c2	freedreno/ir3: insert nop between sfu/mem operations Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	14dfd8cc43	freedreno: dirty context when reallocating a bound bo Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	bde2045fa2	freedreno: keep track of buffer valid ranges Copies nouveau_buffer and radeon_buffer. This allows a write to proceed to an uninitialized part of a buffer even when the GPU is using the previously-initialized portions. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:35 -04:00
Ilia Mirkin	dacf22e0a3	freedreno: mark resources as being read so that writes flush the queue Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:34 -04:00
Ilia Mirkin	2e1445c8f3	freedreno: don't bother setting resource timestamps Waiting on a bo being ready is handled in fd_bo_cpu_prep. No need to keep separate timestamps around. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:34 -04:00
Ilia Mirkin	1fee3061d5	freedreno: add a reading flag to indicate gpu is reading rsc Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:34 -04:00
Ilia Mirkin	ea0952a9db	freedreno: fix resource flushing confusion A resource flush is an upload of a hypothetically-staging texture to the GPU. For a UMA system, this will largely be a no-op or cache-maintenance. Move the render flush logic into transfer_map where it belongs, and clear out the transfer_flush function. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:34 -04:00
Ilia Mirkin	bfb0a8eb69	freedreno: remove tex_resource pipe_sampler_view already contains a texture, remove the redundant tex_resource member which pointed at the same thing. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-05 16:36:34 -04:00
Rob Clark	6cd9c94ce4	freedreno/ir3: handle FRAG IN's without interpolation specified Fallback to picking based on semantic name. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 16:36:34 -04:00
Rob Clark	f513f006ce	freedreno/ir3/cmdline: add @const headers for immediates Since NIR f/e currently encodes immediates in instructions (rather than passing via const), we need to ensure that when const's are used the get initialized to the proper values. Otherwise comparing NIR to TGSI compiler, it will use proper immediate values in one case, and randomly initialize values in the other. Which confuses ir3test. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 16:36:34 -04:00
Rob Clark	6bc12bb5fd	freedreno/ir3/cmdline: remove hack for old compiler Since we dropped the old compiler, we don't need this hack anymore. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 16:36:34 -04:00
Rob Clark	f370e95421	freedreno/ir3: handle const/immed/abs/neg in cp Be smarter about propagating copies from const or immed, or with abs/neg modifiers. Also, realize that absneg.s and absneg.f are really "fancy" mov instructions. This opens up the possibility to remove more copies. It helps the TGSI frontend a bit, but will be really needed for the NIR f/e which builds everything up in SSA form (ie. will always insert a mov from const or immediate). Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 16:36:34 -04:00
Rob Clark	104713d9f2	freedreno/ir3: split float/int abs/neg Even though in the end, they map to the same bits, the backend will need to be able to differentiate float abs/neg vs integer abs/neg. Rather than making the backend figure it out based on instruction opcode (which when combined with mov/absneg instructions, can be awkward), just split out different flags for each so the frontend can signal it's intentions more clearly. Also, since (neg) for bitwise op's is actually a bitwise- not, split it out into bnot flag. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 12:44:01 -04:00
Rob Clark	203f37540a	freedreno/ir3: add ir3 builder helpers Add helpers for constructing SSA forms of instructions. Only partial cat5/cat6 coverage.. but we can add stuff as needed. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 12:44:01 -04:00
Rob Clark	b1c9fb9fca	freedreno/ir3: fix sam argument order comment Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 12:44:01 -04:00
Rob Clark	101142c401	xa: support for drivers which use NIR We need to pull in libnir.la and it's dependency libglsl_util.la. Also, _mesa_error_no_memory() must be defined. Fortunately with libnir.la (vs pulling in all of libglsl.la) we don't also need libstdc++. Signed-off-by: Rob Clark <robclark@freedesktop.org>	2015-04-05 09:24:17 -04:00
Rob Clark	52282fa42d	gallium/ttn: MOD is an integer instruction Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net	2015-04-05 09:24:17 -04:00
Rob Clark	7579ae422a	gallium/ttn: add UMAD Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2015-04-05 09:24:17 -04:00
Ilia Mirkin	ba353935a3	nv50: allocate more offset space for occlusion queries Commit `1a170980a0` started writing to q->data[4]/[5] but kept the per-query space at 16, which meant that in some cases we would write past the end of the buffer. Rotate by 32, like nvc0 does. This ensures that we always have 32 bytes in front of us, and the data writes will go within the allocated space. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89679 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Tested-by: Nick Tenney <nick.tenney@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>	2015-04-04 11:30:03 -04:00
Ilia Mirkin	01d3b750b3	nv50/ir: avoid folding immediates into imad operations Commit `09ee907266` added logic to fold immediates into mad operations, but the emission code is only there for fmad. Only allow it on float types. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 18:42:31 -04:00
Ilia Mirkin	603d28f32c	nv50/ir: fix imad emission when dst == src2 Commit `fb63df2215` added 4-byte mad support, but only supported emission for floats. Disable it for ints for now. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 18:35:59 -04:00
Eric Anholt	a9152376b4	vc4: Add support for nir_iabs. Tested using the GLSL 1.30 tests for integer abs(). Not currently used, but it was one of the new opcodes used by robclark's idiv lowering.	2015-04-02 10:32:35 -07:00
Ilia Mirkin	4a3c0e9950	freedreno/a3xx: add MRT support The hardware only supports 4 MRTs. It should be possible to emulate support for 8, but doesn't seem worth the trouble. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	6f4c1976f4	freedreno: convert blit program to array for each number of rts Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	d9992ab35a	freedreno: add support for laying out MRTs in gmem Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	602bc6c88d	freedreno: add core infrastructure support for MRTs Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	d13803c76f	freedreno/ir3: add support for FS_COLOR0_WRITES_ALL_CBUFS property This will enable the driver to tell which regids to link up to which MRT outputs. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	f27ec59084	freedreno/a3xx: add independent blend function support This is needed for MRT support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Ilia Mirkin	8efa3e340d	freedreno: remove alpha key from ir3_shader This complication is unnecessary and makes MRTs more complicated and likely to generate tons of variants. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-04-02 00:09:14 -04:00
Stéphane Marchesin	70eed78cac	i915g: Implement EGL_EXT_image_dma_buf_import This adds all the plumbing to get EGL_EXT_image_dma_buf_import in i915g. Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>	2015-04-01 20:13:37 -07:00
Emil Velikov	d99135b2e9	configure: nuke --with-max-{width,height} Unused as of commit 630ab0d27ba(mesa: remove last of MAX_WIDTH, MAX_HEIGHT). Update all the remaining references to the defines. v2: Use the correct variable name in the comments Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2015-04-01 19:43:34 +00:00
Emil Velikov	bd4925c6ac	gallium: ship tgsi_to_nir.h in the tarball Acked-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>	2015-04-01 19:33:37 +00:00
Jose Fonseca	3321724c10	automake,scons: Put NIR source files in a separate var to fix SCons build. SCons does not build NIR yet. Trivial.	2015-04-01 19:49:09 +01:00
Jose Fonseca	7f0682cebf	automake: Fix out-of-source builds. Add include path for generated nir_opcodes.h. Trivial.	2015-04-01 19:48:09 +01:00
Eric Anholt	26261bca21	vc4: Add shader-db dumping of NIR instruction count. I was previously using temporary disables of VC4 optimization to show the benefits of improved NIR optimization, but this can get me quick and dirty numbers for NIR-only improvements without having to add hacks to disable VC4's code (disabling of which might hide ways that the NIR changes would hurt actual VC4 codegen).	2015-04-01 10:57:01 -07:00
Eric Anholt	73e2d4837d	vc4: Convert to consuming NIR. NIR brings us better optimization than I would have bothered to write within the driver, developers sharing future optimization work, and the ability to share device-specific lowering code that we and other GLES2-level drivers need. total uniforms in shared programs: 13421 -> 13422 (0.01%) uniforms in affected programs: 62 -> 63 (1.61%) total instructions in shared programs: 39961 -> 39707 (-0.64%) instructions in affected programs: 15494 -> 15240 (-1.64%) v2: Add missing imov support, and assert that there are no dest saturates. v3: Rebase on the target-specific algebraic series. v4: Rebase on gallium-includes-from-NIR changes in mater. v5: Rebase on variables being in lists instead of hash tables. v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which I'm not committing)	2015-04-01 10:57:01 -07:00
Eric Anholt	783ad697d2	gallium: Add tgsi_to_nir to get a nir_shader for a TGSI shader. This will be used by the VC4 driver for doing device-independent optimization, and hopefully eventually replacing its whole IR. It also may be useful to other drivers for the same reason. v2: Add all of the instructions I was relying on tgsi_lowering to remove, and more. v3: Rebase on SSA rework of the builder. v4: Use the NIR ineg operation instead of doing a src modifier. v5: Don't use ineg for fnegs. (infer_src_type on MOV doesn't do what I expect, again). v6: Fix handling of multi-channel KILL_IF sources. v7: Make ttn_get_f() return a swizzle of a scalar load_const, rather than a vector load_const. CSE doesn't recognize that srcs out of those channels are actually all the same. v8: Rebase on nir_builder auto-sizing, make the scalar arguments to non-ALU instructions actually be scalars. v9: Add support for if/loop instructions, additional texture targets, and untested support for indirect addressing on temps. v10: Rebase on master, drop bad comment about control flow and just choose the X channel, use int comparison opcodes in LIT for now, drop unused pipe_context argument.. v11: Fix translation of LRP (previously missed because I mis-translated back out), use nir_builder init helpers. v12: Rebase on master, adding explicit include of mtypes.h to get INTERP_QUALIFIER_* v13: Rebase on variables being in lists instead of hash tables, drop use of mtypes.h in favor of util/pipeline.h. Use Ken's nir_builder swizzle and fmov/imov_alu helpers, drop "struct" in front of nir_builder, use nir_builder directly as the function arg in a lot of cases, drop redundant members of ttn_compile that are also in nir_builder, drop some half-baked malloc failure handling. v14: The indirect uniform src0 should be scalar, not vector (noticed as odd by robclark, confirmed by cwabbott). Apply Ken's review to initialize s->num_uniforms and friends, skip ttn_channel for dot products, and use the simpler discard_if intrinsic. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v13) Acked-by: Rob Clark <robclark@freedesktop.org>	2015-04-01 10:57:01 -07:00
Eric Anholt	486dcfbbd9	vc4: Tell shader-db how big our UBOs are, if present. I had regressed them for a while with the NIR work.	2015-04-01 10:57:01 -07:00
Roland Scheidegger	e3252defd2	gallivm: (trivial) fix the logic deciding if function call should be used... Copy and paste bug with the img filter decision. Since there's only 2 different filters anyway just drop this bit.	2015-04-01 13:26:19 +02:00
Dave Airlie	8f7338f284	egl: add initial EGL_MESA_image_dma_buf_export v2.4 At the moment to get an EGL image to a dma-buf file descriptor, you have to use EGL_MESA_drm_image, and then use libdrm to convert this to a file descriptor. This extension just provides an API modelled on EGL_MESA_drm_image, to return a dma-buf file descriptor. v2: update spec for new API proposal add internal queries to get the fourcc back from intel driver. v2.1: add gallium pieces. v2.2: add offsets to spec and API, rename fd->fds, stride->strides in API. rewrite spec a bit more, add some q/a v2.3: add modifiers to query interface and 64-bit type for that (Daniel Stone) specifiy what happens to num fds vs num planes differences. (Chad Versace) v2.4: fix grammar (Daniel Stone) Signed-off-by: Dave Airlie <airlied@redhat.com>	2015-04-01 14:10:04 +10:00
Roland Scheidegger	611bd80f3b	gallivm: do some hack heuristic to disable texture functions We've seen some cases where performance can hurt quite a bit. Technically, the more simple the function the more overhead there is for using a function for this (and the less benefits this provides). Hence don't do this if we expect the generated code to be simple. There's an even more important reason why this hurts performance, which is shaders reusing the same unit with some of the same inputs, as llvm cannot figure out the calculations are the same if they are performned in the function (even just reusing the same unit without any input being the same provides such optimization opportunities though not very much). This is something which would need to be handled by IPO passes however.	2015-04-01 00:56:12 +02:00
Marcin Ślusarz	f9e2295560	nouveau: synchronize "scratch runout" destruction with the command stream When nvc0_push_vbo calls nouveau_scratch_done it does not mean scratch buffers can be freed immediately. It means "when hardware advances to this place in the command stream the scratch buffers can be freed". To fix it, just postpone scratch runout destruction after current fence is signalled. The bug existed for a very long time. Nobody noticed, because "scratch runout" code path is rarely executed. Fixes hang at the very beginning of first mission in "Serious Sam 3" on nve7/gk107. It manifested as: nouveau E[ PFIFO][0000:01:00.0] read fault at 0x000a9e0000 [PTE] from GR/GPC0/PE_2 on channel 0x007f853000 [Sam3[17056]] Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-31 22:04:31 +02:00
Tom Stellard	fda7558057	clover: Return CL_BUILD_ERROR for CL_PROGRAM_BUILD_STATUS when compilation fails v2 v2: - Don't use _errs map Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2015-03-31 15:40:51 +00:00
Tom Stellard	4c53d2acbb	radeonsi/compute: Default to the same PIPE_SHADER_CAP values as other shader types v2 v2: - Fix typo Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2015-03-31 15:40:51 +00:00
Leo Liu	a714fbacf7	radeon/vce: implement video usability information support This will help encoding VUI into the bitstream v2: make backward compatible Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2015-03-31 12:31:58 -04:00
Leo Liu	8e3668a7c0	st/omx/enc: export framerate to vce driver The framerate will be used for video usability info support by VCE driver Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2015-03-31 12:31:58 -04:00
Roland Scheidegger	489866938f	llvmpipe: enable ARB_texture_gather Just announce support for 4 components. While here also increase the max/min texel offsets (the limit is completely artificial, was chosen because that's what other hardware did, however there's other drivers using larger limits). Over a thousand little piglits skip->pass. v2: update docs/GL3.txt Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-31 17:23:51 +02:00
Roland Scheidegger	0753b135f6	gallivm: implement TG4 for ARB_texture_gather This is quite trivial, essentially just follow all the same code you'd use with linear min/mag (and no mip) filter, then just skip the filtering after looking up the texels in favor of direct assignment of the right channel to the result. (This is though not true for the multi-offset version if we'd want to support it - for this would probably need to do something along the lines of 4x nearest sampling due to the necessity of doing coord wrapping individually per texel.) Supports multi-channel formats. From the SM5 gather cap bit, should support non-constant offsets, plus shadow comparisons (the former untested), but not component selection (should be easy to implement but all this stuff is not really exposable anyway for now). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-31 17:23:51 +02:00
Roland Scheidegger	73c6914195	gallivm: add gather support to sampler interface Luckily thanks to the revamped interface this is a lot less work now... Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-31 17:23:51 +02:00
Roland Scheidegger	1863ed21ff	gallivm: simplify sampler interface This has got a bit out of control with more and more parameters added. Worse, whenever something in there changes all callees have to be updated for that, even though they don't really do much with any parameter in there except pass it on to the actual sampling function. Hence simply put almost everything into a struct. Also instead of relying on some arguments being NULL, be explicit and set this in a key (which is just reused for function generation for simplicity). (The code still relies on them being NULL in the end for now.) Technically there is a minimal functional change here for shadow sampling: if shadow sampling is done is now determined explicitly by the texture function (either sample_c or the gl-style tex func inherit this from target) instead of the static texture state. These two should always match, however. Otherwise, it should generate all the same code. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-31 17:23:51 +02:00
Jose Fonseca	0fc5b80e7a	util/debug: Update MgwHelp link, drop BfdHelp link.	2015-03-31 09:42:06 +01:00
Michel Dänzer	b8797a7875	gallivm: Fix build against LLVM 3.7 SVN r233648 Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2015-03-31 15:05:01 +09:00
Eric Anholt	1dcc1ee314	vc4: Drop integer multiplies with 0 to moves of 0. This cleans up more instructions generated by uniform array indexing multiplies. total instructions in shared programs: 39989 -> 39961 (-0.07%) instructions in affected programs: 896 -> 868 (-3.12%)	2015-03-30 12:57:45 -07:00
Eric Anholt	8c5dcdbccb	vc4: Add a constant folding pass. This cleans up some pointless operations generated by the in-driver mul24 lowering (commonly generated by making a vec4 index for a matrix in a uniform array). I could fill in other operations, but pretty much anything else ought to be getting handled at the NIR level, I think. total uniforms in shared programs: 13423 -> 13421 (-0.01%) uniforms in affected programs: 346 -> 344 (-0.58%)	2015-03-30 12:57:45 -07:00
Eric Anholt	c519c4d85e	vc4: Don't bother masking out the low 24 bits for integer multiplies The hardware just uses the low 24 lines, saving us an AND to drop the high bits. total uniforms in shared programs: 13433 -> 13423 (-0.07%) uniforms in affected programs: 356 -> 346 (-2.81%) total instructions in shared programs: 40003 -> 39989 (-0.03%) instructions in affected programs: 910 -> 896 (-1.54%)	2015-03-30 09:23:39 -07:00
Eric Anholt	5df8bf86fe	vc4: Make integer multiply use 24 bits for the low parts. The hardware uses the low 24 bits in integer multiplies, so we can have fewer high bits (and so probably drop them more frequently).	2015-03-30 09:23:39 -07:00
Michel Dänzer	d64adc3a79	radeonsi: Cache LLVMTargetMachineRef in context instead of in screen Fixes a crash in genymotion with several threads compiling shaders concurrently. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89746 Cc: 10.5 <mesa-stable@lists.freedesktop.org> Reviewed-by: Tom Stellard <thomas.stellard@amd.com>	2015-03-30 15:15:10 +09:00
Ilia Mirkin	ee670c9efa	freedreno/a3xx: add support for point sprite coordinate replacement This does not (yet) support different coordinate origins, so the tests still fail due to fbo flipping. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-28 14:54:41 -04:00
Ilia Mirkin	995f55a6ce	freedreno/a3xx: make vs-set point size work This appears to need the A2XX version of the point list, so select it at draw time if necessary. Experimentally, always using the A2XX version causes hangs when PSIZE isn't actually emitted. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-28 14:54:41 -04:00
Ilia Mirkin	7fc5da8b93	freedreno/a3xx: point size should not be divided by 2 The division is probably a holdover from the days when the fixed point inline functions generated by headergen were broken. Also reduce the maximum point size to 4092 (vs 4096), which is what the blob does. Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2015-03-28 14:54:41 -04:00

1 2 3 4 5 ...

23528 Commits