mirrors/mesa - Frog Git

Commit Graph

Author	SHA1	Message	Date
Vedran Miletić	8e430ff8b0	clover: adapt to new error API since LLVM r286752 Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2016-11-14 15:50:29 +00:00
Tim Rowley	c8a51fa75d	swr: [rasterizer core] remove driverType Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-14 09:03:10 -06:00
Tim Rowley	ddc898aaf3	swr: [rasterizer archrast] move to pass by value Move to pass by value since most events are very small in size. We can look at pass by reference but will need to create multiple versions to handle temp objects. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-14 09:03:04 -06:00
Tim Rowley	23e459b606	swr: [rasterizer core] add mode for aux buffer in the SWR_SURFACE_STATE Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-14 09:02:59 -06:00
Tim Rowley	e9a3ad164d	swr: [rasterizer common] don't bleed NOMINMAX definition after <windows.h> Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-14 09:02:53 -06:00
Tim Rowley	cd8d840ce1	swr: [rasterizer archrast] add events Added events for tracking early/late Depth and stencil events, TE patch info, GS prim info, and FrontEnd/BackEnd DrawEnd events. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-14 09:02:48 -06:00
Tim Rowley	7c3ca2e704	swr: [rasterizer core] fix culling issues - Do proper culling of wireframe triangles (including non-culling of degenerates) - Fix degenerate culling of CCW front-facing triangles in wireframe and conservative rast Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-14 09:02:42 -06:00
Tim Rowley	cee66dd2aa	swr: [rasterizer core/jitter] fix alpha test bug Alpha from render target 0 should always be used for alpha test for all render targets, according to GL and DX9 specs. Previously we were using alpha from the current render target. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-14 09:02:36 -06:00
Tim Rowley	5912552947	swr: [rasterizer core] various code style changes Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-14 09:02:31 -06:00
Tim Rowley	584b65ad44	swr: [rasterizer archrast] don't generate empty files Don't generate files when no events have been generated outside the header events. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-14 09:02:25 -06:00
Tim Rowley	e6f7d8a094	swr: [rasterizer archrast] fix open file handle limit issue Buffer events ourselves and then when that's full or we're destroying the context then write the contents to file. Previously, we're relying ofstream to buffer for us. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-14 09:02:17 -06:00
Tim Rowley	2c697754a9	swr: [rasterizer archrast] fix double free issue Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-14 09:02:11 -06:00
Tim Rowley	dc8408920c	swr: [rasterizer core] separate frontend/backend stats enables Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-14 09:02:04 -06:00
Tim Rowley	937b7d8e5a	swr: [rasterizer core] 16-wide tile store nearly completed * All format combinations coded * Fully emulated on AVX2 and AVX * Known issue: the MSAA sample locations need to be adjusted for 8x2 Set ENABLE_AVX512_SIMD16 and USD_8x2_TILE_BACKEND to 1 in knobs.h to enable Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2016-11-14 09:00:59 -06:00
Jonas Pfeil	5debfeb86f	vc4: Add simulator kernel validation for multithreaded fragment shaders. This is Jonas Pfeil's code from the kernel, brought back to Mesa by anholt.	2016-11-12 19:21:46 -08:00
Eric Anholt	96ffee2d02	vc4: Mark threaded FSes as non-singlethread in the CL.	2016-11-12 19:21:46 -08:00
Eric Anholt	ace0d810e5	vc4: Flag the last thread switch in the program as the last. We don't allow the last thread switch to be inside control flow, to be sure that we hit the last state exactly once. If the last texturing was in control flow, fall back to single threaded.	2016-11-12 19:21:46 -08:00
Eric Anholt	67f72c5f5d	vc4: Add THRSW nodes after each tex sample setup in multithreaded mode. This is a suboptimal implementation, but Jonas Pfeil found that it was still a massive performance gain.	2016-11-12 19:21:46 -08:00
Eric Anholt	e3c620e868	vc4: Add some spec citations about texture fifo management.	2016-11-12 18:46:35 -08:00
Eric Anholt	fd2aff858b	vc4: Use ra14/rb14 as the spilling registers. This makes the raddr fixups compatible with FS threading.	2016-11-12 18:46:35 -08:00
Eric Anholt	755037173d	vc4: Add support for register allocation for threaded shaders. We have two major requirements: Make sure that only the bottom half of the physical reg space is used, and make sure that none of our values are live in an accumulator across a switch.	2016-11-12 18:46:35 -08:00
Eric Anholt	fdad4d2402	vc4: Split register class setup for physical files from accumulators.	2016-11-12 18:46:35 -08:00
Eric Anholt	8e704dca7f	vc4: Use register allocator CLASS_BIT_R0_R3 to clean up CLASS_B. We have had no reason to separate ability to store in an accumulator from ability to store in B, but with FS threading, we need to be able to force values to be stored only in the physical regfiles.	2016-11-12 18:46:35 -08:00
Eric Anholt	1ee503c74d	vc4: Add support for QPU scheduling of thread switch instructions. This is vaguely based off of Jonas Pfeil's thread switch support branch.	2016-11-12 18:46:35 -08:00
Eric Anholt	4f527f1260	vc4: Add a thread switch QIR instruction. This will eventually be generated at the QIR level, so that vc4_qir_schedule.c can arrange the separation of tex_strb from tex_result correctly. It will also be important so that register allocation set the register classes appropriately for values that are live across the switch.	2016-11-12 18:46:35 -08:00
Eric Anholt	93cdae44de	vc4: Add a bit of QPU validation for threaded shaders. These are both bugs we've run into along the way writing multithreaded FS support.	2016-11-12 18:46:35 -08:00
Eric Anholt	977d8b526b	vc4: Fix register class handling of DDX/DDY arguments. I had this exactly backwards, but apparently the piglit tests were all landing in r0-r3 anyway. Cc: "13.0" <mesa-stable@lists.freedesktop.org>	2016-11-12 18:46:35 -08:00
Rob Clark	dfc001dccc	freedreno/ir3: fixup ralloc fallout Fixes fallout from `acc23b04` ("ralloc: remove memset from ralloc_size"). We were still depending on zero'd allocations in a couple of places. Signed-off-by: Rob Clark <robdclark@gmail.com>	2016-11-12 08:57:03 -05:00
Laurent Carlier	3ff9f8c532	clover: fix building since llvm r286566 pretty trivial fix	2016-11-11 19:45:22 +00:00
Samuel Pitoiset	561f2208bd	nvc0: support MP performance counters on Maxwell This adds some performance counters/metrics for SM50/SM52. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Pierre Moreau <pierre.morrow@free.fr>	2016-11-10 22:13:49 +01:00
Tim Rowley	b9578b683d	gallium: detect avx512 cpu features v3: fix check for xmm/ymm test v2: style code, add avx512 to cpu dump Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-11-10 15:03:21 -06:00
Marek Olšák	ce3f453f01	radeonsi: fix r600_texture::tc_compatible_htile htile_size is now always non-zero if HTILE is allocated. It seems to have caused no issues. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-10 18:34:55 +01:00
Marek Olšák	ce3189cbe6	radeonsi: accept is_store in image_fetch_rsrc instead of dcc_off Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-10 18:34:55 +01:00
Marek Olšák	f83b2f524a	radeonsi: don't rely on tgsi_scan::images_buffers the instruction knows the target Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-10 18:34:55 +01:00
Marek Olšák	4e00e20074	radeonsi: re-order cases in si_get_shader_param Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-10 18:34:55 +01:00
Marek Olšák	3f6e0063c8	radeonsi: increase MAX_CONTROL_FLOW_DEPTH AKA MaxIfDepth we don't want to lower deep IFs unconditionally Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-10 18:34:55 +01:00
Nicolai Hähnle	b21912e2e9	radeonsi: fix/silence unused variable warnings in optimized builds I'm leaving num_out_sgpr around since it's not in a fast path, and besides the compiler should be able to optimize it away easily. The alternative with #if/#endif would be extremely ugly. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2016-11-10 13:18:16 +01:00
Nicolai Hähnle	b46a9c570f	gallivm: fix [IU]MUL_HI regression harder The fix in commit `88f791db75` was insufficient for radeonsi because the vector case was not handled properly. It seems piglit only covers the scalar case, unfortunately. Fixes GL45-CTS.shader_bitfield_operation.[iu]mulExtended.* Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2016-11-10 13:17:10 +01:00
Ilia Mirkin	828faaef40	swr: correct setting of independentAlphaBlendEnable This setting is for whether color and alpha have different blend settings, not for whether blending is enabled on a per-RT basis. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-09 20:11:57 -05:00
Ilia Mirkin	5be635d5e4	swr: [rasterizer] add a .dir-locals.el to support 4-space indents Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-09 20:11:39 -05:00
Ilia Mirkin	36e5d68cad	swr: set halfz rasterizer setting Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-09 20:11:10 -05:00
Ilia Mirkin	4b5b87e7ab	swr: [rasterizer core] allow an OpenGL driver to specify halfz clipping With ARB_clip_control, GL may also do 0..1 depth clipping, not just -1..1. This removes clip's reliance on driver type. DX users will need to be updated to set the new clipHalfZ flag to get proper clipping functionality. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-09 20:10:52 -05:00
Ilia Mirkin	4af25e7131	swr: fix support for inverted depth scales Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-09 20:10:44 -05:00
Ilia Mirkin	aed517f985	swr: [rasterizer jitter] fix logic op to work with unorm/snorm Most logic op usage is probably going to end up with normalized textures. Scale the floating point values and convert to integer before performing the logic operations. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>	2016-11-09 20:10:25 -05:00
Eric Anholt	08d51487e3	vc4: Clamp the shadow comparison value. Fixes piglit glsl-fs-shadow2D-clamp-z. Cc: <mesa-stable@lists.freedesktop.org>	2016-11-09 15:33:56 -08:00
Eric Anholt	e887341d3f	vc4: Don't pair up TLB scoreboard locking instructions early in QPU sched. Jonas Pfeil noticed that we were putting passthrough tlb_z writes early in the shader, despite QIR and QPU scheduling both trying to delay scoreboard locking for as long as possible. The problem was that when trying to pair up QPU instructions, at some point the passthrough tlb_z would be the last one available and it would get paired, even if the other half would open up other instructions to be scheduled and we could have paired tlb_z with something later in the program. Also, since passthrough z is just a mov, it pairs up really easily. The proper fix would probably be to flip the order of scheduling instructions so we went from bottom to top (also relevant for branch delay slot scheduling). However, we can do a quick fix here to just not schedule a TLB lock until there's nothing but TLB left in the program, at a slight instruction cost (est .61% cycle count in shader-db) but a major fragment shader parallelism win. glmark2 results: texture:texture-filter=linear: +1.24481% +/- 0.626117% (n=15) bump:bump-render=height: 1.24991% +/- 0.154793% (n=136,133 -- screensaver outliers removed)	2016-11-09 15:33:56 -08:00
Eric Anholt	695a2e2ffa	vc4: Print a reg pressure estimate in our reg allocation failure dump.	2016-11-09 15:33:56 -08:00
Eric Anholt	4d019bd703	vc4: Don't abort when a shader compile fails. It's much better to just skip the draw call entirely. Getting this information out of register allocation will also be useful for implementing threaded fragment shaders, which will need to retry non-threaded if RA fails. Cc: <mesa-stable@lists.freedesktop.org>	2016-11-09 15:33:56 -08:00
Aaron Watry	1492633070	llvmpipe: Fix build after removal of deprecated attribute API v2 Applies on top of v3 of Tom's gallivm change. v2: - Tom Stellard: Use enums instread of strings. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Signed-off-by: Aaron Watry <awatry@gmail.com> CC: Tom Stellard <thomas.stellard@amd.com> CC: Jan Vesely <jan.vesely@rutgers.edu>	2016-11-09 20:13:27 +00:00
Tom Stellard	8bdd52c8f3	gallivm: Fix build after removal of deprecated attribute API v3 v2: Fix adding parameter attributes with LLVM < 4.0. v3: Fix typo. Fix parameter index. Add a gallivm enum for function attributes. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2016-11-09 20:13:27 +00:00

1 2 3 4 5 ...

29279 Commits