KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Roland Scheidegger	bc86e829a5	gallivm: optimize lp_build_unpack_arith_rgba_aos slightly This code uses a vector shift which has to be emulated on x86 unless there's AVX2. Luckily in some cases we can actually avoid the shift altogether, so do that. Also make sure we hit the fast lp_build_conv() path when applicable, albeit that's quite the hack... That said, this path is taken for AoS sampling for small unorm (smaller than rgba8) formats, and it is completely hopeless even with those changes, with or without AVX. (Probably should have some code similar to the one in the llvmpipe fs backend code, using bit replication to extend to rgba8888 - rounding is not quite 100% accurate but if it's good enough there it should be here as well.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2017-01-05 23:59:38 +01:00
Roland Scheidegger	a03a2ac6fd	gallivm: use 2 srcs for 32->16bit conversions in lp_bld_conv_auto If we only feed one source vector at a time, we cannot use pack intrinsics (as we only have a 64bit destination dst vector). lp_bld_conv_auto is specifically designed to alter the length and number of destination vectors, so this works just fine (if we use single source vectors at a time, afterwards we immediately reassemble the vectors). For AVX though this isn't really possible, since we expect 128bit output already for a single 256bit input. (One day we should handle AVX2 which again would need multiple inputs, however there's the problem that we get different ordered output there and we don't want to reorder, so would need to be able to tell build_conv to handle upper and lower halfs independently.) A similar strategy would probably work for 32->8bit too (if it doesn't hit the special case) but I'm going to try something different for that... Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2017-01-05 23:59:38 +01:00
Roland Scheidegger	db7e786a25	llvmpipe: (trivial) minimally simplify mask construction simd instruction sets usually have comparisons for equal, not unequal. So use a different comparison against the mask itself - which also means we don't need a all-zero as well as a all-one (for the pxor) reg. Also add code to avoid scalar expansion of i1 values which we definitely shouldn't do. There's problems with this though with llvm select interaction, so it's disabled (basically using llvm select instead of intrinsics may still produce atrocious code, even in cases where we figured it should not, albeit I think this could probably be fixed with some better selection of optimization passes, but I have zero idea there really). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2017-01-05 23:59:38 +01:00
Lionel Landwerlin	a8eeb089c0	anv: fix multiple creation with internal failure The specification section 9.4 says : When an application attempts to create many pipelines in a single command, it is possible that some subset may fail creation. In that case, the corresponding entries in the pPipelines output array will be filled with VK_NULL_HANDLE values. If any pipeline fails creation (for example, due to out of memory errors), the vkCreate*Pipelines commands will return an error code. The implementation will attempt to create all pipelines, and only return VK_NULL_HANDLE values for those that actually failed. Fixes : dEQP-VK.api.object_management.alloc_callback_fail_multiple.graphics_pipeline dEQP-VK.api.object_management.alloc_callback_fail_multiple.compute_pipeline v2: C is hard let's go shopping (Lionel) v3: Remove unnecessary condition in for loops (Lionel) v4: Document why we return on first failure (Eduardo) Move i declaration inside for() (Eduardo) v5: Move array cleanup out of loop (Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2017-01-05 21:09:09 +00:00
Tim Rowley	33fa4c99f7	swr: [rasterizer core/common/jitter] gl_double support Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99214 Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-01-05 14:10:36 -06:00
Fredrik Höglund	b6670157d7	dri3: Fix MakeCurrent without a default framebuffer In OpenGL 3.0 and later it is legal to make a context current without a default framebuffer. This has been broken since DRI3 support was introduced. Cc: "13.0 12.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2017-01-05 20:52:01 +01:00
Marek Olšák	e16245b339	radeonsi: turn SDMA IBs into de-facto preambles of GFX IBs Draw calls no longer flush SDMA IBs. r600_need_dma_space is responsible for synchronizing execution between both IBs. Initial buffer clears and fast clears will stay unflushed in the SDMA IB (up to 64 MB) as long as the GFX IB isn't flushed either. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:43:24 +01:00
Marek Olšák	cba9d59362	radeonsi: implement SDMA-based buffer clearing for SI Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:43:24 +01:00
Marek Olšák	29d6a367a6	radeonsi: do all math in bytes in SI DMA code Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:43:24 +01:00
Marek Olšák	9e1aa81dfe	gallium/radeon: prevent SDMA stalls by detecting RAW hazards in need_dma_space Call r600_dma_emit_wait_idle only when there is a possibility of a read-after-write hazard. Buffers not yet used by the SDMA IB don't have to wait. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:43:24 +01:00
Marek Olšák	3be8336440	gallium/radeon: move unrelated code from dma_emit_wait_idle to need_dma_space r600_dma_emit_wait_idle is going away in its current form. The only difference is that the moved code is executed before DMA calls instead of after them. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:43:24 +01:00
Marek Olšák	973d7cd90a	radeonsi: inline cik_sdma_do_copy_buffer Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:43:23 +01:00
Marek Olšák	067a3237b9	radeonsi: also wait for SDMA in the clear_buffer CPU fallback Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:43:23 +01:00
Marek Olšák	f6a1c2d883	radeonsi: simplify r600_resource typecasts in si_clear_buffer Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:43:23 +01:00
Marek Olšák	a31a92e7ef	radeonsi: always use SDMA for big buffer clears and first buffer uses Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:43:23 +01:00
Marek Olšák	69f489dfa1	radeonsi: use SDMA in rvid_buffer_clear on CIK-VI Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:43:23 +01:00
Marek Olšák	9a3296bf1c	radeonsi: use SDMA for initial clearing of DCC/CMASK/HTILE on CIK-VI Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:43:23 +01:00
Marek Olšák	d4c0ad4de8	radeonsi: implement SDMA-based buffer clearing for CIK-VI Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:43:23 +01:00
Marek Olšák	431742dbba	gallium/hud: increase the vertex buffer size for text Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:30:00 +01:00
Marek Olšák	6d54cd75a8	gallium/hud: add an option to sort items below graphs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:30:00 +01:00
Marek Olšák	80b8b9c8a4	gallium/hud: add an option to reset the color counter Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:30:00 +01:00
Marek Olšák	a57e071e9e	gallium/hud: allow more data sources per pane Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:30:00 +01:00
Marek Olšák	e8bb97ce30	gallium/hud: add an option to rename each data source useful for radeonsi performance counters v2: allow specifying both : and = Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:30:00 +01:00
Marek Olšák	d995115b17	gallium: remove TGSI_OPCODE_SUB It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:30:00 +01:00
Marek Olšák	a4ace98a97	gallium: remove TGSI_OPCODE_ABS It's redundant with the source modifier. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-01-05 18:30:00 +01:00
Axel Davy	09d09b219e	st/nine: Remove all usage of ureg_SUB in nine_shader This is required to drop gallium SUB. Signed-off-by: Axel Davy <axel.davy@ens.fr> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2017-01-05 18:30:00 +01:00
Axel Davy	67cda68bba	st/nine: Remove all usage of ureg_SUB in nine_ff This is required to remove gallium SUB. Signed-off-by: Axel Davy <axel.davy@ens.fr> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2017-01-05 18:30:00 +01:00
Axel Davy	caf93f5311	st/nine: Do not map SUB and ABS to their gallium equivalent. This is required for gallium SUB and ABS to be removed. Signed-off-by: Axel Davy <axel.davy@ens.fr> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2017-01-05 18:30:00 +01:00
Eric Anholt	dbe0dd11b9	configure: Fix another bashism. Reviewed-by: Matt Turner <mattst88@gmail.com>	2017-01-05 09:24:28 -08:00
Marek Olšák	3477f67057	st/mesa: fix a segfault when prog->sh.data is NULL Broken by: st/mesa: get Version from gl_program rather than gl_shader_program Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-01-05 17:11:03 +01:00
Emil Velikov	37f9262064	docs: add news item and link release notes for 13.0.3 Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2017-01-05 16:07:53 +00:00
Emil Velikov	934792b846	docs: add sha256 checksums for 13.0.3 Signed-off-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit c8ece92ded9337b9ed60aa9568b41313025a1406)	2017-01-05 16:07:53 +00:00
Emil Velikov	5cd9660302	docs: add release notes for 13.0.3 Signed-off-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit bec04114d2612042bdf61183cfa3416b3a643b68)	2017-01-05 16:07:53 +00:00
Nayan Deshmukh	ee4b4791ab	st/va: fix incorrect argument in vl_compositor_cleanup This fixes the mistake introduced in commit `b6737a8bcd` Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2017-01-05 16:40:06 +01:00
Tim Rowley	68ddcc6c28	swr: remove unneeded llvm version check Old test caused breakage with llvm-svn (4.0.0svn), and not needed as the minimum required llvm version for swr is 3.6. Reviewed-by: George Kyriazis <george.kyriazis@intel.com>	2017-01-05 07:31:19 -06:00
George Kyriazis	36ad826548	swr: fix windows build break wrap lp_bld_type.h around extern "C". Windows decorates global variables, so when used from .cpp files, need to use an undecorated version. Also, removed related and unneeded code from swr_screen.cpp Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-01-05 07:30:18 -06:00
Marek Olšák	3753dc896d	radeonsi: update clip_regs if clip_disable changes to fix a hang This seems to fix the GPU hangs caused by: commit `ed3190b3f3` Author: Marek Olšák <marek.olsak@amd.com> Date: Sun Nov 13 18:41:43 2016 +0100 radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99219 Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-01-05 14:01:18 +01:00
Marek Olšák	c7affbf687	st/mesa: enable GLSLOptimizeConservatively for drivers that want it GLSL compilation now takes 24% less time with the Gallium noop driver. I used my shader-db for the measurement. The difference for the whole radeonsi driver can be ~10%. The generated TGSI is mostly the same. For example, the compilation success rate with a TGSI->GCN bytecode converter without any optimizations is the same. Note that glsl_to_tgsi does its own copy propagation and simple register allocation. shader-db GCN report: - Talos spills fewer SGPRs. - DOTA 2 spills more SGPRs. - The average shader-db score is better, but it's just due to randomness. 29045 shaders in 17564 tests Totals: SGPRS: 1325929 -> 1325017 (-0.07 %) VGPRS: 1010808 -> 1010172 (-0.06 %) Spilled SGPRs: 1432 -> 1399 (-2.30 %) Spilled VGPRs: 93 -> 92 (-1.08 %) Private memory VGPRs: 688 -> 688 (0.00 %) Scratch size: 2540 -> 2484 (-2.20 %) dwords per thread Code Size: 39336732 -> 39342936 (0.02 %) bytes Max Waves: 217937 -> 217969 (0.01 %) Reviewed-by: Eric Anholt <eric@anholt.net>	2017-01-05 13:07:12 +01:00
Marek Olšák	96fe8834f5	glsl_to_tgsi: do fewer optimizations with GLSLOptimizeConservatively Reviewed-by: Eric Anholt <eric@anholt.net>	2017-01-05 13:07:12 +01:00
Marek Olšák	0a5018c1a4	mesa: add gl_constants::GLSLOptimizeConservatively to reduce the amount of GLSL optimizations for drivers that can do better. Reviewed-by: Eric Anholt <eric@anholt.net>	2017-01-05 13:07:12 +01:00
Marek Olšák	e51baeb6c1	gallium: add PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY Drivers with good compilers don't need aggressive optimizations before TGSI. Reviewed-by: Eric Anholt <eric@anholt.net>	2017-01-05 13:07:12 +01:00
Marek Olšák	d3cb79e043	glsl: run do_lower_jumps properly in do_common_optimizations so that backends don't have to run it manually Reviewed-by: Eric Anholt <eric@anholt.net>	2017-01-05 13:07:12 +01:00
Kenneth Graunke	7c6b714cd0	i965: Print VS output VUE map in Vulkan too. We need to move this to the shared layer. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>	2017-01-05 01:55:27 -08:00
Kenneth Graunke	480d6c1653	i965: Fix last slot calculations If the VUE map has slots at the end which the shader does not write, then we'd "flush" (constructing an URB write) on the last output it actually wrote. Then, we'd construct another SEND with EOT, but with no actual payload data. That's not legal. For example, SSO programs have clip distance slots allocated no matter what, but the shader may not write them. If it doesn't write any user defined varyings, then the clip distance slots will be the last ones. Found while debugging dEQP-VK.tessellation.shader_input_output.gl_position_vs_to_tcs_to_tes Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>	2017-01-05 01:54:52 -08:00
Iago Toral Quiroga	8dc92a5613	docs: Mark GL_ARB_gpu_shader_fp64 and OpenGL 4.0 as done for i965/hsw+ Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2017-01-05 09:34:36 +01:00
Iago Toral Quiroga	580c503ca2	docs: add GL_ARB_gpu_shader_fp64 and OpenGL 4.0 support for Intel Haswell. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2017-01-05 09:34:14 +01:00
Iago Toral Quiroga	a98f2e53e1	i965: add a kernel_features bitfield to intel screen We can use this to track various features that may or may not be supported by the hw / kernel. Currently, we usually do this by checking the generation and supported command parser versions in various places thoughtout the driver code. With this patch, we centralize all these checks in just once place at screen creation time, then we just query the bitfield wherever we need to check if a particular feature is supported. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-01-05 08:43:46 +01:00
Iago Toral Quiroga	e3123c8ca2	i965/gen7: Enable OpenGL 4.0 in Haswell when supported Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-01-05 08:43:46 +01:00
Iago Toral Quiroga	1f1b8def48	i965: get rid of brw->can_do_pipelined_register_writes Instead, check the screen field directly. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-01-05 08:43:46 +01:00
Chris Wilson	02a44484f0	i965: Move the pipelined test for SO register access to the screen Moving the test to the screen places it alongside the other global HW feature tests that want to be shared between contexts. Also, we need to know if we support pipelined register writes at screen creation time so that we can tell if we can expose OpenGL 4.0 in gen7. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2017-01-05 08:43:46 +01:00

1 2 3 4 5 ...

87837 Commits All Branches Search

87837 Commits

All Branches