Commit Graph

87837 Commits

Author SHA1 Message Date
Roland Scheidegger bc86e829a5 gallivm: optimize lp_build_unpack_arith_rgba_aos slightly
This code uses a vector shift which has to be emulated on x86 unless
there's AVX2. Luckily in some cases we can actually avoid the shift
altogether, so do that.
Also make sure we hit the fast lp_build_conv() path when applicable,
albeit that's quite the hack...
That said, this path is taken for AoS sampling for small unorm (smaller
than rgba8) formats, and it is completely hopeless even with those
changes, with or without AVX.
(Probably should have some code similar to the one in the llvmpipe fs
backend code, using bit replication to extend to rgba8888 - rounding
is not quite 100% accurate but if it's good enough there it should be
here as well.)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-05 23:59:38 +01:00
Roland Scheidegger a03a2ac6fd gallivm: use 2 srcs for 32->16bit conversions in lp_bld_conv_auto
If we only feed one source vector at a time, we cannot use pack intrinsics
(as we only have a 64bit destination dst vector). lp_bld_conv_auto is
specifically designed to alter the length and number of destination vectors,
so this works just fine (if we use single source vectors at a time, afterwards
we immediately reassemble the vectors).
For AVX though this isn't really possible, since we expect 128bit output
already for a single 256bit input. (One day we should handle AVX2 which again
would need multiple inputs, however there's the problem that we get different
ordered output there and we don't want to reorder, so would need to be able
to tell build_conv to handle upper and lower halfs independently.)
A similar strategy would probably work for 32->8bit too (if it doesn't hit
the special case) but I'm going to try something different for that...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-05 23:59:38 +01:00
Roland Scheidegger db7e786a25 llvmpipe: (trivial) minimally simplify mask construction
simd instruction sets usually have comparisons for equal, not unequal.
So use a different comparison against the mask itself - which also means
we don't need a all-zero as well as a all-one (for the pxor) reg.

Also add code to avoid scalar expansion of i1 values which we definitely
shouldn't do. There's problems with this though with llvm select
interaction, so it's disabled (basically using llvm select instead of
intrinsics may still produce atrocious code, even in cases where we
figured it should not, albeit I think this could probably be fixed
with some better selection of optimization passes, but I have zero
idea there really).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2017-01-05 23:59:38 +01:00
Lionel Landwerlin a8eeb089c0 anv: fix multiple creation with internal failure
The specification section 9.4 says :

   When an application attempts to create many pipelines in a single
   command, it is possible that some subset may fail creation. In that
   case, the corresponding entries in the pPipelines output array will
   be filled with VK_NULL_HANDLE values. If any pipeline fails
   creation (for example, due to out of memory errors), the
   vkCreate*Pipelines commands will return an error code. The
   implementation will attempt to create all pipelines, and only
   return VK_NULL_HANDLE values for those that actually failed.

Fixes :

   dEQP-VK.api.object_management.alloc_callback_fail_multiple.graphics_pipeline
   dEQP-VK.api.object_management.alloc_callback_fail_multiple.compute_pipeline

v2: C is hard let's go shopping (Lionel)

v3: Remove unnecessary condition in for loops (Lionel)

v4: Document why we return on first failure (Eduardo)
    Move i declaration inside for() (Eduardo)

v5: Move array cleanup out of loop (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2017-01-05 21:09:09 +00:00
Tim Rowley 33fa4c99f7 swr: [rasterizer core/common/jitter] gl_double support
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99214
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-01-05 14:10:36 -06:00
Fredrik Höglund b6670157d7 dri3: Fix MakeCurrent without a default framebuffer
In OpenGL 3.0 and later it is legal to make a context current without
a default framebuffer.

This has been broken since DRI3 support was introduced.

Cc: "13.0 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-01-05 20:52:01 +01:00
Marek Olšák e16245b339 radeonsi: turn SDMA IBs into de-facto preambles of GFX IBs
Draw calls no longer flush SDMA IBs. r600_need_dma_space is
responsible for synchronizing execution between both IBs.

Initial buffer clears and fast clears will stay unflushed in the SDMA IB
(up to 64 MB) as long as the GFX IB isn't flushed either.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:24 +01:00
Marek Olšák cba9d59362 radeonsi: implement SDMA-based buffer clearing for SI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:24 +01:00
Marek Olšák 29d6a367a6 radeonsi: do all math in bytes in SI DMA code
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:24 +01:00
Marek Olšák 9e1aa81dfe gallium/radeon: prevent SDMA stalls by detecting RAW hazards in need_dma_space
Call r600_dma_emit_wait_idle only when there is a possibility of
a read-after-write hazard. Buffers not yet used by the SDMA IB don't
have to wait.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:24 +01:00
Marek Olšák 3be8336440 gallium/radeon: move unrelated code from dma_emit_wait_idle to need_dma_space
r600_dma_emit_wait_idle is going away in its current form.
The only difference is that the moved code is executed before DMA calls
instead of after them.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:24 +01:00
Marek Olšák 973d7cd90a radeonsi: inline cik_sdma_do_copy_buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák 067a3237b9 radeonsi: also wait for SDMA in the clear_buffer CPU fallback
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák f6a1c2d883 radeonsi: simplify r600_resource typecasts in si_clear_buffer
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák a31a92e7ef radeonsi: always use SDMA for big buffer clears and first buffer uses
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák 69f489dfa1 radeonsi: use SDMA in rvid_buffer_clear on CIK-VI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák 9a3296bf1c radeonsi: use SDMA for initial clearing of DCC/CMASK/HTILE on CIK-VI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák d4c0ad4de8 radeonsi: implement SDMA-based buffer clearing for CIK-VI
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:43:23 +01:00
Marek Olšák 431742dbba gallium/hud: increase the vertex buffer size for text
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák 6d54cd75a8 gallium/hud: add an option to sort items below graphs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák 80b8b9c8a4 gallium/hud: add an option to reset the color counter
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák a57e071e9e gallium/hud: allow more data sources per pane
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák e8bb97ce30 gallium/hud: add an option to rename each data source
useful for radeonsi performance counters

v2: allow specifying both : and =

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák d995115b17 gallium: remove TGSI_OPCODE_SUB
It's redundant with the source modifier.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Marek Olšák a4ace98a97 gallium: remove TGSI_OPCODE_ABS
It's redundant with the source modifier.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-01-05 18:30:00 +01:00
Axel Davy 09d09b219e st/nine: Remove all usage of ureg_SUB in nine_shader
This is required to drop gallium SUB.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-05 18:30:00 +01:00
Axel Davy 67cda68bba st/nine: Remove all usage of ureg_SUB in nine_ff
This is required to remove gallium SUB.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-05 18:30:00 +01:00
Axel Davy caf93f5311 st/nine: Do not map SUB and ABS to their gallium equivalent.
This is required for gallium SUB and ABS to be removed.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-01-05 18:30:00 +01:00
Eric Anholt dbe0dd11b9 configure: Fix another bashism.
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-01-05 09:24:28 -08:00
Marek Olšák 3477f67057 st/mesa: fix a segfault when prog->sh.data is NULL
Broken by:
   st/mesa: get Version from gl_program rather than gl_shader_program

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-01-05 17:11:03 +01:00
Emil Velikov 37f9262064 docs: add news item and link release notes for 13.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-05 16:07:53 +00:00
Emil Velikov 934792b846 docs: add sha256 checksums for 13.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit c8ece92ded9337b9ed60aa9568b41313025a1406)
2017-01-05 16:07:53 +00:00
Emil Velikov 5cd9660302 docs: add release notes for 13.0.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit bec04114d2612042bdf61183cfa3416b3a643b68)
2017-01-05 16:07:53 +00:00
Nayan Deshmukh ee4b4791ab st/va: fix incorrect argument in vl_compositor_cleanup
This fixes the mistake introduced in commit
b6737a8bcd

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-01-05 16:40:06 +01:00
Tim Rowley 68ddcc6c28 swr: remove unneeded llvm version check
Old test caused breakage with llvm-svn (4.0.0svn), and not needed as
the minimum required llvm version for swr is 3.6.

Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
2017-01-05 07:31:19 -06:00
George Kyriazis 36ad826548 swr: fix windows build break
wrap lp_bld_type.h around extern "C".
Windows decorates global variables, so when used from .cpp files, need
to use an undecorated version.

Also, removed related and unneeded code from swr_screen.cpp

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-01-05 07:30:18 -06:00
Marek Olšák 3753dc896d radeonsi: update clip_regs if clip_disable changes to fix a hang
This seems to fix the GPU hangs caused by:

commit ed3190b3f3
Author: Marek Olšák <marek.olsak@amd.com>
Date:   Sun Nov 13 18:41:43 2016 +0100

    radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99219

Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-01-05 14:01:18 +01:00
Marek Olšák c7affbf687 st/mesa: enable GLSLOptimizeConservatively for drivers that want it
GLSL compilation now takes 24% less time with the Gallium noop driver.
I used my shader-db for the measurement. The difference for the whole
radeonsi driver can be ~10%.

The generated TGSI is mostly the same. For example, the compilation success
rate with a TGSI->GCN bytecode converter without any optimizations is
the same. Note that glsl_to_tgsi does its own copy propagation and simple
register allocation.

shader-db GCN report:
- Talos spills fewer SGPRs.
- DOTA 2 spills more SGPRs.
- The average shader-db score is better, but it's just due to randomness.

29045 shaders in 17564 tests
Totals:
SGPRS: 1325929 -> 1325017 (-0.07 %)
VGPRS: 1010808 -> 1010172 (-0.06 %)
Spilled SGPRs: 1432 -> 1399 (-2.30 %)
Spilled VGPRs: 93 -> 92 (-1.08 %)
Private memory VGPRs: 688 -> 688 (0.00 %)
Scratch size: 2540 -> 2484 (-2.20 %) dwords per thread
Code Size: 39336732 -> 39342936 (0.02 %) bytes
Max Waves: 217937 -> 217969 (0.01 %)

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-05 13:07:12 +01:00
Marek Olšák 96fe8834f5 glsl_to_tgsi: do fewer optimizations with GLSLOptimizeConservatively
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-05 13:07:12 +01:00
Marek Olšák 0a5018c1a4 mesa: add gl_constants::GLSLOptimizeConservatively
to reduce the amount of GLSL optimizations for drivers that can do better.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-05 13:07:12 +01:00
Marek Olšák e51baeb6c1 gallium: add PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY
Drivers with good compilers don't need aggressive optimizations before TGSI.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-05 13:07:12 +01:00
Marek Olšák d3cb79e043 glsl: run do_lower_jumps properly in do_common_optimizations
so that backends don't have to run it manually

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-01-05 13:07:12 +01:00
Kenneth Graunke 7c6b714cd0 i965: Print VS output VUE map in Vulkan too.
We need to move this to the shared layer.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-05 01:55:27 -08:00
Kenneth Graunke 480d6c1653 i965: Fix last slot calculations
If the VUE map has slots at the end which the shader does not write,
then we'd "flush" (constructing an URB write) on the last output it
actually wrote.  Then, we'd construct another SEND with EOT, but with
no actual payload data.  That's not legal.

For example, SSO programs have clip distance slots allocated no matter
what, but the shader may not write them.  If it doesn't write any user
defined varyings, then the clip distance slots will be the last ones.

Found while debugging
dEQP-VK.tessellation.shader_input_output.gl_position_vs_to_tcs_to_tes

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
2017-01-05 01:54:52 -08:00
Iago Toral Quiroga 8dc92a5613 docs: Mark GL_ARB_gpu_shader_fp64 and OpenGL 4.0 as done for i965/hsw+
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-05 09:34:36 +01:00
Iago Toral Quiroga 580c503ca2 docs: add GL_ARB_gpu_shader_fp64 and OpenGL 4.0 support for Intel Haswell.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2017-01-05 09:34:14 +01:00
Iago Toral Quiroga a98f2e53e1 i965: add a kernel_features bitfield to intel screen
We can use this to track various features that may or may not be supported
by the hw / kernel. Currently, we usually do this by checking the generation
and supported command parser versions in various places thoughtout the driver
code. With this patch, we centralize all these checks in just once place at
screen creation time, then we just query the bitfield wherever we need to
check if a particular feature is supported.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-05 08:43:46 +01:00
Iago Toral Quiroga e3123c8ca2 i965/gen7: Enable OpenGL 4.0 in Haswell when supported
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-05 08:43:46 +01:00
Iago Toral Quiroga 1f1b8def48 i965: get rid of brw->can_do_pipelined_register_writes
Instead, check the screen field directly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-05 08:43:46 +01:00
Chris Wilson 02a44484f0 i965: Move the pipelined test for SO register access to the screen
Moving the test to the screen places it alongside the other global HW
feature tests that want to be shared between contexts.

Also, we need to know if we support pipelined register writes at
screen creation time so that we can tell if we can expose OpenGL 4.0
in gen7.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-01-05 08:43:46 +01:00