Commit Graph

97766 Commits

Author SHA1 Message Date
Brian Paul 8150690cac st/mesa: whitespace clean-ups in st_context.c
Trivial.
2017-11-15 16:12:43 -07:00
Brian Paul 0605a6cc89 st/mesa: move st_manager_destroy() earlier in file
To avoid forward declaration.

Reviewed-By: Gert Wollny <gw.fossdev@gmail.com>
2017-11-15 16:12:43 -07:00
Brian Paul 3a74eb3a9b st/mesa: move st_init_driver_flags() earlier in file
To get rid of forward declaration.

Reviewed-By: Gert Wollny <gw.fossdev@gmail.com>
2017-11-15 16:12:43 -07:00
Brian Paul 955cbdf120 docs: update llvmpipe.html build instructions 2017-11-15 16:12:42 -07:00
Wladimir J. van der Laan d61a914394 etnaviv: Add sampler TS support
Sampler TS is an hardware optimization that can be used when rendering
to textures. After rendering to a resource with TS enabled, the
texture unit can use this to bypass lookups to empty tiles. This also
means a resolve-in-place can be avoided to flush the TS.

This commit is also an optimization when not using sampler TS, as
resolve-in-place will now be skipped if a resource has no (valid) TS.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-11-15 23:27:54 +01:00
Wladimir J. van der Laan 59d76e7ab6 etnaviv: Flush TS cache before changing TS configuration
This is to make sure that the TS is properly flushed to memory before
rendering to a new surface starts.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-11-15 23:27:39 +01:00
Wladimir J. van der Laan 0d6d9b520b etnaviv: Add TS_SAMPLER formats to etnaviv_format
Sampler TS introduces yet another format enumeration for
renderable+textureable formats. Introduce it into the etnaviv_format
table as another column.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-11-15 23:27:26 +01:00
Wladimir J. van der Laan ade528edd1 etnaviv: Check that resource has a valid TS in etna_resource_needs_flush
Resources only need a resolve-to-itself if their TS is valid for any
level, not just if it happens to be allocated.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-11-15 23:27:09 +01:00
Wladimir J. van der Laan b24cb40188 etnaviv: rnndb update
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-11-15 23:26:53 +01:00
Dave Airlie 00bf875d55 radv: it isn't an error to not support a format or driver
This reverts two of the vk_error changes:

reporting unsupported format is common,
and testing non-amdgpu drivers and ignoring them is also common.

Fixes: cd64a4f70 (radv: use vk_error() everywhere an error is returned)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-16 06:12:42 +10:00
Kenneth Graunke 5da2b26dcb i965: Drop some reserved space remnants.
BATCH_RESERVED was deleted in commit 2c46a67b41 (i965: Delete
BATCH_RESERVED handling.)  The reserved_space field is dead code, and
the comments aren't useful these days.
2017-11-15 09:37:32 -08:00
Kenneth Graunke e48cc01be9 intel: Drop mtypes.h include from brw_compiler.h.
This isn't necessary and causes trouble for a project I'm working on.
2017-11-15 09:37:32 -08:00
Kenneth Graunke 0704702972 i965: Fold ABO state upload code into the SSBO/UBO state upload code.
Having this separate could potentially make programs that rebind atomics
but no other surfaces ever so slightly faster.  But it's a tiny amount
of code to add to the existing UBO/SSBO atom, and very related.

The extra atoms have a cost on every draw call, and so dropping some of
them would be nice.  This also reclaims a dirty bit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-11-15 09:37:32 -08:00
Kenneth Graunke ff964916dc i965: Use nir_lower_atomics_to_ssbos and delete ABO compiler code.
We use the same hardware mechanism for both atomic counters and SSBO
atomics, so there's really no benefit to maintaining separate code to
handle each case.  Instead, we can just use Rob's shiny new NIR pass to
convert atomic_uints to SSBOs, and delete piles of code.

The ssbo_start section of the binding table becomes a combined ABO and
SSBO section, with ABOs first, then SSBOs.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-11-15 09:37:32 -08:00
Kenneth Graunke f48f52b030 i965: Make a better helper function for UBO/SSBO/ABO surface handling.
This fixes the missing AutomaticSize handling in the ABO code, removes
a bunch of duplicated code, and drops an extra layer of wrapping around
brw_emit_buffer_surface_state().

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-11-15 09:37:32 -08:00
Samuel Pitoiset 059d25a06d radv: add the vertex buffers BO to the list at bind time
This should reduce the overhead of adding a BO to the current
list, especially when the list is huge. Also, when a new pipeline
is bound, we only need to update the descriptor, the buffer objects
should already be in the list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-11-15 09:01:07 +01:00
Samuel Pitoiset c665879455 radv: replace vb_dirty with RADV_CMD_DIRTY_VERTEX_BUFFER
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-11-15 09:01:05 +01:00
Samuel Pitoiset 8fd213277f radv: drop radv_cmd_dirty_mask_t typedef
I don't think we will need a 64-bit unsigned integer for the
dirty flags in the future, and there is still 20 bits left.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-11-15 09:01:01 +01:00
Samuel Pitoiset f697365058 radv: use an unsigned 32-bit integer for radv_queue::family_index
VkDeviceQueueCreateInfo::queueFamilyIndex is an unsigned 32-bit
integer.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-11-15 09:00:59 +01:00
Samuel Pitoiset f9e1ff2464 radv: do not add the image BO in radv_set_dcc_need_cmask_elim_pred()
radv_fill_buffer() ensures that the image BO is added to the list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-11-15 09:00:57 +01:00
Samuel Pitoiset 40290c805f radv: do not add the image BO in radv_set_color_clear_regs()
radv_fill_buffer() ensures that the image BO is added to the list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-11-15 09:00:54 +01:00
Roland Scheidegger 65123ee62c r600: set the number type correctly for float rts in cb setup
Float rts were always set as unorm instead of float.
Not sure of the consequences, but at least it looks like the blend clamp
would have been enabled, which is against the rules (only eg really bothered
to even attempt to specify this correctly, r600 always used clamp anyway).
Albeit r600 (not r700) setup still looks bugged to me due to never setting
BLEND_FLOAT32 which must be set according to docs...
Not sure if the hw really cares, no piglit change (on eg/juniper).

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-11-15 03:13:46 +01:00
Roland Scheidegger 570d5b7992 r600: use ieee version of rsq
Both r600 and evergreen used the clamped version, whereas cayman used the
ieee one. I don't think there's a valid reason for this discrepancy, so let's
switch to the ieee version for r600 and evergreen too, since we generally
want to stick to ieee arithmetic.
With this, behavior for both rcp and rsq should now be the same for all of
r600, eg, cm, all using ieee versions (albeit note rsq retains the abs
behavior for everybody, which may not be a good idea ultimately).

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-11-15 03:13:46 +01:00
Roland Scheidegger 1c8d57a008 r600: use ieee version of rcp
r600 used the clamped version for rcp, whereas both evergreen and cayman
used the ieee version. I don't know why that discrepancy exists (it does so
since day 1) but there does not seem to be a valid reason for this, so make
it consistent. This seems now safer than before the previous commit (using
the dx10 clamp bit).
Note that rsq still uses clamped version (as before even though the table
may have suggested otherwise for evergreen) for r600/eg, but not for cayman.
Will be changed separately for better regression tracking...

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-11-15 03:13:46 +01:00
Roland Scheidegger 3835009796 r600: use DX10_CLAMP bit in shader setup
The docs are not very concise in what this really does, however both
Alex Deucher and Nicolai Hähnle suggested this only really affects instructions
using the CLAMP output modifier, and I've confirmed that with the newly
changed piglit isinf_and_isnan test.
So, with this bit set, if an instruction has the CLAMP modifier bit (which
clamps to [0,1]) set, then NaNs will be converted to zero, otherwise the result
will be NaN.
D3D10 would require this, glsl doesn't have modifiers (with mesa
clamp(x,0,1) would get converted to such a modifier) coupled with a
whatever-floats-your-boat specified NaN behavior, but the clamp behavior
should probably always be used (this also matches what a decomposition into
min(1.0, max(x, 0.0)) would do, if min/max also adhere to the ieee spec of
picking the non-nan result).
Some apps may in fact rely on this, as this prevents misrenderings in
This War of Mine since using ieee muls
(ce7a045fee), without having to use clamped
rcp opcode, which would also fix this bug there.
radeonsi also seems to set this bit nowadays if I see that righ (albeit the
llvm amdgpu code comment now says "Make clamp modifier on NaN input returns 0"
instead of "Do not clamp NAN to 0" since it was changed, which also looks
a bit misleading).

v2: set it in all shader stages.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103544

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-11-15 03:13:46 +01:00
Roland Scheidegger aab0bfc648 r600: use min_dx10/max_dx10 instead of min/max
I believe this is the safe thing to do, especially ever since the driver
actually generates NaNs for muls too.
The ISA docs are not very helpful here, however the dx10 versions will pick
a non-nan result over a NaN one (this is also the ieee754 behavior), whereas
the non-dx10 ones will pick the NaN (verified by newly changed piglit
isinf-and-isnan test).
Other "modern" drivers will most likely do the same.
This was shown to make some difference for bug 103544, albeit it is not
required to fix it.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2017-11-15 03:13:46 +01:00
Dave Airlie 3ceee04a4f r600: fix cubemap arrays
A lot of cubemap array piglits fail, port the texture type
picking code from radeonsi which seems to fix most of them.

For images I will port the rest of the code.

Fixes:
getteximage-depth gl_texture_cube_map_array-*
fbo-generatemipmap-cubemap array
getteximage-targets cube_array
amongst others.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-11-15 11:26:11 +10:00
Rob Clark 7676e71113 freedreno/a5xx: small comment fix
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-14 18:12:47 -05:00
Rob Clark d27318bdd0 freedreno/a5xx: indirect draw support
A couple failures in piglit tests w/ TF or gl_VertexID + indirect draws.
OTOH all the deqp tests (although they don't test those combinations).
I suspect this could be fixed by a firmware update, but I don't think
there is much we can do in mesa for that.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-14 18:10:58 -05:00
Rob Clark f383cf9d41 freedreno/a5xx: split out helper for pipeline stalls
We need a similar thing for indirect draws.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-14 18:10:51 -05:00
Rob Clark d74029bddc freedreno: update generated headers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-11-14 18:10:43 -05:00
Timothy Arceri 5041ea96a0 gallium/radeon: disable the cache when nir backend enabled
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-15 08:47:31 +11:00
Timothy Arceri 7273e9820e st/glsl_to_tgsi: use tgsi_get_gl_varying_semantic() for gs/tes outputs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-15 08:26:34 +11:00
Timothy Arceri bc308122cc gallium/tgsi: add tess output supoort to tgsi_get_gl_varying_semantic()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-15 08:26:34 +11:00
Timothy Arceri 4ae9f0b580 st/glsl_to_tgsi: make use of tgsi_get_gl_varying_semantic()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-15 08:26:34 +11:00
Timothy Arceri 3d21eb3b7d gallium/tgsi: add prim id to tgsi_get_gl_varying_semantic()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-15 08:26:34 +11:00
Anuj Phogat fc59546e9a i965: Make use of brw_load_register_imm32() helper function
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Nanley Chery <nanley.g.chery@intel.com>
2017-11-14 13:23:18 -08:00
Anuj Phogat 1dc45d75bb i965/gen8+: Fix the number of dwords programmed in MI_FLUSH_DW
Number of dwords in MI_FLUSH_DW changed from 4 to 5 in gen8+.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
2017-11-14 13:23:18 -08:00
Anuj Phogat 6165fda59b i965: Program DWord Length in MI_FLUSH_DW
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
2017-11-14 13:23:18 -08:00
Anuj Phogat 5d8164c428 anv/gen10: Enable float blend optimization
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-11-14 13:23:18 -08:00
Anuj Phogat 72a239266b intel/genxml: Add Cache Mode SubSlice Register to gen10.xml
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-11-14 13:23:18 -08:00
Anuj Phogat aacf1943c0 anv/gen10: Implement WaSampleOffsetIZ workaround
We already have this workaround in OpenGL driver.
See Mesa commit 3cf4fe2219.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Nanley Chery <nanley.g.chery@intel.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
2017-11-14 13:23:18 -08:00
Andres Rodriguez 20e8dfcca9 mesa/st: add missing copyright headers to memoryobjects files
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-14 11:32:44 -08:00
Andres Rodriguez 60baf1a962 mesa: minor tidy up for memory object error strings
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-14 11:31:49 -08:00
Andres Rodriguez f7580e7204 broadcom/vc4: fix indentation in vc4_screen.c
Stumbled into this when adding a new PIPE_CAP.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-11-14 11:31:36 -08:00
Matt Turner a31d038208 Revert "intel/fs: Use a pure vertical stride for large register strides"
This reverts commit e8c9e65185.

With the actual bug fixed (by commit 6ac2d16901), this is not
necessary. I'm doubtful of its correctness in any case.
2017-11-14 11:24:08 -08:00
Matt Turner 6ac2d16901 i965/fs: Fix extract_i8/u8 to a 64-bit destination
The MOV instruction can extract bytes to words/double words, and
words/double words to quadwords, but not byte to quadwords.

For unsigned byte to quadword, we can read them as words and AND off the
high byte and extract to quadword in one instruction. For signed bytes,
we need to first sign extend to word and the sign extend that word to a
quadword.

Fixes the following test on CHV, BXT, and GLK:
   KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-11-14 10:56:18 -08:00
Matt Turner cfcfa0b9cd i965/fs: Split all 32->64-bit MOVs on CHV, BXT, GLK
Fixes the following tests on CHV, BXT, and GLK:
    KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot
    dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115
2017-11-14 10:56:18 -08:00
Tim Rowley d8489517a5 swr/rast: Faster emulated simd16 permute
Speed up simd16 frontend (default) on avx/avx2 platforms;
fixes performance regression caused by switch to simdlib.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-14 11:40:19 -06:00
Tim Rowley 439904847e swr/rast: Use gather instruction for i32gather_ps on simd16/avx512
Speed up avx512 platforms; fixes performance regression caused
by swithc to simdlib.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: mesa-stable@lists.freedesktop.org
2017-11-14 11:39:02 -06:00