Commit Graph

96160 Commits

Author SHA1 Message Date
Samuel Pitoiset 70f6b95862 radv: remove unused radv_meta_state::btoi::render_pass handle
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-02 11:56:20 +02:00
Samuel Pitoiset 6f1447c090 radv: do not check the number of levels when doing fast htile
We shouldn't reach this point because HTILE is only enabled
when the number of levels is 1.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-02 11:56:20 +02:00
Samuel Pitoiset 06dbe0722f radv: cleanup radv_device_finish_meta_XXX() helpers
Unnecessary to double check that handles are not NULL.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-02 11:56:20 +02:00
Samuel Pitoiset 2084629b63 radv: select the pipeline outside of emit_fast_clear_flush()
It can't change during the decompression pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-02 11:56:20 +02:00
Samuel Pitoiset 331a4f885a radv: drop useless param in emit_depth_decomp()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-02 11:56:20 +02:00
Samuel Pitoiset 87f4e432e3 radv: drop useless check in depth_view_can_fast_clear()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-02 11:56:20 +02:00
Samuel Pitoiset 689930f670 radv: add radv_subpass_clear_attachment() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-02 11:56:20 +02:00
Samuel Pitoiset a821771c56 radv: add radv_attachment_needs_clear() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-02 11:56:20 +02:00
Samuel Pitoiset 0a208122d7 radv: remove unused param in radv_handle_{cmask,dcc}_image_transition()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-02 11:56:20 +02:00
Samuel Pitoiset db2e68b66b radv: add radv_vi_dcc_enabled() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-02 11:56:20 +02:00
Samuel Pitoiset 457306fa4c radv: do not need to double zero-init the meta state structures
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-02 11:56:20 +02:00
Samuel Pitoiset af62984c8a radv: inline destroy_render_pass()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-02 11:56:20 +02:00
Samuel Pitoiset 84635ef3a3 radv: use pipeline handles instead of objects for meta clear operations
To be consistent with other meta operations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-02 11:56:20 +02:00
Samuel Pitoiset a5f76d259b radv: inline blit2d_unbind_dst()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-02 11:56:20 +02:00
Samuel Pitoiset 219be27a09 radv: rework DCC/CMASK/FMASK/HTILE allocations
Add helpers and some comments to make the thing more readable.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-10-02 11:56:20 +02:00
Eric Engestrom 1262e828e7 meson: fix version typo + grammar
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2017-10-02 09:37:54 +01:00
Iago Toral Quiroga 5e584a9db7 i965: skip reading unused slots at the begining of the URB for the FS
We can start reading the URB at the first offset that contains varyings
that are actually read in the URB. We still need to make sure that we
read at least one varying to honor hardware requirements.

This helps alleviate a problem introduced with 99df02ca26 for
separate shader objects: without separate shader objects we assign
locations sequentially, however, since that commit we have changed the
method for SSO so that the VUE slot assigned depends on the number of
builtin slots plus the location assigned to the varying. This fixed
layout is intended to help SSO programs by avoiding on-the-fly recompiles
when swapping out shaders, however, it also means that if a varying uses
a large location number close to the maximum allowed by the SF/FS units
(31), then the offset introduced by the number of builtin slots can push
the location outside the range and trigger an assertion.

This problem is affecting at least the following CTS tests for
enhanced layouts:

KHR-GL45.enhanced_layouts.varying_array_components
KHR-GL45.enhanced_layouts.varying_array_locations
KHR-GL45.enhanced_layouts.varying_components
KHR-GL45.enhanced_layouts.varying_locations

which use SSO and the the location layout qualifier to select such
location numbers explicitly.

This change helps these tests because for SSO we always have to include
things such as VARYING_SLOT_CLIP_DIST{0,1} even if the fragment shader is
very unlikely to read them, so by doing this we free builtin slots from
the fixed VUE layout and we avoid the tests to crash in this scenario.

Of course, this is not a proper fix, we'd still run into problems if someone
tries to use an explicit max location and read gl_ViewportIndex, gl_LayerID or
gl_CullDistancein in the FS, but that would be a much less common bug and we
can probably wait to see if anyone actually runs into that situation in a real
world scenario before making the decision that more aggresive changes are
required to support this without reverting 99df02ca26.

v2:
- Add a debug message when we skip clip distances (Ilia)
- we also need to account for this when we compute the urb setup
  for the fragment shader stage, so add a compiler util to compute
  the first slot that we need to read from the URB instead of
  replicating the logic in both places.

v3:
- Make the util more generic so it can account for all unused slots
  at the beginning of the URB, that will make it more useful (Ken).
- Drop the debug message, it was not what Ilia was asking for.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-10-02 08:27:13 +02:00
Matt Turner 3cfd6ad01c i965: Normalize types for FBL, FBH, etc
Allows the instructions to be compacted. The documentation claims that
some of these only accept UD types, even though the type doesn't change
the operation performed. Just normalize the types to ensure we get
instruction compaction.

The only functional changes are for FBL and CBIT (always use UD types)
and FBH (always use the same types).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-09-30 20:18:09 -07:00
Marek Olšák da3cf0e206 radeonsi: don't use the template keyword
for C++ editors

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-09-30 19:03:07 +02:00
Marek Olšák e90a2ed88e glx: don't use the template keyword
for C++ editors

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-09-30 19:03:07 +02:00
Marek Olšák 9592c43a96 gallium/vl: don't use the template keyword
for C++ editors

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-09-30 19:03:07 +02:00
Marek Olšák 874db83e24 egl/dri2: don't use the template keyword
for C++ editors

Reviewed-by: Brian Paul <brianp@vmware.com>
2017-09-30 19:03:07 +02:00
Benedikt Schemmer 3797a82e78 radeonsi/uvd: clean up si_video_buffer_create
V2: remove code duplication and one unnessecary variable, minor whitespace fix

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2017-09-30 19:03:07 +02:00
Marek Olšák e9cf64a67c radeonsi/uvd: fix planar formats broken since f70f6baaa3
Tested-by: Benedikt Schemmer <ben@besd.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
2017-09-30 19:03:07 +02:00
Roland Scheidegger 740a1618c3 gallium: add new LOD opcode
The operation performed is all the same as LODQ, but with the usual
differences between dx10 and GL texture opcodes, that is separate resource
and sampler indices (plus result swizzling, and setting z/w channels
to zero).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-09-30 02:58:09 +02:00
Kamil Páral d5e7ce28b5 drirc: whitelist glthread for Outlast
FPS increase 10-20% in starting locations on Core i5-4570 +
Radeon R9 270.
2017-09-29 20:53:32 +02:00
Jan Vesely 7148795665 travis: Add clover build using llvm-5.0
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-09-29 12:14:34 -04:00
Jan Vesely 8af90b59f9 travis: Add clover build using llvm-4.0
llvm-4 needs gcc 4.8:
http://releases.llvm.org/4.0.1/docs/ReleaseNotes.html#non-comprehensive-list-of-changes-in-this-release

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-09-29 12:14:34 -04:00
Jan Vesely b9a358a3e6 travis: Add clover build using llvm-3.9
Use r600,radeonsi instead of i915
Update binutils, new linker is required for llvm-3.9:
https://www.ubuntuupdates.org/package/core/trusty/universe/updates/binutils-2.26

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
2017-09-29 12:14:34 -04:00
Leo Liu 361d8f82c0 st/va: add dst rect to avoid scale on deint
For 1080p video transcode, the height will be scaled to 1088 when deint
to progressive buffer. Set dst rect to make sure no scale.

Fixes: 3ad8687 "st/va: use new vl_compositor_yuv_deint_full() to deint"

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Andy Furniss <adf.lists@gmail.com>
2017-09-29 10:06:30 -04:00
Nicolai Hähnle d190bfc1ad radeonsi: emit DLDEXP and DFRACEXP TGSI opcodes
Note: this causes spurious regressions in some current piglit tests,
because the tests incorrectly assume that there is no denorm support for
doubles. I'm going to send out a fix for those tests as well.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 12:08:07 +02:00
Nicolai Hähnle 061303e4fd radeonsi: emit LDEXP opcode
The LLVM intrinsic has existed for a long time. The current name was
established in LLVM 3.9.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 12:08:04 +02:00
Nicolai Hähnle 6de5147d20 st/glsl_to_tgsi: use LDEXP when available
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 12:08:03 +02:00
Nicolai Hähnle cad959d901 gallium: add LDEXP TGSI instruction and corresponding cap
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 12:08:01 +02:00
Nicolai Hähnle 2b0bfc51de tgsi: infer that dst[1] of DFRACEXP is an integer
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 12:07:59 +02:00
Nicolai Hähnle 5cf279bf7e gallivm: add support for TGSI instructions with two outputs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 12:07:57 +02:00
Nicolai Hähnle 7af64b4d4a gallivm: add dst register index to lp_build_tgsi_context::emit_store
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 12:07:55 +02:00
Nicolai Hähnle 3c78215a1c tgsi: clarify the semantics of DFRACEXP
The status quo is quite the mess:

1. tgsi_exec will do a per-channel computation, and store the dst[0]
   result (significand) correctly for each channel. The dst[1] result
   (exponent) will be written to the first bit set in the writemask.
   So per-component calculation only works partially.

2. r600 will only do a single computation. It will replicate the
   exponent but not the significand.

3. The docs pretend that there's per-component calculation, but even
   get dst[0] and dst[1] confused.

4. Luckily, st_glsl_to_tgsi only ever emits single-component instructions,
   and kind-of assumes that everything is replicated, generating this for
   the dvec4 case:

     DFRACEXP TEMP[0].xy, TEMP[1].x, CONST[0][0].xyxy
     DFRACEXP TEMP[0].zw, TEMP[1].y, CONST[0][0].zwzw
     DFRACEXP TEMP[2].xy, TEMP[1].z, CONST[0][1].xyxy
     DFRACEXP TEMP[2].zw, TEMP[1].w, CONST[0][1].zwzw

Settle on the simplest behavior, which is single-component calculation
with replication, document it, and adjust tgsi_exec and r600.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 12:07:50 +02:00
Nicolai Hähnle dbe7fc00d5 tgsi: fix the documentation of DLDEXP
Sourcing the exponent for the zw destination pair from Z is consistent
with both tgsi_exec and gallivm. In practice, st_glsl_to_tgsi always
generates per-channel instructions anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 12:07:46 +02:00
Nicolai Hähnle d713af711d tgsi: infer that DLDEXP's second source has an integer type
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 12:07:33 +02:00
Nicolai Hähnle 93bf9c114b glsl/lower_instruction: handle denorms and overflow in ldexp correctly
GLSL ES requires both, and while GLSL explicitly doesn't require correct
overflow handling, it does appear to require handling input inf/denorms
correctly.

Fixes dEQP-GLES31.functional.shaders.builtin_functions.precision.ldexp.*

Cc: mesa-stable@lists.freedesktop.org
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 12:07:08 +02:00
Nicolai Hähnle a208cd7ae4 util/queue: fix a race condition in the fence code
A tempting alternative fix would be adding a lock/unlock pair in
util_queue_fence_is_signalled. However, that wouldn't actually
improve anything in the semantics of util_queue_fence_is_signalled,
while making that test much more heavy-weight. So this lock/unlock
pair in util_queue_fence_destroy for "flushing out" other threads
that may still be in util_queue_fence_signal looks like the better
fix.

v2: rephrase the comment

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
2017-09-29 11:52:41 +02:00
Nicolai Hähnle c49400a03b r600: cleanup set_occlusion_query_state
This fixes a warning caused by the fork (note the change in the function
signature):

../../../../../mesa-src/src/gallium/drivers/r600/r600_state_common.c: In function ‘r600_init_common_state_functions’:
../../../../../mesa-src/src/gallium/drivers/r600/r600_state_common.c:2974:36: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
  rctx->b.set_occlusion_query_state = r600_set_occlusion_query_state;

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-29 11:47:37 +02:00
Nicolai Hähnle 5184a1e8ee r300: add missing case PIPE_SHADER_CAP_INT64_ATOMICS
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-09-29 11:47:34 +02:00
Nicolai Hähnle 797dd12c7b radeonsi: fix border color translation for integer textures
This fixes the extremely unlikely case that an application uses
0x80000000 or 0x3f800000 as border color for an integer texture and
helps in the also, but perhaps slightly less, unlikely case that 1 is
used as a border color.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 11:45:08 +02:00
Nicolai Hähnle 6eb9483912 radeonsi: clamp border colors for upgraded depth textures
The hardware does this automatically for unorm formats, but we need to
do it manually for unorm depth formats that have been upgraded to
Z32_FLOAT.

Fixes dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth
and others.

Fixes: d4d9ec55c5 ("radeonsi: implement TC-compatible HTILE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 11:45:05 +02:00
Nicolai Hähnle 4c56e07029 radeonsi: clamp depth comparison value only for fixed point formats
The hardware usually does this automatically. However, we upgrade
depth to Z32_FLOAT to enable TC-compatible HTILE, which means the
hardware no longer clamps the comparison value for us.

The only way to tell in the shader whether a clamp is required
seems to be to communicate an additional bit in the descriptor
table. While VI has some unused bits in the resource descriptor,
those bits have unfortunately all been used in gfx9. So we use
an unused bit in the sampler state instead.

Fixes dEQP-GLES3.functional.texture.shadow.2d.linear.equal_depth_component32f
and many other tests in dEQP-GLES3.functional.texture.shadow.*

Fixes: d4d9ec55c5 ("radeonsi: implement TC-compatible HTILE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 11:44:50 +02:00
Nicolai Hähnle 7dfa891f32 radeonsi/gfx9: fix geometry shaders without output vertices
Not that those are super common or useful, but hey! Fun corner cases
of the API...

Fixes dEQP-GLES31.functional.geometry_shading.emit.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 11:43:09 +02:00
Nicolai Hähnle a6ea4c1b93 amd/common: save an instruction in the build_cube_select sequence
Avoid a v_cndmask: the absolute value is free due to input modifiers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 11:43:07 +02:00
Nicolai Hähnle 5be5c1e0fa amd/common: fix build_cube_select
Fix the custom cube coord selection sequence to be identical to
the hardware v_cubesc/tc and OpenGL spec. Affects texture sampling
with user-provided derivatives.

Fixes dEQP-GLES3.functional.shaders.texture_functions.texturegrad.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2017-09-29 11:43:04 +02:00