This re-enables and fixes support for stencil buffer.
It fixes 365 stencil related deqp tests. All tests that use INCR, INCR_WRAR,
DECR and DECR_WRAP as a stencil op still fail, but they also fail with the
blob, so we may ignore that for now.
We still have dEQP-GLES2.functional.depth_stencil_clear.depth_stencil_masked
failing, which is strange because it's the only one out of the
depth_stencil_clear.* set.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
ds_bpermute_b32/ds_permute_b32 are fine, I think
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
We don't create atomics with definitions if they are not used in NIR, but
our own DCE can remove the uses if an export turns out to be null.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
isel/label_instruction currently doesn't create these but we should
probably check anyway.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
This was fine because a literal used to mean that it didn't use modifiers,
but now VOP3 can take a literal on GFX10.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
Tested on Navi by using dEQP-VK.image.image_size.buffer.* and the GFX8
path with the size multipled by the stride.
dEQP-VK.image.image_size.buffer.* was also run with the tests modified to
use a 96bit format.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
RADV's LLVM backend and radeonsi does the same thing.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
The bo access needs to be freed prior to removing it from its hash
table. This prevents leaking them over time.
Signed-off-by: Daniel Ogorchock <daniel.ogorchock@garmin.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3308>
This field is for the primitive ID export to the fragment shader.
Ported from RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Only needed for NGG without passthrough mode or for NGG streamout.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It can't be enabled for geometry shaders, for NGG streamout and
for vertex shaders that export the primitive ID. NGG passthrough
requires that LDS isn't used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Per the semi-recently-released NVIDIA docs, when this bit is not
enabled, then the result for RT[0] will be used. So if e.g. only a
single RT is drawn to and it's not RT[2], the results will not be
visible. Fixes
GTF-GL45.gtf33.GL3Tests.explicit_attrib_location.explicit_attrib_location_pipeline
which was failing due to a frag shader outputting only to location=2.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This corresponds to gl_PrimitiveID and gl_Layer. When both of these are
stored in a single AST.64 or AST.128 operation, then it appears as
though the whole store fails. Fixes the recently extended
glsl-1.50-transform-feedback-builtins piglit, and also
gtf30.GL3Tests.transform_feedback.transform_feedback_builtins.
The issue was reproduced on GM107 and GP108 but not GK208 nor GK104.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Perhaps in a future implementation, such events could be passed back to
the driver, or queried directly. However for now, this is required for
GL 4.3 robustness contexts.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The fix was found by Karol Herbst a long time ago, but it was unclear
why it helped or if it would create additional problems. This change
adds a comment that explains what's going on, and in the process also
normalizes the nv50 implementation to match.
The coordinates which are fed to gl_Position map directly to pixel
coordinates, since the viewport transform is disabled. If the
framebuffer is MSAA, then that doesn't affect the pixel coordinates at
all, it's just that each pixel has multiple samples.
Note that this makes it really clear that this approach is inappropriate
for EXT_framebuffer_multisample_blit_scaled, and also the 3d path will
fail terribly for direct copies. Thankfully the 2d path normally takes
care of this.
Fixes KHR-GL43.packed_depth_stencil.blit.depth32f_stencil8 as well as
scaling issues in a number of EXT_framebuffer_multisample-related piglit
tests (although they continue to fail due to inaccuracies).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
lima doesn't support alpha test, flat shading, two-sided color nor
clip planes. We can enable these caps when corresponding hw features
are implemented in the driver.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Fixes some of dEQP-GLES2.functional.polygon_offset.* tests and shadows in Q3A.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Apparently Mali4x0 doesn't do viewport clipping, so anything rendered beyond viewport
is still rendered. Looks like we need to use scissors to do clipping.
Fixes most of dEQP-GLES2.functional.clipping.*, 6 out of 7 remaining failures
fail on blob as well. Remaining [1] fails on many other gallium drivers.
[1] dEQP-GLES2.functional.clipping.triangle_vertex.clip_three.clip_neg_x_neg_z_and_pos_x_pos_z_and_neg_x_neg_y_pos_z
Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Apparently it doesn't depend on primitive type, the value
only depends on whether we specify point size via PLBU command --
bit 12 is set in this case
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
The state value of main_uniform_storage_index will be wrong for
add_parameter() when find_and_update_previous_uniform_storage()
finds a uniform if there is more than 1 uniform used in
multiple shader stages.
The new code is also simpler.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The output of v_cmp instructions is s1 (a single SGPR) in wave32 mode,
as opposed to s2 (an SGPR-pair) in wave64 mode.
A couple of cases where this should have been fixed were omitted from
the previous patch by mistake.
Fixes: e0bcefc3a0
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
This will be convenient in a later commit enabling SIMD32 fragment
shaders, and happens to fix the calculation for MATH instructions
which is currently inaccurate for SIMD-lowered instructions on Gen4-5
platforms (all of them on Gen4 in SIMD16 mode), since it was based on
the shader's dispatch width rather than on the actual execution size
of the instruction.
This causes some shader-db noise on Gen4 due to the more compact
register allocation interacting with the SEND dependency workarounds,
but otherwise no major changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The liveness calculation done by the local CSE pass in order to prune
AEB entries whose sources are no longer live is currently inaccurate,
because the live intervals are calculated once at the beginning of the
pass, so they don't take into account any of the copy instructions
inserted by the CSE pass as it makes progress. However the IP counter
used in that calculation is based on the start_ip of the basic block,
which is updated automatically whenever any instructions are inserted
into the CFG. This causes the IP counter and liveness intervals to
get out of sync in programs with multiple basic blocks, causing the
CSE pass to toss AEB entries prematurely, which can lead to missed
optimization opportunities rather non-deterministically.
On BDW this leads to the following shader-db changes:
total instructions in shared programs: 14952488 -> 14951763 (-0.00%)
instructions in affected programs: 45416 -> 44691 (-1.60%)
helped: 40
HURT: 4
total spills in shared programs: 20989 -> 20970 (-0.09%)
spills in affected programs: 103 -> 84 (-18.45%)
helped: 3
HURT: 0
total fills in shared programs: 24981 -> 24926 (-0.22%)
fills in affected programs: 127 -> 72 (-43.31%)
helped: 3
HURT: 0
In addition it avoids a number of regressions in combination with some
of the optimization changes I'm working on for SIMD32, which would
have made CSE more effective... Causing it to be less effective
elsewhere in the program astonishingly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For uniform sample ID, only the first channel of msg_data will be
initialized. We need to pass that component only to the SEND message
for SIMD lowering to unzip the descriptor source correctly.
Fixes several dozens of conformance test failures with SIMD32 fragment
shaders enabled, including:
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.dynamic_sample_number.*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The problem occured when the return payload of a SIMD8 SEND
instruction was re-used as source payload of an EOT SEND message. In
such cases the interference edge added by that workaround between the
payload and grf127_send_hack_node would have no effect, because the
payload would be allocated to a fixed range of registers containing
r127 by the special handling of EOT message payloads in the same
function. This would cause things to blow up if the source payload of
the first SIMD8 message ended up being allocated to a range which
happened to overlap the destination.
Fix it by avoiding r127 altogether in the allocation of EOT message
payloads.
The problem can be reproduced on ICL with the fp-indirections2 Piglit
test-case in combination with the other optimizer changes of this
series.
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends dest"
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prevents invalid code from being emitted for ROR/ROL instructions in
SIMD32 shaders.
The problem can be reproduced with the following tests while forcing
SIMD32 to be used for fragment shaders:
piglit.shaders.glsl-rotate-left
piglit.shaders.glsl-rotate-right
However the issue could occur in production already with compute
shaders and a workgroup size large enough to trigger SIMD32 dispatch.
Fixes: 83fdec0f0d "intel/compiler: Enable the emission of ROR/ROL instructions"
Cc: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The current implementation was broken for any integers between 2^24
and 2^30 (it would return zero for me on ICL). The reason is that for
such integers we wouldn't take the 'if (0 <= shiftCount)' early return
path, however 'shiftCount + 7' would be positive, leading to a
negative 'count' argument passed to __shift64RightJamming(), which
would give undefined results.
This reworks the affected conversion functions to use either
__shortShift64Left() or __shift64RightJamming() based on the sign of
the final shift count, which should avoid the problem. In addition
this should qualify as a clean-up/optimization -- This implementation
of the conversion functions translates to 7 instructions less than the
original on Intel hardware.
This fixes the 'KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot'
conformance tests on soft fp64 hardware with large enough subgroup
size (>16).
Fixes: d5cf6e92b4 "glsl: Add built-in functions to do uint64_to_fp32(uint64_t)"
Fixes: c9d333a6b7 "glsl: Add built-in functions to do int64_to_fp32(int64_t)"
Cc: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
This patch prevents memory leak in get_version function in st_manager.c
This issue was found by valgrind:
16 bytes in 1 blocks are definitely lost in loss record 6 of 1,418
at 0x483CD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x63D9476: st_init_extensions (st_extensions.c:1679)
by 0x63B803B: get_version (st_manager.c:1271)
by 0x63B8124: st_api_query_versions (st_manager.c:1289)
by 0x63266EF: dri_init_screen_helper (dri_screen.c:583)
by 0x6321B12: dri2_init_screen (dri2.c:2110)
by 0x631AACC: driCreateNewScreen2 (dri_util.c:155)
by 0x5D58192: dri3_create_screen (dri3_glx.c:897)
by 0x5D39829: AllocAndFetchScreenConfigs (glxext.c:815)
by 0x5D39C57: __glXInitialize (glxext.c:941)
by 0x5D3290A: GetGLXPrivScreenConfig (glxcmds.c:174)
by 0x5D34F38: glXQueryExtensionsString (glxcmds.c:1307)
Fixes: eca8032f20 ("gallium: Add ARB_gl_spirv support")
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3345>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3345>
Previously, when cluster_size was set to 0, it always worked as if
the cluster size was 64. This commit fixes it in wave32 mode by
changing to work as if the cluster size was set to 32.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>