This really, really helps on platforms where fabs() isn't free. A great
many shaders use a * frsq(fabs(fdot(a, a))) to normalize a vector.
Since the result of the fdot must be non-negative, the fabs can be
eliminated by an existing algebraic rule.
shader-db results:
r300 (run on R420 - X800XL)
total instructions in shared programs: 1369807 -> 1368550 (-0.09%)
instructions in affected programs: 59986 -> 58729 (-2.10%)
helped: 609
HURT: 0
total vinst in shared programs: 512899 -> 512861 (<.01%)
vinst in affected programs: 1522 -> 1484 (-2.50%)
helped: 36
HURT: 0
total sinst in shared programs: 260690 -> 260570 (-0.05%)
sinst in affected programs: 1419 -> 1299 (-8.46%)
helped: 120
HURT: 0
total consts in shared programs: 957295 -> 957230 (<.01%)
consts in affected programs: 849 -> 784 (-7.66%)
helped: 65
HURT: 0
LOST: 0
GAINED: 3
The 3 gained shaders are all vertex shaders from XCom: Enemy Unknown.
I'm guessing that game is never going to run on my X800XL. :)
i915
total instructions in shared programs: 791121 -> 780843 (-1.30%)
instructions in affected programs: 220170 -> 209892 (-4.67%)
helped: 2085
HURT: 0
total temps in shared programs: 47765 -> 47766 (<.01%)
temps in affected programs: 9 -> 10 (11.11%)
helped: 0
HURT: 1
total const in shared programs: 93048 -> 92983 (-0.07%)
const in affected programs: 784 -> 719 (-8.29%)
helped: 65
HURT: 0
LOST: 0
GAINED: 36
Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 16702250 -> 16697908 (-0.03%)
instructions in affected programs: 119277 -> 114935 (-3.64%)
helped: 1065
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 4.08 x̃: 4
helped stats (rel) min: 0.48% max: 10.17% x̄: 3.66% x̃: 3.94%
95% mean confidence interval for instructions value: -4.26 -3.89
95% mean confidence interval for instructions %-change: -3.76% -3.56%
Instructions are helped.
total cycles in shared programs: 880772068 -> 880734134 (<.01%)
cycles in affected programs: 2134456 -> 2096522 (-1.78%)
helped: 941
HURT: 324
helped stats (abs) min: 2 max: 2180 x̄: 123.06 x̃: 44
helped stats (rel) min: 0.04% max: 49.96% x̄: 7.08% x̃: 3.81%
HURT stats (abs) min: 2 max: 2098 x̄: 240.33 x̃: 35
HURT stats (rel) min: 0.04% max: 77.07% x̄: 12.34% x̃: 3.00%
95% mean confidence interval for cycles value: -47.93 -12.04
95% mean confidence interval for cycles %-change: -2.87% -1.34%
Cycles are helped.
No shader-db changes on any other Intel platform.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17181>
The modifiers are u_vectors, but the code was trying to access them
as dynarrays. This resulted in a wrong number of modifiers, which then
later on would also lead to invalid reads used as modifiers.
In the case of the iris driver, a wrongly read number of modifiers > 0
would also trigger an error message.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6643
Fixes: b5848b2dac ("egl/wayland: use surface dma-buf feedback to allocate surface buffers")
Reviewed-by: Leandro Ribeiro <leandro.ribeiro@collabora.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17180>
This is similar to vk_shader_module_to_nir only it takes a
VkPipelineShaderStageCreateInfo and handles
VK_KHR_graphics_pipeline_library semantics for when a
VkShaderModuleCreateInfo is provided instead of an actual module.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17196>
This change involves two enums:
* rogue_texstate.xml: All COMPRESSED_* members of FORMAT are moved
to FORMAT_COMPRESSED (without the prefix). A second field is added
to IMAGE_WORD0 (texformat_compressed) which overlaps with the
original (texformat), and
* rogue_pbestate.xml: REG_WORD0_LINESTRIDE was not a real enum; it's
removed entirely. It only has value when feature
pbe_stride_align_1pixel is present, so a FIXME comment was added to
this effect.
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17204>
Fix defect reported by Coverity Scan.
Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking rop_reads_dst suggests that it may be
null, but it has already been dereferenced on all paths leading to the
check.
Fixes: 94be0dd0b8 ("tu: Implement extendedDynamicState2LogicOp")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17099>
SSBO access works very differently from UBO access. Straddling
loads/stores isn't an issue, loads/stores instead must be aligned to the
element size and can have up to 4 components.
We support 16-bit access with SSBOs on a650+, and sometimes the
vectorizer tries to create a misaligned 32-bit access when combining
32-bit and 16-bit accesses. The UBO-focused logic didn't reject this,
which is now fixed. This fixes a number of VK-CTS regressions on a650+.
Fixes: bf49d4a084 ("freedreno/ir3: Enable load/store vectorization for SSBO access, too.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17040>
`x86_build-base-wine.sh` are usd to install wine and xvfb
`x86_build-mingw-patch.sh` are used to pull packages from msys2 that can be directly used.
`x86_build-mingw-source-deps.sh` are used to building llvm, libclc, clang, spirv-tools and directx-headers from source
xvfb are used to enable wgl tests on debian
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16084>
The test can compile, but can not pass, so compile it but not running it
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16084>
error message:
```
../../src/gallium/winsys/d3d12/wgl/d3d12_wgl_framebuffer.cpp:231:42: error: no matching function for call to 'operator new(sizetype, d3d12_wgl_framebuffer*&)'
231 | new (fb) struct d3d12_wgl_framebuffer();
| ^
<built-in>: note: candidate: 'void* operator new(long long unsigned int)'
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16084>
Because we may compile mesa with both rtti=enabled and rtti=disabled because of LLVM
Fixes errors:
../../src/gallium/drivers/d3d12/d3d12_video_enc_h264.cpp:777:7: error: 'dynamic_cast' not permitted with '-fno-rtti'
777 | dynamic_cast<d3d12_video_bitstream_builder_h264 *>(pD3D12Enc->m_upBitstreamBuilder.get());
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16084>
Reprogram SF CLIP viewport pointer by not skipping its
dirty flag bit.
Many thanks to Lin, Shuicheng <shuicheng.lin@intel.com>,
Jerez Plata, Francisco <francisco.jerez.plata@intel.com>,
Graunke, Kenneth W <kenneth.w.graunke@intel.com>,
and others for their great help.
Signed-off-by: Zhang, Jianxun <jianxun.zhang@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17171>
We need to push loop nesting to handle this correctly -- at the end of
the innermost loop, the correct nesting is 1 (from the if), not 0.
Fixes assertion failure in
dEQP-GLES2.functional.shaders.struct.local.dynamic_loop_nested_struct_array_fragment,UnexpectedPass
dEQP-GLES2.functional.shaders.struct.local.dynamic_loop_nested_struct_array_vertex,UnexpectedPass
dEQP-GLES2.functional.shaders.struct.uniform.dynamic_loop_nested_struct_array_fragment,UnexpectedPass
dEQP-GLES2.functional.shaders.struct.uniform.dynamic_loop_nested_struct_array_vertex,UnexpectedPass
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17128>
In trying to figure out why an rpi3 job took so long, I wished that the
wget verbosity was hidden by default, and that it told me how long it took
like other sections do.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17146>
We had it set up for arm64 asan already, do it for everyone else too. In
cleaning up the duplication, this fixes a pasteo in rpi3 which had the
"artifacts: false" on the wrong job, causing it to do a slow download of
the mesa build from gitlab.
Doing this required also moving the ".use-debian/arm_test" in as well, so
that its "needs:" didn't overwrite ours if it appeared after us in the
consumer's "extends:"
Should save about 20 seconds on rpi3 jobs.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17146>
This is not one of the valid stages in the top level "stages:"
declaration, you're supposed to get your stage from your
test-source-dep.yml include. Avoids issues when .baremetal-test gets
included after your test-source-dep.
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17146>
There are several places that should have supported the various sized
versions of bcsel and the various nir_op_[fi]csel_* opcodes. Rather
than enumerate the whole list, add a property.
v2: Make the comment for NIR_OP_IS_SELECTION more descriptive.
Suggested by Jason.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17048>
For example, the proof for this pattern
(('bcsel', ('flt', 'a@32', 0), 'b@32', 'c@32'), ('fcsel_ge', a, c, b)),
would be
bcsel(a < 0, b, c)
bcsel(!(a < 0), c, b)
bcsel(a >= 0, c, b)
fcsel_ge(a, c, b)
However, !(a < 0) => (a >= 0) is well known to produce different
results if `a` is NaN.
Instead of that replacement, use this replacement:
bcsel(a < 0, b, c)
bcsel(-0 < -a, b, c)
bcsel(0 < -a, b, c)
fcsel_gt(-a, b, c)
This is NaN-safe and exact.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Fixes: 0f5b3c37c5 ("nir: Add opcodes for fused comp + csel and optimizations")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17048>
This was missing, and the added validation caught it.
Fixes: 708c47e663 ("nir: Validate nir_tex_instr::dest_type bitsize")
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17172>
All 4 jobs had a total of about 26 minutes of runner time, so squish them
onto 3 runners and use gbm for the .shader_tests to avoid X overhead and
hopefully succeed with full concurrency.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17172>
min/max pointsize clamping affects the value that must be used,
meaning that it may not be 1.0
in the case where clamping changes the value from 1.0, ensure the shader
export path is used if attenuation isn't enabled
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17145>