With TGSI, the driver allocates space for the alpha ref as a uniform and
adds a conditional discard to the shader. We could either replicate that
with NIR, or just set the flag saying we need the shader lowering and get
the same thing.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16063>
There's no hardware instructions for them until then. These chips don't
expose the extension provinding the GLSL builtins for operations like
bfrev, but NIR can recognize the construct and optimize it to
bitfield_reverse, which pre-nvc0 would then fail to codegen. Prevents a
regression when moving to nir-to-tgsi. Other lower_bitfield flags are set
as well for when someone comes along and adds optimizations for them too.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16063>
Also adjust the lowering pass to handle wide SSBO loads that we now emit
for the nir case.
This improves generated code quality since memoryopt can't
merge SSBO loads that end up predicated on a bounds check.
This also happens to fix a few test cases, only because the simpler generated
IR is less likely to trigger other compiler bugs. Eg on kepler with
NV50_PROG_USE_NIR=1, this fixes
arb_gpu_shader_fp64-fs-non-uniform-control-flow-ubo
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16063>
This is important so you don't go comparing the number of instructions
emitted when you unrolled loops differently.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16063>
For !8044 I'm working on getting all drivers to accept NIR. The NIR
compiler in the driver is apparently not quite ready, so use NIR-to-TGSI
instead. This is a net win in testcases working on my RV770 and Turks
cards (especially in some important piglit tests involving YUV dma-buf
decode), though it's not regression-free.
shader-db (R600):
total dw in shared programs: 8553412 -> 8358918 (-2.27%)
dw in affected programs: 7476702 -> 7282208 (-2.60%)
total gprs in shared programs: 217286 -> 213217 (-1.87%)
gprs in affected programs: 72747 -> 68678 (-5.59%)
total loops in shared programs: 398 -> 330 (-17.09%)
loops in affected programs: 68 -> 0
total cf in shared programs: 558835 -> 332768 (-40.45%)
cf in affected programs: 420475 -> 194408 (-53.76%)
shader-db (Turks):
total dw in shared programs: 14104598 -> 13556782 (-3.88%)
dw in affected programs: 12161972 -> 11614156 (-4.50%)
total gprs in shared programs: 321068 -> 313690 (-2.30%)
gprs in affected programs: 114899 -> 107521 (-6.42%)
total loops in shared programs: 736 -> 651 (-11.55%)
loops in affected programs: 111 -> 26 (-76.58%)
total cf in shared programs: 925771 -> 581226 (-37.22%)
cf in affected programs: 678600 -> 334055 (-50.77%)
total stack in shared programs: 27853 -> 27855 (<.01%)
stack in affected programs: 5 -> 7 (40.00%)
glmark2 terrain: 0.137649% +/- 0.0511938% (n=6)
glmark2 jellyfish: no change (n=8)
unigine valley (extreme) 5.36 -> 5.45 (n=1 it takes so long to run)
unigine heaven (basic) 16.13 -> 16.15 (n=1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14319>
KHR-GL33.shaders.indexing.tmp_array.vertexid regressed with the switch to
NIR-to-TGSI because the shader got optimized enough to emit a read just
after writing to the array (the kind of situation where a non-rel write
would have been followed by a PV/PS read). The R600 and EG docs say you
always need to do this, but apparently some hardware gives you the right
answer anyway so we don't flag it on all of them.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14319>
We are underprovisioned for Windows, almost certainly not running wide
enough on the insufficient number of slots we do have, and there are
also indications that the machine itself is having physical issues.
Disable it until it's fixed.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16055>
util_cpu_detect is an anti-pattern: it relies on callers high up in the call
chain initializing a local implementation detail. As a real example, I added:
...a Mali compiler unit test
...that called bi_imm_f16() to construct an FP16 immediate
...that calls _mesa_float_to_half internally
...that calls util_get_cpu_caps internally, but only on x86_64!
...that relies on util_cpu_detect having been called before.
As a consequence, this unit test:
...crashes on x86_64 with USE_X86_64_ASM set
...passes on every other architecture
...works on my local arm64 workstation and on my test board
...failed CI which runs on x86_64
...needed to have a random util_cpu_detect() call sprinkled in.
This is a bad design decision. It pollutes the tree with magic, it causes
mysterious CI failures especially for non-x86_64 developers, and it is not
justified by a micro-optimization.
Instead, let's call util_cpu_detect directly from util_get_cpu_caps, avoiding
the footgun where it fails to be called. This cleans up Mesa's design,
simplifies the tree, and avoids a class of a (possibly platform-specific)
failures. To mitigate the added overhead, wrap it all in a (fast) atomic
load check and declare the whole thing as ATTRIBUTE_CONST so the
compiler will CSE calls to util_cpu_detect.
Co-authored-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15580>
We know it's not conformant and that's OK.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16031>
NIR doesn't have that knob currently, so we end up throwing errors about
it being ignored.
This should fix cases of "tgsi_to_nir: unhandled TGSI property 23 = 1",
and presumably do better at DX9 muls on nv50 and r600.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14883>
Otherwise, we get an ishl that the HW can't support, and a ushr if the NIR
ends up being lowered to ubo_vec4, which may not get constant-folded if
the offset was non-constant.
This matches what mesa/st uses for this arg to uniform lowering.
Fixes: #5971
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14883>
On virglrenderer it is of interest to not re-use temporaries when we
want to handle precise, invariant, and highp/mediump with better
possibility for optimization.
v2: Force optimized RA if the number of registers is too large
(Emma: only 16 bit signed int are reserved for register indices)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16051>
r600 would end up looking for it past the end of its array of inputs
(which expected 1:1 ordering from declarations to driver locations).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16043>
virglrenderer emits GLSL referencing all the swizzles, even if the write
mask doesn't contain them. This is a problem when the output is
TessLevelInner, which has only 2 elements.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16043>
Same splitting method as store_output. Fixes regressions in virgl
with nir-to-tgsi.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16043>
Found in virgl, where a glslparsertest accidentally gets its inputs
lowered to undefs, and 64-bit undefs don't get split by the normal
alu/intrinsic splitter (and would be hard to split because other passes
would see reconstruction of the vec4 from undefs and turn it back into
vec3/vec4 undef).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16043>
I've tried to keep virglrenderer workarounds out of ntt, but this one
would be bothersome to do with tgsi_translate and TG4 is pretty low-stakes
for NTT consumers.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16043>
Getting opengl32*.def consistence with Windows SDK.
Getting osmesa.mingw.def's gl* functions consistence with Windows SDK.
stw_* functions are cdecl, not stdcall, so there is no need mangling the symbol.
Fixes egl.def for x86
d3d10sw: Move the place of d3d10_sw.def to d3d10_sw.def.in
Fixes vulkan_lvp.def for x86
Fixes#5552
Remove stdcall-fixup
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14041>
gfxbench sets these between the gbuffer subpass and the following ones.
They should be no-ops as subpass dependencies. gfxbench vk-5-debug perf
12.8 -> 14.6 fps thanks to getting gmem on the gbuffer rendering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15982>
gfxbench vk-5-normal has a shader that sampels into a texels[] array at
the top, then in a loop calls a GLSL function passing texels[] in by
value. This resulted in a copy to a temp inside the loop, which got
lowered to scratch stores since it was pretty big.
By doing find_array_copies(), we notice that it's equivalent to
copy_deref, then get to copy-propagate from the array at the top. Then we
only have to set up the scratch array outside of the loop and load_scratch
from it in the called function inside the loop. This also causes there to
be less spilling, stps 1144 -> 354 and ldps 826->36.
However, it doesn't seem to change performance on the test. So, while
this seems to be an improvement for the shader, and we could maybe even do
better by rematerializing the txl samples inside the loop instead of
storing the texture fetches to scratch in the first place, it doesn't
currently seem worth pursuing more optimization of this shader.
No change on freedreno shader-db.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15982>
This was useful for comparing image allocations between gfxbench
gl_5_normal and vk_5_normal to see if rendering was generally equivalent
(formats, MSAA, UBWC choices, and notably gfxbench vk was choosing DXT5
instead of ASTC on non-android builds!)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15982>
Refactored tu6_calculate_lrz_state and added comments.
1) If there is no depth write we could keep LRZ valid with any
compare op, we just have to temporary disable LRZ for incompatible
ops in such case.
2) Found that VK_COMPARE_OP_EQUAL is not compatible with LRZ,
and since it doesn't change LRZ buffer - LRZ could be just
temporary disabled. This fixes rendering of grass/trees in
PUBG mobile on angle.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6127
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16014>
The tgsi path already marked all aliasing loads of atomic counters with
CACHE_CG, so we don't need to emit a cctl. This patch uses the cache
flag on the atomic to model whether the L1 cache needs the stale
values to be flushed or not.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14386>
We were previously only emitting these for CAS, but all of the atomics
seem to need it.
Fixes spec@glsl-es-3.10@execution@fs-simple-atomic-counter-inc-dec-read
on kepler with NV50_PROG_USE_NIR=1
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14386>
There, no more C and C++ sources of the same base-name. We can do both
in one source.
This is our last C++ source file, so let's also clean away the C++20
mess in meson.build.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15816>
This does quite a lot in one go, simply because C and C++ are too
different to cleanly move from one language to another. But hopefully
this won't create too many rebase-issues.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15816>