this should be significantly more performant for the majority of cases
since it's rare that shaders have multiple variants outside of unit tests,
so now there can just be a list of shaders being iterated instead where the
first entry is the last used
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12842>
Get comfy.
llvmpipe coroutines have a stack frame. This is created by hooking
in malloc and coro.alloc and coro.size intrinsics.
LLVM has an CoroElide pass that is meant to allow that stack frame
to be done as an alloca in the caller instead of using the malloc path.
The CoroElide pass relies on the coroutine being inlined (fixed that).
The CoroElide pass relies on there being a direct connect between
coro.destroy(i8 *arg) and arg = coro.begin(id). However due to the
way the compute shaders are launched, there is no way to ensure that
link. Fixing the CoroElide pass seems quite difficult, I considered
having a force CoroElide always flag to make it dtrt, however I'm not
sure how ugly that would end up.
My first attempt tried to preallocate the stacks at a fixed size,
this turned out to be naive as the stack frame size was not sized
like I expected. Instead the first coro to run allocs enough for
everyone, so avoid the massive amounts of small allocations.
This remove coro malloc from a lot of profiles and shaves another 30s
or so from OpenCL ./conversions/test_conversions uchar_uin
(from 4.40m to just under 4m on my ryzen 7 1800x)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12432>
This helps reduced the mtx lock/unlock overheads for the threadpool
if the work evenly distributes across the number of threads.
The CL CTS conversions tests really hit this, and this takes maybe 10-20s
off a 5min test run.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12432>
the mask can't entirely be calculated based on the integer parameters,
as it's possible for some of the "bind" slots to actually be unbinds,
so remove bits as necessary to fix this
also add some debug asserts to ensure I don't break this again for the
tenth time
Fixes: 6dd02a5139 ("zink: stop using util_set_vertex_buffers_mask()")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12871>
Although the index has to be dynamically uniform, if we don't ever
execute a few lanes then we'll have 0, so it important to read the
ssbo index from the first active lane.
Just loop over them all.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12689>
Not all native displays are pointers to dereferenceable memory, e.g.
DCs on Windows. Don't bother dereferencing if no platforms are available
that can be detected that way.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed By: Bill Kristiansen <billkris@microsoft.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12727>
Add a stub EGL driver for Windows
Fix compiler issues in egl/main
Ensure Windows build produces libEGL.dll
Default EGL to enabled for Windows when building a Gallium driver
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed By: Bill Kristiansen <billkris@microsoft.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12727>
Data imports need to be marked __declspec(dllimport), so
just export a function instead of data.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed By: Bill Kristiansen <billkris@microsoft.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12727>
EGL's native window is an HWND, so this removes the need to
GetDC from the creation path there.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed By: Bill Kristiansen <billkris@microsoft.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12727>
If it's 0, then it's looked up from the framebuffer for the specified HDC
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed By: Bill Kristiansen <billkris@microsoft.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12727>