Otherwise, search_phi_bcsel() will be called with a buf_size that is
slightly lower than it has to be.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7748>
It should only recurse if there's enough space to add the phi sources.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 72ac3f6026 ("nir: add nir_unsigned_upper_bound and nir_addition_might_overflow")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7748>
It seems that only gfx queue doesn't support it, except on GFX10.3
which supports all queues.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7732>
They're not really "push" anymore but that's because there is no such
thing as push constants in bindless shaders on Intel. They should be
fast enough, though. There is some room for debate here as to whether
we want to do the pull in NIR or push it into the back-end. The
advantage of doing it in the back-end is that it'd be easier to use
MOV_INDIRECT for indirect push constant access rather than falling back
to a dataport message.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
For triangle geometry, the hit attributes are always two floats which
contain the barycentric coordinates of the hit. For procedural
geometry, they're an arbitrary blob of data passed from the intersection
shader to the hit shaders. In our implementation, we stash that data
right after the HW RayQuery in the ray stack.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Unlike graphics and compute pipelines, Vulkan ray-tracing pipelines do
not have a single entrypoint. Instead, the raygen shader is specified
as a one-element shader binding table in the vkCmdTraceRay call. This
means that raygen shaders have to be bindless shaders just like any
other ray tracing shader. To launch them, we have a tiny compute shader
that acts as a trampoline and sets up the hotzone and uses btd_spawn to
fire off the raygen shader.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Most of the work for this is done for us by spirv_to_nir which gives us
a load_global from a memory address based on the shader_record_ptr
system values.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
This is a little bit more work than executeCallable() because we also
have to set up the MemRay data structure which the ray traversal
hardware uses to keep its state.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Both traceRay() and executeCallable() take a payload parameter which
gets passed from the caller to the callee and which the callee can write
to pass data back to the caller. We implement these by passing a
pointer to the data structure in the callee to the caller as the second
QWord on its stack. Coming out of spirv_to_nir, the incoming call
payloads get the nir_var_shader_call_data variable mode allowing us to
easily identify them. Outgoing call payloads get assigned the
nir_var_shader_temp mode and will have been turned into function_temp by
nir_lower_global_vars_to_local. All we have to do is crawl the shader
looking for references to the nir_var_shader_call_data variable and
rewrite those to use the passed in pointer. nir_lower_explicit_io will
do the rest for us.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
These are required for ray-tracing. There are many cases where the
ray-tracing hardware may decide to execute some but not all of our
shaders. In these cases, it needs a shader to execute at the end which
will pop the stack back to the shader which called traceRay().
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Each callable ray-tracing shader shader stage has to perform a return
operation at the end. In the case of raygen shaders, it retires the
bindless thread because the raygen shader is always the root of the call
tree. In the case of any-hit shaders, the default action is accep the
hit. For callable, miss, and closest-hit shaders, it does a return
operation. The assumption is that the calling shader has placed a
BINDLESS_SHADER_RECORD address for the return in the first QWord of the
callee's scratch space. The return operation simply loads this value
and calls a btd_spawn intrinsic to jump to it.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
In ray-tracing shader stages, we have a real call stack and so we can't
use the normal scratch mechanism. Instead, the invocation's stack lives
in a memory region of the RT scratch buffer that sits after the HW ray
stacks. We handle this by asking nir_lower_io to lower local variables
to 64-bit global memory access. Unlike nir_lower_io for 32-bit offset
scratch, when 64-bit global access is requested, nir_lower_io generates
an address calculation which starts from a load_scratch_base_ptr. We
then lower this intrinsic to the appropriate address calculation in
brw_nir_lower_rt_intrinsics.
When a COMPUTE_WALKER command is sent to the hardware with the BTD Mode
bit set to true, the hardware generates a set of stack IDs, one for each
invocation. These then get passed along from one shader invocation to
the next as we trace the ray. We can use those stack IDs to figure out
which stack our invocation needs to access. Because we may not be the
first shader in the stack, there's a per-stack offset that gets stored
in the "hotzone".
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
These will eventually contain per-stage lowering for various ray-tracing
things. This is separate from brw_nir_lower_rt_intrinsics because, for
reasons that will become apparent later, brw_nir_lower_rt_intrinsics has
to be run very late in the compile process, right before brw_compile_bs.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
The new intrinsics we added for doing address calculations are all
things we fetch from the RT_DISPATCH_GLOBALS struct. We could emit an
RT_DISPATCH_GLOBALS load at every point we want it and trust NIR to CSE
it for us but it's easier to use intermediate intrinsics.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
The Intel bindless thread dispatch model is very simple. When a compute
shader is to be used for bindless dispatch, it can request a set of
stack IDs. These are allocated per-dual-subslice by the hardware and
recycled automatically when the stack ID is returned. Passed to the
bindless dispatch are a global argument address, a stack ID, and an
address of the BINDLESS_SHADER_RECORD to invoke. When the bindless
shader is dispatched, it is passed its stack ID as well as the global
and local argument pointers. The local argument pointer is the address
of the BINDLESS_SHADER_RECORD plus some offset which is specified as
part of the BINDLESS_SHADER_RECORD.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
The RT_DISPATCH_GLOBALS struct is half HW-defined by the ray-tracing
spec and half SW-defined. However, due to the addresses in it, it's
convenient for it to all be in GenXML.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
The cloned version is the one that has updated start and end bits
fields. We're about to start passing those through to a new
__gen_address function and we need the correct start/end in order to do
that reliably.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
This is the first of the HW data structures added for ray-tracing.
These are added to their own file because it's not really associated
with any hardware we've enabled in Mesa just yet. Eventually, these
will likely get folded into the appropriate genX.xml file as they are
hardware data structures and needed to be tracked as such.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
Halt is like a return for the entire shader or exit() if you prefer to
think of it that way. Once an invocation hits a halt, it's 100% dead.
Any writes to output variables which happened before the halt do,
however, still apply.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
If valgrind is installed, these components need to find valgrind.h.
Fixes: 53f7d539cd ("util: Add helgrind support for simple_mtx")
Closes: #3876
Acked-by: Rob Clark <robclark@freedesktop.org>