Prologs (shader binaries inserted before the API shader binary) need to
know this, so that they won't change the input registers unintentionally.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The texture slot is expanded to 16 dwords containing 2 descriptors.
Those can be:
- Image and fmask, or
- Image and sampler state
By carefully choosing the locations, we can put all three into one slot,
with the fmask and sampler state being mutually exclusive.
This improves shaders in 2 ways:
- 2 user SGPRs are unused, shaders can use them as temporary registers now
- each pair of descriptors is always on the same cache line
v2: cosmetic changes: add back v8i32, don't load a sampler state & fmask
at the same time
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It was partly a state and partly emulated by shader code, but since we want
to do this in a fragment shader prolog, we need to put it into the shader
key, which will be used to generate the prolog.
This also removes the spi_ps_input states and moves the registers
to the PS state.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
BCOLOR inputs were immediately after COLOR inputs. Thus, all following inputs
were offset by 1 if color_two_side was enabled, and not offset if it was not
enabled, which is a variation that's problematic if we want to have 1 variant
per shader and the variant doesn't care about color_two_side (that should be
handled by other bytecode attached at the beginning).
Instead, move BCOLOR inputs after all other inputs, so BCOLOR0 is at location
"num_inputs" if it's present. BCOLOR1 is next.
This also allows removing si_shader::nparam and
si_shader::ps_input_param_offset, which are useless now.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2: drop inline keyword
drop radeon_llvm_dispose_kernel_module wrapper
v3: move definitions to .c file
use in radeonsi
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
It is only used during shader creation now, so no need to keep it around
afterwards.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This does change the behavior slightly:
If a shader writes COLOR[i] and that color buffer isn't bound,
the shader will export MRT_NULL instead and discard the IR tree that
calculates the output. The only exception is alpha-to-coverage, which
requires an alpha export.
v2: - update a comment about 16BPC
- account for MRTZ when when fixing alpha-test/kill
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The function will be extended to dump all binaries shaders will consist of,
so si_shader* makes sense here.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
There will be 1 config per variant, which will be a union of configs
from {prolog, main, epilog}. For now, just add the structure.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
because the API pixel shader binary will not emulate alpha test one day,
so the KILL_ENABLE bit must be determined elsewhere.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will allow us to send shader debug info.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This reduces the shader key for ES.
Use a fixed attrib location based on (semantic name, index).
The ESGS item size is determined by the physical index of the highest ES
output, so it's almost always larger than before, but I think that
shouldn't matter as long as the ESGS ring buffer is large enough.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
I discovered that increasing the ESGS ring size fixes GS hangs on Tonga,
so let's do it properly.
There is now a separate init_config_gs_rings state that is not immutable,
because GS rings are resized when needed.
This also saves some memory. Most apps won't need more than 1MB
per ring per shader engine.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The "current" shader pointer is moved from the CSO to the context, so that
the CSO is mostly immutable.
The only drawback is that the "current" pointer isn't saved when unbinding
a shader and it must be looked up when the shader is bound again.
This is also a prerequisite for multithreaded shader compilation.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
- convert 16 states to 1 atom
- only emit 1 scissor if VIEWPORT_INDEX isn't written
- use only one packet when emitting consecutive scissors
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>