Still disabled.
Only prologs & epilogs are compiled in draw calls, but each variant of those
is compiled only once per process.
VS is always compiled as hw VS.
TES is always compiled as hw VS.
LS and ES stages are always compiled on demand.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It only exports the primitive ID.
Also used by TES when it's compiled as VS.
The VS input location of the primitive ID input is v2.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Prologs (shader binaries inserted before the API shader binary) need to
know this, so that they won't change the input registers unintentionally.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The texture slot is expanded to 16 dwords containing 2 descriptors.
Those can be:
- Image and fmask, or
- Image and sampler state
By carefully choosing the locations, we can put all three into one slot,
with the fmask and sampler state being mutually exclusive.
This improves shaders in 2 ways:
- 2 user SGPRs are unused, shaders can use them as temporary registers now
- each pair of descriptors is always on the same cache line
v2: cosmetic changes: add back v8i32, don't load a sampler state & fmask
at the same time
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It was partly a state and partly emulated by shader code, but since we want
to do this in a fragment shader prolog, we need to put it into the shader
key, which will be used to generate the prolog.
This also removes the spi_ps_input states and moves the registers
to the PS state.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
BCOLOR inputs were immediately after COLOR inputs. Thus, all following inputs
were offset by 1 if color_two_side was enabled, and not offset if it was not
enabled, which is a variation that's problematic if we want to have 1 variant
per shader and the variant doesn't care about color_two_side (that should be
handled by other bytecode attached at the beginning).
Instead, move BCOLOR inputs after all other inputs, so BCOLOR0 is at location
"num_inputs" if it's present. BCOLOR1 is next.
This also allows removing si_shader::nparam and
si_shader::ps_input_param_offset, which are useless now.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2: drop inline keyword
drop radeon_llvm_dispose_kernel_module wrapper
v3: move definitions to .c file
use in radeonsi
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
It is only used during shader creation now, so no need to keep it around
afterwards.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This does change the behavior slightly:
If a shader writes COLOR[i] and that color buffer isn't bound,
the shader will export MRT_NULL instead and discard the IR tree that
calculates the output. The only exception is alpha-to-coverage, which
requires an alpha export.
v2: - update a comment about 16BPC
- account for MRTZ when when fixing alpha-test/kill
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The function will be extended to dump all binaries shaders will consist of,
so si_shader* makes sense here.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
There will be 1 config per variant, which will be a union of configs
from {prolog, main, epilog}. For now, just add the structure.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
because the API pixel shader binary will not emulate alpha test one day,
so the KILL_ENABLE bit must be determined elsewhere.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will allow us to send shader debug info.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This reduces the shader key for ES.
Use a fixed attrib location based on (semantic name, index).
The ESGS item size is determined by the physical index of the highest ES
output, so it's almost always larger than before, but I think that
shouldn't matter as long as the ESGS ring buffer is large enough.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
I discovered that increasing the ESGS ring size fixes GS hangs on Tonga,
so let's do it properly.
There is now a separate init_config_gs_rings state that is not immutable,
because GS rings are resized when needed.
This also saves some memory. Most apps won't need more than 1MB
per ring per shader engine.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>