Only non-indexed triangle lists and strips are supported. This increases
performance if there is something to cull.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
The value is not changed. I just use a different way to compute it.
The value will vary with NGG culling.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
This decreases memory usage, because serialized NIR is more compact.
The main shader part is compiled from nir_shader.
Monolithic shader variants are compiled from nir_binary.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We need two different values of the register, one for NGG and one for
legacy, in order to fix edge flags for the legacy pipeline.
Passing the ngg flag to emit_clip_regs would be too complicated,
so CONTEXT_REG_RMW is used for partial register updates.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Lowering PS inputs can eliminate some of them, which messes up
persp/linear barycentric coord usage info.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Legacy GS has to use Wave64, so TES before GS has to use Wave64 too.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
as_ngg is required by Wave32.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This can decrease LDS and/or memory usage for shader outputs when geometry
shaders or tessellation is used.
Only PS inputs support higher indices and those aren't eliminated by
kill_outputs.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
- don't pass it via a parameter if it can be derived from other parameters
- set shader_type for ac_rtld_open
- use enum pipe_shader_type instead of unsigned
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
We need to tell PA to accept edge flags generated by the input assembler,
because decomposed primitives shouldn't draw inner edges.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
With NGG, the VGT_GS_OUT_PRIM_TYPE can change without a shader change.
The VS_STATE is required for both streamout and culling from a vertex
shader without pre-compiling outprim-specific variants.
We could consider compiling specialized variants in the future. We
could also consider compiling the NGG logic as an epilog.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For pipelines without API GS. We will later expand this to cover NGG
geometry shaders as well.
Note that the vtx offset passed into the GS part is just the
vertex index multiplied by VGT_ESGS_RING_ITEMSIZE.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Also add the shader main part NGG variant, so that in principle
we can switch between legacy in NGG modes.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will make it easier to use LDS for other purposes in geometry
shaders in the future.
The lifetime of the esgs_ring variable is as follows:
- declared as [0 x i32] while compiling shader parts or monolithic shaders
- just before uploading, gfx9_get_gs_info computes (among other things)
the final ESGS ring size (this depends on both the ES and the GS shader)
- during upload, the "esgs_ring" symbol is given to ac_rtld as a shared
LDS symbol, which will lead to correctly laying out the LDS including
other LDS objects that may be defined in the future
- si_shader_gs uses shader->config.lds_size as the LDS size
This change depends on the LLVM changes for emitting LDS symbols into
the ELF file.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Upcoming changes to LLVM will emit LDS objects as symbols in the ELF
symbol table, with relocations that will be resolved with this change.
Callers will also be able to define LDS symbols that are shared between
shader parts. This will be used by radeonsi for the ESGS ring in gfx9+
merged shaders.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Since it can only be used for reading the config of an individual,
non-combined shader, it is not very reusable anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The prim discard compute shader bakes InstanceID into the output index buffer.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
We already use GFX9 and I don't want us to have confusing naming
in the driver. GFXn naming is better from the driver perspective,
because it's the real version of the gfx portion of the hw. Also,
CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI.
It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have
nothing to do with GFXn and they have their own version numbers.
The overall goal is to support unaligned loads from vertex buffers
natively on SI.
In the unaligned case, we fall back to the general case implementation in
ac_build_opencoded_load_format. Since this function is fully general,
we will also use it going forward for cases requiring fully manual format
conversions of dwords anyway.
This requires a different encoding of the fix_fetch array, which will now
contain the entire format information if a fixup is required.
Having to check the alignment of vertex buffers is awkward. To keep the
impact on the fast path minimal, the si_context will keep track of which
vertex buffers are (not) at least dword-aligned, while the
si_vertex_elements will note which vertex buffers have some (at most dword)
alignment requirement. Vertex buffers should be dword-aligned most of the
time, which allows a fast early-out in almost all cases.
Add the radeonsi_vs_fetch_always_opencode configuration variable for
testing purposes. Note that it can only be used reliably on LLVM >= 9,
because support for byte and short load is required.
v2:
- add a missing check to si_bind_vertex_elements
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
User are encouraged to switch to LLVM 7.0 released in September 2018.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
40% is the decrease in the LGKM counter (which includes SMEM too)
for the GFX9 LSHS stage.
This will make the LDS size slightly larger, but I wasn't able to increase
the patch stride without corruption, so I'm increasing the vertex stride.
color_interp_vgpr_index was declared as a generic char value.
Because signed values are used in this variable, the result
was not safe across architectures and crashed on ppc64[el]
and arm.
Declare color_interp_vgpr_index as a signed type.
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
As precursor to moving init to common code, just rename the struct
and move it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
GS is tested, tessellation is untested.
Have outputs_written_before_ps for HW VS and outputs_written for other
stages. The reason is that COLOR and BCOLOR alias for HW VS, which
drives elimination of VS outputs based on PS inputs.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It will contain more variables.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
so that it can be removed and replaced with inline VBO descriptors,
and the pointer can be packed in unused bits of VBO descriptors.
This also removes the pointer from merged TES-GS where it's useless.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
TCS_OUT_LAYOUT has 13 unused bits. That's enough for a 32-bit address
aligned to 512KB. Hey, it's a 13-bit pointer!
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If 32-bit pointers are supported, both pointers can be moved into s[0:1]
and then ESGS has exactly the same user data SGPR declarations as VS.
If 32-bit pointers are not supported, only one pointer can be moved into
s[0:1]. In that case, the 2nd pointer is moved before TCS constants,
so that the location is the same in HS and GS.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We need to account for SGPR locations in merged shaders.
This case is exercised by KHR-GL45.enhanced_layouts.vertex_attrib_locations
Fixes: 79c2e7388c ("radeonsi/gfx9: use SPI_SHADER_USER_DATA_COMMON")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There's a race condition between si_shader_select_with_key and
si_bind_XX_shader:
Thread 1 Thread 2
-------- --------
si_shader_select_with_key
begin compiling the first
variant
(guarded by sel->mutex)
si_bind_XX_shader
select first_variant by default
as state->current
si_shader_select_with_key
match state->current and early-out
Since thread 2 never takes sel->mutex, it may go on rendering without a
PM4 for that shader, for example.
The solution taken by this patch is to broaden the scope of
shader->optimized_ready to a fence shader->ready that applies to
all shaders. This does not hurt the fast path (if anything it makes
it faster, because we don't explicitly check is_optimized).
It will also allow reducing the scope of sel->mutex locks, but this is
deferred to a later commit for better bisectability.
Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple.buffers.bufferdata_render
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It's inaccurate. Instead, see the copyright and use "git log" and
"git blame" to know the authorship.
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>