Commit Graph

385 Commits

Author SHA1 Message Date
Marek Olšák 735a3ba007 radeonsi/gfx10: enable GS fast launch for triangles and strips with NGG culling
Only non-indexed triangle lists and strips are supported. This increases
performance if there is something to cull.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-20 16:16:11 -05:00
Marek Olšák 8db00a51f8 radeonsi/gfx10: implement NGG culling for 4x wave32 subgroups
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-20 16:16:11 -05:00
Marek Olšák aa2d846604 radeonsi/gfx10: move GE_PC_ALLOC setting to shader states
The value is not changed. I just use a different way to compute it.

The value will vary with NGG culling.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-20 16:16:11 -05:00
Marek Olšák 68586bdd21 radeonsi: remove useless #includes
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>
2020-01-15 21:54:55 +00:00
Marek Olšák da2c12af4b radeonsi: move geometry shader code into si_shader_llvm_gs.c
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>
2020-01-15 21:54:55 +00:00
Marek Olšák cf65c6f0d2 radeonsi: move VS_STATE.LS_OUT_PATCH_SIZE a few bits higher to make space there
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-15 15:06:31 -05:00
Marek Olšák 42112010a3 radeonsi: rename si_shader_create -> si_create_shader_variant for clarity
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2020-01-14 18:46:07 -05:00
Marek Olšák f4ba457e1e radeonsi: clean up si_shader_info
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2020-01-14 18:46:07 -05:00
Marek Olšák 03950473df radeonsi: merge si_tessctrl_info into si_shader_info
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2020-01-14 18:46:07 -05:00
Marek Olšák 5fa2ab831e radeonsi: fork tgsi_shader_info and tgsi_tessctrl_info
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2020-01-14 18:46:07 -05:00
Marek Olšák 18aaceae8d radeonsi: rename si_shader_info -> si_shader_binary_info
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2020-01-14 18:46:07 -05:00
Marek Olšák 7f4a54d5bd radeonsi: remove TGSI from comments
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2020-01-14 18:46:07 -05:00
Marek Olšák 363b4027fc radeonsi: put up to 5 VBO descriptors into user SGPRs
gfx6-8: 1 VBO descriptor in user SGPRs
gfx9-10: 5 VBO descriptors in user SGPRs

We no longer pull up to 5 VBO descriptors from GTT when SDMA is disabled.

Totals from affected shaders:
SGPRS: 1110528 -> 1170528 (5.40 %)
VGPRS: 952896 -> 951936 (-0.10 %)
Spilled SGPRs: 83 -> 61 (-26.51 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 23766296 -> 22843920 (-3.88 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 179344 -> 179344 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-13 15:57:07 -05:00
Marek Olšák 312e04689a radeonsi: don't allow draw calls with uninitialized VS inputs
These always hang, because vertex buffer descriptors are not set up.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-13 15:57:07 -05:00
Marek Olšák 420fe1e7f9 radeonsi: remove TGSI
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-06 15:57:20 -05:00
Marek Olšák 17164d4e27 radeonsi/gfx10: don't declare any LDS for NGG if it's not used
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-27 13:50:57 -05:00
Marek Olšák 378444ce90 radeonsi: allow generating VS prologs with 0 inputs
If "ls_vgpr_fix" is set, we use a prolog, but it can have 0 inputs.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>
2019-12-16 20:06:07 +00:00
Marek Olšák 442ef8c3e3 radeonsi: keep serialized NIR instead of nir_shader in si_shader_selector
This decreases memory usage, because serialized NIR is more compact.

The main shader part is compiled from nir_shader.
Monolithic shader variants are compiled from nir_binary.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-05 23:28:45 -05:00
Marek Olšák fff884e09d radeonsi/nir: implement pipe_screen::finalize_nir
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-23 21:12:52 -04:00
Marek Olšák 268e0e01f3 radeonsi/nir: simplify si_lower_nir signature
just a cleanup
2019-10-15 21:52:09 -04:00
Marek Olšák eec7b0a865 radeonsi: use simple_mtx_t instead of mtx_t
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-07 20:05:07 -04:00
Marek Olšák 4dde40908f radeonsi/gfx10: set PA_CL_VS_OUT_CNTL with CONTEXT_REG_RMW to fix edge flags
We need two different values of the register, one for NGG and one for
legacy, in order to fix edge flags for the legacy pipeline.

Passing the ngg flag to emit_clip_regs would be too complicated,
so CONTEXT_REG_RMW is used for partial register updates.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-27 16:16:08 -04:00
Marek Olšák 1426acf9e7 radeonsi/gfx10: remove incorrect ngg/pos_writes_edgeflag variables
It varies depending on si_shader_key::as_ngg.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-27 16:16:08 -04:00
Marek Olšák 223b3174bd radeonsi/nir: always lower ballot masks as 64-bit, codegen handles it
This fixes KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks.

This solution is better, because the IR isn't dependent on wave32.
2019-08-19 17:23:38 -04:00
Marek Olšák 5167ca27fa gallium: add TGSI_SEMANTIC_DEFAULT_OUTER/INNER_LEVEL
for radeonsi NIR support.
2019-08-12 14:52:17 -04:00
Marek Olšák 902dd50cf0 gallium: add AMD-specific compute TGSI enums
for tgsi_to_nir
2019-08-12 14:52:17 -04:00
Marek Olšák 6a2bdb8d01 gallium: add TGSI_PROPERTY_VS_BLIT_SGPRS_AMD for tgsi_to_nir
needed by radeonsi NIR support
2019-08-12 14:52:17 -04:00
Marek Olšák f064b530f6 radeonsi/gfx10: remove an obsolete VGT_REUSE_OFF workaround
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:09:01 -04:00
Marek Olšák e777720173 radeonsi/nir: lower PS inputs before scanning the shader
Lowering PS inputs can eliminate some of them, which messes up
persp/linear barycentric coord usage info.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-08-06 17:08:46 -04:00
Marek Olšák 9ac7d0a0e2 radeonsi: fix packing of key.mono.u.ps 2019-07-30 22:06:23 -04:00
Marek Olšák 8f72f137ad radeonsi/gfx10: add as_ngg variant for TES as ES to select Wave32/64
Legacy GS has to use Wave64, so TES before GS has to use Wave64 too.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák 88efb63caf radeonsi/gfx10: implement Wave32
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák a8a526c5cb radeonsi/gfx10: set as_ngg for GS prolog
as_ngg is required by Wave32.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák 0f30223cf4 radeonsi/gfx10: combine hw edgeflags with user edgeflags for correct behavior
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák 985a59e0d1 radeonsi/gfx10: don't compile the GS copy shader if it's 100% not needed
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-07-19 20:16:19 -04:00
Marek Olšák 1c99a13f89 radeonsi: decrease maximum supported GENERIC varying index from 42 to 31
This can decrease LDS and/or memory usage for shader outputs when geometry
shaders or tessellation is used.

Only PS inputs support higher indices and those aren't eliminated by
kill_outputs.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák 3be4ed2fe1 radeonsi: fix and clean up shader_type passing
- don't pass it via a parameter if it can be derived from other parameters
- set shader_type for ac_rtld_open
- use enum pipe_shader_type instead of unsigned

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
2019-07-09 17:24:16 -04:00
Marek Olšák f66ee5af2f radeonsi: determine the rasterization primitive type accurately (v2)
v2: reworked version to fix bugs and make it more efficient

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák 32694456f7 radeonsi/gfx10: jump over the shader query atomic if the queries are disabled
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák b680f723f8 radeonsi/gfx10: export correct PrimitiveID from NGG vertex shaders
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Marek Olšák 6920f09f4b radeonsi/gfx10: fix GL_LINE polygon mode for decomposed primitives
We need to tell PA to accept edge flags generated by the input assembler,
because decomposed primitives shouldn't draw inner edges.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:13 -04:00
Nicolai Hähnle 77e715541c radeonsi/gfx10: emit VGT_GS_OUT_PRIM_TYPE from draw and add it to VS_STATE
With NGG, the VGT_GS_OUT_PRIM_TYPE can change without a shader change.

The VS_STATE is required for both streamout and culling from a vertex
shader without pre-compiling outprim-specific variants.

We could consider compiling specialized variants in the future. We
could also consider compiling the NGG logic as an epilog.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle 4ecc39e1aa radeonsi/gfx10: NGG geometry shader PM4 and upload
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle 016a465d7d radeonsi/gfx10: implement gfx10_shader_ngg
For pipelines without API GS. We will later expand this to cover NGG
geometry shaders as well.

Note that the vtx offset passed into the GS part is just the
vertex index multiplied by VGT_ESGS_RING_ITEMSIZE.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle 8ec60d3031 radeonsi/gfx10: add as_ngg shader key bit
Also add the shader main part NGG variant, so that in principle
we can switch between legacy in NGG modes.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle 226f650d92 radeonsi/gfx10: document NGG shader stages
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-07-03 15:51:12 -04:00
Nicolai Hähnle b519ddc35c radeonsi/gfx9: declare LDS ESGS ring as an explicit symbol on LLVM >= 9
This will make it easier to use LDS for other purposes in geometry
shaders in the future.

The lifetime of the esgs_ring variable is as follows:
- declared as [0 x i32] while compiling shader parts or monolithic shaders
- just before uploading, gfx9_get_gs_info computes (among other things)
  the final ESGS ring size (this depends on both the ES and the GS shader)
- during upload, the "esgs_ring" symbol is given to ac_rtld as a shared
  LDS symbol, which will lead to correctly laying out the LDS including
  other LDS objects that may be defined in the future
- si_shader_gs uses shader->config.lds_size as the LDS size

This change depends on the LLVM changes for emitting LDS symbols into
the ELF file.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle f8315ae04b amd/rtld: layout and relocate LDS symbols
Upcoming changes to LLVM will emit LDS objects as symbols in the ELF
symbol table, with relocations that will be resolved with this change.

Callers will also be able to define LDS symbols that are shared between
shader parts. This will be used by radeonsi for the ESGS ring in gfx9+
merged shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle ca21ba2a08 radeonsi: inline si_shader_binary_read_config into its only caller
Since it can only be used for reading the config of an individual,
non-combined shader, it is not very reusable anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle bf8a1ca902 radeonsi: use the new run-time linker for shaders
v2:
- fix a memory leak

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle bf11c594dd radeonsi: return bool from si_shader_binary_upload
We didn't really use error codes anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle 8b1343ca79 radeonsi: let si_shader_create return a boolean
We didn't really use error codes anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Nicolai Hähnle 77b05cc42d radeonsi: use ac_shader_config
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-12 20:28:23 -04:00
Marek Olšák c9b7a37b8f radeonsi: cull primitives with async compute for large draw calls
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:13:34 -04:00
Marek Olšák 4eb377d1c3 radeonsi: add si_vs_prolog_bits::unpack_instance_id_from_vertex_id:1
The prim discard compute shader bakes InstanceID into the output index buffer.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-05-16 13:10:07 -04:00
Marek Olšák ccfcb9d818 ac: rename SI-CIK-VI to GFX6-GFX7-GFX8
Acked-by: Dave Airlie <airlied@redhat.com>

We already use GFX9 and I don't want us to have confusing naming
in the driver. GFXn naming is better from the driver perspective,
because it's the real version of the gfx portion of the hw. Also,
CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI.

It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have
nothing to do with GFXn and they have their own version numbers.
2019-05-15 20:54:10 -04:00
Nicolai Hähnle d814c21b1b radeonsi: overhaul the vertex fetch fixup mechanism
The overall goal is to support unaligned loads from vertex buffers
natively on SI.

In the unaligned case, we fall back to the general case implementation in
ac_build_opencoded_load_format. Since this function is fully general,
we will also use it going forward for cases requiring fully manual format
conversions of dwords anyway.

This requires a different encoding of the fix_fetch array, which will now
contain the entire format information if a fixup is required.

Having to check the alignment of vertex buffers is awkward. To keep the
impact on the fast path minimal, the si_context will keep track of which
vertex buffers are (not) at least dword-aligned, while the
si_vertex_elements will note which vertex buffers have some (at most dword)
alignment requirement. Vertex buffers should be dword-aligned most of the
time, which allows a fast early-out in almost all cases.

Add the radeonsi_vs_fetch_always_opencode configuration variable for
testing purposes. Note that it can only be used reliably on LLVM >= 9,
because support for byte and short load is required.

v2:
- add a missing check to si_bind_vertex_elements

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-13 17:07:23 +02:00
Timothy Arceri a004e95dd7 radeonsi/nir: create si_nir_opts() helper
We will make use of this in the following commit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-05-01 09:41:07 +10:00
Marek Olšák 501ff90a95 radeonsi: rename r600_resource -> si_resource
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-22 13:32:18 -05:00
Timothy Arceri 2817a4ec0b radeonsi: remove unrequired param in si_nir_scan_tess_ctrl()
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-01-02 10:01:15 +11:00
Samuel Pitoiset 3fbdcd942f amd: remove support for LLVM 6.0
User are encouraged to switch to LLVM 7.0 released in September 2018.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-12-06 14:02:56 +01:00
Sonny Jiang 084cf3b966 radeonsi:optimizing SET_CONTEXT_REG for shaders vgt_vertex_reuse
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-10-05 19:04:13 -04:00
Sonny Jiang ce1d72609d radeonsi:optimizing SET_CONTEXT_REG for shaders Tessellation
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-10-05 19:04:13 -04:00
Sonny Jiang 4052624398 radeonsi:optimizing SET_CONTEXT_REG for shaders GS
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-10-05 19:04:13 -04:00
Marek Olšák c5442c1165 radeonsi: add TGSI_SEMANTIC_CS_USER_DATA for reading up to 4 SGPRs with TGSI 2018-08-29 15:31:42 -04:00
Marek Olšák 86b52d4236 radeonsi: reduce LDS stalls by 40% for tessellation
40% is the decrease in the LGKM counter (which includes SMEM too)
for the GFX9 LSHS stage.

This will make the LDS size slightly larger, but I wasn't able to increase
the patch stride without corruption, so I'm increasing the vertex stride.
2018-07-23 20:23:52 -04:00
Timothy Pearson e1621fda84 radeonsi: Use signed char for color_interp_vgpr_index
color_interp_vgpr_index was declared as a generic char value.
Because signed values are used in this variable, the result
was not safe across architectures and crashed on ppc64[el]
and arm.

Declare color_interp_vgpr_index as a signed type.

Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2018-07-18 13:31:29 -04:00
Dave Airlie 0eb65b4944 radeonsi: rename si_compiler -> ac_llvm_compiler
As precursor to moving init to common code, just rename the struct
and move it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-07-04 05:31:32 +10:00
Marek Olšák 32e413ca59 ac: move all LLVM module initialization into ac_create_module
This removes some ugly code around module initialization.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-07-02 14:34:39 -04:00
Marek Olšák 5a6414f135 radeonsi: implement vertex color clamping for tess and GS 2018-06-28 22:41:12 -04:00
Marek Olšák 034b385fc2 radeonsi: move VS_STATE_SGPR before draw SGPRs
for vertex color clamping.
2018-06-28 22:27:25 -04:00
Marek Olšák d77557c9db radeonsi: store compute local_size into tgsi_shader_info
This is kinda a hack, but it's enough for the shader cache.
2018-06-28 22:27:25 -04:00
Marek Olšák f154555733 radeonsi: clean up passing the is_monolithic flag for compilation
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-06-25 18:33:58 -04:00
Marek Olšák 2f65c67043 radeonsi: fix passing gl_ClipVertex for GS and tess
Also add the fprintf call.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-25 16:46:00 -04:00
Marek Olšák a7d61c0753 radeonsi: fix color inputs/outputs for GS and tess
GS is tested, tessellation is untested.

Have outputs_written_before_ps for HW VS and outputs_written for other
stages. The reason is that COLOR and BCOLOR alias for HW VS, which
drives elimination of VS outputs based on PS inputs.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-05-25 16:46:00 -04:00
Marek Olšák e75fc8d033 radeonsi: move data_layout into si_compiler
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 797d673c9a radeonsi: move passmgr into si_compiler
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák c1823ff661 radeonsi: move target_library_info into si_compiler
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 43f0a10051 radeonsi: add triple into si_compiler
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 87eb597758 radeonsi: add struct si_compiler containing LLVMTargetMachineRef
It will contain more variables.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák a8abbbb172 radeonsi: remove r600_pipe_common.h
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-04-27 17:56:04 -04:00
Marek Olšák 4c5efc40f4 radeonsi: update copyrights
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-04-05 15:34:58 -04:00
Marek Olšák 2be6143032 radeonsi: implement GL_KHR_blend_equation_advanced
MSAA is supported using sample shading. Layered rendering and all texture
targets are also supported.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-04-02 13:55:25 -04:00
Marek Olšák 9b7db12815 radeonsi: remove chip_class parameter from si_lower_nir
We can get it from si_screen.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2018-03-08 14:58:16 -05:00
Timothy Arceri 70190a6567 radeonsi/nir: call ac_lower_indirect_derefs()
Fixes piglit tests:
tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec3-index-rd.shader_test
tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-05 14:09:23 +11:00
Timothy Arceri 561503e3bd radeonsi: add chip class to compiler_ctx_state
This will be used in the following patch.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-03-05 14:09:23 +11:00
Marek Olšák 8799eaed99 radeonsi: remove 2 unused user SGPRs from merged TES-GS with 32-bit pointers
The effect of the last 13 commits on user SGPR counts:

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-26 12:01:19 +01:00
Marek Olšák 3fa7a59d69 radeonsi: make SI_SGPR_VERTEX_BUFFERS the last user SGPR input
so that it can be removed and replaced with inline VBO descriptors,
and the pointer can be packed in unused bits of VBO descriptors.
This also removes the pointer from merged TES-GS where it's useless.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-26 12:01:08 +01:00
Marek Olšák 2d03c4cac8 radeonsi: move tess ring address into TCS_OUT_LAYOUT, removes 2 TCS user SGPRs
TCS_OUT_LAYOUT has 13 unused bits. That's enough for a 32-bit address
aligned to 512KB. Hey, it's a 13-bit pointer!

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-24 23:08:29 +01:00
Marek Olšák 190e064e63 radeonsi: move 2nd-shader descriptor pointers into s[0:1]
If 32-bit pointers are supported, both pointers can be moved into s[0:1]
and then ESGS has exactly the same user data SGPR declarations as VS.

If 32-bit pointers are not supported, only one pointer can be moved into
s[0:1]. In that case, the 2nd pointer is moved before TCS constants,
so that the location is the same in HS and GS.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-02-24 23:08:29 +01:00
Marek Olšák 931ec80eeb radeonsi: implement 32-bit pointers in user data SGPRs (v2)
User SGPRs changes:
    VS:     14 ->  9
    TCS:    14 -> 10
    TES:    10 ->  6
    GS:      8 ->  4
    GSCOPY:  2 ->  1
    PS:      9 ->  5
    Merged VS-TCS: 24 -> 16
    Merged VS-GS:  18 -> 11
    Merged TES-GS: 18 -> 11

SGPRS: 2170102 -> 2158430 (-0.54 %)
VGPRS: 1645656 -> 1641516 (-0.25 %)
Spilled SGPRs: 9078 -> 8810 (-2.95 %)
Spilled VGPRs: 130 -> 114 (-12.31 %)
Scratch size: 1508 -> 1492 (-1.06 %) dwords per thread
Code Size: 52094872 -> 52692540 (1.15 %) bytes
Max Waves: 371848 -> 372723 (0.24 %)

v2: - the shader cache needs to take address32_hi into account
    - set amdgpu-32bit-address-high-bits

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1)
2018-02-17 04:52:17 +01:00
Marek Olšák 148b48646b radeonsi: print shader-db stats for main parts, not final binaries
This is needed to get shader-db stats for LS,HS,ES,GS stages on gfx9.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-01-31 03:21:20 +01:00
Marek Olšák c02c9ee550 radeonsi: move max_simd_waves computation into a separate function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2018-01-31 03:21:20 +01:00
Timothy Arceri 452586b56a radeonsi: add dummy implementation of si_nir_scan_tess_ctrl()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-05 11:58:55 +11:00
Samuel Pitoiset 45872a0a6d radeonsi: make use of ac_get_spi_shader_z_format()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2017-12-14 22:23:25 +01:00
Nicolai Hähnle 239d2b5809 radeonsi: clarify that si_shader_selector::esgs_itemsize is set for the ES part
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-28 09:34:43 +01:00
Nicolai Hähnle df5ebe0c26 radeonsi/gfx9: fix VM fault with fetched instance divisors
We need to account for SGPR locations in merged shaders.

This case is exercised by KHR-GL45.enhanced_layouts.vertex_attrib_locations

Fixes: 79c2e7388c ("radeonsi/gfx9: use SPI_SHADER_USER_DATA_COMMON")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-20 16:26:10 +01:00
Nicolai Hähnle 4f493c79ee radeonsi: use ready fences on all shaders, not just optimized ones
There's a race condition between si_shader_select_with_key and
si_bind_XX_shader:

  Thread 1                         Thread 2
  --------                         --------
  si_shader_select_with_key
    begin compiling the first
    variant
    (guarded by sel->mutex)
                                   si_bind_XX_shader
                                     select first_variant by default
                                     as state->current
                                   si_shader_select_with_key
                                     match state->current and early-out

Since thread 2 never takes sel->mutex, it may go on rendering without a
PM4 for that shader, for example.

The solution taken by this patch is to broaden the scope of
shader->optimized_ready to a fence shader->ready that applies to
all shaders. This does not hurt the fast path (if anything it makes
it faster, because we don't explicitly check is_optimized).

It will also allow reducing the scope of sel->mutex locks, but this is
deferred to a later commit for better bisectability.

Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple.buffers.bufferdata_render

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-11-09 11:37:51 +01:00
Marek Olšák 529cdce799 radeonsi: remove 'Authors:' comments
It's inaccurate. Instead, see the copyright and use "git log" and
"git blame" to know the authorship.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-02 18:19:03 +01:00
Marek Olšák da0083f123 radeonsi: use postponed KILL only when derivatives are used
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-10-24 14:56:34 +02:00