Commit Graph

565 Commits

Author SHA1 Message Date
Pierre-Eric Pelloux-Prayer 5a05f9714b radeonsi: bump SI_NUM_SHADER_BUFFERS to 32
Some app uses more than 8 SSBOs (https://gitlab.freedesktop.org/mesa/mesa/-/issues/2946),
so increase SI_NUM_SHADER_BUFFERS to 32 (which allows 16 SSBOs).

Since we're now using a 64 bits number to track buffers, we could bump
SI_NUM_SHADER_BUFFERS to 48 but that would conflict with Mesa's
MAX_COMBINED_ATOMIC_BUFFERS limit (= 90).

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2122
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5632>
2020-06-30 09:23:14 +02:00
Marek Olšák da78d50bc8 radeonsi: make si_pm4_cmd_begin/end static and simplify all usages
There is no longer the confusing trailing si_pm4_cmd_end call.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5603>
2020-06-26 07:02:57 +00:00
Marek Olšák 7b2a0f880b radeonsi: disallow adding BOs into si_pm4_state except 1 shader BO per state
The si_shader pointer is already there, so use it and remove the array
of BOs.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5603>
2020-06-26 07:02:57 +00:00
Marek Olšák 428360662f radeonsi: don't add the tess ring buffers into the cs_preamble state
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5603>
2020-06-26 07:02:57 +00:00
Marek Olšák 1c1d34a67a radeonsi: rename init_config states to cs_preamble states
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5603>
2020-06-26 07:02:57 +00:00
Marek Olšák 3fec2f67c3 radeonsi: compact MRTs to save PS export memory space
If there are holes between color outputs (e.g. a shader exports MRT1, but
not MRT0), we can remove the holes by moving higher MRTs lower.

The hardware will remap the MRTs to their correct locations if we remove
holes in SPI_SHADER_COL_FORMAT but not CB_SHADER_MASK.

This is a performance optimization, but MRTs with holes are pretty rare,
so there is most likely no effect on any app.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5535>
2020-06-23 00:23:51 -04:00
Marek Olšák a23802bcb9 ac,radeonsi: start adding support for gfx10.3
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5383>
2020-06-09 16:17:36 +00:00
Marek Olšák a1602516d7 ac,radeonsi: replace == GFX10 with >= GFX10 where it's needed
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5383>
2020-06-09 16:17:36 +00:00
Marek Olšák 2a3806ffa3 amd: replace SH -> SA (shader array) in comments
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5184>
2020-05-26 06:00:54 -04:00
Marek Olšák 2cf46f2e3d ac/gpu_info: replace num_good_cu_per_sh with min/max_good_cu_per_sa
Perf counters use the new max number.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5184>
2020-05-26 06:00:54 -04:00
Marek Olšák 3cd96b5109 radeonsi: don't use INDIRECT_BUFFER within IBs
It's fragile. If I change the size or alignment, it hangs. Better safe than
sorry.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5095>
2020-05-23 03:44:44 -04:00
Axel Davy 45e69e7d11 radeonsi: Enable tgsi to nir disk cache
Enable the tgsi to nir cache for radeonsi.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4993>
2020-05-13 19:43:05 +00:00
Axel Davy 522bd414f3 ttn: Add new allow_disk_cache parameter
For now this parameter doesn't do anything.
It means the implementation is allowed to use
a cache on disk.

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4993>
2020-05-13 19:43:05 +00:00
Pierre-Eric Pelloux-Prayer 547e81655a radeonsi: don't print gs_copy_shader stats for shaderdb
Fixes: dbc86fa3de ("radeonsi: dump shader stats when hitting the live cache")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4607>
2020-05-05 12:26:02 +02:00
Marek Olšák b4fd8f1919 ac,radeonsi: simplify checking for Navi1x chips
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4698>
2020-04-24 10:38:54 +00:00
Pierre-Eric Pelloux-Prayer dbc86fa3de radeonsi: dump shader stats when hitting the live cache
With the introduction of the live shader cache, when a shader is
fetched from the cache no stats are printed for shaderdb.
So in a sequence like this: vs1, fs1, vs1, fs2, shaderdb may see
3 or 4 lines, depending on the threads being used.
If one run produces 3 lines while the other produces 4 lines, it
would compare vs1 stats with fs2 stats.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4355>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4355>
2020-04-02 08:31:37 +02:00
Pierre-Eric Pelloux-Prayer d7008fe46a radeonsi: switch to 3-spaces style
Generated automatically using clang-format and the following config:

AlignAfterOpenBracket: true
AlignConsecutiveMacros: true
AllowAllArgumentsOnNextLine: false
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: false
AlwaysBreakAfterReturnType: None
BasedOnStyle: LLVM
BraceWrapping:
  AfterControlStatement: false
  AfterEnum: true
  AfterFunction: true
  AfterStruct: false
  BeforeElse: false
  SplitEmptyFunction: true
BinPackArguments: true
BinPackParameters: true
BreakBeforeBraces: Custom
ColumnLimit: 100
ContinuationIndentWidth: 3
Cpp11BracedListStyle: false
Cpp11BracedListStyle: true
ForEachMacros:
  - LIST_FOR_EACH_ENTRY
  - LIST_FOR_EACH_ENTRY_SAFE
  - util_dynarray_foreach
  - nir_foreach_variable
  - nir_foreach_variable_safe
  - nir_foreach_register
  - nir_foreach_register_safe
  - nir_foreach_use
  - nir_foreach_use_safe
  - nir_foreach_if_use
  - nir_foreach_if_use_safe
  - nir_foreach_def
  - nir_foreach_def_safe
  - nir_foreach_phi_src
  - nir_foreach_phi_src_safe
  - nir_foreach_parallel_copy_entry
  - nir_foreach_instr
  - nir_foreach_instr_reverse
  - nir_foreach_instr_safe
  - nir_foreach_instr_reverse_safe
  - nir_foreach_function
  - nir_foreach_block
  - nir_foreach_block_safe
  - nir_foreach_block_reverse
  - nir_foreach_block_reverse_safe
  - nir_foreach_block_in_cf_node
IncludeBlocks: Regroup
IncludeCategories:
  - Regex:           '<[[:alnum:].]+>'
    Priority:        2
  - Regex:           '.*'
    Priority:        1
IndentWidth: 3
PenaltyBreakBeforeFirstCallParameter: 1
PenaltyExcessCharacter: 100
SpaceAfterCStyleCast: false
SpaceBeforeCpp11BracedList: false
SpaceBeforeCtorInitializerColon: false
SpacesInContainerLiterals: false

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4319>
2020-03-30 11:05:52 +00:00
Marek Olšák 4ef1c8d60b radeonsi/gfx10: fix the wave size for compute-based culling
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4269>
2020-03-28 00:58:34 +00:00
Marek Olšák 65e9239977 radeonsi: add num_vbos_in_user_sgprs into the shader cache key
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4269>
2020-03-28 00:58:34 +00:00
Marek Olšák 7ba5e94c50 ac: add radeon_info::use_late_alloc to control LATE_ALLOC globally
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4143>
2020-03-12 17:27:23 +00:00
Marek Olšák 7481c4be58 radeonsi: add a bug workaround for NGG - LATE_ALLOC_GS
Cc: 19.3 20.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4079>
2020-03-09 16:08:10 -04:00
Daniel Schürmann 9d64ad2fe7 radeonsi: lower discard to demote when FS_CORRECT_DERIVS_AFTER_KILL is enabled
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4047>
2020-03-09 12:29:32 +00:00
Marek Olšák c046551e60 radeonsi: print shader cache stats with AMD_DEBUG=cache_stats
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2929>
2020-01-24 20:29:29 -05:00
Marek Olšák 2fd3bb23ab radeonsi: restructure si_shader_cache_load_shader
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2929>
2020-01-24 20:29:29 -05:00
Marek Olšák 0db74f479b radeonsi: use the live shader cache
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2929>
2020-01-24 20:29:29 -05:00
Marek Olšák 7ce84b256e radeonsi: make si_compile_shader return bool
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3421>
2020-01-23 19:10:21 +00:00
Marek Olšák 735a3ba007 radeonsi/gfx10: enable GS fast launch for triangles and strips with NGG culling
Only non-indexed triangle lists and strips are supported. This increases
performance if there is something to cull.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-20 16:16:11 -05:00
Marek Olšák c377f45c18 radeonsi/gfx10: rewrite late alloc computation
- Use conservative late alloc when the number of CUs <= 6.
- Move the late alloc GS register to the GS shader state, so that it can be
  tuned for NGG culling.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-20 16:16:11 -05:00
Marek Olšák 8db00a51f8 radeonsi/gfx10: implement NGG culling for 4x wave32 subgroups
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-20 16:16:11 -05:00
Marek Olšák aa2d846604 radeonsi/gfx10: move GE_PC_ALLOC setting to shader states
The value is not changed. I just use a different way to compute it.

The value will vary with NGG culling.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-20 16:16:11 -05:00
Marek Olšák 41fef6fc09 radeonsi/gfx10: don't initialize VGPRs not used by NGG passthrough
v2: TES doesn't use the GS PrimitiveID

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-20 16:16:11 -05:00
Marek Olšák 42112010a3 radeonsi: rename si_shader_create -> si_create_shader_variant for clarity
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2020-01-14 18:46:07 -05:00
Marek Olšák 03950473df radeonsi: merge si_tessctrl_info into si_shader_info
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2020-01-14 18:46:07 -05:00
Marek Olšák 5fa2ab831e radeonsi: fork tgsi_shader_info and tgsi_tessctrl_info
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2020-01-14 18:46:07 -05:00
Marek Olšák 363b4027fc radeonsi: put up to 5 VBO descriptors into user SGPRs
gfx6-8: 1 VBO descriptor in user SGPRs
gfx9-10: 5 VBO descriptors in user SGPRs

We no longer pull up to 5 VBO descriptors from GTT when SDMA is disabled.

Totals from affected shaders:
SGPRS: 1110528 -> 1170528 (5.40 %)
VGPRS: 952896 -> 951936 (-0.10 %)
Spilled SGPRs: 83 -> 61 (-26.51 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 23766296 -> 22843920 (-3.88 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 179344 -> 179344 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-13 15:57:07 -05:00
Marek Olšák 312e04689a radeonsi: don't allow draw calls with uninitialized VS inputs
These always hang, because vertex buffer descriptors are not set up.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-13 15:57:07 -05:00
Marek Olšák fd84e422b6 radeonsi: clean up messy si_emit_rasterizer_prim_state
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-08 15:48:49 -05:00
Marek Olšák 898c9cb797 radeonsi: fix context roll tracking in si_emit_shader_vs
probably harmless, because we don't need to track context rolls on gfx10

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-08 15:48:39 -05:00
Marek Olšák 420fe1e7f9 radeonsi: remove TGSI
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-06 15:57:20 -05:00
Marek Olšák e79f55ff86 radeonsi/gfx10: improve performance for TES using PrimID but not exporting it
This field is really for the primitive export to the pixel shader.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-27 13:50:57 -05:00
Marek Olšák aa3df12fc2 radeonsi/gfx10: enable NGG passthrough for eligible shaders
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-27 13:50:57 -05:00
Marek Olšák aced18aa61 radeonsi/gfx10: simplify the tess_turns_off_ngg condition
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-10 16:32:36 -05:00
Marek Olšák 42f921387b radeonsi/gfx10: disable vertex grouping
based on PAL.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-10 16:32:34 -05:00
Marek Olšák 4675cb2019 radeonsi: initialize the per-context compiler on demand
This takes a noticable amount of time in piglit and some tests don't
need it.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-25 16:48:27 -05:00
Marek Olšák 3ef50b023e radeonsi/nir: fix compute shader crash due to nir_binary == NULL
This partially reverts 8b30114dda.

Fixes: 8b30114dda "radeonsi/nir: call nir_serialize only once per shader"
2019-11-08 16:47:59 -05:00
Marek Olšák 8b30114dda radeonsi/nir: call nir_serialize only once per shader
We were calling it twice.

First serialize it, then use it to compute the cache key.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-08 15:30:28 -05:00
Marek Olšák 442ef8c3e3 radeonsi: keep serialized NIR instead of nir_shader in si_shader_selector
This decreases memory usage, because serialized NIR is more compact.

The main shader part is compiled from nir_shader.
Monolithic shader variants are compiled from nir_binary.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-05 23:28:45 -05:00
Marek Olšák 62229e8949 radeonsi: use IR SHA1 as the cache key for the in-memory shader cache
instead of using whole IR binaries. This saves some memory.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-05 23:28:42 -05:00
Marek Olšák 4d1e43badb radeonsi: initialize shader compilers in threads on demand
It takes a noticable amount of time with piglit.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-10-28 21:36:18 -04:00
Marek Olšák fff884e09d radeonsi/nir: implement pipe_screen::finalize_nir
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-23 21:12:52 -04:00