Pierre-Eric Pelloux-Prayer
5a05f9714b
radeonsi: bump SI_NUM_SHADER_BUFFERS to 32
...
Some app uses more than 8 SSBOs (https://gitlab.freedesktop.org/mesa/mesa/-/issues/2946 ),
so increase SI_NUM_SHADER_BUFFERS to 32 (which allows 16 SSBOs).
Since we're now using a 64 bits number to track buffers, we could bump
SI_NUM_SHADER_BUFFERS to 48 but that would conflict with Mesa's
MAX_COMBINED_ATOMIC_BUFFERS limit (= 90).
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2122
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5632 >
2020-06-30 09:23:14 +02:00
Marek Olšák
da78d50bc8
radeonsi: make si_pm4_cmd_begin/end static and simplify all usages
...
There is no longer the confusing trailing si_pm4_cmd_end call.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5603 >
2020-06-26 07:02:57 +00:00
Marek Olšák
7b2a0f880b
radeonsi: disallow adding BOs into si_pm4_state except 1 shader BO per state
...
The si_shader pointer is already there, so use it and remove the array
of BOs.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5603 >
2020-06-26 07:02:57 +00:00
Marek Olšák
428360662f
radeonsi: don't add the tess ring buffers into the cs_preamble state
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5603 >
2020-06-26 07:02:57 +00:00
Marek Olšák
1c1d34a67a
radeonsi: rename init_config states to cs_preamble states
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5603 >
2020-06-26 07:02:57 +00:00
Marek Olšák
3fec2f67c3
radeonsi: compact MRTs to save PS export memory space
...
If there are holes between color outputs (e.g. a shader exports MRT1, but
not MRT0), we can remove the holes by moving higher MRTs lower.
The hardware will remap the MRTs to their correct locations if we remove
holes in SPI_SHADER_COL_FORMAT but not CB_SHADER_MASK.
This is a performance optimization, but MRTs with holes are pretty rare,
so there is most likely no effect on any app.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5535 >
2020-06-23 00:23:51 -04:00
Marek Olšák
a23802bcb9
ac,radeonsi: start adding support for gfx10.3
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5383 >
2020-06-09 16:17:36 +00:00
Marek Olšák
a1602516d7
ac,radeonsi: replace == GFX10 with >= GFX10 where it's needed
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5383 >
2020-06-09 16:17:36 +00:00
Marek Olšák
2a3806ffa3
amd: replace SH -> SA (shader array) in comments
...
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5184 >
2020-05-26 06:00:54 -04:00
Marek Olšák
2cf46f2e3d
ac/gpu_info: replace num_good_cu_per_sh with min/max_good_cu_per_sa
...
Perf counters use the new max number.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5184 >
2020-05-26 06:00:54 -04:00
Marek Olšák
3cd96b5109
radeonsi: don't use INDIRECT_BUFFER within IBs
...
It's fragile. If I change the size or alignment, it hangs. Better safe than
sorry.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5095 >
2020-05-23 03:44:44 -04:00
Axel Davy
45e69e7d11
radeonsi: Enable tgsi to nir disk cache
...
Enable the tgsi to nir cache for radeonsi.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4993 >
2020-05-13 19:43:05 +00:00
Axel Davy
522bd414f3
ttn: Add new allow_disk_cache parameter
...
For now this parameter doesn't do anything.
It means the implementation is allowed to use
a cache on disk.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4993 >
2020-05-13 19:43:05 +00:00
Pierre-Eric Pelloux-Prayer
547e81655a
radeonsi: don't print gs_copy_shader stats for shaderdb
...
Fixes: dbc86fa3de
("radeonsi: dump shader stats when hitting the live cache")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4607 >
2020-05-05 12:26:02 +02:00
Marek Olšák
b4fd8f1919
ac,radeonsi: simplify checking for Navi1x chips
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4698 >
2020-04-24 10:38:54 +00:00
Pierre-Eric Pelloux-Prayer
dbc86fa3de
radeonsi: dump shader stats when hitting the live cache
...
With the introduction of the live shader cache, when a shader is
fetched from the cache no stats are printed for shaderdb.
So in a sequence like this: vs1, fs1, vs1, fs2, shaderdb may see
3 or 4 lines, depending on the threads being used.
If one run produces 3 lines while the other produces 4 lines, it
would compare vs1 stats with fs2 stats.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4355 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4355 >
2020-04-02 08:31:37 +02:00
Pierre-Eric Pelloux-Prayer
d7008fe46a
radeonsi: switch to 3-spaces style
...
Generated automatically using clang-format and the following config:
AlignAfterOpenBracket: true
AlignConsecutiveMacros: true
AllowAllArgumentsOnNextLine: false
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: false
AlwaysBreakAfterReturnType: None
BasedOnStyle: LLVM
BraceWrapping:
AfterControlStatement: false
AfterEnum: true
AfterFunction: true
AfterStruct: false
BeforeElse: false
SplitEmptyFunction: true
BinPackArguments: true
BinPackParameters: true
BreakBeforeBraces: Custom
ColumnLimit: 100
ContinuationIndentWidth: 3
Cpp11BracedListStyle: false
Cpp11BracedListStyle: true
ForEachMacros:
- LIST_FOR_EACH_ENTRY
- LIST_FOR_EACH_ENTRY_SAFE
- util_dynarray_foreach
- nir_foreach_variable
- nir_foreach_variable_safe
- nir_foreach_register
- nir_foreach_register_safe
- nir_foreach_use
- nir_foreach_use_safe
- nir_foreach_if_use
- nir_foreach_if_use_safe
- nir_foreach_def
- nir_foreach_def_safe
- nir_foreach_phi_src
- nir_foreach_phi_src_safe
- nir_foreach_parallel_copy_entry
- nir_foreach_instr
- nir_foreach_instr_reverse
- nir_foreach_instr_safe
- nir_foreach_instr_reverse_safe
- nir_foreach_function
- nir_foreach_block
- nir_foreach_block_safe
- nir_foreach_block_reverse
- nir_foreach_block_reverse_safe
- nir_foreach_block_in_cf_node
IncludeBlocks: Regroup
IncludeCategories:
- Regex: '<[[:alnum:].]+>'
Priority: 2
- Regex: '.*'
Priority: 1
IndentWidth: 3
PenaltyBreakBeforeFirstCallParameter: 1
PenaltyExcessCharacter: 100
SpaceAfterCStyleCast: false
SpaceBeforeCpp11BracedList: false
SpaceBeforeCtorInitializerColon: false
SpacesInContainerLiterals: false
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4319 >
2020-03-30 11:05:52 +00:00
Marek Olšák
4ef1c8d60b
radeonsi/gfx10: fix the wave size for compute-based culling
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4269 >
2020-03-28 00:58:34 +00:00
Marek Olšák
65e9239977
radeonsi: add num_vbos_in_user_sgprs into the shader cache key
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4269 >
2020-03-28 00:58:34 +00:00
Marek Olšák
7ba5e94c50
ac: add radeon_info::use_late_alloc to control LATE_ALLOC globally
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4143 >
2020-03-12 17:27:23 +00:00
Marek Olšák
7481c4be58
radeonsi: add a bug workaround for NGG - LATE_ALLOC_GS
...
Cc: 19.3 20.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4079 >
2020-03-09 16:08:10 -04:00
Daniel Schürmann
9d64ad2fe7
radeonsi: lower discard to demote when FS_CORRECT_DERIVS_AFTER_KILL is enabled
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4047 >
2020-03-09 12:29:32 +00:00
Marek Olšák
c046551e60
radeonsi: print shader cache stats with AMD_DEBUG=cache_stats
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2929 >
2020-01-24 20:29:29 -05:00
Marek Olšák
2fd3bb23ab
radeonsi: restructure si_shader_cache_load_shader
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2929 >
2020-01-24 20:29:29 -05:00
Marek Olšák
0db74f479b
radeonsi: use the live shader cache
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2929 >
2020-01-24 20:29:29 -05:00
Marek Olšák
7ce84b256e
radeonsi: make si_compile_shader return bool
...
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3421 >
2020-01-23 19:10:21 +00:00
Marek Olšák
735a3ba007
radeonsi/gfx10: enable GS fast launch for triangles and strips with NGG culling
...
Only non-indexed triangle lists and strips are supported. This increases
performance if there is something to cull.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-20 16:16:11 -05:00
Marek Olšák
c377f45c18
radeonsi/gfx10: rewrite late alloc computation
...
- Use conservative late alloc when the number of CUs <= 6.
- Move the late alloc GS register to the GS shader state, so that it can be
tuned for NGG culling.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-20 16:16:11 -05:00
Marek Olšák
8db00a51f8
radeonsi/gfx10: implement NGG culling for 4x wave32 subgroups
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-20 16:16:11 -05:00
Marek Olšák
aa2d846604
radeonsi/gfx10: move GE_PC_ALLOC setting to shader states
...
The value is not changed. I just use a different way to compute it.
The value will vary with NGG culling.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-20 16:16:11 -05:00
Marek Olšák
41fef6fc09
radeonsi/gfx10: don't initialize VGPRs not used by NGG passthrough
...
v2: TES doesn't use the GS PrimitiveID
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-20 16:16:11 -05:00
Marek Olšák
42112010a3
radeonsi: rename si_shader_create -> si_create_shader_variant for clarity
...
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2020-01-14 18:46:07 -05:00
Marek Olšák
03950473df
radeonsi: merge si_tessctrl_info into si_shader_info
...
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2020-01-14 18:46:07 -05:00
Marek Olšák
5fa2ab831e
radeonsi: fork tgsi_shader_info and tgsi_tessctrl_info
...
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2020-01-14 18:46:07 -05:00
Marek Olšák
363b4027fc
radeonsi: put up to 5 VBO descriptors into user SGPRs
...
gfx6-8: 1 VBO descriptor in user SGPRs
gfx9-10: 5 VBO descriptors in user SGPRs
We no longer pull up to 5 VBO descriptors from GTT when SDMA is disabled.
Totals from affected shaders:
SGPRS: 1110528 -> 1170528 (5.40 %)
VGPRS: 952896 -> 951936 (-0.10 %)
Spilled SGPRs: 83 -> 61 (-26.51 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 23766296 -> 22843920 (-3.88 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 179344 -> 179344 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-13 15:57:07 -05:00
Marek Olšák
312e04689a
radeonsi: don't allow draw calls with uninitialized VS inputs
...
These always hang, because vertex buffer descriptors are not set up.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-13 15:57:07 -05:00
Marek Olšák
fd84e422b6
radeonsi: clean up messy si_emit_rasterizer_prim_state
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-08 15:48:49 -05:00
Marek Olšák
898c9cb797
radeonsi: fix context roll tracking in si_emit_shader_vs
...
probably harmless, because we don't need to track context rolls on gfx10
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-08 15:48:39 -05:00
Marek Olšák
420fe1e7f9
radeonsi: remove TGSI
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2020-01-06 15:57:20 -05:00
Marek Olšák
e79f55ff86
radeonsi/gfx10: improve performance for TES using PrimID but not exporting it
...
This field is really for the primitive export to the pixel shader.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-27 13:50:57 -05:00
Marek Olšák
aa3df12fc2
radeonsi/gfx10: enable NGG passthrough for eligible shaders
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-27 13:50:57 -05:00
Marek Olšák
aced18aa61
radeonsi/gfx10: simplify the tess_turns_off_ngg condition
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-10 16:32:36 -05:00
Marek Olšák
42f921387b
radeonsi/gfx10: disable vertex grouping
...
based on PAL.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-10 16:32:34 -05:00
Marek Olšák
4675cb2019
radeonsi: initialize the per-context compiler on demand
...
This takes a noticable amount of time in piglit and some tests don't
need it.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-11-25 16:48:27 -05:00
Marek Olšák
3ef50b023e
radeonsi/nir: fix compute shader crash due to nir_binary == NULL
...
This partially reverts 8b30114dda
.
Fixes: 8b30114dda
"radeonsi/nir: call nir_serialize only once per shader"
2019-11-08 16:47:59 -05:00
Marek Olšák
8b30114dda
radeonsi/nir: call nir_serialize only once per shader
...
We were calling it twice.
First serialize it, then use it to compute the cache key.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-08 15:30:28 -05:00
Marek Olšák
442ef8c3e3
radeonsi: keep serialized NIR instead of nir_shader in si_shader_selector
...
This decreases memory usage, because serialized NIR is more compact.
The main shader part is compiled from nir_shader.
Monolithic shader variants are compiled from nir_binary.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-05 23:28:45 -05:00
Marek Olšák
62229e8949
radeonsi: use IR SHA1 as the cache key for the in-memory shader cache
...
instead of using whole IR binaries. This saves some memory.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-11-05 23:28:42 -05:00
Marek Olšák
4d1e43badb
radeonsi: initialize shader compilers in threads on demand
...
It takes a noticable amount of time with piglit.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-10-28 21:36:18 -04:00
Marek Olšák
fff884e09d
radeonsi/nir: implement pipe_screen::finalize_nir
...
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-23 21:12:52 -04:00