KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Samuel Pitoiset	f502bdf1ab	radv: only apply the MRT output NaN fixup to non-meta shaders We only want this workaround to be applied for game shaders. Cc: mesa-stable Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4163 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9048>	2021-02-16 09:03:31 +01:00
Rhys Perry	7ff805a19d	radv,aco: add radv_nir_compiler_options::wgp_mode Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8761>	2021-02-15 13:35:36 +00:00
Rhys Perry	6a770cae4b	radv: round up max_lds_per_simd / lds_per_wave If each SIMD has to get an different number of waves, report the maximum. One example of a situation is when a single-wave workgroup uses more than max_lds_per_simd. This change causes radv_get_max_waves() to report a single wave per SIMD instead of none. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8761>	2021-02-15 13:34:30 +00:00
Rhys Perry	267d7074d9	radv: use lds_{encode,alloc}_granularity This fixes a issue in radv_get_max_waves() where it aligned the LDS allocation to 512 bytes instead of 1024 on GFX10.3. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8761>	2021-02-15 13:34:30 +00:00
Rhys Perry	df61444ac4	radv: switch MaxWaves statistic to wave32 waves Always return the wave32 waves instead of wave64 waves because the wave32 wave count is more precise in the case of wave32. This also fixes usage of lds_per_wave in wave32. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8761>	2021-02-15 13:34:30 +00:00
Rhys Perry	43108824ec	radv: fix max_lds_per_simd on GFX10 num_simd_per_compute_unit was the number of SIMDs per compute unit, but lds_size_per_workgroup was the bytes of LDS per WGP. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8761>	2021-02-15 13:34:30 +00:00
Samuel Pitoiset	e02b1577d0	radv/winsys: remove the radv_amdgpu_winsys_bo::ws indirection This saves a 64-bit pointer from radv_amdgpu_winsys_bo and it's also common to pass a winsys pointer as the first parameter. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8859>	2021-02-08 08:45:38 +01:00
Bas Nieuwenhuizen	fdfd316d5b	radv: Implement VK_KHR_zero_initialize_workgroup_memory. Reuses the pass that was implemented for ANV. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8751>	2021-02-04 01:29:58 +00:00
Rhys Perry	0602d4ec69	radv: correctly enable WGP_MODE for tessellation control Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8811>	2021-02-03 11:27:50 +00:00
Rhys Perry	2338e4ad36	radv: correctly enable WGP_MODE for NGG and GS Previously, we would set WGP_MODE on GFX10+ and then only on GFX10. Because we used bitwise or, the result was WGP_MODE being set on GFX10+. We also set the wrong bit, S_00B848_WGP_MODE instead of S_00B228_WGP_MODE. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8811>	2021-02-03 11:27:50 +00:00
Jason Ekstrand	23ba48a0c7	vulkan: Make the debug_report implementation internal Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>	2021-02-01 18:54:25 +00:00
Jason Ekstrand	41318a5819	vulkan: Use vk_object_base::type for debug_report Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>	2021-02-01 18:54:25 +00:00
Jason Ekstrand	19d7cf0457	radv: Switch to the common VK_EXT_debug_report Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8676>	2021-02-01 18:54:25 +00:00
Bas Nieuwenhuizen	d938fcefb9	radv: Expose VK_KHR_workgroup_memory_explicit_layout. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8752>	2021-01-29 00:05:36 +01:00
James Park	2e81ed2a47	radv: Pointer arithmetic on char/uint8_t, not void Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7793>	2021-01-26 09:16:15 +00:00
Rhys Perry	af4c6605a8	radv: use nir_opt_access fossil-db (GFX10.3): Totals from 3231 (2.32% of 139391) affected shaders: SGPRs: 168654 -> 167454 (-0.71%); split: -0.72%, +0.00% VGPRs: 152352 -> 152416 (+0.04%) CodeSize: 13872836 -> 13806376 (-0.48%); split: -0.50%, +0.02% MaxWaves: 36640 -> 36634 (-0.02%) Instrs: 2639959 -> 2626852 (-0.50%); split: -0.52%, +0.03% Cycles: 77706000 -> 77496792 (-0.27%); split: -0.28%, +0.01% VMEM: 809496 -> 790847 (-2.30%); split: +2.06%, -4.36% SMEM: 267843 -> 253187 (-5.47%); split: +0.76%, -6.23% VClause: 61353 -> 60426 (-1.51%); split: -1.86%, +0.35% SClause: 95409 -> 92355 (-3.20%); split: -3.24%, +0.04% Copies: 194951 -> 196702 (+0.90%); split: -0.53%, +1.43% Branches: 84320 -> 84331 (+0.01%); split: -0.00%, +0.02% PreSGPRs: 110162 -> 110203 (+0.04%); split: -0.04%, +0.07% PreVGPRs: 127021 -> 127037 (+0.01%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6490>	2021-01-21 18:07:03 +00:00
Rhys Perry	dc19fe0e9f	radv,aco: use deref_buffer_array_length Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3993 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8163>	2021-01-21 11:53:12 +00:00
Rhys Perry	914c61d6c0	radv,aco: don't use MUBUF for multi-channel loads on GFX8 with robustness2 Fixes several dEQP-VK.robustness.robustness2.* tests on GFX8. Generations other than GFX8 don't fail the tests because bounds-checking is done using the index (making it per-vertex). fossil-db (Polaris): Totals from 1387 (0.99% of 140385) affected shaders: (no statistics affected) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Fixes: `03a0d39366` ("aco: use MUBUF in some situations instead of splitting vertex fetches") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7834>	2021-01-20 17:57:56 +00:00
Rhys Perry	12ea0143de	radv: fix max_waves estimation on GFX10.3 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8523>	2021-01-20 16:46:54 +00:00
Rhys Perry	dfe429eb41	nir/loop_unroll: unroll more aggressively if it can improve load scheduling Significantly improves performance of a Control compute shader. Also seems to increase FPS at the very start of the game by ~5% (RX 580, 1080p, medium settings, no MSAA). fossil-db (Sienna): Totals from 81 (0.06% of 139391) affected shaders: SGPRs: 3848 -> 4362 (+13.36%); split: -0.99%, +14.35% VGPRs: 4132 -> 4648 (+12.49%) CodeSize: 275532 -> 659188 (+139.24%) MaxWaves: 986 -> 906 (-8.11%) Instrs: 54422 -> 126865 (+133.11%) Cycles: 1057240 -> 750464 (-29.02%); split: -42.61%, +13.60% VMEM: 26507 -> 61829 (+133.26%); split: +135.56%, -2.30% SMEM: 4748 -> 5895 (+24.16%); split: +31.47%, -7.31% VClause: 1933 -> 6802 (+251.89%); split: -0.72%, +252.61% SClause: 1179 -> 1810 (+53.52%); split: -3.14%, +56.66% Branches: 1174 -> 1157 (-1.45%); split: -23.94%, +22.49% PreVGPRs: 3219 -> 3387 (+5.22%); split: -0.96%, +6.18% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6538>	2021-01-13 18:54:18 +00:00
Daniel Schürmann	fcd2ef23e5	radv: vectorize 16bit instructions Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6680>	2021-01-13 17:46:56 +00:00
Daniel Schürmann	d495a5c183	radv: enable .lower_ineg We already emit ineg as isub most of the time. The results are a bit mixed, but shouldn't really make a difference. A couple of additional copies are needed as isub writes scc. Totals from 5975 (4.29% of 139391) affected shaders: CodeSize: 31508648 -> 31509264 (+0.00%); split: -0.00%, +0.00% Instrs: 6073379 -> 6073531 (+0.00%); split: -0.00%, +0.00% Cycles: 47186280 -> 47187116 (+0.00%); split: -0.00%, +0.00% VMEM: 2528515 -> 2529139 (+0.02%); split: +0.03%, -0.01% SMEM: 596842 -> 596924 (+0.01%); split: +0.02%, -0.00% SClause: 280596 -> 280594 (-0.00%) Copies: 288554 -> 288669 (+0.04%); split: -0.00%, +0.04% PreSGPRs: 240390 -> 240397 (+0.00%) PreVGPRs: 349630 -> 349749 (+0.03%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8425>	2021-01-12 16:14:00 +00:00
Rhys Perry	f17de6a803	radv: add RADV_DEBUG=invariantgeom This can be used to work around a common class of bugs appearing as flickering. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8104>	2021-01-12 15:11:49 +00:00
Daniel Schürmann	bd8e84eb8d	nir: replace .lower_sub with .has_fsub and .has_isub This allows a more fine-grained control about whether a backend supports one of these instructions. Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6597>	2021-01-11 19:13:51 +00:00
Rhys Perry	d95fe8a25e	radv: support SpvCapabilitySparseResidency Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7775>	2021-01-08 14:27:07 +00:00
Rhys Perry	4c67423e99	radv: implement is_sparse_texels_resident and sparse_residency_code_and Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7775>	2021-01-08 14:27:07 +00:00
Samuel Pitoiset	7a464f4296	radv: track if VRS is enabled to apply a workaround on GFX10.3 On some chips, gl_FragCoord.z has to be adjusted. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7837>	2020-12-14 16:22:38 +00:00
Samuel Pitoiset	bf69d89b5a	radv: implement VK_KHR_fragment_shading_rate Only supported on GFX10.3+. Attachment Fragment Shading Rate is for later. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7837>	2020-12-14 16:22:38 +00:00
James Park	fe67fe688a	radv: Wrap pragmas with __GNUC__ to fix MSVC Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7866>	2020-12-02 11:27:01 +00:00
Samuel Pitoiset	04ea3d6501	radv: disable WGP_MODE for NGG on GFX10.3 Ported from RadeonSI, reducing the CU mask probably broke WGP mode. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7769>	2020-11-30 09:31:29 +00:00
Tony Wasserka	cba6ec309a	radv: Fix -Wshadow warnings Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7430>	2020-11-20 09:29:19 +00:00
Marek Olšák	cb20d58f45	nir: optimize nir_lower_discard_to_demote to lower discard/demote both ways This is smarter and also lowers demote to discard if helper invocations are not needed. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7586>	2020-11-12 21:02:05 +00:00
Eric Anholt	eda3e4e055	nir/builder: Add a name format arg to nir_builder_init_simple_shader(). This cleans up a bunch of gross sprintfs and keeps the caller from needing to remember to ralloc_strdup. I added a couple of '"%s", name ? name : ""' to radv where I didn't fully trace through whether a non-null name was being passed in. I also took the liberty of adding a basic name to a few shaders (pan_blit, unit tests) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7323>	2020-11-11 08:50:29 -08:00
Eric Anholt	5f992802f5	nir/builder: Drop the mem_ctx arg from nir_builder_init_simple_shader(). This looks a lot more simple now! Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7323>	2020-11-11 08:50:29 -08:00
Eric Anholt	4e9328e3b6	nir_builder: Return a new builder from nir_builder_init_simple_shader(). It's a little inline function, so we can just RAII it for better ergonomics. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7323>	2020-11-11 08:49:49 -08:00
Rhys Perry	86ef139bf4	radv: implement VK_EXT_shader_image_atomic_int64 The extension is only exposed on ACO and LLVM 11+ because of a LLVM bug. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7234>	2020-11-09 18:28:59 +00:00
Daniel Schürmann	fef8a4befd	radv: remove call to nir_lower_pack() The pack_* instructions are now lowered via nir_lower_alu_to_scalar() and unpack_* are not lowered anymore. These bitcasts are no-ops, and lowering prevents some optimizations like vectorization. Note: There are still some *_split variations remaining from different other NIR passes. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6527>	2020-10-28 10:14:26 +00:00
Daniel Schürmann	212be2a04e	radv: lower pack_[64/32]_* via nir_lower_alu_to_scalar() Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6527>	2020-10-28 10:14:26 +00:00
Samuel Pitoiset	6d32fcaaaf	Revert "radv/aco: disable NGG GS support because it randomly hangs the GPU" This reverts commit `b84d1a0c42`. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7213>	2020-10-20 07:11:29 +00:00
James Park	af8d488ea5	util,ac,aco,radv: Cross-platform memstream API POSIX memstream is not available on Windows. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7143>	2020-10-19 03:37:42 -07:00
Daniel Schürmann	2f125908b3	radv,aco: lower_pack_half_2x16 This patch also optimizes pack_half_2x16(a, 0.0). Totals from 1949 (1.43% of 136546) affected shaders (RAVEN): SGPRs: 83376 -> 83336 (-0.05%) CodeSize: 3532144 -> 3512352 (-0.56%) Instrs: 660746 -> 660682 (-0.01%); split: -0.01%, +0.00% Cycles: 6780716 -> 6780472 (-0.00%); split: -0.00%, +0.00% VMEM: 990886 -> 990883 (-0.00%); split: +0.00%, -0.00% SMEM: 150506 -> 150538 (+0.02%); split: +0.05%, -0.03% SClause: 30595 -> 30594 (-0.00%); split: -0.01%, +0.00% Copies: 40801 -> 40729 (-0.18%) PreSGPRs: 52335 -> 52341 (+0.01%); split: -0.03%, +0.04% PreVGPRs: 45104 -> 45097 (-0.02%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777>	2020-10-14 15:31:38 +00:00
Samuel Pitoiset	e3e8d13ada	radv: move compiler statistics to ACO They are really specific to ACO. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7061>	2020-10-14 15:09:34 +00:00
James Park	28d02b9d3e	ac,amd/llvm,radv: Initialize structs with {0} Necessary to compile with MSVC. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7123>	2020-10-14 12:15:23 +00:00
Samuel Pitoiset	b84d1a0c42	radv/aco: disable NGG GS support because it randomly hangs the GPU Disable ACO NGG GS until the random GPU hangs are fixed (one CTS run == one GPU hang here). No hangs so far after 5 full CTS runs with this disabled. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7108>	2020-10-14 13:52:42 +02:00
Rhys Perry	e1120f274f	nir: move divergence analysis options to nir_shader_compiler_options Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558>	2020-10-13 12:47:21 +00:00
Rhys Perry	8850a63161	radv/aco,nir/lower_subgroups: don't lower elect ACO can implement this better. fossil-db (Navi): Totals from 33 (0.02% of 135946) affected shaders: SGPRs: 1736 -> 1744 (+0.46%) VGPRs: 1680 -> 1656 (-1.43%) CodeSize: 246160 -> 245916 (-0.10%); split: -0.14%, +0.04% MaxWaves: 449 -> 461 (+2.67%) Instrs: 48301 -> 48266 (-0.07%); split: -0.12%, +0.05% Cycles: 469740 -> 469240 (-0.11%); split: -0.18%, +0.08% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558>	2020-10-13 12:47:20 +00:00
Rhys Perry	8e981453ed	radv: use radv_optimize_nir() less in radv_link_shaders() fossil-db (Navi): Totals from 11 (0.01% of 137413) affected shaders: CodeSize: 99372 -> 99480 (+0.11%) Instrs: 19119 -> 19110 (-0.05%) Cycles: 222144 -> 222000 (-0.06%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6891>	2020-10-09 15:48:00 +00:00
Rhys Perry	55254f241f	radv: move optimizations in shader_compile_to_nir() to after io_to_scalar This results in at least one less radv_optimize_nir() iteration. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6891>	2020-10-09 15:47:59 +00:00
Timur Kristóf	17ad2ade82	radv/aco: Use new GS lowering options for ACO with NGG GS. This makes it easier for ACO to implement NGG GS: 1. No need to keep track of vertex and primitive counts. 2. No need to discard incomplete primitives. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>	2020-10-09 15:26:14 +02:00
Timur Kristóf	2be99012e9	nir: Add ability to count emitted GS primitives. Add an option to nir_lower_gs_intrinsics which tells it to track the number of emitted primitives, not just vertices. Additionally, also make it per-stream. Also rename the set_vertex_count intrinsic to set_vertex_and_primitive_count. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>	2020-10-09 15:26:14 +02:00
Samuel Pitoiset	9aa5c7ce72	radv: use the same NIR compiler options for both compiler backends No changes, they are already similar. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6938>	2020-10-09 11:06:36 +02:00
Samuel Pitoiset	63049b0444	radv/llvm: do not lower sub To match ACO. Totals from 268 (0.20% of 136420) affected shaders: CodeSize: 1214060 -> 1214096 (+0.00%); split: -0.05%, +0.06% Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6938>	2020-10-09 11:06:34 +02:00
Samuel Pitoiset	a41bed243e	radv/llvm: do not lower nir_op_fsat To match ACO. fossilds-db (Navi10): Totals from 20869 (15.30% of 136420) affected shaders: SGPRs: 1851128 -> 1851920 (+0.04%); split: -0.41%, +0.46% VGPRs: 1607360 -> 1608212 (+0.05%); split: -0.20%, +0.25% SpillSGPRs: 267331 -> 261350 (-2.24%); split: -3.67%, +1.43% CodeSize: 155460104 -> 155303508 (-0.10%); split: -0.21%, +0.11% MaxWaves: 179156 -> 178928 (-0.13%); split: +0.48%, -0.60% Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6932>	2020-10-08 12:38:04 +00:00
Tony Wasserka	76add3565e	radv: Fix unaligned memory access when writing specialization map entries Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6728>	2020-10-07 19:50:01 +00:00
Rhys Perry	19561f31a8	radv: remove trailing whitespace Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7043>	2020-10-07 11:53:23 +00:00
Samuel Pitoiset	6b0695c42a	radv/llvm: enable lower_unpack_half_2x16 To match ACO. fossils-db (Navi10): Totals from 294 (0.22% of 136420) affected shaders: SGPRs: 16504 -> 16496 (-0.05%) VGPRs: 19008 -> 19124 (+0.61%); split: -0.06%, +0.67% SpillVGPRs: 511 -> 476 (-6.85%); split: -7.63%, +0.78% CodeSize: 1688852 -> `1687932` (-0.05%); split: -0.10%, +0.05% Scratch: 305152 -> 307200 (+0.67%) MaxWaves: 2877 -> 2878 (+0.03%) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6937>	2020-10-05 12:42:42 +02:00
Samuel Pitoiset	cdf6d93498	radv/llvm: lower VS IO Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6912>	2020-10-05 08:06:12 +00:00
Samuel Pitoiset	1c4a21328e	radv/llvm: lower TCS IO Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6912>	2020-10-05 08:06:12 +00:00
Samuel Pitoiset	9615273907	radv/llvm: lower TES IO Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6912>	2020-10-05 08:06:12 +00:00
Samuel Pitoiset	6e339418a7	radv/llvm: lower GS IO The LLVM bakend expects 64-bit IO to be lowered to 32-bit but it's unclear if we want to do that for ACO at this point. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6912>	2020-10-05 08:06:12 +00:00
Samuel Pitoiset	df63491594	radv/aco: lower IO for all stages outside of ACO Lowering IO for VS, TCS, TES and GS still have to be done for LLVM. No fossils-db change on NAVI10. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6897>	2020-10-01 14:58:25 +00:00
Jason Ekstrand	d3fa7451a6	anv,radv,tu,val: Call nir_lower_io for push constants Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5275>	2020-09-30 07:20:39 +00:00
Samuel Pitoiset	291cfb1e41	radv: move lowering of FS outputs outside of ACO This enables lowering of FS outputs for RADV/LLVM. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6865>	2020-09-29 14:44:05 +00:00
Samuel Pitoiset	4dae9e53f6	radv: call nir_io_add_const_offset_to_base for FS outputs The store_output of RADV/LLVM expects the const offset to be 0. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Acked-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6865>	2020-09-29 14:44:05 +00:00
Samuel Pitoiset	778fe02f3b	radv/llvm: call nir_lower_io_to_vector with FS to fix array tests Fixes dEQP-VK.glsl.440.linkage.varying.component.frag_out.*. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6883>	2020-09-29 10:00:50 +00:00
Samuel Pitoiset	1588644543	radv: lower deref operations for global memory for both backends To match ACO. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5316>	2020-09-29 07:24:35 +00:00
Kenneth Graunke	140f53e646	Revert "nir: replace lower_ffma and fuse_ffma with has_ffma" This reverts commit `939ddf3f67`. Intel has a separate pass for fusing FFMAs selectively. We split these flags in commit `1b72c31e1f` and the reasoning still stands. The patch being reverted was just a cleanup, so there should be no issue with reverting it. Acked-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6849>	2020-09-24 13:11:50 -07:00
Marek Olšák	939ddf3f67	nir: replace lower_ffma and fuse_ffma with has_ffma Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6756>	2020-09-24 12:29:11 +00:00
Marek Olšák	771aad3027	nir: split lower_ffma into lower_ffma16/32/64 AMD wants different behavior for each bit size Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6756>	2020-09-24 12:29:11 +00:00
Samuel Pitoiset	05b6612b4e	radv: do not lower UBO/SSBO access to offsets Use nir_lower_explicit_io instead of lowering to offsets. Extra (useless) additions are removed by lowering load_vulkan_descriptor to vec2(src.x, 0). fossils-db (Navi): Totals from 18236 (13.21% of 138013) affected shaders: SGPRs: 1172766 -> 1168278 (-0.38%); split: -0.89%, +0.50% VGPRs: 940156 -> 952232 (+1.28%); split: -0.08%, +1.37% SpillSGPRs: 30286 -> 31109 (+2.72%); split: -0.78%, +3.50% SpillVGPRs: 1893 -> 1909 (+0.85%) CodeSize: 87910396 -> 88113592 (+0.23%); split: -0.35%, +0.58% Scratch: 819200 -> 823296 (+0.50%) MaxWaves: 205535 -> 202102 (-1.67%); split: +0.05%, -1.72% Instrs: 17052527 -> 17113484 (+0.36%); split: -0.32%, +0.67% Cycles: 670794876 -> 669084540 (-0.25%); split: -0.38%, +0.13% VMEM: 5274728 -> 5388556 (+2.16%); split: +3.10%, -0.94% SMEM: 1196146 -> 1165850 (-2.53%); split: +2.06%, -4.60% VClause: 381463 -> 399217 (+4.65%); split: -1.08%, +5.73% SClause: 666216 -> 631135 (-5.27%); split: -5.44%, +0.18% Copies: 1292720 -> 1289318 (-0.26%); split: -1.28%, +1.01% Branches: 467336 -> 473028 (+1.22%); split: -0.67%, +1.89% PreSGPRs: 766459 -> 772175 (+0.75%); split: -0.53%, +1.28% PreVGPRs: 819746 -> 825327 (+0.68%); split: -0.05%, +0.73% Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6202>	2020-09-21 15:37:11 +00:00
Marek Olšák	ac55b1a9a6	nir: get ffma support from NIR options for nir_lower_flrp This also fixes the inverted last parameter of nir_lower_flrp in most drivers. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6599>	2020-09-04 17:06:22 +00:00
Samuel Pitoiset	ebf2576862	radv,aco: disable opts if VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT Sounds useful to determine if ACO breaks a specific pipeline because of various optimizations. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6487>	2020-09-04 06:59:45 +00:00
Marek Olšák	b7a6333ee4	amd/registers: switch to new generated register definitions Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6423>	2020-09-01 08:45:54 -04:00
Samuel Pitoiset	8301a43f27	radv: dump shader stats with VK_KHR_pipeline_executable_properties Instead of duplicating shader statistics in two different parts in the driver. This also now reports the LDS size in bytes instead of blocks with VK_AMD_shader_info. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6406>	2020-08-31 09:42:25 +02:00
Samuel Pitoiset	0d8ae4ac15	radv: fix setting EXCP_EN for different shader stages While TRAP_PRESENT is always at the same place, EXCP_EN can be different between shader stages. This sets it properly. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6452>	2020-08-26 08:12:22 +02:00
Samuel Pitoiset	8e97a61cfb	radv: enable the trap handler and configure the shader exceptions When TRAP_PRESENT is not enabled, all traps and exceptions are ignored. Only EXCP_EN.mem_viol is currently supported because the other exceptions have to be tested/validated first. EXCP_EN.mem_viol is used to detect any sort of invalid memory access like VM fault. When a memory violation is reported, the hw jumps to the trap handler. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6384>	2020-08-24 11:08:24 +00:00
Samuel Pitoiset	8fd2f5c16d	radv: add a small interface for creating the trap handler shader Similar to the GS copy shader except that NIR is unused because the shader is written directly using ACO IR. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6384>	2020-08-24 11:08:24 +00:00
Jesse Natalie	d3faac7a15	nir: Add options to nir_lower_compute_system_values to control compute ID base lowering If no options are provided, existing intrinsics are used. If the lowering pass indicates there should be offsets used for global invocation ID or work group ID, then those instructions are lowered to include the offset. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5891>	2020-08-21 22:07:05 +00:00
Jesse Natalie	2e1df6a17f	nir: Move compute system value lowering to a separate pass The actual variable -> intrinsic lowering stays where it is, but ops which convert one intrinsic to be implemented in terms of another have moved. Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5891>	2020-08-21 22:07:05 +00:00
Eric Anholt	b3c822a0a8	radv: Move nir_opt_shrink_vectors() into the opt loop. Upcoming changes to opt_undef will result in this pass doing more work and generating vector MOVs that need re-scalarizing (which is inside of the main opt loop). Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6054>	2020-08-20 16:44:08 +00:00
Samuel Pitoiset	e901b901cb	radv,aco: report ACO errors/warnings back via VK_EXT_debug_report To help developers. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6318>	2020-08-20 08:15:08 +02:00
Connor Abbott	c77716294b	radv: Use an input for the layer when lowering input attachments Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5719>	2020-08-19 16:36:43 +00:00
Connor Abbott	d243bf1032	nir/lower_input_attachments: Support loading layer id as an input Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5719>	2020-08-19 16:36:43 +00:00
Connor Abbott	e72895767b	nir/lower_input_attachments: Refactor to use an options struct While we're at it, fold the details of how to load the fragcoord into load_fragcoord(). Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5719>	2020-08-19 16:36:43 +00:00
Samuel Pitoiset	11781c0e49	radv: report the spirv-nir logs back to the application Via VK_EXT_debug_report to help debugging various SPIRV->NIR issues. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6223>	2020-08-10 13:19:21 +02:00
Samuel Pitoiset	bea8930468	radv: allow to force-enable LLVM internally for a specific shader stage For ACO debugging purposes, developers only. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6208>	2020-08-07 07:45:06 +00:00
Rhys Perry	6e2e77557e	radv/llvm: enable VK_KHR_memory_model Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6063>	2020-08-05 09:45:54 +00:00
Rhys Perry	da38e99eda	radv/aco: enable VK_KHR_memory_model Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6063>	2020-08-05 09:45:54 +00:00
Eric Anholt	d8c2f896db	amd: Swap from nir_opt_shrink_load() to nir_opt_shrink_vectors(). This should do much more trimming than shrink_load, and is a win on i965's vec4 and nir-to-tgsi. For scalar backends like this that don't need ALU shrinking, it still gets more load intrinsics covered. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6050>	2020-08-03 21:26:45 +00:00
Rhys Perry	cc3bc9493c	radv: use scoped barriers fossil-db (LLVM, Navi): Totals from 843 (0.62% of 135820) affected shaders: SGPRs: 40456 -> 40480 (+0.06%); split: -0.10%, +0.16% VGPRs: 39648 -> 39688 (+0.10%); split: -0.01%, +0.11% CodeSize: 2936164 -> 2932508 (-0.12%); split: -0.21%, +0.09% MaxWaves: 10828 -> 10827 (-0.01%) fossil-db changes seem to be due to SPIR-V -> NIR emitting a workgroup scope shared memory barrier instead of a group_memory_barrier. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5980>	2020-07-29 17:57:13 +00:00
Jason Ekstrand	5c5555a862	nir: Add a find_variable_with_[driver_]location helper We've hand-rolled this loop 10 places and those are just the ones I found easily. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>	2020-07-29 17:38:58 +00:00
Jason Ekstrand	caab46c1e9	nir: Take a shader and variable mode in nir_assign_io_var_locations Reviewed-by: Gert Wollny <gert.wollny@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>	2020-07-29 17:38:58 +00:00
Jason Ekstrand	2956d53400	nir: Add nir_foreach_shader_in/out_variable helpers Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>	2020-07-29 17:38:57 +00:00
Rhys Perry	cd392a10d0	radv/aco,aco: use scoped barriers fossil-db (Navi): Totals from 109 (0.08% of 132058) affected shaders: SGPRs: 5416 -> 5424 (+0.15%) CodeSize: 460500 -> 460508 (+0.00%); split: -0.07%, +0.07% Instrs: 87278 -> 87272 (-0.01%); split: -0.09%, +0.09% Cycles: 2241996 -> 2241852 (-0.01%); split: -0.04%, +0.04% VMEM: 33868 -> 35539 (+4.93%); split: +5.14%, -0.20% SMEM: 7183 -> 7184 (+0.01%); split: +0.36%, -0.35% VClause: 1857 -> 1882 (+1.35%) SClause: 2052 -> 2055 (+0.15%); split: -0.05%, +0.19% Copies: 6377 -> 6380 (+0.05%); split: -0.02%, +0.06% PreSGPRs: 3391 -> 3392 (+0.03%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4905>	2020-07-28 16:56:34 +00:00
Daniel Schürmann	af0bc71015	radv: call radv_nir_lower_ycbcr_textures after first optimizations There might still be tex instructions with undef texture/sampler before the first round of optimizations. No pipelinedb changes. Fixes: `14a12b771d` ('spirv: Rework our handling of images and samplers') Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6083>	2020-07-27 10:03:20 +00:00
Samuel Pitoiset	a1b237b9ef	radv: set LDS TCS size at shaders creation for GFX9+ Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5837>	2020-07-24 12:30:03 +00:00
Jason Ekstrand	196db51fc2	anv,turnip,radv,clover,glspirv: Run nir_copy_prop before nir_opt_deref We're about to make the SPIR-V -> NIR path generate a bit more complex SSA chains for certain derefs. This will ensure we don't regress anyone when we start making vec2's of derefs. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>	2020-07-23 22:43:21 -05:00
Samuel Pitoiset	6c1108d25b	radv: advertise VK_EXT_shader_atomic_float No hw support for float atomic add for buffer and (sparse) images. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6000>	2020-07-22 10:20:58 +02:00
Rhys Perry	ec9920e72b	radv: use lower_shuffle_to_swizzle_amd Affects a few shaders in Detroit: Become Human and Doom Eternal. fossil-db (Navi): Totals from 9 (0.01% of 135946) affected shaders: CodeSize: 31188 -> 25096 (-19.53%) Instrs: 6136 -> 4999 (-18.53%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5695>	2020-07-13 14:11:50 +00:00
Samuel Pitoiset	7324977e42	radv: remove the secure compile support feature Steam was the only client of this feature and it seems no longer used. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5869>	2020-07-13 08:56:44 +02:00
Samuel Pitoiset	26a48d8d35	radv: enable VK_AMD_shader_ballot on GFX6-7 with both compiler backends It gives +1-2 FPS with Doom Eternal on Pitcairn. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5659>	2020-06-29 07:40:05 +00:00
Daniel Schürmann	db0afb3800	radv: change use_aco -> use_llvm We are about to make ACO the default backend. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5445>	2020-06-25 15:16:28 +02:00
Samuel Pitoiset	a102896cff	radv: lower 64-bit dfloor on GFX6 for fixing precision issues GFX6 doesn't support v_floor_f64 and the precision of v_fract_f64 which is used to implement 64-bit floor is less than what Vulkan requires. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5609>	2020-06-25 12:09:08 +00:00
Samuel Pitoiset	c84f11e7b6	radv: lower 64-bit drcp/dsqrt/drsq for fixing precision issues The hardware precision of v_rcp_f64, v_sqrt_f64 and v_rsq_f64 is less than what Vulkan requires. This lowers using the Goldschmidt's algorithm to improve precision. Fixes dEQP-VK.glsl.builtin.precision_double.* on both compiler backends. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5609>	2020-06-25 12:09:08 +00:00
Bas Nieuwenhuizen	aa35670fd0	radv: Make radv_alloc_shader_memory static. Just a cleanup. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5578>	2020-06-24 13:00:02 +00:00
Bas Nieuwenhuizen	a5cb88eea4	radv: Handle mmap failures. Which can happen if we have to many mmaps active in the process. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5578>	2020-06-24 13:00:02 +00:00
Daniel Schürmann	f03a5f6cac	radv/aco: implement logic64 instead of lowering to make use of the scalar ALU Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5527>	2020-06-22 10:59:45 +00:00
Samuel Pitoiset	51fb3b09dc	radv/aco: enable FP16 features/extensions on GFX9+ This enables shaderFloat16, VK_AMD_gpu_shader_half_float and VK_AMD_gpu_shader_int16. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5347>	2020-06-17 18:12:51 +02:00
Samuel Pitoiset	6f21995f98	radv: add new drirc option radv_enable_mrt_output_nan_fixup To replace NaN from FS with zeros to fix game bugs. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5359>	2020-06-12 14:43:31 +02:00
Samuel Pitoiset	64f2d45c3b	radv/aco: enable shaderInt8 and VK_KHR_shader_float16_int8 on GFX6-GFX7 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>	2020-06-09 21:25:38 +00:00
Samuel Pitoiset	be4dd6abd1	radv/aco: enable shaderInt16 on GFX6-GFX7 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>	2020-06-09 21:25:38 +00:00
Samuel Pitoiset	b3aee3aa23	radv/aco: enable 8-bit/16-bit storage on GFX6-GFX7 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5226>	2020-06-09 21:25:38 +00:00
Marek Olšák	789cdab3b6	ac: align num_vgprs for gfx10.3 Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5383>	2020-06-09 16:17:36 +00:00
Samuel Pitoiset	d7923c74d4	radv/llvm: expose VK_EXT_shader_demote_to_helper_invocation with LLVM 9+ It should already work with the LLVM backend. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5361>	2020-06-09 08:04:23 +02:00
Timothy Arceri	04dbf709ed	nir: add callback to nir_remove_dead_variables() This allows us to do API specific checks before removing variable without filling nir_remove_dead_variables() with API specific code. In the following patches we will use this to support the removal of dead uniforms in GLSL. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4797>	2020-06-03 02:22:23 +00:00
Marek Olšák	116ec85012	ac: rename has_double_rate_fp16 -> has_packed_math_16bit Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5003>	2020-06-02 16:29:25 -04:00
Samuel Pitoiset	b3c0f82841	radv: advertise VK_AMD_texture_gather_bias_lod Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5147>	2020-05-25 08:51:10 +02:00
Samuel Pitoiset	b1f0233077	radv: enable shaderResourceMinLod This feature was missing for unknown reasons. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4989>	2020-05-14 10:05:44 +00:00
Samuel Pitoiset	178adfa6a8	radv: use the base object struct types Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4886>	2020-05-13 08:23:23 +02:00
Samuel Pitoiset	65458528fc	radv: use the common base object type for VkDevice Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4886>	2020-05-13 08:23:23 +02:00
Rhys Perry	5c5c2dd48f	radv/aco: enable 8/16-bit storage and int8/int16 on GFX8+ With this, Doom Eternal should now run with ACO on GFX8+. The generated 8/16-bit storage code is okay but the generated int8/int16 code is currently pretty bad but it works and apparently Doom Eternal doesn't actually use it (even though it requires it). Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4707>	2020-04-24 20:04:39 +01:00
Rhys Perry	03568249f9	radv: allocate larger shader memory slabs if needed Fixes dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 hang with ACO (features needed for the test are implemented in a later commit) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4639>	2020-04-24 18:52:54 +00:00
Jason Ekstrand	f4addfdde3	spirv: Use nir_const_value for spec constants When we originally wrote spirv_to_nir we didn't have a good scalar value union to handily use so we rolled our own thing for spec constants. Now that we have nir_const_value, we can use that and simplify a bunch of the spec constant logic. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4675>	2020-04-24 09:23:59 +00:00
Jason Ekstrand	a4885df9f8	radv: Properly handle all sizes of specialization constants cc: mesa-stable@lists.freedesktop.org Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4675>	2020-04-24 09:23:59 +00:00
Samuel Pitoiset	19aa68ae31	radv: set missing SHARED_VGPR_CNT for NGG VS and ACO shuffle is implemented with shared VGPRs with ACO and Wave64. Fixes dEQP-VK.subgroups.shuffle.framebuffer.subgroupshuffle*_vertex with Wave64. Fixes: `c24d9522da` ("radv: Enable ACO for NGG VS/TES, but disable NGG for ACO GS.") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4595>	2020-04-17 16:11:17 +00:00
Samuel Pitoiset	1d74c6565d	radv: only expose shaderFloat16 for chips with double rate fp16 This disables shaderFloat16 on GFX8 because only GFX9+ supports double rate packed math. This improves consistency regarding other AMD Vulkan drivers and it makes no sense to enable that feature without packed math. This also reduces performance with Wolfeinstein Youngblood if fp16 is forced enabled on GFX8, while it's similar on GFX9. We might re-introduce that feature in the future with ACO support if it ends up being faster and correct. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4453>	2020-04-09 13:34:36 +02:00
Samuel Pitoiset	9f005f1f85	radv: enable lowering of GS intrinsics for the LLVM backend This replaces emit_vertex with: if (vertex_count < max_vertices) { emit_vertex_with_counter vertex_count ... vertex_count += 1 } Which is exactly what NIR->LLVM was doing but at NIR level. This pass is already called by ACO. pipeline-db changes on GFX10: Totals from affected shaders: SGPRS: 1952 -> 1912 (-2.05 %) VGPRS: 2112 -> 2044 (-3.22 %) Code Size: 189368 -> 185620 (-1.98 %) bytes Max Waves: 494 -> 491 (-0.61 %) No pipeline-db changes on other generations. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4182>	2020-04-08 08:24:05 +02:00
Timur Kristóf	db2ee3686d	radv: Print shader stage before disassembly. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3576> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3576>	2020-04-07 11:29:35 +00:00
Rhys Perry	7e6aec6687	radv, aco: collect statistics if requested but executables are not Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2965> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2965>	2020-04-03 12:12:08 +00:00
Rhys Perry	ad2703653f	radv: add code for exposing compiler statistics Statistics will be added to ACO in later commits. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2965>	2020-04-03 12:12:08 +00:00
Marek Olšák	56cc10bd27	ac: unify denorm setting enforcement Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4196>	2020-03-17 20:47:48 +00:00
Samuel Pitoiset	c923de68dd	radv/gfx10: fix required ballot size with VK_EXT_subgroup_size_control If compute shaders require a specific subgroup size (ie. Wave32), we have to use the correct ballot size. Fixes dEQP-VK.subgroups.ballot_other.compute.*_requiredsubgroupSize. Fixes: `fb07fd4e6c` ("radv: implement VK_EXT_subgroup_size_control") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4215> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4215>	2020-03-17 12:45:01 +00:00
Samuel Pitoiset	672d106199	radv/gfx10: fix required subgroup size with VK_EXT_subgroup_size_control If compute shaders require a specific subgroup size (ie. Wave32), we have to return the correct one. Fixes dEQP-VK.subgroups.size_control.compute.required_subgroup_size_*. Fixes: `fb07fd4e6c` ("radv: implement VK_EXT_subgroup_size_control") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4215>	2020-03-17 12:45:01 +00:00
Samuel Pitoiset	2d295ab3f3	radv: add llvm_compiler_shader() helper To match aco_compile_shader(). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4163> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4163>	2020-03-13 10:22:13 +00:00
Samuel Pitoiset	4d991c2de4	radv: remove unnecessary LLVM includes They are already included from src/amd/llvm. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4163>	2020-03-13 10:22:13 +00:00
Samuel Pitoiset	5ea32a6201	radv: remove radv_shader_variant::aco_used Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4163>	2020-03-13 10:22:13 +00:00
Samuel Pitoiset	3fea948177	radv: cleanup occurences of use_aco everywhere Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4163>	2020-03-13 10:22:13 +00:00
Timur Kristóf	967eb23261	radv: Enable lowering dynamic quad broadcasts. This will lower dynamic quad broadcasts into something that both LLVM and ACO can understand. On hardware which supports shuffles, they are lowered to shuffle, on older hardware (GFX6-7) they will get lowered to constant quad broadcasts. Fixes dEQP-VK.subgroups.quad..subgroupquadbroadcast_nonconst_ Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4147> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4147>	2020-03-12 13:16:07 +00:00
Daniel Schürmann	bdd7587414	radv: use nir_lower_discard_to_demote to work around game bugs Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4047>	2020-03-09 12:29:32 +00:00
Samuel Pitoiset	9432eb3e9c	ac: rename lds_size_per_cu to lds_size_per_workgroup It's more accurate. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3975>	2020-03-03 08:16:56 +01:00
Samuel Pitoiset	9204ad70f2	radv/gfx10: adjust the number of VGPRs used to compute waves Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3899>	2020-02-26 07:58:47 +00:00
Samuel Pitoiset	568f150409	radv/gfx10: adjust the LDS size used to compute waves It's 128KB per CU in WGP. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3899>	2020-02-26 07:58:47 +00:00
Samuel Pitoiset	b2531370c9	radv: remove RADV_DEBUG=nosisched and RADV_PERFTEST=sisched They are no longer useful. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3789>	2020-02-13 08:09:13 +00:00
Arcady Goldmints-Orlov	e9f83185a2	Rename nir_lower_constant_initializers to nir_lower_variable_initalizers This is naming is more clear as nir_variables can be initializes not just with a nir_constant but with a pointer to another nir_variable. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3047> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3047>	2020-02-12 15:41:49 +00:00
Samuel Pitoiset	401bfe0283	radv: implement VK_AMD_shader_explicit_vertex_parameter Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2402 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>	2020-01-29 09:49:50 +00:00
Rhys Perry	72e9a23443	radv/aco: use ACO for GS copy shaders Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>	2020-01-24 13:35:07 +00:00
Rhys Perry	8bad100f83	aco: implement GS on GFX7-8 GS is the same on GFX6, but GFX6 isn't fully supported yet. v4: fix regclass v7: rebase after shader args MR Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>	2020-01-24 13:35:07 +00:00
Rhys Perry	40bb81c9dd	radv/aco,aco: implement GS on GFX9+ v2: implement GFX10 v3: rebase v7: rebase after shader args MR v8: fix gs_vtx_offset usage on GFX9/GFX10 v8: use unreachable() instead of printing intrinsic v8: rename output_state to ge_output_state v8: fix formatting around nir_foreach_variable() v8: rename some helpers in the scheduler v8: rename p_memory_barrier_all to p_memory_barrier_common v8: fix assertion comparing ctx.stage against vertex_geometry_gs Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>	2020-01-24 13:35:07 +00:00
Samuel Pitoiset	12fe19ba3b	radv: advertise VK_AMD_shader_fragment_mask Only for GFX8+ because it's untested on older generations. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3304> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3304>	2020-01-23 10:48:02 +00:00
Samuel Pitoiset	e298e78a01	radv: advertise VK_AMD_shader_image_load_store_lod This extension allows to use LOD with image read/write operations. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2020-01-09 07:58:34 +01:00
Samuel Pitoiset	eda1b77cc2	radv: enable SpvCapabilityImageMSArray The Vulkan spec says that StorageImageMultisample and ImageMSArray SPIRV-V capabilities must be enabled if the shaderStorageImageMultisample feature is supported. This fixes a warning with RenderDoc. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2212 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-12-12 18:52:08 +01:00
Samuel Pitoiset	3b51259f06	radv: remove dead shader input/output variables No pipeline-db changes. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-12-04 08:04:05 +01:00
Samuel Pitoiset	c105e6169c	radv,ac/nir: lower deref operations for shared memory This shouldn't introduce any functional changes for RadeonSI when NIR is enabled because these operations are already lowered. pipeline-db (NAVI10/LLVM): SGPRS: 9043 -> 9051 (0.09 %) VGPRS: 7272 -> 7292 (0.28 %) Code Size: 638892 -> 621628 (-2.70 %) bytes LDS: 1333 -> 1331 (-0.15 %) blocks Max Waves: 1614 -> 1608 (-0.37 %) Found this while glancing at some F12019 shaders. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-11-29 21:58:18 +01:00
Connor Abbott	e7f4cadd02	radv: Replace supports_spill with explict_scratch_args The former was always true and hence dead code. We will want to explicitly declare the ring offset register with ACO, but we also want to declare the scratch offset too, and we can't try to disable it since ACO also supports spilling and the determination of whether spilling has to happen occurs well after setting up registers. So replace supports_spill with something that will actually be used for ACO. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-11-25 14:17:51 +01:00
Connor Abbott	b45c54ff8d	aco: Use radv_shader_args in aco_compile_shader() Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-11-25 14:17:51 +01:00
Connor Abbott	66c703b3e8	radv: Move argument declaration out of nir_to_llvm Now it's executed for ACO too. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-11-25 14:17:51 +01:00
Rhys Perry	d7b0d9a8d8	radv: enable FP16/FP64 denormals earlier and only for LLVM ACO sets this itself and will have to set it differently in the future to support shaderDenormFlushToZeroFloat64. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-11-15 17:36:21 +00:00
Samuel Pitoiset	519d9b30de	radv: remove useless RADV_DEBUG=unsafemath debug option This option is useless and shouldn't be used at all. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-11-15 09:07:34 +01:00
Rhys Perry	76544f632d	radv: adjust loop unrolling heuristics for int64 In particular, increase the cost of 64-bit integer division. Fixes huge shaders with dEQP-VK.spirv_assembly.type.scalar.i64.mod_geom , with ACO used for GS this creates shaders requiring a branch with >32767 dword offset. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-11-07 23:29:12 +00:00
Samuel Pitoiset	d3f9957de4	radv: determine shaders wavesize at pipeline level Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-11-06 09:20:34 +01:00
Samuel Pitoiset	d4e0bef1bb	radv: fix dumping SPIR-V into hang reports Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-30 13:02:08 +00:00
Timothy Arceri	07692f703f	radv: for secure compile exit early from radv_shader_variant_create() We don't have permission to be creating shared memory etc. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-26 13:04:12 +11:00
Rhys Perry	7453c1adff	radv: round vgprs/sgprs before calculating max_waves Note that ACO doesn't correctly round SGPR counts on GFX8/GFX9. pipeline-db (ACO/Vega): SGPRS: 11000 -> 11000 (0.00 %) VGPRS: 3120 -> 3120 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 164328 -> 164328 (0.00 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 1125 -> 1000 (-11.11 %) v2: consider wave32 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-23 19:11:20 +01:00
Samuel Pitoiset	7c50214aab	radv: implement VK_KHR_shader_float_controls This exposes what's required for DX and this is what we already configure. The driver flushes denorms for FP32 and preserves them for FP16/FP64. Note that we can't allow both preserving and flushing denorms because this won't work for merged shaders. This will require LLVM to update the float mode register to make it work. Only enabled on GFX8+ with the LLVM path because it's untested on previous chips and ACO doesn't support it. This extension is required for SPIRV 1.4. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-18 16:55:58 +02:00
Samuel Pitoiset	cbd6f0a0c2	radv: implement VK_KHR_shader_clock NIR->LLVM and ACO already support nir_intrinsic_shader_clock. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-10-09 08:43:14 +02:00
Rhys Perry	a87b0f5141	radv/aco,aco: set lower_fmod This simplifies ACO and allows the lowered code to be optimized (in particular, constant folded). Totals from affected shaders: SGPRS: 1776 -> 1776 (0.00 %) VGPRS: 1436 -> 1436 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 203452 -> 203564 (0.06 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 103 -> 103 (0.00 %) At least some of the code size increase seems to be from literals being applied to instructions as a result of constant folding. v2: remove fmod/frem handling in init_context() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-04 14:00:46 +00:00
Samuel Pitoiset	5ebe1a17e9	radv: enable lower_fmod for the LLVM path This lowers fmod and frem at NIR level like RadeonSI. fmod is already lowered directly in NIR->LLVM, and frem will be lowered by LLVM anyways. This fixes a LLVM crash with: dEQP-VK.glsl.builtin.precision_fp16_storage32b.frem.compute.scalar. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-03 18:15:14 +02:00
Samuel Pitoiset	a2a68d551c	radv/gfx10: fix the ESGS ring size symbol Random hangs no longer happen, I'm actually not sure if they were related to this. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-10-02 21:50:40 +02:00
Daniel Schürmann	0fb27f1e5a	radv/aco: Don't lower subtractions 40228 shaders in 20236 tests Totals: SGPRS: 2045512 -> 2046496 (0.05 %) VGPRS: 1430856 -> 1430464 (-0.03 %) Spilled SGPRs: 1077 -> 1077 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 10348 -> 10348 (0.00 %) dwords per thread Code Size: 77202840 -> 77151832 (-0.07 %) bytes LDS: 863 -> 863 (0.00 %) blocks Max Waves: 260729 -> 260754 (0.01 %) Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-09-30 09:44:10 +00:00
Timur Kristóf	30f0c0ea7d	radv: Add debug option to dump meta shaders. This new option can help debug shader compiler problems when there are issues with the meta shaders. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-26 13:36:49 +00:00
Timur Kristóf	a4fd8ba7e3	amd/common: Introduce ac_get_fs_input_vgpr_cnt. Add a function called ac_get_fs_input_vgpr_cnt which will return the number of input VGPRs used by an AMD shader. Previously, radv and radeonsi had the same code duplicated, but this commit also allows them to share this code. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-09-26 13:36:49 +00:00
Timur Kristóf	83eebdb507	radv: Set shared VGPR count in radv_postprocess_config. This commit allows RADV to set the shared VGPR count according to the shader config. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-26 13:36:49 +00:00
Rhys Perry	3c966fd688	aco,radv: rename record_llvm_ir/llvm_ir_string to record_ir/ir_string Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-26 11:08:47 +01:00
Rhys Perry	ec8ced9123	radv/aco: return a correct name and description for the backend IR Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-26 11:08:43 +01:00
Rhys Perry	6613b81327	aco,radv/aco: get dissassembly for release builds if requested Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-26 11:08:09 +01:00
Daniel Schürmann	8b78cce433	radv: remove dead shared variables LLVM does this anyway, but for ACO we need to do it in NIR. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-19 12:10:00 +02:00
Daniel Schürmann	281262281b	radv/aco: enable VK_EXT_shader_demote_to_helper_invocation For now, this extension will only be enabled for ACO. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-19 12:10:00 +02:00
Daniel Schürmann	a70a998718	radv/aco: Setup alternate path in RADV to support the experimental ACO compiler LLVM remains default and ACO can be enabled with RADV_PERFTEST=aco. Co-authored-by: Daniel Schürmann <daniel@schuermann.dev> Co-authored-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-19 12:10:00 +02:00
Marek Olšák	0692ae34e9	ac: move ac_get_num_physical_sgprs into radeon_info Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-09-18 14:39:06 -04:00
Marek Olšák	ca43006fd2	ac: move ac_get_max_wave64_per_simd into radeon_info Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-09-18 14:39:06 -04:00
Samuel Pitoiset	5ebc76471c	radv/gfx10: adjust the GS NGG scratch size for streamout It needs more space for multiple streams. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-16 12:08:22 +02:00
Samuel Pitoiset	a15b3bcf1a	radv/gfx10: add an option to switch from legacy to NGG streamout This internal option is turned off by default because NGG streamout still hangs. It seems like it's related to GDS as RadeonSI. That option will be turned on once all issues are resolved. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-16 12:08:22 +02:00
Samuel Pitoiset	538766792d	radv/gfx10: declare a LDS symbol for the NGG emit space This fixes some interactions when NGG GS is enabled. It fixes: - dEQP-VK.clipping.user_defined.clip_cull_distance_dynamic_index.geom - dEQP-VK.tessellation.geometry_interaction.passthrough.* For some reasons, using the computed ESGS ring size randomly hangs with CTS. For now, just use the maximum LDS size for ESGS. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-10 09:27:01 +02:00
Samuel Pitoiset	a9af11f1fa	radv: fill shader info for all stages in the pipeline This shouldn't be in NIR->LLVM because ACO also needs the shader info. This will also help for computing some NGG values that are necessary for declaring LDS symbols. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-10 09:26:45 +02:00
Marek Olšák	d95afd8b9e	radeonsi/gfx10: fix wave occupancy computations Cc: 19.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-09-09 23:43:03 -04:00
Samuel Pitoiset	83499ac765	radv: merge radv_shader_variant_info into radv_shader_info Having two different structs is useless. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-09-06 15:52:03 +02:00
Vasily Khoruzhick	9367d2ca37	nir: allow specifying filter callback in lower_alu_to_scalar Set of opcodes doesn't have enough flexibility in certain cases. E.g. Utgard PP has vector conditional select operation, but condition is always scalar. Lowering all the vector selects to scalar increases instruction number, so we need a way to filter only those ops that can't be handled in hardware. Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-09-06 01:51:28 +00:00
Connor Abbott	3f5b541fc8	radv: Call nir_propagate_invariant() Without this, invariant qualifiers don't do anything. Together with a fix to the game, this fixes flickering in No Man's Sky. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-09-05 14:05:46 +02:00
Connor Abbott	71a6794200	ac/nir: Enable nir_opt_large_constants vkpipeline-db numbers: Totals: SGPRS: 1740306 -> 1741322 (0.06 %) VGPRS: 1331124 -> 1331712 (0.04 %) Spilled SGPRs: 21201 -> 21316 (0.54 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 256 -> 256 (0.00 %) dwords per thread Code Size: 79022628 -> 78694788 (-0.41 %) bytes LDS: 6500 -> 6500 (0.00 %) blocks Max Waves: 301413 -> 301302 (-0.04 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 53633 -> 54649 (1.89 %) VGPRS: 53000 -> 53588 (1.11 %) Spilled SGPRs: 3454 -> 3569 (3.33 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 5284232 -> 4956392 (-6.20 %) bytes LDS: 2 -> 2 (0.00 %) blocks Max Waves: 4239 -> 4128 (-2.62 %) Wait states: 0 -> 0 (0.00 %) (The biggest VGPR and max wave regression is due to unrolling a loop, which made the scheduler more aggressive, but in this case it's able to effectively hide latency so it's actually probably a win.) shader-db numbers with radeonsi NIR: Totals: SGPRS: 3526496 -> 3526512 (0.00 %) VGPRS: 2198576 -> 2198576 (0.00 %) Spilled SGPRs: 10463 -> 10463 (0.00 %) Spilled VGPRs: 86 -> 86 (0.00 %) Private memory VGPRs: 3182 -> 2528 (-20.55 %) Scratch size: 3308 -> 2640 (-20.19 %) dwords per thread Code Size: 74117280 -> 74106140 (-0.02 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 775846 -> 775844 (-0.00 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 856 -> 872 (1.87 %) VGPRS: 680 -> 680 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 654 -> 0 (-100.00 %) Scratch size: 668 -> 0 (-100.00 %) dwords per thread Code Size: 49652 -> 38512 (-22.44 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 182 -> 180 (-1.10 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-09-05 12:21:46 +02:00
Connor Abbott	5dadbabb47	radv/radeonsi: Don't count read-only data when reporting code size We usually use these counts as a simple way to figure out if a change reduces the number of instructions or shrinks an instruction. However, since .rodata sections aren't executed, we shouldn't be counting their size for this analysis. Make the linker return the total executable size, and use it to report the more useful size in both drivers. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-09-05 12:21:35 +02:00
Samuel Pitoiset	cc3d36b5dd	radv: remove radv_init_llvm_target() helper RADV no longer uses specific LLVM options compared to the common code. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-30 09:33:21 +02:00
Samuel Pitoiset	8d44f83844	radv: move lowering PS inputs/outputs at the right place At shaders creation, just after NIR linking. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-30 09:29:31 +02:00
Samuel Pitoiset	151d6990ec	radv: gather info about PS inputs in the shader info pass It's the right place to do that. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-30 09:29:29 +02:00
Samuel Pitoiset	49f5ddd3ae	radv: make use of has_ls_vgpr_init_bug Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-27 08:04:51 +02:00
Samuel Pitoiset	2b9c371575	ac: add cpdma_prefetch_writes_memory to ac_gpu_info Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-27 08:04:29 +02:00
Samuel Pitoiset	1fd60db4a1	ac,radv,radeonsi: remove LLVM 7 support Now that LLVM 9 will be released soon, we will only support LLVM 8, 9 and master (10). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-23 08:12:34 +02:00
Samuel Pitoiset	e73d863a66	radv: allow to enable VK_AMD_shader_ballot only on GFX8+ Scans aren't implemented on SI/CIK. Cc: 19.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-21 15:14:29 +02:00
Bas Nieuwenhuizen	2e763f7c87	radv: Use correct vgpr_comp_cnt for VS if both prim_id and instance_id are needed. Should take the max of the 2. Fixes: `ea337c8b7e` "radv/gfx10: fix VS input VGPRs with the legacy path" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-08-21 09:38:46 +00:00
Rhys Perry	7740149852	nir: merge and extend nir_opt_move_comparisons and nir_opt_move_load_ubo v2: add to series v3: update Makefile.sources v4: don't remove a comment and break statement v4: use nir_can_move_instr Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-12 22:01:30 +00:00
Bas Nieuwenhuizen	8874af8ef4	radv: Keep shader info when needed. This allows enabling the shader info keeping on a per shader basis. Also disables the cache on a per shader basis. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	5444d3e0c2	radv: Use string for nir dumping. Reviewed-by: Dave Airlie <airlied@redhat.com> Allows us to easily dump all nir shaders for combined variants in vega and simplifies ownership.	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	739a2880f5	radv: Get max workgroup size without nir. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	290ca0c4dd	radv: Add utility function to calculate max waves. Not AC because a lot of it is data extraction out of radv structs. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	ba8d3c362b	radv: Properly use Wave64 for non-NGG GS and copy shader. Fixes: `8a86908e9a` "radv/gfx10: add Wave32 support for vertex, tessellation and geometry shaders" Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 13:32:18 +00:00
Bas Nieuwenhuizen	035406ecf7	radv: Put wave size in shader options/info. Instead of having the three values everywhere. This is also more future proof if we want the driver to make those decisions eventually. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 13:32:18 +00:00
Bas Nieuwenhuizen	72e7b7a00b	ac/nir,radv: Optimize bounds check for 64 bit CAS. When the application does not ask for robust buffer access. Only implemented the check in radv. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-08-02 21:21:55 +02:00
Samuel Pitoiset	96a5445559	radv/gfx10: use the correct target machine for Wave32 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-02 09:37:38 +02:00
Samuel Pitoiset	8a86908e9a	radv/gfx10: add Wave32 support for vertex, tessellation and geometry shaders It can be enabled with RADV_PERFTEST=gewave32. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-02 09:37:36 +02:00
Samuel Pitoiset	953bbacc23	radv/gfx10: add Wave32 support for fragment shaders It can be enabled with RADV_PERFTEST=pswave32. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-02 09:37:34 +02:00
Samuel Pitoiset	ea38565011	radv/gfx10: add Wave32 support for compute shaders It can be enabled with RADV_PERFTEST=cswave32. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-31 09:35:04 +02:00
Daniel Schürmann	45638e14fb	radv: Don't include radv_private.h from radv_shader.h This patch decouples radv_shader.h from any LLVM dependency. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-30 10:29:11 +02:00
Connor Abbott	a69ab1b7d2	radv: Delete unused local variables in optimization loop Totals from affected shaders: SGPRS: 376 -> 376 (0.00 %) VGPRS: 620 -> 560 (-9.68 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 292 -> 292 (0.00 %) dwords per thread Code Size: 20024 -> 20144 (0.60 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 25 -> 25 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-29 11:37:46 +02:00
Samuel Pitoiset	09abe571a2	radv/gfx10: emit streamout shader config Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-24 08:23:32 +02:00
Samuel Pitoiset	ea337c8b7e	radv/gfx10: fix VS input VGPRs with the legacy path For some reasons, InstanceID is VGPR3 although StepRate0 is set to 1. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-24 08:23:21 +02:00
Samuel Pitoiset	9343c93e34	radv: fix dumping disassembly with RADV_DEBUG=shaders Fixes: `a20a9d0c5e` ("radv: dont store disasm string unless keep_shader_info flag set") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-23 10:22:29 +02:00
Daniel Schürmann	64b7386ee8	radv: move nir_opt_conditional_discard out of optimization loop This late optimization pass is only affected by nir_opt_if() and handles all cases in a single pass. It's enough to call it once after the optimization loop. No changes on vkpipeline-db. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 08:12:18 +02:00
Bas Nieuwenhuizen	451f030c06	radv: Fix uninitialized warning. For es_vgpr_comp_cnt. Fixes: `795adbbadd` "radv/gfx10: Add pipeline state support for tess." Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-21 01:39:08 +02:00
Marek Olšák	921c1d24d5	ac/rtld: add support for Wave32 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Dave Airlie	2ac2b98780	radv: fix crash in shader tracing. Enabling tracing, and then having a vmfault, can leads to a segfault before we print out the traces, as if a meta shader is executing and we don't have the NIR for it. Just pass the stage and give back a default. Fixes: `9b9ccee4d6` ("radv: take LDS into account for compute shader occupancy stats") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 11:00:25 +10:00
Timothy Arceri	a20a9d0c5e	radv: dont store disasm string unless keep_shader_info flag set This fixes the memory use regression from bug 111107. Fixes: `726a31df70` ("radv: Add the concept of radv shader binaries.") Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111107	2019-07-18 00:25:55 +00:00
Samuel Pitoiset	07ff367442	radv/gfx10: implement VK_EXT_post_depth_coverage I did implement this extension a while ago but it didn't work on pre GFX10 for some reasons. Now all CTS pass. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-17 08:32:39 +02:00
Samuel Pitoiset	edf1af696f	radv/gfx10: fallback to the legacy path if tess and extreme geometry This is unsupported and hangs. This fixes GPU hangs with dEQP-VK.tessellation.geometry_interaction.limits.output_required_*. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-17 08:32:33 +02:00
Samuel Pitoiset	ed12be1b8f	radv/gfx10: enable OC_LDS_EN for NGG GS if the ES stage is TES Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-15 20:05:21 +02:00
Samuel Pitoiset	f0a90eddb6	radv/gfx10: allocate ESGS ring space for exporting PrimitiveID Only VS needs that. We shouldn't hardcode these values but that's complicated to not do that for now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-15 11:30:05 +02:00
Samuel Pitoiset	e68b55f5e3	radv/gfx10: set HS/GS/CS.WGP_MODE Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 17:47:12 +02:00
Samuel Pitoiset	2b6a089813	radv: tidy up radv_get_shader_name() and add NGG stages Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 08:19:53 +02:00
Samuel Pitoiset	5bbcb3f5bc	radv/gfx10: implement support for GS as NGG Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-11 15:45:53 +02:00
Bas Nieuwenhuizen	7286865f6d	radv/gfx10: Use correct ES shader for es_vgpr_comp_cnt for GS. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-11 15:45:51 +02:00
Connor Abbott	f18b8a1174	radv: Don't optimize after lowering FS inputs Currently this is done rather late in radv, after lowering booleans, so it isn't safe to run additional optimizations that may add e.g. 1-bit booleans. We could move the lowering parts earlier, but since right now we only lower FS inputs and by this point all indirects have been lowered away, there's no reason we should need to optimize anything. One shader from Devil May Cry 5 was getting optimized, but only because the optimization loop was working on 32-bit booleans which revealed an opportunity that was hidden with 1-bit booleans, and we generated a 1-bit boolean which is invalid. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111092 Fixes: `118a66df99` Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 10:10:20 +02:00
Samuel Pitoiset	3f50007ad8	radv: set correct number of VGPRs for GS on GFX10 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 09:54:27 +02:00
Samuel Pitoiset	d2a8b63a2c	radv: fix computing the number of ES VGPRS for TES on GFX10 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 09:54:14 +02:00
Bas Nieuwenhuizen	795adbbadd	radv/gfx10: Add pipeline state support for tess. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 12:04:26 +10:00
Connor Abbott	118a66df99	radv: Use NIR barycentric intrinsics We have to add a few lowering to deal with things that used to be dealt with inline when creating inputs. We also move the code that fills out the radv_shader_variant_info struct for linking purposes to radv_shader.c, as it's no longer tied to the NIR->LLVM lowering. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:18:25 +02:00
Connor Abbott	27f0c3c15e	radv: Make FragCoord a sysval load_fragcoord is already handled in common code for radeonsi, so we don't need to do anything to handle it. However, there were some passes creating NIR with the varying, so we switch them over to the sysval. In the case of nir_lower_input_attachments which is used by both radv and anv, we add handling for both until intel switches to using a sysval. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Daniel Schürmann	e41e932e57	radv: Lower input attachments in NIR. v2 (Connor) - Fix warning in release mode using MAYBE_UNUSED Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Samuel Pitoiset	ee21bd7440	radv/gfx10: implement NGG support (VS only) This needs to be cleaned up a bit, and it probably contains missing stuff and/or bugs. This doesn't fix the "half of the triangles" issue. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Bas Nieuwenhuizen	aeb5b1a998	radv/gfx10: Set MEM_ORDERED flags on shaders. Scattered because depending on stage they are at offset 24/25/27/30. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	352365c5e2	radv/gfx10: do not set stream output shader config Transform feedback is really different on GFX10. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	4c82094b7b	radv/gfx10: implement radv_fill_shader_variant() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:39 +02:00
Bas Nieuwenhuizen	5ff651c0a7	radv: Move more stuff to variant create time. Due to them depending on the linker result. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen	726a31df70	radv: Add the concept of radv shader binaries. This simplifies a bunch of stuff by (1) Keeping all the things in a single allocation, making things easier for the cache. (2) creating a shader_variant creation helper. This is immediately put to use by creating rtld shader binaries. This is the main reason for the binaries, as we need to do the linking at upload time, i.e. post caching. We do not enable rtld yet. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen	43f2f01cc8	radv: Add export_prim_id to the shader variant info. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen	7469516244	radv: Merge rsrc1/rsrc2 fields with the config fields. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-04 10:52:26 +00:00
Nicolai Hähnle	74a26af913	amd/common/gfx10: add register JSON A small number of fields now need new disambiguation. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Sagar Ghuge	456557a837	nir: Add lower_rotate flag and set to true in all drivers Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Samuel Pitoiset	d8b079e4c7	radv: rework how the number of VGPRs is computed Just a cleanup, it shouldn't change anything. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-01 14:59:27 +02:00
Samuel Pitoiset	f4d2c47cf6	radv: the number of VGPR_COMP_CNT for GS is expected to be 0 on GFX8 Just move around the switch case. GFX9+ is handled below. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-01 14:59:19 +02:00
Samuel Pitoiset	b4477fa4d4	radv: reduce number of VGPRs for TESS_EVAL if primitive ID is not used We only need to 2. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-01 14:59:17 +02:00
Daniel Schürmann	0daeb1d127	amd/common: lower bitfield_extract to ubfe/ibfe. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-24 18:42:20 +02:00
Daniel Schürmann	48a75e7af0	amd/common: lower bitfield_insert to bfm & bitfield_select Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-24 18:42:20 +02:00
Daniel Schürmann	c58dff753c	radv: enable AMD_shader_ballot with RADV_PERFTEST_SHADER_BALLOT ('shader_ballot') Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-13 12:44:23 +00:00
Daniel Schürmann	7a858f274c	spirv/nir: add support for AMD_shader_ballot and Groups capability This commit also renames existing AMD capabilities: - gcn_shader -> amd_gcn_shader - trinary_minmax -> amd_trinary_minmax Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-13 12:44:23 +00:00
Nicolai Hähnle	f480b8aaa4	amd/common: use generated register header	2019-06-03 20:05:20 -04:00
Caio Marcelo de Oliveira Filho	e45bf01940	spirv: Change spirv_to_nir() to return a nir_shader spirv_to_nir() returned the nir_function corresponding to the entrypoint, as a way to identify it. There's now a bool is_entrypoint in nir_function and also a helper function to get the entry_point from a nir_shader. The return type reflects better what the function name suggests. It also helps drivers avoid the mistake of reusing internal shader references after running NIR_PASS on it. When using NIR_TEST_CLONE or NIR_TEST_SERIALIZE, those would be invalidated right in the first pass executed. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-29 10:34:35 -07:00
Caio Marcelo de Oliveira Filho	a3bfdacb6c	radv: Don't re-use entry_point pointer from spirv_to_nir Replace its uses with checking for is_entrypoint and calling nir_shader_get_entrypoint(). This is a preparation to change spirv_to_nir() return type. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-29 10:34:35 -07:00
Caio Marcelo de Oliveira Filho	31a7476335	spirv, radv, anv: Replace ptr_type with addr_format Instead of setting the glsl types of the pointers for each resource, set the nir_address_format, from which we can derive the glsl_type, and in the future the bit pattern representing a NULL pointer. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Samuel Pitoiset	d7501834cd	radv: add a workaround for Monster Hunter World and LLVM 7&8 The load/store optimizer pass doesn't handle WaW hazards correctly and this is the root cause of the reflection issue with Monster Hunter World. AFAIK, it's the only game that are affected by this issue. This is fixed with LLVM r361008, but we need a workaround for older LLVM versions unfortunately. Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-17 11:41:19 +02:00
Marek Olšák	ccfcb9d818	ac: rename SI-CIK-VI to GFX6-GFX7-GFX8 Acked-by: Dave Airlie <airlied@redhat.com> We already use GFX9 and I don't want us to have confusing naming in the driver. GFXn naming is better from the driver perspective, because it's the real version of the gfx portion of the hw. Also, CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI. It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have nothing to do with GFXn and they have their own version numbers.	2019-05-15 20:54:10 -04:00
Jonathan Marek	d0bff89159	nir: allow specifying a set of opcodes in lower_alu_to_scalar This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalar-only. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-10 15:10:41 +00:00
Ian Romanick	1f1007a4ed	nir: Initialize lower_flrp_progress everywhere I don't know why I thought NIR_PASS always set the progress variable. Derp. Fixes: `d41cdef2a5` ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Coverity CID: 1444996 Coverity CID: 1444995 Coverity CID: 1444994 Coverity CID: 1444993 Coverity CID: 1444991 Coverity CID: 1444989	2019-05-09 10:03:51 -07:00
Timothy Arceri	e19a8fe033	radv: call constant folding before opt algebraic The pattern of calling opt algebraic first seems to have originated in i965. The order in OpenGL drivers generally doesn't matter because the GLSL IR optimisations do constant folding before opt algebraic. However in Vulkan drivers calling opt algebraic first can result in missed constant folding opportunities. vkpipeline-db results (VEGA64): Totals from affected shaders: SGPRS: 3160 -> 3176 (0.51 %) VGPRS: 3588 -> 3580 (-0.22 %) Spilled SGPRs: 52 -> 44 (-15.38 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 12 -> 12 (0.00 %) dwords per thread Code Size: 261812 -> 261036 (-0.30 %) bytes LDS: 7 -> 7 (0.00 %) blocks Max Waves: 346 -> 348 (0.58 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-08 19:45:01 +10:00
Ian Romanick	d41cdef2a5	nir: Use the flrp lowering pass instead of nir_opt_algebraic I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1-c)+bc. On all other platforms 64-bit nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 -> 188647748 (<.01%) cycles in affected programs: 5096 -> 5090 (-0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Juan A. Suarez Romero	06c9d7f9f9	radv: enable descriptor indexing capabilities This enables the remaining capabilities in SPV_EXT_descriptor_indexing. Fixes: `0e10790558` "radv: Enable VK_EXT_descriptor_indexing." Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-30 09:23:23 +02:00
Bas Nieuwenhuizen	5c3467e74a	radv: Run the new ycbcr lowering pass. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Samuel Pitoiset	b3e3440c87	radv: add VK_NV_compute_shader_derivates support Only computeDerivativeGroupLinear is supported for now. All crucible tests pass. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-22 14:51:57 +02:00
Samuel Pitoiset	9cf55b022d	radv: add VK_KHR_shader_atomic_int64 but disable it for now No support for 64-bit compare&swap atomic operations. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-17 21:59:56 +02:00
Samuel Pitoiset	ecbe6cb805	radv: sort the shader capabilities alphabetically Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-16 09:14:22 +02:00
Samuel Pitoiset	14f03978ed	radv: enable VK_KHR_shader_float16_int8 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-15 10:43:55 +02:00
Timothy Arceri	e30804c602	nir/radv: remove restrictions on opt_if_loop_last_continue() When I implemented opt_if_loop_last_continue() I had restricted this pass from moving other if-statements inside the branch opposite the continue. At the time it was causing a bunch of spilling in shader-db for i965. However Samuel Pitoiset noticed that making this pass more aggressive significantly improved the performance of Doom on RADV. Below are the statistics he gathered. 28717 shaders in 14931 tests Totals: SGPRS: 1267317 -> 1267549 (0.02 %) VGPRS: 896876 -> 895920 (-0.11 %) Spilled SGPRs: 24701 -> 26367 (6.74 %) Code Size: 48379452 -> 48507880 (0.27 %) bytes Max Waves: 241159 -> 241190 (0.01 %) Totals from affected shaders: SGPRS: 23584 -> 23816 (0.98 %) VGPRS: 25908 -> 24952 (-3.69 %) Spilled SGPRs: 503 -> 2169 (331.21 %) Code Size: 2471392 -> 2599820 (5.20 %) bytes Max Waves: 586 -> 617 (5.29 %) The codesize increases is related to Wolfenstein II it seems largely due to an increase in phis rather than the existing jumps. This gives +10% FPS with Doom on my Vega56. Rhys Perry also benchmarked Doom on his VEGA64: Before: 72.53 FPS After: 80.77 FPS v2: disable pass on non-AMD drivers Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1) Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-09 11:29:41 +10:00
Samuel Pitoiset	c25f63872b	radv: partially enable VK_KHR_shader_float16_int8 Only 8-bit integers for now, float16 requires a bit more work. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-01 18:53:59 +02:00
Rhys Perry	0af95f0ffc	radv: lower 16-bit flrp Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-01 09:58:48 +02:00
Samuel Pitoiset	8a6e61cc52	radv: do not lower frexp_exp and frexp_sig Hardware has two instructions. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-28 13:02:51 +01:00
Samuel Pitoiset	23d30f4099	spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass This lowering isn't needed for RADV because AMDGCN has two instructions. It will be disabled for RADV in an upcoming series. While we are at it, factorize a little bit. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-22 19:41:46 +01:00
Rhys Perry	037f11d42e	radv: enable VK_KHR_8bit_storage Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-21 09:02:27 +01:00
Jason Ekstrand	08f804ec0c	anv,radv,turnip: Lower TG4 offsets with nir_lower_tex v2: turn on for turnip as well (Karol Herbst) Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-03-21 02:58:41 +00:00
Samuel Pitoiset	71ffa00fc6	radv: enable lower_mul_2x32_64 Fixes: `58bcebd987` ("spirv: Allow [i/u]mulExtended to use new nir opcode") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-06 22:41:20 +01:00
Bas Nieuwenhuizen	13ab63bb62	radv: Implement VK_EXT_buffer_device_address. v2: Also update the release notes. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-06 22:37:38 +01:00
Samuel Pitoiset	5e7f800f32	radv: fix build Fixes: `9b9ccee4d6` ("radv: take LDS into account for compute shader occupancy stats") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-01 15:31:55 +01:00
Timothy Arceri	9b9ccee4d6	radv: take LDS into account for compute shader occupancy stats Ported from `d205faeb6c`. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-01 22:25:30 +11:00
Timothy Arceri	a53d68d318	ac/radv/radeonsi: add ac_get_num_physical_sgprs() helper Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-01 22:25:30 +11:00
Bas Nieuwenhuizen	ead54d4a42	radv/winsys: Set winsys bo priority on creation. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-01-29 15:56:41 +01:00
Karol Herbst	9b24028426	nir: rename nir_var_function to nir_var_function_temp Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-19 20:01:41 +01:00
Karol Herbst	d0c6ef2793	nir: rename global/local to private/function memory the naming is a bit confusing no matter how you look at it. Within SPIR-V "global" memory is memory accessible from all threads. glsl "global" memory normally refers to shader thread private memory declared at global scope. As we already use "shared" for memory shared across all thrads of a work group the solution where everybody could be happy with is to rename "global" to "private" and use "global" later for memory usually stored within system accessible memory (be it VRAM or system RAM if keeping SVM in mind). glsl "local" memory is memory only accessible within a function, while SPIR-V "local" memory is memory accessible within the same workgroup. v2: rename local to function as well v3: rename vtn_variable_mode_local as well Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 18:51:46 +01:00
Jason Ekstrand	05d72d6d48	spirv: Sort supported capabilities Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-07 18:41:15 -06:00
Jason Ekstrand	63b9aa2e25	spirv: Add support for using derefs for UBO/SSBO access For now, it's hidden behind a cap. Hopefully, we can eventually drop that along with all the manual offset code in spirv_to_nir. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	adc155a815	spirv: Add explicit pointer types Instead of baking in uvec2 for UBO and SSBO pointers and uint for push constant and shared memory pointers, make it configurable. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	fc9c4f89b8	nir: Move propagation of cast derefs to a new nir_opt_deref pass We're going to want to do more deref optimizations going forward and this gives us a central place to do them. Also, cast propagation will get a bit more complicated with the addition of ptr_as_array derefs. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Samuel Pitoiset	9606310081	radv: enable shaderStorageImageMultisample feature on GFX8+ Untested on older chips. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-20 18:01:19 +01:00
Samuel Pitoiset	0a7e767e58	radv: drop the amdgpu-skip-threshold=1 workaround for LLVM 8 This workaround has been introduced by `135e4d434f` for fixing DXVK GPU hangs with many games. It is no longer needed since LLVM r345718. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-20 12:09:57 +01:00
Ian Romanick	378f996771	nir/opt_peephole_select: Don't peephole_select expensive math instructions On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	09b7e1d8e4	nir/opt_peephole_select: Don't try to remove flow control around indirect loads That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Samuel Pitoiset	3fbdcd942f	amd: remove support for LLVM 6.0 User are encouraged to switch to LLVM 7.0 released in September 2018. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-06 14:02:56 +01:00
Nicolai Hähnle	8c97abc066	radv: include LLVM IR in the VK_AMD_shader_info "disassembly" Helpful for debugging compiler backend problems: this allows us to easily retrieve the LLVM IR from RenderDoc. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-11-09 14:54:37 +01:00
Samuel Pitoiset	b4eb029062	radv: implement VK_EXT_transform_feedback This implementation should work and potential bugs can be fixed during the release candidates window anyway. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:10:58 +01:00
Jason Ekstrand	28bb6abd1d	nir/validate: Print when the validation failed Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Timothy Arceri	3a95396f3c	radv: use nir_shrink_vec_array_vars() Totals from affected shaders: SGPRS: 1096 -> 1096 (0.00 %) VGPRS: 1192 -> 1056 (-11.41 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 100940 -> 94384 (-6.49 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 100 -> 112 (12.00 %) Wait states: 0 -> 0 (0.00 %) All affected shaders are from Batman Arkham City. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-18 15:04:09 +11:00
Timothy Arceri	8086fa1bcd	radv: use nir_split_array_vars() We call in the opt loop in case another pass results in an array with indirect access being turned into direct access. Totals from affected shaders: SGPRS: 512 -> 496 (-3.12 %) VGPRS: 456 -> 452 (-0.88 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 40040 -> 39664 (-0.94 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 41 -> 43 (4.88 %) Wait states: 0 -> 0 (0.00 %) All affected shaders are from Batman Arkham City. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-18 15:04:09 +11:00
Timothy Arceri	06675711e7	radv: use nir_opt_find_array_copies() Totals from affected shaders: SGPRS: 1112 -> 1112 (0.00 %) VGPRS: 1492 -> 1196 (-19.84 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 112172 -> 101316 (-9.68 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 93 -> 98 (5.38 %) Wait states: 0 -> 0 (0.00 %) All affected shaders are from "Batman: Arkham City" over DXVK. The pass detects that the temporary array created by DXVK for storing TCS inputs is a copy of the input arrays and allows us to avoid copying all of the input data and then indirecting on it with if-ladders, instead we just do indirect indexing. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-18 15:04:09 +11:00
Timothy Arceri	9d5b106b2e	radv: use nir_opt_copy_prop_vars and nir_opt_dead_write_vars Totals from affected shaders: SGPRS: 2856 -> 2856 (0.00 %) VGPRS: 3236 -> 3248 (0.37 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 236560 -> 233548 (-1.27 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 277 -> 283 (2.17 %) Wait states: 0 -> 0 (0.00 %) Even in the cases were we have increased VGPR use it appears the NIR is improved significantly. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-18 15:04:09 +11:00
Timothy Arceri	72e4287e8f	radv: make use of nir_lower_load_const_to_scalar() This allows NIR to CSE more operations. LLVM does this also so the impact is limited, however doing this in NIR allows other opts to make progress. For example in radeonsi more loops are unrolled in Civilization Beyond Earth. The actual pipeline-db stats are not overwhelming but even in the negatively affected shaders the NIR is clearly better. It just happens that the code shuffling and in some cases calls to max rather than a flt result in the final output from LLVM not giving as good numbers. However this is an incremental opt that further passes build off so the change should be made IMO. Totals from affected shaders: SGPRS: 20192 -> 20184 (-0.04 %) VGPRS: 19516 -> 19524 (0.04 %) Spilled SGPRs: 437 -> 444 (1.60 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 1527444 -> 1522276 (-0.34 %) bytes LDS: 6 -> 6 (0.00 %) blocks Max Waves: 1018 -> 1016 (-0.20 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-25 09:31:22 +10:00

... 4 5 6 7 8 ...

654 Commits