KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Bas Nieuwenhuizen	d97c892584	radv: Set the user SGPR MSB for Vega. Otherwise using 32 user SGPRs would be broken. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-09-16 12:50:58 +02:00
Jason Ekstrand	44ec31cd75	nir: Drop the vs_inputs_dual_locations option It was very inconsistently handled; the only things that made use of it were glsl_to_nir, glspirv, and nir_gather_info. In particular, nir_lower_io completely ignored it so anyone using nir_lower_io on 64-bit vertex attributes was going to be in for a shock. Also, as of the previous commit, it's set by every driver that supports 64-bit vertex attributes. There's no longer any reason to have it be an option so let's just delete it. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-09-06 16:07:50 -05:00
Samuel Pitoiset	24ee53231d	radv: remove dead variables after splitting per member structs Otherwise, nir_lower_clip_cull_distance_arrays might report wrong number of output clips/culls because it relies on shader output variables and some of them might be dead. This fixes a rendering issue with Dolphin and Super Mario Sunshine. Fixes: `b0c643d8f5` ("spirv: Use NIR per-member splitting") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107610 CC: 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-22 13:57:18 +02:00
Dave Airlie	b88468f15c	radv: return binary code_size not variant code size to cache The code sizes return here get passed to the cache shader insert function, which then memcpy from the code ptr, and causes all sorts of valgrind errors like: ==6755== Invalid read of size 8 ==6755== at 0x4C32FEE: memcpy@GLIBC_2.2.5 (vg_replace_strmem.c:1021) ==6755== by 0x2305D4C7: radv_pipeline_cache_insert_shaders (radv_pipeline_cache.c:416) ==6755== by 0x2305791D: radv_create_shaders (radv_pipeline.c:2158) ==6755== by 0x2305C523: radv_pipeline_init (radv_pipeline.c:3404) ==6755== by 0x2305C890: radv_graphics_pipeline_create (radv_pipeline.c:3515) ==6755== by 0x230188AB: radv_device_init_meta_blit_color (radv_meta_blit.c:871) ==6755== by 0x2301D50E: radv_device_init_meta_blit_state (radv_meta_blit.c:1278) ==6755== by 0x23011893: radv_device_init_meta (radv_meta.c:352) ==6755== by 0x2300744B: radv_CreateDevice (radv_device.c:1576) ==6755== by 0x5187D0F: ??? (in /usr/lib64/libvulkan.so.1.1.77) ==6755== by 0x518F6A3: ??? (in /usr/lib64/libvulkan.so.1.1.77) ==6755== by 0x5192A42: vkCreateDevice (in /usr/lib64/libvulkan.so.1.1.77) ==6755== Address 0x22a58548 is 4 bytes after a block of size 116 alloc'd ==6755== at 0x4C2EBAB: malloc (vg_replace_malloc.c:299) ==6755== by 0x23089DC4: ac_elf_read (ac_binary.c:144) ==6755== by 0x23090A60: ac_compile_module_to_binary (ac_llvm_helper.cpp:162) ==6755== by 0x23053F06: compile_to_memory_buffer (radv_llvm_helper.cpp:58) ==6755== by 0x23053F06: radv_compile_to_binary (radv_llvm_helper.cpp:98) ==6755== by 0x23052769: ac_llvm_compile (radv_nir_to_llvm.c:3394) ==6755== by 0x23052823: ac_compile_llvm_module (radv_nir_to_llvm.c:3418) ==6755== by 0x23053C05: radv_compile_nir_shader (radv_nir_to_llvm.c:3542) ==6755== by 0x23061B4E: shader_variant_create (radv_shader.c:580) ==6755== by 0x23061CFD: radv_shader_variant_create (radv_shader.c:634) ==6755== by 0x23057765: radv_create_shaders (radv_pipeline.c:2123) ==6755== by 0x2305C523: radv_pipeline_init (radv_pipeline.c:3404) ==6755== by 0x2305C890: radv_graphics_pipeline_create (radv_pipeline.c:3515) Since we are just inserting the code into the cache, we can avoid these bad reads and data in the cache by just using the binary code size here. Fixes: `939e5a382` (radv: add padding for the UMR disassembler) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-07-28 06:20:20 +10:00
Daniel Schürmann	62024fa775	radv: enable VK_KHR_16bit_storage extension / 16bit storage features Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-23 23:16:26 +02:00
Danylo Piliaiev	494a206229	radv: Fix incorrect assumption about ternary operator precedence Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-07-19 10:04:27 +02:00
Dave Airlie	6f3aee40f9	radv: using tls to store llvm related info and speed up compiles (v10) This uses the common compiler passes abstraction to help radv avoid fixed cost compiler overheads. This uses a linked list per thread stored in thread local storage, with an entry in the list for each target machine. This should remove all the fixed overheads setup costs of creating the pass manager each time. This takes a demo app time to compile the radv meta shaders on nocache and exit from 1.7s to 1s. It also has been reported to take the startup time of uncached shaders on RoTR from 12m24s to 11m35s (Alex) v2: fix llvm6 build, inline emit function, handle multiple targets in one thread v3: rebase and port onto new structure v4: rename some vars (Bas) v5: drag all code into radv for now, we can refactor it out later for radeonsi if we make it shareable v6: use a bit more C++ in the wrapper v7: logic bugs fixed so it actually runs again. v8: rebase on top of radeonsi changes. v9: drop some C++ headers, cleanup list entry v10: use pop_back (didn't have enough caffeine) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-10 07:58:03 +10:00
Dave Airlie	7398913a62	ac/radv: move llvm compiler info to struct and init in one place This ports radv to the shared code, however due to a bug in LLVM version prior to 7, radv cannot add target info at this stage, as it would leak one for every shader compile, however I'd prefer to keep this llvm damage in the shared code, since it isn't the driver at fault here. We just add a flag to denote if the driver can support leaking the target info or not, and the common code does the right thing depending on the llvm version. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-07-04 10:29:16 +10:00
Dave Airlie	35c82af539	radv/radeonsi: add a check ir tm options This doesn't do much yet, but it makes it easier to move the code to a common shared code base. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-07-04 05:32:35 +10:00
Dave Airlie	e1387eaf12	radv: create/destroy passmgr at the higher level. This is prep work for moving this to a per-thread struct Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-07-04 05:31:05 +10:00
Dave Airlie	f2b3e96e75	radv: drop copy of ac_create_target_machine. Once we split the init once stuff out, this can be shared again. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-07-04 05:15:35 +10:00
Dave Airlie	473be16c74	ac/radv: split the non-common init_once code from the common target code. (v2) This just splits out the non-shared code and reuses ac_get_llvm_target in radv. v2: rebase on Marek's patch - fixup brace position/whitespace Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-07-04 05:15:23 +10:00
Samuel Pitoiset	939e5a3823	radv: add padding for the UMR disassembler Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-02 10:42:17 +02:00
Samuel Pitoiset	bcbd8dd6c9	radv: enable VK_EXT_shader_stencil_export The driver already supports exporting the stencil value. The following CTS test now pass: dEQP-VK.pipeline.shader_stencil_export.op_replace Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-06-26 10:40:10 +02:00
Bas Nieuwenhuizen	8c4f430d43	radv: Enable lower_io_to_temporaries after deref changes. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-22 21:23:06 -07:00
Rob Clark	d143f6c856	move lower_deref_instrs Signed-off-by: Rob Clark <robdclark@gmail.com> Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-22 20:54:00 -07:00
Bas Nieuwenhuizen	3573570afe	radv: Disable lower_io_to_temporaries during deref changes. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-22 20:54:00 -07:00
Jason Ekstrand	c11833ab24	nir,spirv: Rework function calls This commit completely reworks function calls in NIR. Instead of having a set of variables for the parameters and return value, nir_call_instr now has simply has a number of sources which get mapped to load_param intrinsics inside the functions. It's up to the client API to build an ABI on top of that. In SPIR-V, out parameters are handled by passing the result of a deref through as an SSA value and storing to it. This virtue of this approach can be seen by how much it allows us to delete from core NIR. In particular, nir_inline_functions gets halved and goes from a fairly difficult pass to understand in detail to almost trivial. It also simplifies spirv_to_nir somewhat because NIR functions never were a good fit for SPIR-V. Unfortunately, there is no good way to do this without a mega-commit. Core NIR and SPIR-V have to be changed at the same time. This also requires changes to anv and radv because nir_inline_functions couldn't handle deref instructions before this change and can't work without them after this change. Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-22 20:15:58 -07:00
Jason Ekstrand	b0c643d8f5	spirv: Use NIR per-member splitting Before, we were doing structure splitting in spirv_to_nir. Unfortunately, this doesn't really work when you think about passing struct pointers into functions. Doing it later in NIR is a much better plan. Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-22 20:15:57 -07:00
Jason Ekstrand	74212c2414	anv,i965,radv,st,ir3: Call nir_lower_deref_instrs This inserts a call to nir_lower_deref_instrs at every call site of glsl_to_nir, spirv_to_nir, and prog_to_nir. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-06-22 20:15:54 -07:00
Eric Engestrom	d85fef1e34	radv: fix reported number of available VGPRs It's a bit late to round up after an integer division. Fixes: `de88979413` "radv: Implement VK_AMD_shader_info" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Alex Smith <asmith@feralinteractive.com>	2018-06-18 17:08:22 +01:00
Samuel Pitoiset	bfca15e16a	radv: add RADV_DEBUG=checkir This allows to run the LLVM verifier pass. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-06-15 15:54:08 +02:00
Samuel Pitoiset	135e4d434f	radv: add a workaround for DXVK hangs by setting amdgpu-skip-threshold Workaround for bug in llvm that causes the GPU to hang in presence of nested loops because there is an exec mask issue. The proper solution is to fix LLVM but this might require a bunch of work. This fixes a bunch of GPU hangs that happen with DXVK. Vega10: Totals from affected shaders: SGPRS: 110456 -> 110456 (0.00 %) VGPRS: 122800 -> 122800 (0.00 %) Spilled SGPRs: 7478 -> 7478 (0.00 %) Spilled VGPRs: 36 -> 36 (0.00 %) Code Size: 9901104 -> 9922928 (0.22 %) bytes Max Waves: 7143 -> 7143 (0.00 %) Code size slightly increases because it inserts more branch instructions but that's expected. I don't see any real performance changes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105613 Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-06-09 14:16:49 +02:00
Bas Nieuwenhuizen	38933c1151	radv: Add option to print errors even in optimized builds. Errors are not that common of a case so we can eat a slight perf hit in having to call a function and do a runtime check. In turn this makes debugging random errors happening for end users easier, because they don't have to have a debug build on hand. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-05-31 11:51:23 +02:00
Samuel Pitoiset	38a8c5903b	radv: call nir_lower_io_to_temporaries for VS, GS, TES and FS Do not lower FS inputs because this moves all load_var instructions at beginning of shaders and because interp_var_at_sample (and friends) seem broken. That might be eventually enabled later on if we really want to preload all FS inputs at beginning. Polaris10: Totals from affected shaders: SGPRS: 54072 -> 54264 (0.36 %) VGPRS: 38580 -> 38124 (-1.18 %) Spilled SGPRs: 652 -> 652 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Code Size: 2128116 -> 2127380 (-0.03 %) bytes Max Waves: 8048 -> 8086 (0.47 %) Vega10: Totals from affected shaders: SGPRS: 52616 -> 52656 (0.08 %) VGPRS: 37536 -> 37116 (-1.12 %) Spilled SGPRs: 828 -> 828 (0.00 %) Code Size: 2043756 -> 2042672 (-0.05 %) bytes Max Waves: 9176 -> 9254 (0.85 %) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-24 09:18:57 +02:00
Samuel Pitoiset	ded1509587	radv: call nir_split_var_copies() before nir_lower_var_copies() This doesn't nothing special currently because we don't create any copy_var instructions, but this is needed for the next patch. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-24 09:18:54 +02:00
Samuel Pitoiset	d8a61d3232	radv: set amdgpu-32bit-address-high-bits LLVM attribute Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-22 15:53:15 +02:00
Samuel Pitoiset	6211799aff	radv: remove the radv_finishme() when compiling shaders Having an entrypoint different than "main" doesn't mean we have multiple shaders per module. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-17 13:48:24 +02:00
Samuel Pitoiset	1e86eaf7d8	radv: remove radv_device::llvm_supports_spill It's always true. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-17 13:48:21 +02:00
Samuel Pitoiset	8ade3e4684	radv: allow to dump the GS copy shader with RADV_DEBUG="shaders" Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-05-14 12:38:00 +02:00
Timothy Arceri	ce188813bf	radv: add initial support for VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT When VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT is set we skip NIR linking optimisations and only run over the NIR optimisation loop once similar to the GLSLOptimizeConservatively constant used by some GL drivers. We need to run over the opts at least once to avoid errors in LLVM (e.g. dead vars it can't handle) and also to reduce the time spent compiling the IR in LLVM. With this change the Blacksmith Unity demos compilation times go from 329760 ms -> 299881 ms when using Wine and DXVK. V2: add bit to radv_pipeline_key Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106246	2018-05-13 09:58:33 +10:00
Samuel Pitoiset	3a410f0afc	radv: minor cleanups in radv_fill_shader_variant() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-05-11 12:35:05 +02:00
Iago Toral Quiroga	2d648e5ba3	compiler/lower_64bit_packing: rename the pass to be more generic It can do 32-bit packing too now. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-05-03 11:40:26 +02:00
Marek Olšák	43f0a10051	radeonsi: add triple into si_compiler Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Tested-by: Benedikt Schemmer <ben at besd.de> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-04-27 17:56:04 -04:00
Dave Airlie	f77caa7411	ac/radv/radeonsi: refactor max simd waves into common code. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-04-24 09:08:33 +10:00
Bas Nieuwenhuizen	dffdef6737	radv: Add Vega M support. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-04-19 16:36:21 +02:00
Bas Nieuwenhuizen	0e10790558	radv: Enable VK_EXT_descriptor_indexing. This adds everything except non-uniform indexing, which needs a bit more work and testing. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-04-18 22:56:54 +02:00
Daniel Schürmann	f2c6a55061	radv: enable subgroup capabilities Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-04-14 01:03:15 +02:00
Samuel Pitoiset	466aba9fa2	radv: add RADV_NUM_PHYSICAL_VGPRS constant Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-04-09 14:28:13 +02:00
Samuel Pitoiset	2f7bb93146	radv: add radv_get_num_physical_sgprs() helper Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-04-09 14:28:13 +02:00
Samuel Pitoiset	acf60abc54	radv: enable VK_EXT_shader_viewport_index_layer The driver already supports exporting the Layer and ViewportIndex built-ins from vertex or tessellation shaders. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-04-03 14:05:46 +02:00
Daniel Schürmann	b91cd5dba4	radv: enable VK_AMD_shader_trinary_minmax extension Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-03-29 01:29:39 +02:00
Timothy Arceri	9a243eccae	radv: don't lower indirects until after opts have run Noticed while passing by. Not sure if it impacts anything, but likely to impact GFX9 more than anything else since we lower inputs, outputs and locals there. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-20 15:01:44 +11:00
Dave Airlie	e8d9b7ab02	radv: lower constant initializers on output variables earlier If a shader only writes to an output via a constant initializer we need to lower it before we call nir_remove_dead_variables so that this pass sees the stores from the initializer and doesn't kill the output. Fixes test failures in new work-in-progress CTS tests: dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output.float This is ported from anv: `99b57daf4a` anv/pipeline: lower constant initializers on output variables earlier from Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-03-19 19:29:40 +00:00
Samuel Pitoiset	e96a1d27dc	radv: run nir_opt_move_load_ubo Polaris10: SGPRS: 108560 -> 107856 (-0.65 %) VGPRS: 74576 -> 74520 (-0.08 %) Spilled SGPRs: 7375 -> 7113 (-3.55 %) Code Size: 4273464 -> 4274364 (0.02 %) bytes Max Waves: 9434 -> 9446 (0.13 %) Vega10: Totals from affected shaders: SGPRS: 108264 -> 107576 (-0.64 %) VGPRS: 69068 -> 69000 (-0.10 %) Spilled SGPRs: 7221 -> 6959 (-3.63 %) Code Size: 3800796 -> 3801496 (0.02 %) bytes Max Waves: 10687 -> 10709 (0.21 %) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-03-16 09:58:19 +01:00
Dave Airlie	010d055aae	radv: drop tess offchip layout for tcs. This removes the last TCS specific user sgpr. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-16 05:22:54 +00:00
Samuel Pitoiset	81818662a5	radv: record LLVM IR when debugging shaders If AMD_shader_info or RADV_TRACE_FILE is used we might need to keep trace of LLVM IR. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-15 17:20:03 +01:00
Samuel Pitoiset	d07edf5fdf	radv: add dump_shader to the NIR compiler options Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-15 17:20:00 +01:00
Alejandro Piñeiro	50767214a7	spirv/radv: add AMD_gcn_shader capability, remove current extensions So now, during spirv_to_nir, it uses the capability instead of the extension. Note that we are really doing here is treating SPV_AMD_gcn_shader as other supported extensions. SPV_AMD_gcn_shader is not the first SPV extension supported. For example, the capability draw_parameters infers if the extension SPV_KHR_shader_draw_parameters is supported or not. This could be seen as counter-intuitive, and that it would be easier to define which extensions are supported, and based our checks on that, but we need to take into account that some capabilities are optional from core, and others came from new extensions. Also this commit would make the implementation of ARB_spirv_extensions easier. v2: AMD_gcn_shader capability renamed to gcn_shader (Daniel Schürmann) Reviewed-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-15 12:08:25 +01:00
Samuel Pitoiset	fbe694562b	ac/nir: move ac_nir_compiler_options and friends to radv folder Also replace ac_ by radv_. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-13 16:54:23 +01:00
Samuel Pitoiset	237229430f	ac: move ac_shader_info to radv folder This is RADV specific code. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-13 16:54:21 +01:00
Samuel Pitoiset	b2653007b9	ac/nir: move all RADV related code to radv_nir_to_llvm.c Now the "ac/nir" prefix will really be the shared code between RadeonSI and RADV, that might avoid confusions in the future. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-13 14:05:06 +01:00
Daniel Schürmann	ffbf75cde4	radv: enable AMD_gcn_shader extension Signed-off-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-07 23:09:58 +01:00
Bas Nieuwenhuizen	5240fddb9d	radv: Add trivial device group implementation. Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-03-07 21:18:35 +01:00
Bas Nieuwenhuizen	8f9af587a2	radv: Add minimal subgroup support. Deliberately not implementing workgroup scopes as that is not needed for core vulkan. Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-03-07 21:18:35 +01:00
Samuel Pitoiset	e96e6f60f7	radv: report the scratch private memory size with shader stats Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-03-06 10:38:42 +01:00
Timothy Arceri	0f2c7341e8	ac/radv: move lower_indirect_derefs() to ac_nir_to_llvm.c Until llvm handles indirects better we will need to use these workarounds in the radeonsi backend also. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-05 14:09:23 +11:00
Samuel Pitoiset	7aa008d1d7	radv: enable lowering of fpow to fexp2 and flog2 There is no fpow in hardware, so it's always lowered somewhere, but it appears that lowering at NIR level is better. Figured while comparing compute shaders between RadeonSI and RADV. Polaris10: Totals from affected shaders: SGPRS: 18936 -> 18904 (-0.17 %) VGPRS: 12240 -> 12220 (-0.16 %) Spilled SGPRs: 2809 -> 2809 (0.00 %) Code Size: 718116 -> 719848 (0.24 %) bytes Max Waves: 1409 -> 1410 (0.07 %) Vega10: Totals from affected shaders: SGPRS: 18392 -> 18392 (0.00 %) VGPRS: 12008 -> 11920 (-0.73 %) Spilled SGPRs: 3001 -> 2981 (-0.67 %) Code Size: 777444 -> 778788 (0.17 %) bytes Max Waves: 1503 -> 1504 (0.07 %) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-02-22 20:40:47 +01:00
Bas Nieuwenhuizen	05d84ed68a	radv: Always lower indirect derefs after nir_lower_global_vars_to_local. Otherwise new local variables can cause hangs on vega. CC: <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105098 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-02-15 23:45:59 +01:00
Samuel Pitoiset	3488a3f033	radv: run nir_opt_shrink_load LLVM can't shrink loads. Polaris10: Totals from affected shaders: SGPRS: 62528 -> 59955 (-4.11 %) VGPRS: 44708 -> 44616 (-0.21 %) Spilled SGPRs: 16 -> 8 (-50.00 %) Code Size: 1355504 -> 1355172 (-0.02 %) bytes Max Waves: 11710 -> 11670 (-0.34 %) Vega10: Totals from affected shaders: SGPRS: 51448 -> 50371 (-2.09 %) VGPRS: 39140 -> 39048 (-0.24 %) Spilled SGPRs: 16 -> 16 (0.00 %) Code Size: 1307188 -> 1304296 (-0.22 %) bytes Max Waves: 11312 -> 11292 (-0.18 %) This reduces SGPRs spilling in MadMax, and it also reduces number of SGPRs in DOW3 and F12017. The number of waves slightly decreases in F1 but I don't see any performance changes after benchmarking it. Talos and Serious Sam are not affected because they don't use any push constants. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-02-06 23:08:44 +01:00
Timothy Arceri	5b8de4bdff	nir: add vs_inputs_dual_locations compiler option Allows nir drivers to either use a single or dual locations for vs double inputs. i965 uses dual locations for both OpenGL and Vulkan drivers, for now gallium OpenGL drivers only use a single location. The following patch will also make use of this option when calling nir_shader_gather_info(). Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-01-30 09:08:47 +11:00
Samuel Pitoiset	33e6e5e6a4	radv: add an option that allows to dump pre-optimization ir With RADV_DEBUG=preoptir. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-01-22 12:28:33 +01:00
Bas Nieuwenhuizen	0f89f9b8eb	radv: Replace an assert with unreachable. Otherwise we get uninitialized variable warnings for es_vgpr_comp_cnt. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-01-19 00:38:45 +01:00
Timothy Arceri	f0d74ecce8	radv/radeonsi/nir: lower 64bit flrp Fixes a bunch of arb_gpu_shader_fp64 piglit tests for example: generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-mix-double-double-double.shader_test Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-01-13 18:04:40 +11:00
Samuel Pitoiset	4e701cf75c	radv/gfx9: calculate the number of ES VGPRs for merged shaders Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-01-10 12:31:53 +01:00
Samuel Pitoiset	232c418af5	radv/gfx9: enable LDS for GS only if the ES type is TES Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-01-10 12:31:51 +01:00
Samuel Pitoiset	b462ceb482	radv/gfx9: do not load VGPR1 when GS uses points or lines VGPR1 is only needed for topology that needs 3 offsets like triangles or quads. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-01-08 21:24:53 +01:00
Samuel Pitoiset	a3c2a86757	radv: make shader BOs read-only for the GPU Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-01-08 21:24:51 +01:00
Samuel Pitoiset	2670ebb584	radv/gfx9: reduce the number of input VGPRs for the GS stage This can still be improved, but let's start with this. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-01-04 18:43:25 +01:00
Samuel Pitoiset	4237c3d645	radv: properly load unused gl_LocalInvocationID/gl_WorkGroupID components F1 2017 looks good now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-12-19 21:26:25 +01:00
Samuel Pitoiset	bb01661918	Revert "radv: do not load unused gl_LocalInvocationID/gl_WorkGroupID components" This reverts commit `2294d35b24`. We can't do this without adjusting the input SGPRs/VGPRs logic. For now, just revert it. I will send a proper solution later. It fixes a rendering issue in F1 2017 that CTS didn't catch up. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-12-18 11:50:02 +01:00
Samuel Pitoiset	90c3bf0789	radv: do not load the local invocation index when it's unused Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-12-14 22:22:26 +01:00
Samuel Pitoiset	2294d35b24	radv: do not load unused gl_LocalInvocationID/gl_WorkGroupID components We should also not load the input SGPRs and VGPRS, but let's start with this for now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-12-14 22:22:06 +01:00
Jason Ekstrand	e19c623128	spirv: Convert the supported_extensions struct to spirv_options This is a bit more general and lets us pass additional options into the spirv_to_nir pass beyond what capabilities we support. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2017-12-02 08:09:11 -08:00
Samuel Pitoiset	921986b580	radv: do not dump meta shaders with RADV_DEBUG=shaders It's really annoying and this pollutes the output especially when a bunch of non-meta shaders are compiled. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-12-01 11:38:26 +01:00
Samuel Pitoiset	cd64a4f705	radv: use vk_error() everywhere an error is returned For consistency and it might help for debugging purposes. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-11-13 11:05:26 +01:00
Dave Airlie	3bf8be41b8	radv: pre-calculate user_data_0 registers and store in pipeline There's no point recalculating these the whole time on descriptor emission, just store them at pipeline creation. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-11-06 21:44:49 +00:00
Alex Smith	134a40d2a6	radv: Fix -Wformat-security issue Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103513 Fixes: `de88979413` ("radv: Implement VK_AMD_shader_info") Signed-off-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-10-30 10:58:56 +01:00
Alex Smith	de88979413	radv: Implement VK_AMD_shader_info This allows an app to query shader statistics and get a disassembly of a shader. RenderDoc git has support for it, so this allows you to view shader disassembly from a capture. When this extension is enabled on a device (or when tracing), we now disable pipeline caching, since we don't get the shader debug info when we retrieve cached shaders. v2: Improvements to resource usage reporting v3: Disassembly string must be null terminated (string_buffer's length does not include the terminator) v4: Fixed LDS reporting. (Bas) Signed-off-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-10-29 00:28:45 +02:00
Dave Airlie	a639d40f13	radv: add support for local bos. (v3) This uses the new kernel interfaces for reduced cs overhead, We only set the local flag for memory allocations that don't have a dedicated allocation and ones that aren't imports. v2: add to all the internal buffer creation paths. v3: missed some command submission paths, handle 0/empty bo lists. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-10-26 23:59:28 +01:00
Timothy Arceri	f0a2bbd1a4	radv: move nir print after linking is done We now have linking optimisations so we want to delay dumping the nir until after these are complete. Fixes: `06f05040eb` (radv: Link shaders) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-10-24 10:41:38 +11:00
Bas Nieuwenhuizen	c07d719e8b	radv: Disallow indirect outputs for GS on GFX9 as well. Since it also uses the output vector before writing to memory. Fixes: `e38685cc62` 'Revert "radv: disable support for VEGA for now."' Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2017-10-23 00:27:44 +02:00
Bas Nieuwenhuizen	6ce550453f	radv: Don't use vgpr indexing for outputs on GFX9. Due to LLVM bugs. Fixes a bunch of dEQP-VK.glsl.indexing.* tests. Fixes: `e38685cc62` 'Revert "radv: disable support for VEGA for now."' Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-10-22 02:36:37 +02:00
Jason Ekstrand	59fb59ad54	nir: Get rid of nir_shader::stage It's redundant with nir_shader::info::stage. Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2017-10-20 12:49:17 -07:00
Bas Nieuwenhuizen	73749caf0e	radv: calculate and emit GFX9 GS registers to pipeline state. Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-10-20 06:23:47 +01:00
Timothy Arceri	087e010b2b	radv: copy indirect lowering settings from radeonsi It looks the original indirect mask was probably copied from ANV. Sascha Willems demo results: tessellation ~4000 -> ~4200 fps V2: continue lowering local indirects due to llvm deficiencies. Tested-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-10-20 08:01:26 +11:00
Bas Nieuwenhuizen	228325f4b7	radv: Modify rsrc1/rsrc2 generation for merged tess. No OC_LDS_EN for HS, and the included LS vgpr_comp_cnt is at a different offset. Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-10-19 22:25:44 +02:00
Bas Nieuwenhuizen	91b033f4f6	radv: Update GFX9 user data regs for GS/tess. Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-10-19 22:25:27 +02:00
Bas Nieuwenhuizen	ce03c119ce	radv: Add code to compile merged shaders. Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-10-19 22:25:23 +02:00
Bas Nieuwenhuizen	a996ed1f9b	ac/nir: Change interface to allow multiple source shaders. Reviewed-by: Dave Airlie <airlied@redhat.com>	2017-10-19 22:24:47 +02:00
Bas Nieuwenhuizen	06f05040eb	radv: Link shaders. Here we make use of NIR the linking helpers to remove unused varyings. Sascha Willems demo results: computecullandlod 39 -> 41 fps pipelines ~6100 -> ~6200 fps Signed-off-by: Bas Nieuwenhuizen <basni@google.com> Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com> Acked-by: Dave Airlie <airlied@redhat.com>	2017-10-18 09:19:35 +11:00
Timothy Arceri	7664aaf331	radv: remove duplicate debug_flags field Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-10-12 08:52:38 +11:00
Dave Airlie	2c61594d84	radv: lower ffma in nir. So it appears the Vulkan SPIR-V fma opcode can be equivalent to a mad operation, and the fma hw opcode on AMD hw is issued like a double opcode so is slower. Also the radeonsi stack does this. This appears to improve performance on a number of games from Feral, and thanks to Feral for noticing the problem. I'm reposting this one as Marek indicated he thinks this is what we should be doing on AMD hw. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Cc: "17.2" <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-10-11 07:31:27 +10:00
Marek Olšák	7b697c8b78	amd: move r600d_common.h into r600g Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-10-09 16:27:06 +02:00
Samuel Pitoiset	844ae722c4	radv: dump SPIRV when a GPU hang is detected Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-10-04 19:37:08 +02:00
Samuel Pitoiset	a2a350a3be	radv: dump NIR when a GPU hang is detected This looks a bit ugly to me, but the existing codepath is not terribly elegant as well. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-10-04 19:37:08 +02:00
Samuel Pitoiset	80b8d9f7e7	radv: add radv_shader_dump_stats() helper To dump the shader stats when a hang is detected. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-09-14 10:37:57 +02:00
Dave Airlie	64d9bd149a	radv/nir: call opt_remove_phis after trivial continues. With the shaders in the ssao demo, the nir_opt_if wasn't working properly without this, after this the if gets optimised so that loop unrolling gets called. (loop unrolling fails due to instruction count, but at least it gets to do that.) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Cc: "17.2" <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-09-13 21:13:03 +01:00
Samuel Pitoiset	885d75760b	radv: keep track of the disasm string in debug mode only This will allow to dump the active shaders when a hang is detected. Only the ASM will be dumped for now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-09-08 17:18:17 +02:00
Samuel Pitoiset	92db23f3f9	radv: add shader_variant_create() helper function Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-09-08 17:17:40 +02:00
Samuel Pitoiset	47efc5264a	radv: drop 'dump' parameters from some shader related functions The device object contains the debug flags. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-09-08 17:17:40 +02:00
Samuel Pitoiset	d4d777317b	radv: move shaders related code to radv_shader.c Reduce size of radv_pipeline.c and improve code isolation. More code can probably moved but it's a start. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2017-09-08 17:17:40 +02:00

... 10 11 12 13 14

652 Commits