KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Samuel Pitoiset	6cb455c418	radv: remove unused shader_info parameter in ac_compile_llvm_module() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-30 09:33:17 +02:00
Samuel Pitoiset	9aaca90123	radv: remove some unused fields from radv_shader_context Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-30 09:33:15 +02:00
Samuel Pitoiset	9f2fd23f99	ac: drop now useless lookup_interp_param from ABI Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-30 08:23:56 +02:00
Samuel Pitoiset	a63719db6a	ac: import linear/perspective PS input parameters from radv/radeonsi Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-30 08:23:54 +02:00
Samuel Pitoiset	49f5ddd3ae	radv: make use of has_ls_vgpr_init_bug Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-27 08:04:51 +02:00
Connor Abbott	b7acf38073	ac/nir: Remove gfx9_stride_size_workaround_for_atomic The workaround was entirely in common code, and it's needed in radeonsi too so just always do it when necessary. Fixes KHR-GL45.shader_image_load_store.advanced-allStages-oneImage on gfx9 with LLVM 8. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-08-26 11:00:49 +02:00
Samuel Pitoiset	1fd60db4a1	ac,radv,radeonsi: remove LLVM 7 support Now that LLVM 9 will be released soon, we will only support LLVM 8, 9 and master (10). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-23 08:12:34 +02:00
Marek Olšák	223b3174bd	radeonsi/nir: always lower ballot masks as 64-bit, codegen handles it This fixes KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks. This solution is better, because the IR isn't dependent on wave32.	2019-08-19 17:23:38 -04:00
Bas Nieuwenhuizen	739a2880f5	radv: Get max workgroup size without nir. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	035406ecf7	radv: Put wave size in shader options/info. Instead of having the three values everywhere. This is also more future proof if we want the driver to make those decisions eventually. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 13:32:18 +00:00
Bas Nieuwenhuizen	72e7b7a00b	ac/nir,radv: Optimize bounds check for 64 bit CAS. When the application does not ask for robust buffer access. Only implemented the check in radv. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-08-02 21:21:55 +02:00
Samuel Pitoiset	8a86908e9a	radv/gfx10: add Wave32 support for vertex, tessellation and geometry shaders It can be enabled with RADV_PERFTEST=gewave32. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-02 09:37:36 +02:00
Samuel Pitoiset	953bbacc23	radv/gfx10: add Wave32 support for fragment shaders It can be enabled with RADV_PERFTEST=pswave32. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-02 09:37:34 +02:00
Samuel Pitoiset	ea38565011	radv/gfx10: add Wave32 support for compute shaders It can be enabled with RADV_PERFTEST=cswave32. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-31 09:35:04 +02:00
Daniel Schürmann	45638e14fb	radv: Don't include radv_private.h from radv_shader.h This patch decouples radv_shader.h from any LLVM dependency. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-30 10:29:11 +02:00
Samuel Pitoiset	383c2e625a	radv/gfx10: declare streamout user SGPRs Required for legacy streamout. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-24 08:23:30 +02:00
Samuel Pitoiset	ea337c8b7e	radv/gfx10: fix VS input VGPRs with the legacy path For some reasons, InstanceID is VGPR3 although StepRate0 is set to 1. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-24 08:23:21 +02:00
Samuel Pitoiset	915abbe932	radv/gfx10: update descriptors for inline uniform blocks Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 09:02:42 +02:00
Samuel Pitoiset	d76746c1ff	radv/gfx10: emit the GS NGG prologue before the nested barrier Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 09:02:39 +02:00
Marek Olšák	81091a5183	ac: create the LLVM builder in ac_llvm_context_init Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	eb54b8c222	ac: create the LLVM module for Wave32 or Wave64 in ac_llvm_context_init Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	9e467d111b	ac: initial Wave32 support in LLVM build helpers Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Dave Airlie	2ac2b98780	radv: fix crash in shader tracing. Enabling tracing, and then having a vmfault, can leads to a segfault before we print out the traces, as if a meta shader is executing and we don't have the NIR for it. Just pass the stage and give back a default. Fixes: `9b9ccee4d6` ("radv: take LDS into account for compute shader occupancy stats") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 11:00:25 +10:00
Samuel Pitoiset	8315dbe419	radv/gfx10: do not always execute a barrier before the second shader With NGG, empty waves may still be required to export data. This fixes dEQP-VK.ycbcr.format._unorm.geometry_. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-18 10:06:34 +02:00
Samuel Pitoiset	07ff367442	radv/gfx10: implement VK_EXT_post_depth_coverage I did implement this extension a while ago but it didn't work on pre GFX10 for some reasons. Now all CTS pass. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-17 08:32:39 +02:00
Samuel Pitoiset	17464d205c	radv: pass output values to radv_emit_stream_output() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-16 11:16:58 +02:00
Samuel Pitoiset	219dc1b25c	radv: restore an assertion in handle_vs_outputs() The NGG GS epilogue no longers call that function so the assertion is just useless now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-16 11:16:53 +02:00
Samuel Pitoiset	68603b767f	radv/gfx10: emit ES outputs of TES when it's not NGG Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-16 11:16:51 +02:00
Samuel Pitoiset	1b2bfeaaaa	radv: fix gathering clip/cull distance masks for GS For NGG, the driver relies on the VS outinfo struct. This fixes dEQP-VK.clipping.user_defined.clip__vert_tess_geom_ Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-16 10:09:37 +02:00
Samuel Pitoiset	994253b400	radv/gfx10: add missing conversions for 16-bit exports This fixes dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_* Found with RADV_DEBUG=checkir Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-16 08:12:34 +02:00
Samuel Pitoiset	d8844533af	radv: remove unused code in radv_export_param() It was hack for geometry shaders. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-16 08:12:20 +02:00
Samuel Pitoiset	b650f3d197	radv/gfx10: export the PrimitiveID for ES stages (VS or TES) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-15 11:30:10 +02:00
Samuel Pitoiset	8175f6269b	radv/gfx10: declare an external symbol for the ESGS ring It will be used for stream output but for now only declares it if VS and if the PrimitiveID needs to be exported. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-15 11:30:08 +02:00
Samuel Pitoiset	4478f14327	radv/gfx10: fix crash when emitting NGG GS prologue ac_nir_context is initialized after the driver emits the NGG GS prologue so it's likely to crash. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-15 08:51:53 +02:00
Samuel Pitoiset	958ee4c21a	radv: report shader stage name when dumping LLVM IR For debugging purposes. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 08:19:53 +02:00
Samuel Pitoiset	c6fa4de15d	radv/gfx10: do not set alignment on the ngg_emit pointer This is invalid and this fixes a crash in LLVM. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 08:19:53 +02:00
Samuel Pitoiset	df0a23ad1e	radv/gfx10: fix exporting clip/cull distances for GS This fixes dEQP-VK.clipping.user_defined.clip_distance.geom. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 08:19:53 +02:00
Samuel Pitoiset	edcd2bc833	radv/gfx10: fix exporting the subpass view index for GS This fixes dEQP-VK.multiview.geometry. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 08:19:20 +02:00
Bas Nieuwenhuizen	0a8ef756d3	radv/gfx10: Fix NGG GS output mask handlings for LDS indexing. In emit_vertex we optimize storage if the output mask does not have all bits set. Do the same in the epilogue so the indices actually match up. Fixes dEQP-VK.geometry.input.basic_primitive.points because it outputs PSIZE with an output mask of 1, which cause the generic attribute for the color to be loaded from the wrong indices. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-11 15:45:59 +02:00
Bas Nieuwenhuizen	f5982917ff	radv/gfx10: Simplify output mask handling for NGG GS. We only ever get in this function for a NGG GS proper. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-11 15:45:58 +02:00
Bas Nieuwenhuizen	7515f41c78	radv/gfx10: Do GS prologue outside of gs_threads if. Mirror radeonsi. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-11 15:45:56 +02:00
Samuel Pitoiset	5bbcb3f5bc	radv/gfx10: implement support for GS as NGG Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-11 15:45:53 +02:00
Samuel Pitoiset	51e2124a4b	radv: switch to the new VS exports path It will help for GS as NGG on GFX10. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 23:37:02 +02:00
Samuel Pitoiset	f616d80a7a	radv: set the slot_index correctly for VARYING_SLOT_CLIP_DIST1 For selecting a different SQ_EXP_POS target. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 23:36:59 +02:00
Samuel Pitoiset	c4ab33378a	radv: add a new function for exporting VS outputs Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 23:36:57 +02:00
Samuel Pitoiset	ac0edc369c	radv: implement new path for exporting generic varyings Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 23:36:55 +02:00
Samuel Pitoiset	0b368fc8c3	radv: use the generic export path for clip/cull distances When they are exported to the next stage. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 23:36:52 +02:00
Samuel Pitoiset	f653e5c1d6	radv: remove an extra memcpy when exporting clip/cull distances Cleanup. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 23:36:50 +02:00
Samuel Pitoiset	3303bc8b74	radv: remove extra code for exporting LayerID to the next stage Now that the output usage mask is set to 0x1 the LayerID is correctly exported in the loop above. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 15:17:08 +02:00
Bas Nieuwenhuizen	14291342ec	radv: Add a common member in the union to make things more clear. This clarifies that the struct can be used when the shader can be one of VS/TES. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-09 09:59:07 +00:00
Bas Nieuwenhuizen	f9070743a9	Revert "radv: keep track of whether NGG is used for GS on GFX10" This reverts commit `63e0675d98`. The GS is merged with the preceding shader and since the preceding shader will have as_ngg set the final binary will have is_ngg set. So we do not need the gs key here. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-09 09:59:07 +00:00
Samuel Pitoiset	63e0675d98	radv: keep track of whether NGG is used for GS on GFX10 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 09:54:19 +02:00
Samuel Pitoiset	2974df819e	radv: set max workgroup size to 128 for TES as NGG on GFX10 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 09:54:12 +02:00
Samuel Pitoiset	53c75f17ec	radv: fix allocating USER SGPRs on GFX10 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 09:54:11 +02:00
Bas Nieuwenhuizen	9a8e4a07ad	radv/gfx10: Add tess eval ngg shader support. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 12:04:20 +10:00
Connor Abbott	118a66df99	radv: Use NIR barycentric intrinsics We have to add a few lowering to deal with things that used to be dealt with inline when creating inputs. We also move the code that fills out the radv_shader_variant_info struct for linking purposes to radv_shader.c, as it's no longer tied to the NIR->LLVM lowering. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:18:25 +02:00
Connor Abbott	27f0c3c15e	radv: Make FragCoord a sysval load_fragcoord is already handled in common code for radeonsi, so we don't need to do anything to handle it. However, there were some passes creating NIR with the varying, so we switch them over to the sysval. In the case of nir_lower_input_attachments which is used by both radv and anv, we add handling for both until intel switches to using a sysval. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Daniel Schürmann	e41e932e57	radv: Lower input attachments in NIR. v2 (Connor) - Fix warning in release mode using MAYBE_UNUSED Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Daniel Schürmann	c65e880a65	radv: Implement nir_intrinsic_load_layer_id(). Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-08 14:14:53 +02:00
Bas Nieuwenhuizen	4d118ad44a	radv/gfx10: Move NGG output handling outside of giant if-statement. In merged shaders we put a big if around each shader, so both stages can have a different number of threads. However, the NGG output code still needs to run if the first shader is not executed. This can happen when there are more gs threads than vs/es threads, or when there are 0 es/vs threads (why? no clue). Fixes: `ee21bd7440` "radv/gfx10: implement NGG support (VS only)" Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-08 01:49:54 +02:00
Samuel Pitoiset	ee21bd7440	radv/gfx10: implement NGG support (VS only) This needs to be cleaned up a bit, and it probably contains missing stuff and/or bugs. This doesn't fix the "half of the triangles" issue. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Bas Nieuwenhuizen	9e37609d0b	radv: Combine vs and tes output keys parts. That way the same deref is valid for both shader stages. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	ce3b5d4c17	radv/gfx10: do not declare streamout SGPRS Streamout is completely different on GFX10. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	4c31f3dcc0	radv/gfx10: fix PS exports for SPI_SHADER_32_AR Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:39 +02:00
Samuel Pitoiset	34b185cc43	radv/gfx10: fix a possible hang with exp pos0 with done=0 and exec=0 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Marek Olšák	8a71f60194	ac: replace glc,slc with cache_policy for loads cosmetic change Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-04 15:38:56 -04:00
Marek Olšák	a29e781961	ac: replace glc,slc with cache_policy for stores cosmetic change Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-04 15:38:54 -04:00
Bas Nieuwenhuizen	6a220e67ce	radv: Switch to using rtld. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen	5ff651c0a7	radv: Move more stuff to variant create time. Due to them depending on the linker result. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen	726a31df70	radv: Add the concept of radv shader binaries. This simplifies a bunch of stuff by (1) Keeping all the things in a single allocation, making things easier for the cache. (2) creating a shader_variant creation helper. This is immediately put to use by creating rtld shader binaries. This is the main reason for the binaries, as we need to do the linking at upload time, i.e. post caching. We do not enable rtld yet. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen	43f2f01cc8	radv: Add export_prim_id to the shader variant info. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen	15046ef7c8	radv: use last nir shader to determine stage in postprocessing Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-04 10:52:26 +00:00
Samuel Pitoiset	d8b079e4c7	radv: rework how the number of VGPRs is computed Just a cleanup, it shouldn't change anything. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-01 14:59:27 +02:00
Samuel Pitoiset	d5004f60be	radv: only export clip/cull distances if PS reads them The only exception is the GS copy shader which emits them unconditionally. Totals from affected shaders: SGPRS: 71320 -> 71008 (-0.44 %) VGPRS: 54372 -> 54240 (-0.24 %) Code Size: 2952628 -> 2941368 (-0.38 %) bytes Max Waves: 9689 -> 9723 (0.35 %) This helps Dota2, Doom, GTAV and Hitman 2. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-27 08:56:37 +02:00
Connor Abbott	3bf8981c51	ac,radeonsi: Always mark buffer stores as inaccessiblememonly inaccessiblememonly means that it doesn't modify memory accesible via normal LLVM pointers. This lets LLVM's dead store elimination, memcpy forwarding, etc. ignore functions with this attribute. We don't represent descriptors as pointers, so this property is always true of buffer and image stores. There are plans to represent descriptors via pointers, but this just means that now nothing is inaccessiblememonly, as LLVM will then understand loads/stores via its usual alias analysis. Radeonsi was mistakenly only setting it if the driver could prove that there were no reads, and then it was cargo-culted into ac_llvm_build and ac_llvm_to_nir. Rip it out of everything. statistics with nir enabled: Totals from affected shaders: SGPRS: 152 -> 152 (0.00 %) VGPRS: 128 -> 132 (3.12 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 9324 -> 9244 (-0.86 %) bytes LDS: 2 -> 2 (0.00 %) blocks Max Waves: 17 -> 17 (0.00 %) Wait states: 0 -> 0 (0.00 %) The only difference was a manhattan31 shader. Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-19 14:08:27 +02:00
Samuel Pitoiset	33f4e04d5a	ac,radv: do not emit vec3 for raw load/store on SI It's unsupported, only load/store format with vec3 are supported. Fixes: `6970a9a6ca` ("ac,radv: remove the vec3 restriction with LLVM 9+")" Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-04 08:47:26 +02:00
Nicolai Hähnle	f480b8aaa4	amd/common: use generated register header	2019-06-03 20:05:20 -04:00
Marek Olšák	486bc1e17e	ac: use amdgpu-flat-work-group-size Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-03 14:32:47 -04:00
Samuel Pitoiset	6970a9a6ca	ac,radv: remove the vec3 restriction with LLVM 9+ This changes requires LLVM r356755. 32706 shaders in 16744 tests Totals: SGPRS: 1448848 -> 1455984 (0.49 %) VGPRS: 1016684 -> 1016220 (-0.05 %) Spilled SGPRs: 25871 -> 25815 (-0.22 %) Spilled VGPRs: 122 -> 122 (0.00 %) Scratch size: 11964 -> 11956 (-0.07 %) dwords per thread Code Size: 55324500 -> 55301152 (-0.04 %) bytes Max Waves: 235660 -> 235586 (-0.03 %) Totals from affected shaders: SGPRS: 293704 -> 300840 (2.43 %) VGPRS: 246716 -> 246252 (-0.19 %) Spilled SGPRs: 159 -> 103 (-35.22 %) Scratch size: 188 -> 180 (-4.26 %) dwords per thread Code Size: 8653664 -> 8630316 (-0.27 %) bytes Max Waves: 60811 -> 60737 (-0.12 %) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-03 11:30:08 +02:00
Marek Olšák	ccfcb9d818	ac: rename SI-CIK-VI to GFX6-GFX7-GFX8 Acked-by: Dave Airlie <airlied@redhat.com> We already use GFX9 and I don't want us to have confusing naming in the driver. GFXn naming is better from the driver perspective, because it's the real version of the gfx portion of the hw. Also, CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI. It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have nothing to do with GFXn and they have their own version numbers.	2019-05-15 20:54:10 -04:00
Marek Olšák	6b0b8f132a	ac: use 1D GEPs for descriptors and constants just a cleanup Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-14 15:15:11 -04:00
Bas Nieuwenhuizen	f53ebfb450	radv: Do not use extra descriptor space for the 3rd plane. While ImageFormatProperties returns the number of internal descriptors, it turns out that applications do not need to actually allocate more descriptors in the descriptor pool. So if we make descriptors with more planes larger we have to be convervative and always allocate space for the larger descriptors which is a waste given the low usage of this ext. So let us make use of the fact that 3plane formats all have the same formats & dimensions for the last two planes. This way we only need the first half of the descriptor of the 3rd plane and can share the second half of the second plane. This allows us to use 16 bytes for the descriptor which nicely fits into the 16 bytes that are unused right next to the sampler. Fixes: `5564c38212` "radv: Update descriptor sets for multiple planes." Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-12 23:02:44 +00:00
Samuel Pitoiset	4f18c43d1d	radv: apply the indexing workaround for atomic buffer operations on GFX9 Because the new raw/struct intrinsics are buggy with LLVM 8 (they weren't marked as source of divergence), we fallback to the old instrinsics for atomic buffer operations only. This means we need to apply the indexing workaround for GFX9. The load/store operations still use the new LLVM 8 intrinsics. The fact that we need another workaround is painful but we should be able to clean up that a bit once LLVM 7 support will be dropped. This fixes a GPU hang with AC Odyssey and some rendering problems with Nioh. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110573 Fixes: `31164cf5f7` ("ac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-03 17:59:12 +02:00
Samuel Pitoiset	62001f3dff	radv: only need to force emit the TCS regs on Vega10 and Raven1 Other GFX9 chips aren't affected. Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-02 22:29:01 +02:00
Samuel Pitoiset	6162543999	radv: do not need to force emit the TCS regs on Vega20 This chip doesn't need the fixup. This fixes a bunch of dEQP-VK.tessellation tests and avoid random GPU hangs. Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-02 09:24:05 +02:00
Bas Nieuwenhuizen	5564c38212	radv: Update descriptor sets for multiple planes. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	8d2654a419	radv: Support VK_EXT_inline_uniform_block. Basically just reserve the memory in the descriptor sets. On the shader side we construct a buffer descriptor, since AFAIU VGPR indexing on 32-bit pointers in LLVM is still broken. This fully supports update after bind and variable descriptor set sizes. However, the limits are somewhat arbitrary and are mostly about finding a reasonable division of a 2 GiB max memory size over the set. v2: - rebased on top of master (Samuel) - remove the loading resources rework (Samuel) - only load UBO descriptors if it's a pointer (Samuel) - use LLVMBuildPtrToInt to avoid IR failures (Samuel) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v2)	2019-04-19 09:21:47 +02:00
Samuel Pitoiset	d5befdbe4a	radv: always load 3 channels for formats that need to be shuffled This fixes a rendering issue with Hellblade and DXVK. Fixes: `a66b186beb` ("radv: use typed buffer loads for vertex input fetches") Reported-by: Philip Rebohle <philip.rebohle@tu-dortmund.de> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-15 11:35:52 +01:00
Samuel Pitoiset	045fae0f73	ac: add ac_build_{struct,raw}_tbuffer_load() helpers The struct version sets IDXEN=1, while the raw version sets IDXEN=0. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-13 14:15:05 +01:00
Samuel Pitoiset	a66b186beb	radv: use typed buffer loads for vertex input fetches This drastically reduces the number of SGPRs because the driver now uses descriptors per vertex binding, instead of per vertex attribute format. 29077 shaders in 15096 tests Totals: SGPRS: 1354285 -> 1282109 (-5.33 %) VGPRS: 909896 -> 908800 (-0.12 %) Spilled SGPRs: 24840 -> 24811 (-0.12 %) Code Size: 49221144 -> 48986628 (-0.48 %) bytes Max Waves: 243930 -> 244229 (0.12 %) Totals from affected shaders: SGPRS: 390648 -> 318472 (-18.48 %) VGPRS: 288432 -> 287336 (-0.38 %) Spilled SGPRs: 94 -> 65 (-30.85 %) Code Size: 11548412 -> 11313896 (-2.03 %) bytes Max Waves: 86460 -> 86759 (0.35 %) This gives a really tiny boost. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-13 13:31:11 +01:00
Timothy Arceri	54522d0506	nir: rename glsl_type_is_struct() -> glsl_type_is_struct_or_ifc() Replace done using: find ./src -type f -exec sed -i -- \ 's/glsl_type_is_struct(/glsl_type_is_struct_or_ifc(/g' {} \; Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 13:10:02 +11:00
Bas Nieuwenhuizen	c0110477b5	radv: Interpolate less aggressively. Seems like dxvk used integer builtins without setting the flat interpolation decoration. I believe in the current spec the app is required to set these, but in the meantime to avoid breaking things in stable releases (and so close to release for 19.0), only expand the interpolation to float16 and struct (which cannot be builtins as our spirv parser lowers the builtin block). Fixes: `f324784104` "radv: Allow interpolation on non-float types." Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-26 18:51:35 +00:00
Bas Nieuwenhuizen	f324784104	radv: Allow interpolation on non-float types. In particular structs containing floats and 16-bit floating point types. Fixes: `62024fa775` "radv: enable VK_KHR_16bit_storage extension / 16bit storage features" Fixes: `da29594636` "spirv: Only split blocks" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109735 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-22 17:06:55 +01:00
Bas Nieuwenhuizen	a1fdd4a4a7	radv: Fix float16 interpolation set up. float16 types can have non-flat interpolation so set up the HW correctly for that. Fixes: `62024fa775` "radv: enable VK_KHR_16bit_storage extension / 16bit storage features" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-22 17:06:55 +01:00
Bas Nieuwenhuizen	1ef2855692	radv: Handle clip+cull distances more generally as compact arrays. Needed for https://gitlab.freedesktop.org/mesa/mesa/merge_requests/248 . That MR keeps the clip and cull arrays split. So we have to handle - compact arrays with location_frac != 0 - VARYING_SLOT_CLIP_DIST1 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-20 22:49:52 +00:00
Bas Nieuwenhuizen	572854e706	radv: Clean up a bunch of compiler warnings. Random unused vars. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-20 03:21:09 +01:00
Rhys Perry	0ca550e01a	radv: ensure export arguments are always float So that the signature is correct and consistent, the inputs to a export intrinsic should always be 32-bit floats. This and the previous commit fixes a large amount crashes from dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_int_* tests Fixes: `b722b29f10` ('radv: add support for 16bit input/output') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-19 11:03:22 +00:00
Rhys Perry	64065aa504	radv: bitcast 16-bit outputs to integers 16-bit outputs are stored as 16-bit floats in the outputs array, so they have to be bitcast. Fixes: `b722b29f10` ('radv: add support for 16bit input/output') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-19 11:03:18 +00:00
Samuel Pitoiset	52bdb043af	radv: fix invalid element type when filling vertex input default values The elements added into a vector should have the same type as the first one, otherwise this hits an assertion in LLVM. Fixes: `4b3549c084` ("radv: reduce the number of loaded channels for vertex input fetches") reported-by: Philip Rebohle <philip.rebohle@tu-dortmund.de> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-16 15:33:18 +01:00
Bas Nieuwenhuizen	4b03a19a0b	radv: Use correct num formats to detect whether we should be use 1.0 or 1. normalized and scaled formats also return floats. Fixes: `4b3549c084` ("radv: reduce the number of loaded channels for vertex input fetches") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-15 20:24:16 +00:00
Samuel Pitoiset	227df98fa6	radv: fix radv_fixup_vertex_input_fetches() We should check that num_channels is 4, otherwise that breaks the world. Sorry for the short breakage. Fixes: `4b3549c084` ("radv: reduce the number of loaded channels for vertex input fetches") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-14 09:44:35 +01:00
Samuel Pitoiset	4b3549c084	radv: reduce the number of loaded channels for vertex input fetches It's unnecessary to load more channels than the vertex attribute format. The remaining channels are filled with 0 for y and z, and 1 for w. 29077 shaders in 15096 tests Totals: SGPRS: 1321605 -> 1318869 (-0.21 %) VGPRS: 935236 -> 932252 (-0.32 %) Spilled SGPRs: 24860 -> 24776 (-0.34 %) Code Size: 49832348 -> 49819464 (-0.03 %) bytes Max Waves: 242101 -> 242611 (0.21 %) Totals from affected shaders: SGPRS: 93675 -> 90939 (-2.92 %) VGPRS: 58016 -> 55032 (-5.14 %) Spilled SGPRs: 172 -> 88 (-48.84 %) Code Size: 2862740 -> 2849856 (-0.45 %) bytes Max Waves: 15474 -> 15984 (3.30 %) This mostly helps Croteam games (Talos/Sam2017). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-14 09:10:56 +01:00
Samuel Pitoiset	bd1186572f	radv: add support for push constants inlining when possible This removes some scalar loads from shaders, but it increases the number of SET_SH_REG packets. This is currently basic but it could be improved if needed. Inlining dynamic offsets might also help. Original idea from Dave Airlie. 29077 shaders in 15096 tests Totals: SGPRS: 1321325 -> 1357101 (2.71 %) VGPRS: 936000 -> 932576 (-0.37 %) Spilled SGPRs: 24804 -> 24791 (-0.05 %) Code Size: 49827960 -> 49642232 (-0.37 %) bytes Max Waves: 242007 -> 242700 (0.29 %) Totals from affected shaders: SGPRS: 290989 -> 326765 (12.29 %) VGPRS: 244680 -> 241256 (-1.40 %) Spilled SGPRs: 1442 -> 1429 (-0.90 %) Code Size: 8126688 -> 7940960 (-2.29 %) bytes Max Waves: 80952 -> 81645 (0.86 %) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-12 17:25:54 +01:00
Samuel Pitoiset	8364ffe823	radv: keep track of the number of remaining user SGPRs Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-12 17:25:52 +01:00
Samuel Pitoiset	5806d99984	radv: gather more info about push constants This is needed in order to inline some push constants when possible. This also adds a new helper for initializing the pass. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-12 17:25:34 +01:00
Samuel Pitoiset	5e7f800f32	radv: fix build Fixes: `9b9ccee4d6` ("radv: take LDS into account for compute shader occupancy stats") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-01 15:31:55 +01:00
Timothy Arceri	9b9ccee4d6	radv: take LDS into account for compute shader occupancy stats Ported from `d205faeb6c`. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-01 22:25:30 +11:00
Samuel Pitoiset	afeef3cacf	radv: set noalias/dereferenceable LLVM attributes based on param types Instead of using this useless array_params_mask variable. This should set these two attributes to streamout buffers too. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-28 16:30:38 +01:00
Samuel Pitoiset	320b058d32	radv: simplify allocating user SGPRS for descriptor sets Unnecesary to check the current stages if desc_set_used_mask is used. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-28 16:30:36 +01:00
Samuel Pitoiset	d1994ed229	radv: remove radv_userdata_info::indirect field Always false. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-28 16:30:33 +01:00
Timothy Arceri	0907ae35ad	radv/ac: fix some fp16 handling Fixes: `b722b29f10` ("radv: add support for 16bit input/output") Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-28 10:41:48 +11:00
Samuel Pitoiset	378e2d2414	radv: fix computing number of user SGPRs for streamout buffers Streamout buffers are emitted like push constants. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-25 15:36:16 +01:00
Samuel Pitoiset	83cc87ead4	radv: drop unused code related to 16 sample locations The driver only supports up to 8 sample locations. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-18 13:26:24 +01:00
Bas Nieuwenhuizen	76b12fa564	radv: Only use 32 KiB per threadgroup on Stoney. Causes hangs on some machines. What works for dEQP-VK.tessellation.shader_input_output.barrier: - running num_patches = 6 (which limits LDS to 32 KiB) - running num_patches = 8, and artificially cutting LDS size at 32 KiB. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-01-14 19:58:27 +00:00
Rhys Perry	ee8488ea3b	ac/nir,radv,radeonsi/nir: use correct indices for interpolation intrinsics Fixes artifacts in World of Warcraft when Multi-sample Alpha-Test is enabled with DXVK. It also fixes artifacts with Fallout 4's god rays with DXVK. Various piglit interpolateAt*() tests under NIR are also fixed. v2: formatting fix update commit message to include Fallout 4 and the Fixes tag Fixes: `f4e499ec79` ('radv: add initial non-conformant radv vulkan driver') Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106595 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>	2019-01-09 14:57:07 +00:00
Samuel Pitoiset	3fbdcd942f	amd: remove support for LLVM 6.0 User are encouraged to switch to LLVM 7.0 released in September 2018. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-06 14:02:56 +01:00
Bas Nieuwenhuizen	dd0172e865	radv: Use structured intrinsics instead of indexing workaround for GFX9. These force the index to be used in the instruction so we don't need the workaround. Totals: SGPRS: 1321642 -> 1321802 (0.01 %) VGPRS: 943664 -> 943788 (0.01 %) Spilled SGPRs: 28468 -> 28480 (0.04 %) Spilled VGPRs: 88 -> 89 (1.14 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 80 -> 80 (0.00 %) dwords per thread Code Size: 52415292 -> 52338932 (-0.15 %) bytes LDS: 400 -> 400 (0.00 %) blocks Max Waves: 233903 -> 233803 (-0.04 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 238344 -> 238504 (0.07 %) VGPRS: 232732 -> 232856 (0.05 %) Spilled SGPRs: 13125 -> 13137 (0.09 %) Spilled VGPRs: 88 -> 89 (1.14 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 80 -> 80 (0.00 %) dwords per thread Code Size: 15752712 -> 15676352 (-0.48 %) bytes LDS: 139 -> 139 (0.00 %) blocks Max Waves: 31680 -> 31580 (-0.32 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-11-19 23:36:00 +01:00
Dave Airlie	fcf15a007d	radv/xfb: don't increase offset by component mask start. This is incorrect, the offset is into the buffer, and it's legal to write loc 0,0 -> buffer0, offset 0 loc 0,1 -> buffer1, offset 0 This fixes a bunch of piglits running on my zink xfb code on radv. Fixes: `6c21645046` (radv: emit stream outputs for vertex and tessellation stages) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-31 23:48:10 +00:00
Samuel Pitoiset	f8d0337299	radv: add multiple streams support for the GS copy shader Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:09:08 +01:00
Samuel Pitoiset	6c21645046	radv: emit stream outputs for vertex and tessellation stages Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:09:08 +01:00
Samuel Pitoiset	19f1b49236	radv: declare streamout SGPRs Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:09:08 +01:00
Samuel Pitoiset	fe551ec122	radv: allow to emit a vertex to a specified stream This is required for GS multiple streams support. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:09:08 +01:00
Samuel Pitoiset	a59f1b06ef	radv: allow to use up to 4 GSVS ring buffers For all streams. We basically just need to update the base address and compute a stride for every stream. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:09:08 +01:00
Dave Airlie	600d8ecb57	radv: remove unsigned comparison against 0 The value is always >= 0 here. Found by coverity Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-11 10:19:20 +10:00
Marek Olšák	a668c8d6ba	ac: define all address spaces properly	2018-10-06 21:50:09 -04:00
Bas Nieuwenhuizen	0dd8189f15	radv: Only allow 16 user SGPRs for compute on GFX9+. Apparently for compute there are only 16 instead of the 32 for the graphics path. Fixes dEQP-VK.binding_model.descriptorset_random.sets16.noarray.ubolimitlow.sbolimitlow.imglimitlow.noiub.comp.0 CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-09-16 12:50:58 +02:00
Samuel Pitoiset	9de062ef20	radv: fix setting global locations for indirect descriptors Indirect descriptors only need one entry, we don't have to emit a location for every descriptors. Fixes GPU hangs with new CTS: dEQP-VK.binding_model.descriptorset_random.* CC: 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	aa30205929	radv: handle loc->indirect correctly for the first descriptor This was wrong for descriptor #0 when all of them are indirect. This is because indirect_offset was 0 and we emitted a "normal" descriptor pointer for nothing. While we are at it remove radv_userdata_info::indirect_offset which is useless. CC: 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	b9f6521157	radv: bump the maximum number of arguments to 64 Bumping to 64 should be safe enough. Fixes some crashes with new CTS: dEQP-VK.binding_model.descriptorset_random.* CC: 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	c28ea92947	radv: tidy up ac_setup_rings() for the GSVS rings Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	40fb8c7fca	radv: fix setting the number of entries for GSVS on VI+ According to RadeonSI, it's unnecessary to multiply by the stride. That field seems to always be 64. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	a006c24237	radv: always compute the number of components from the output mask That removes two special cases for clip/cull distances. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	9447e91329	radv: emit data contiguously in the GS->VS ring buffer Instead of having holes. The other ring parameters like offset and stride can be updated later. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	fbc064a5b4	radv: make use of the output usage mask in GS copy shader This is just for consistency because LLVM can detect and remove unused loads. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	18464d298b	radv: make use of ac_unpack_param() instead of ac_build_bfe() Same code is generated because LLVM ends up by using bfe, but that seems cleaner to me. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	7355e9326b	radv: remove dead code in scan_shader_output_decl() Never used. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-08-31 17:34:41 +02:00
Samuel Pitoiset	e9acf069b2	radv: remove radv_shader_context::num_output_{clips,culls} Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-08-31 17:34:41 +02:00
Samuel Pitoiset	a6a6441c75	radv: adjust the cull dist mask in scan_shader_output_decl() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-08-31 17:34:41 +02:00
Samuel Pitoiset	ea778e760c	radv: get length of the clip/cull distances array from usage mask Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-08-31 17:34:41 +02:00
Samuel Pitoiset	732679c25e	radv: do not recompute the output usage mask for clipdist twice The shader info pass takes care of this now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-08-31 17:34:41 +02:00
Samuel Pitoiset	6f47df3129	radv: fix passing clip/cull distances from VS to PS CTS doesn't test input clip/cull distances for the fragment shader stage, which explains why this was totally broken. I wrote a simple test locally that works now. This fixes a crash with GTA V and DXVK. Note that we are exporting unused parameters from the vertex shader now, but this can't be optimized easily because we don't keep the fragment shader info... Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107477 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-31 17:34:36 +02:00
Samuel Pitoiset	0608349232	radv: use ac_build_imad() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-22 09:17:40 +02:00
Bas Nieuwenhuizen	011a811652	radv: Revert divisor = 0 case for vertex attribute extension. Seems like DXVK depends on that and it might get reverted upstream. Since apps are not supposed to use 0 in v2 anyway, we should be safe implementing the old behavior there. Fixes: `66e12451ac` "radv: Update to new VK_EXT_vertex_attribute_divisor to version 2." CC: 18.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-16 11:13:19 +02:00
Bas Nieuwenhuizen	66e12451ac	radv: Update to new VK_EXT_vertex_attribute_divisor to version 2. Behavior wrt firstInstance got changed, and a divisor of 0 has been disallowed. The new version of the ext got published in specification 1.1.81. Sending to stable since the only known user is DXVK, which needs this for correctness. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> CC: 18.2 <mesa-stable@lists.freedesktop.org>	2018-08-14 22:13:09 +02:00
Samuel Pitoiset	ff0d553818	radv: fix adjusting vertex fetches since 16bit support Move the integer conversion after the fixup. This fixes some regressions with dEQP-VK.pipeline.vertex_input.single_attribute.mat4.as_a2r10g10b10* Fixes: `b722b29f10` ("radv: add support for 16bit input/output") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-26 08:57:43 +02:00
Daniel Schürmann	b722b29f10	radv: add support for 16bit input/output Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-23 23:16:25 +02:00
Dave Airlie	6f3aee40f9	radv: using tls to store llvm related info and speed up compiles (v10) This uses the common compiler passes abstraction to help radv avoid fixed cost compiler overheads. This uses a linked list per thread stored in thread local storage, with an entry in the list for each target machine. This should remove all the fixed overheads setup costs of creating the pass manager each time. This takes a demo app time to compile the radv meta shaders on nocache and exit from 1.7s to 1s. It also has been reported to take the startup time of uncached shaders on RoTR from 12m24s to 11m35s (Alex) v2: fix llvm6 build, inline emit function, handle multiple targets in one thread v3: rebase and port onto new structure v4: rename some vars (Bas) v5: drag all code into radv for now, we can refactor it out later for radeonsi if we make it shareable v6: use a bit more C++ in the wrapper v7: logic bugs fixed so it actually runs again. v8: rebase on top of radeonsi changes. v9: drop some C++ headers, cleanup list entry v10: use pop_back (didn't have enough caffeine) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-10 07:58:03 +10:00
Samuel Pitoiset	5e5a28d52a	radv: reduce CPU overhead in radv_flush_descriptors() The number of enabled descriptors for a given pipeline stage can be computed at compile time. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-09 13:56:58 +02:00
Marek Olšák	4695984dbc	ac: fold LLVMContext creation into ac_llvm_context_init Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-07-04 15:48:18 -04:00
Dave Airlie	7398913a62	ac/radv: move llvm compiler info to struct and init in one place This ports radv to the shared code, however due to a bug in LLVM version prior to 7, radv cannot add target info at this stage, as it would leak one for every shader compile, however I'd prefer to keep this llvm damage in the shared code, since it isn't the driver at fault here. We just add a flag to denote if the driver can support leaking the target info or not, and the common code does the right thing depending on the llvm version. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-07-04 10:29:16 +10:00
Dave Airlie	e1387eaf12	radv: create/destroy passmgr at the higher level. This is prep work for moving this to a per-thread struct Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-07-04 05:31:05 +10:00
Dave Airlie	97d9b88447	radv: port to use common passmgr code. This adds a inline always pass, but otherwise should work the same. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-07-04 05:30:34 +10:00
Marek Olšák	32e413ca59	ac: move all LLVM module initialization into ac_create_module This removes some ugly code around module initialization. Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-07-02 14:34:39 -04:00
Samuel Pitoiset	70c1bee187	radv: do not use an user SGPR for the sample position offset We know the number of samples at compile time. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-06-20 13:21:42 +02:00
Samuel Pitoiset	20170865db	radv: don't store the number of samples as log2 Needed for the following patch. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-06-20 13:21:42 +02:00
Samuel Pitoiset	5917761e3d	radv: fix emitting the TCS regs on GFX9 The primitive ID is NULL and this generates an invalid select instruction which crashes because one operand is NULL. This fixes crashes in The Long Journey Home, Quantum Break and Just Cause 3 with DXVK. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106756 CC: <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-06-16 10:18:51 +02:00
Samuel Pitoiset	bfca15e16a	radv: add RADV_DEBUG=checkir This allows to run the LLVM verifier pass. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-06-15 15:54:08 +02:00
Samuel Pitoiset	45eb24fedf	radv: run the EarlyCSEMemSSA LLVM pass It's recommended by the instruction combining pass, and RadeonSI also runs it. This pass used to segfault with one shader of F12017 in the past, but it no longer crashes. Maybe the LLVM IR generated by RADV has changed. Polaris10: Totals from affected shaders: SGPRS: 441352 -> 441648 (0.07 %) VGPRS: 310888 -> 300784 (-3.25 %) Spilled SGPRs: 13576 -> 12983 (-4.37 %) Code Size: 22560328 -> 22420544 (-0.62 %) bytes Max Waves: 40755 -> 41366 (1.50 %) Vega10: Totals from affected shaders: SGPRS: 442848 -> 442000 (-0.19 %) VGPRS: 310396 -> 300460 (-3.20 %) Spilled SGPRs: 13708 -> 12906 (-5.85 %) Code Size: 22479428 -> 22336216 (-0.64 %) bytes Max Waves: 45783 -> 46506 (1.58 %) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-25 14:24:14 +02:00
Samuel Pitoiset	75e919c045	radv: fix computation of user sgprs for 32-bit pointers With 32-bit pointers we only need one user SGPR per desc set. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-22 15:53:29 +02:00
Samuel Pitoiset	c5536fc813	radv: drop user_sgpr_info::sgpr_count It's only used inside allocate_user_sgprs(). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-22 15:53:26 +02:00
Samuel Pitoiset	36a4d6d081	radv: add support for 32-bit pointers in user data SGPRs We still use 64-bit GPU pointers for all ring buffers because llvm.amdgcn.implicit.buffer.ptr doesn't seem to support 32-bit GPU pointers for now. This can be improved later anyways. Vega10: Totals from affected shaders: SGPRS: 1008722 -> 1026710 (1.78 %) VGPRS: 706580 -> 707136 (0.08 %) Spilled SGPRs: 22555 -> 22209 (-1.53 %) Spilled VGPRs: 75 -> 75 (0.00 %) Code Size: 34819208 -> 35202140 (1.10 %) bytes Max Waves: 175423 -> 175086 (-0.19 %) Polaris10: Totals from affected shaders: SGPRS: 1029849 -> 1036517 (0.65 %) VGPRS: 709984 -> 708872 (-0.16 %) Spilled SGPRs: 22672 -> 22309 (-1.60 %) Spilled VGPRs: 82 -> 66 (-19.51 %) Scratch size: 76 -> 60 (-21.05 %) dwords per thread Code Size: 34915336 -> 35309752 (1.13 %) bytes Max Waves: 151221 -> 151677 (0.30 %) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-22 15:53:22 +02:00
Samuel Pitoiset	b654ef5808	radv: add set_loc_shader_ptr() helper This helper will hep for switching to 32-bit GPU pointers. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-22 15:53:20 +02:00
Samuel Pitoiset	d8a61d3232	radv: set amdgpu-32bit-address-high-bits LLVM attribute Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-22 15:53:15 +02:00
Samuel Pitoiset	73df16dcee	radv: fix centroid interpolation It's legal to set the centroid and sample interpolation modes when MSAA disabled. So, we have to initialize the centroid inputs because the hardware doesn't. This fixes rendering issues with DXVK and The Witness, World of Warcraft, Trackmania and probably more games. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106315 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102390 CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-21 13:57:46 +02:00
Samuel Pitoiset	03c4816093	radv: pass radv_nir_compiler_options directly to create_llvm_function() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-05-18 11:07:01 +02:00
Marek Olšák	f9eb1ef870	amd: remove support for LLVM 4.0 It doesn't support GFX9. Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-05-17 14:54:41 -04:00
Samuel Pitoiset	1fba2e10b3	radv: only declare the ESGS rings for pre GFX9 chips GFX9 uses LDS instead. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-17 14:14:20 +02:00
Samuel Pitoiset	56d53ed1d6	radv: do not emit unnecessary ES output stores GFX9: Totals from affected shaders: SGPRS: 472 -> 464 (-1.69 %) VGPRS: 576 -> 584 (1.39 %) Code Size: 45432 -> 44324 (-2.44 %) bytes Max Waves: 40 -> 40 (0.00 %) VI: SGPRS: 720 -> 720 (0.00 %) VGPRS: 728 -> 728 (0.00 %) Code Size: 45348 -> 43992 (-2.99 %) bytes Max Waves: 120 -> 120 (0.00 %) This affects Rise of Tomb Raider and the three Vulkan demos that use a geometry shader (geometryshader, deferredshadows and viewportarray). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-17 14:14:13 +02:00
Samuel Pitoiset	a6e44d1271	radv: do not emit unnecessary GS output stores Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-17 14:14:11 +02:00
Samuel Pitoiset	97b179570c	radv: reduce the number of parameters export by the GS copy shader By using the geometry shader output usage mask. This improves all Vulkan demos that use a geometry shader (ie. geometryshader, deferredshadows, viewportarray). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-14 21:38:23 +02:00
Samuel Pitoiset	ea43d935ab	radv: run the shader info pass before emitting the GS copy shader For further optimizations. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-05-14 21:38:19 +02:00
Bas Nieuwenhuizen	3d4d388e39	radv: Fix up 2_10_10_10 alpha sign. Pre-Vega HW always interprets the alpha for this format as unsigned, so we have to implement a fixup to do the sign correctly for signed formats. v2: Improve indexing mess. CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106480 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-05-14 18:58:20 +02:00
Samuel Pitoiset	efc10949cc	radv: move ac_build_if_state on top of radv_nir_to_llvm.c These helpers will be needed for future work. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-05-11 12:35:07 +02:00
Grazvydas Ignotas	4fdce205dd	radv: assorted typo fixes Trivial. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-05-10 11:50:46 +03:00
Bas Nieuwenhuizen	6ff98dbf7c	radv: Implement VK_EXT_vertex_attribute_divisor. Pretty straight forward, just pass the divisors through the shader key and then do a LLVM divide. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-04-12 22:57:23 +02:00
Mike Lothian	7e144ace95	ac/nir: Fix include for LLVMAddPromoteMemoryToRegisterPass Include llvm-c/Transforms/Utils.h with the newest LLVM 7 Signed-of-by: Mike Lothian <mike@fireburn.co.uk> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-04-02 14:27:29 -04:00
Bas Nieuwenhuizen	4503ff760c	ac/nir: Add workaround for GFX9 buffer views. On GFX9 whether the buffer size is interpreted as elements or bytes depends on whether IDXEN is enabled in the instruction. If the index is a constant zero, LLVM optimizes IDXEN to 0. Now the size in elements is interpreted in bytes which of course results in out of bounds accesses. The correct fix is most likely to disable the LLVM optimization, but we need something to work with LLVM <= 6.0. radeonsi does the max between stride and element count on the CPU but that results in the size intrinsics returning the wrong size for the buffer. This would cause CTS errors for radv. v2: Also include the store changes. Fixes: `e38685cc62` 'Revert "radv: disable support for VEGA for now."' Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-03-29 00:03:03 +02:00
Timothy Arceri	92fa89a08d	ac/radeonsi: pass bindless bool to load_sampler_desc() We also fix the base_index for bindless by using the driver location. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-03-28 12:56:16 +11:00
Samuel Pitoiset	4e9b0b39b5	radv: only enable one channel when exporting prim id It's a 32-bit integer like the layer. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-20 21:54:48 +01:00
Dave Airlie	32791a0502	radv: don't export NULL layer. We have some cases where in subpass we want the layer but having it be 0 and loaded in the frag shader without the vertex shader exporting it is fine. So don't export the layer if we don't have a value to put in it. Fixes: `d4c74aed7a` (radv/multiview: mark layer_input if we have input attachments.) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-03-19 21:36:48 +00:00
Dave Airlie	8f052a3e25	radv: handle exporting view index to fragment shader. (v1.1) The fragment shader was trying to read this, but nothing was exporting it from the vertex shader. This handles it like the prim id export. Fixes: dEQP-VK.multiview.secondary_cmd_buffer.* dEQP-VK.multiview.index.fragment_shader.* v1.1: updated to use 0x1 (Samuel) Fixes: `e3265c10c8` (radv: Implement multiview draws.) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-19 01:20:00 +00:00
Dave Airlie	9d0d806332	radv: drop geometry stride user sgpr. This removes the other geometry specific user sgpr. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-16 05:23:21 +00:00
Dave Airlie	6f051549c3	radv: get rid of geometry user sgpr for num entries. This drops one of the geometry specific user sgprs, we can work this out at compile time. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-16 05:23:17 +00:00
Dave Airlie	9188bd78d7	radv: migrate lds size calculations to shader gen. This moves the lds_size calcs into the shader so we have all the size stuff in one file. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-16 05:23:12 +00:00
Dave Airlie	384aced65e	radv: drop scanning the tess shader in the nir code. This drops the now unneeded scanning and results in favour of the ones in the info. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-16 05:23:08 +00:00
Dave Airlie	bf9a0ea853	radv/tess: remove last chunk of tess sgprs This removes the last TES-specifc user sgpr. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-16 05:23:01 +00:00
Dave Airlie	6db44d6a8c	radv: pass num_patches to tes from tcs TES needs num_patches to do some of the calculations. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-16 05:22:58 +00:00
Dave Airlie	010d055aae	radv: drop tess offchip layout for tcs. This removes the last TCS specific user sgpr. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-16 05:22:54 +00:00
Dave Airlie	ee31cff856	radv: drop tcs_out_offsets Move all calculations to shader generation. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-16 05:22:47 +00:00
Dave Airlie	b0460bbf1c	radv: drop tcs_out_layout Move all calculations to shader generation. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-16 05:22:43 +00:00
Dave Airlie	6adf99165c	radv/tess: drop tcs_in_layout setting completely. Inline all calcs at shader creation. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-16 05:22:37 +00:00
Dave Airlie	f343d11ae7	radv: drop ls_out_layout const. We can precalculate input_vertex_size at compile time. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-16 05:22:32 +00:00
Dave Airlie	2012dae19a	radv: migrate unique index info shader info (v2) This just moves this function to an inline so the shader_info pass can use it. v2: use inline (Samuel) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-16 05:22:19 +00:00
Samuel Pitoiset	81818662a5	radv: record LLVM IR when debugging shaders If AMD_shader_info or RADV_TRACE_FILE is used we might need to keep trace of LLVM IR. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-15 17:20:03 +01:00
Samuel Pitoiset	d07edf5fdf	radv: add dump_shader to the NIR compiler options Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-15 17:20:00 +01:00
Samuel Pitoiset	50fcca328c	radv: pass the NIR compiler options to ac_compile_llvm_module() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-15 17:19:58 +01:00
Dave Airlie	27a5e5366e	radv: mark all tess output for an indirect access. If a shader does a tcs store with an indirect access, we were only marking the first spot as used. For indirect access we always now mark all slots used by the variable. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 Fixes: `94f9591995` (radv/ac: add support for TCS/TES inputs/outputs.) Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-14 11:18:54 +10:00
Dave Airlie	4f0c89d66c	ac/nir: pass the nir variable through tcs loading. I was going to have to add another parameter to this monster, so we should just pass the nir_variable in, I can't find any reason this would be a bad idea. This needed for the next fix. Fixes: `94f9591995` (radv/ac: add support for TCS/TES inputs/outputs.) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-14 11:18:54 +10:00
Dave Airlie	f9de2d409b	radv: get correct offset into LDS for indexed vars. This seems more correct to me, since if we have an array of floats they'll be vec4 aligned, and if we do af[2], we want the const index to increase by 2 slots in the non compact case. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 Fixes: `94f9591995` (radv/ac: add support for TCS/TES inputs/outputs.) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-03-14 11:18:54 +10:00
Samuel Pitoiset	7c83430672	ac/nir: rename radeon_llvm_reg_index_soa() to ac_llvm_reg_index_soa() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-13 16:54:28 +01:00
Samuel Pitoiset	fbe694562b	ac/nir: move ac_nir_compiler_options and friends to radv folder Also replace ac_ by radv_. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-13 16:54:23 +01:00
Samuel Pitoiset	237229430f	ac: move ac_shader_info to radv folder This is RADV specific code. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-13 16:54:21 +01:00
Samuel Pitoiset	2cfba40eea	ac/nir: move ac_shader_variant_info and friends to radv folder Also replace ac_ by radv_. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-13 16:54:16 +01:00
Samuel Pitoiset	b2653007b9	ac/nir: move all RADV related code to radv_nir_to_llvm.c Now the "ac/nir" prefix will really be the shared code between RadeonSI and RADV, that might avoid confusions in the future. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-03-13 14:05:06 +01:00

... 3 4 5 6 7 ...

404 Commits