KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Arcady Goldmints-Orlov	8762f29e9c	broadcom/compiler: Add a v3d_compile argument to vir_set_[pu]f Reviewed-by: Iago Toral Quioroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>	2021-02-12 07:05:33 +00:00
Iago Toral Quiroga	6630825dcf	broadcom/compiler: let QPUs stall on TMU input/config overflows We have been trying to avoid this by tracking fifo usages in the driver and flushing all outstanding TMU sequences if we overflowed any of these, however, this is actually not the most efficient strategy. Instead, we would like to flush only enough operations to get things going again, which is better for pipelining. Doing that in the driver would require some additional work, but thankfully, it is not required, since this seems to be what the hardware does automatically, so we can just remove overflow tracking for these two fifos and enjoy the benefits. This also further improves shader-db stats: total instructions in shared programs: 8975062 -> 8955145 (-0.22%) instructions in affected programs: 1637624 -> 1617707 (-1.22%) helped: 4050 HURT: 2241 Instructions are helped. total threads in shared programs: 236802 -> 237042 (0.10%) threads in affected programs: 252 -> 492 (95.24%) helped: 122 HURT: 2 Threads are helped. total sfu-stalls in shared programs: 19901 -> 19592 (-1.55%) sfu-stalls in affected programs: 4744 -> 4435 (-6.51%) helped: 1248 HURT: 1051 Sfu-stalls are helped. total inst-and-stalls in shared programs: 8994963 -> 8974737 (-0.22%) inst-and-stalls in affected programs: 1636184 -> 1615958 (-1.24%) helped: 4050 HURT: 2239 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	ecd654bf00	broadcom/compiler: support pipelining of image load/store instructions Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	0bdc6dca6c	broadcom/compiler: refactor image load/store TMU emission code This mostly moves code around to group together the code involved with actually emitting a TMU sequence. This will make it a bit easier to then implement pipelining while reusing this code, similar to how we handled other cases of TMU pipelining. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	be45960d3e	broadcom/compiler: support pipelining of tex instructions This follows the same idea as for TMU general instructions of reusing the existing infrastructure to first count required register writes and flush outstanding TMU dependencies, and then emit the actual writes, which requires that we split the code that decides about register writes to a helper. We also need to start using a component mask instead of the number of components that we need to read with a particular TMU operation. v2: update tmu_writes for V3D_QPU_WADDR_TMUOFF Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	197090a3fc	broadcom/compiler: implement pipelining for general TMU operations This creates the basic infrastructure to implement TMU pipelining and applies it to general TMU. Follow-up patches will expand this to texture and image/load store operations. TMU pipelining means that we don't immediately end TMU sequences, and instead, we postpone the thread switch and LDTMU (for loads) or TMUWT (for stores) until we really need to do them. For loads, we may need to flush them if another instruction reads the result of a load operation. We can detect this because in that case ntq_get_src() will not find the definition for that ssa/reg (since we have not emitted the LDTMU instructions for it yet), so when that happens, we flush all pending TMU operations and then try again to find the definition for the source. We also need to flush pending TMU operations when we reach the end of a control flow block, to prevent the case where we emit a TMU operation in a block, but then we read the result in another block possibly under control flow. It is also required to flush across barriers and discards to honor their semantics. Since this change doesn't implement pipelining for texture and image load/store, we also need to flush outstanding TMU operations if we ever have to emit one of these. This will be corrected with follow-up patches. Finally, the TMU has 3 fifos where it can queue TMU operations. These fifos have limited capacity, depending on the number of threads used to compile the shader, so we also need to ensure that we don't have too many outstanding TMU requests and flush pending TMU operations if a new TMU operation would overflow any of these fifos. While overflowing the Input and Config fifos only leads to stalls (which we want to avoid anyway), overflowing the Output fifo is incorrect and would end up with a broken shader. This means that we need to know how many TMU register writes are required to emit a TMU operation and use that information to decide if we need to flush pending TMU operations before we emit any register writes for the new TMU operation. v2: fix TMU flushing for NIR registers reads (jasuarez) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Alejandro Piñeiro	429c336412	broadcom/compiler: separate texture/sampler info from v3d_key So far the v3d compiler has them combined, as for OpenGL both are the same. This change is intended to fit the v3d compiler better with Vulkan, where they are separate concepts. Note that NIR has them separate for a long time, both on nir_variable and on some NIR lowerings. v2: (from Iago feedback) * Use key->num_tex/sampler_used to iterate through the array * Fill up num_samplers_used on v3d, assert that is the same that num_tex_used if possible. v3: (Iago) * Assert num_tex/samplers_used is smaller that tex/sampler array size. v4: Update assert mentioned on v3 to use <= instead of < (detected by CI) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> squash! broadcom/compiler: separate texture/sampler info from v3d_key Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7545>	2020-11-14 15:59:02 +00:00
Arcady Goldmints-Orlov	0b30336906	broadcom/compiler: Handle non-SSA destinations for tex instructions The NIR that is given to the VIR compiler is not in SSA form, and so the v3d*_vir_emit_tex() functions must be able to handle both SSA and register destinations. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7318>	2020-11-05 09:03:46 +00:00
Alejandro Piñeiro	b9dd7e30a6	v3d/tex: avoid to ask back for a sampler state if not needed So far we were not asking the driver for the sampler state if we could just use the default P1 values. But even if we need to fill P1 (for example to fill up the output type of the format), if the texture operation doesn't need a sampler, we can let that field as NULL (so default values) and avoid calling back the driver for a sampler. This is not mandatory for OpenGL (as we always have a sampler object), although still a good to have. For Vulkan this is needed, as we don't have a sampler object in that case. v2: reword comment (Eric) Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:31 +00:00
Alejandro Piñeiro	f8946bd705	v3d/tex: handle correctly coordinates for cube/cubearrays images When fetching for cube maps, we need to interpret them as 2d texture arrays, being the third coordinate the index for the face. Fixes Vulkan CTS tests like the following using v3dv: dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_image.fragment.single_descriptor.cube_base_mip dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_image.compute.multiple_descriptor_sets.multiple_contiguous_descriptors.cube_array_base_mip Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5675>	2020-07-03 08:14:57 +00:00
Alejandro Piñeiro	f7fcbe9830	v3d/tex: use TMUSLOD register if possible TMUSLOD register is the same that TMUS but having the same effect that setting disable_autolod on the TMU configuration parameter 2. So using that register is potentially more efficient, as in several cases we would be able to skip writing P2. One case where we can't use it is for texture cube maps, as we need to use TMUSCM. v2: don't put a comment in the middle of the conditions (Iago) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4962>	2020-05-11 23:52:46 +00:00
Alejandro Piñeiro	c3af695bb0	v3d/tex: set up default values for Configuration Parameter 1 if possible Texture access has three configuration parameters, P0 (texture), P1 (sampler) and P2(lookup). P1 and P2 are optional, but if P2 is needed (like for example to set the offset for texelFetchOffset), then you need to set P1. But until now when setting up P1 we were asking the driver to fill up the address with the shader state. But in that case we can just fill that address with the default value NULL. So let's avoid asking the driver to fill that default values, and do it directly on the compiler. This is a good-to-have on OpenGL, and likely would be needed on Vulkan. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4962>	2020-05-11 23:52:46 +00:00
Alejandro Piñeiro	50c2c76ea3	v3d/tex: only look up the 2nd texture gather offset for 1d non-arrays Commit `1bc71e8b65` already did that for the 3rd offset, but it also needs to do it for the 2nd (to handle 1d array). Fixes assertion failures with Vulkan CTS tests using 1darray targets. Seems that there isn't too many 1darray tests on OpenGL CTS, and OpenGL-ES don't support 1d arrays, but the same problem could arise eventually on OpenGL. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4962>	2020-05-11 23:52:46 +00:00
Alejandro Piñeiro	ad460c5dd6	v3d: support for textureQueryLOD Fixes all the ARB_texture_query_lod piglit tests, and needed to get the Vulkan CTS textureQueryLOD passing with the ongoing Vulkan driver. Note that LOD Query bit flag became only available on V42 of the hw, but the v3d40_tex is using V41 as reference. In order to avoid setting up the infrastructure to support both v41 and v42, we manually set the bit if the device version is the correct one. We also fix how the ARB_texture_query_lod (so EXT_texture_query_lod) is exposed. Before this commit it was always exposed (wrongly as it was not really supported). Now it is exposed for devinfo.ver >= 42. v2: move _need_sampler helper to nir.h (Eric Anholt) Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4677>	2020-04-22 23:43:23 +02:00
Alejandro Piñeiro	9967c26ae6	v3d/tex: Configuration Parameter 1 can be only skipped if P2 can be skipped too Configuration Parameter packets 1 and 2 are pointed as optional, but it is not clearly stated if you can skip only P1 when P2 is needed. In the practice, it seems that the situation P0 - non-P1 - P2 can causes problems, and at least on the simulator, it seems that sampler info are attempted to be accessed. So let's just be conservative, and only skip P1 configuration if we can skip P2 configuration too. Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4677>	2020-04-22 23:39:34 +02:00
Alejandro Piñeiro	d0b644d9f9	v3d/tex: don't configure tmu config 1 if not needed TMU configuration parameter 1 configures the sampler for the texture operation. But there are some texture operations that doesn't need a sampler. Skipping the configuration could provide a small perf improvement on OpenGL. On the incoming Vulkan driver, would allow us to avoid to set up an unneeded sampler. Note that we still need to add the sampler configuration parameter if the output is a 32bit, as it is on the sampler where we configure that info. Also, note that for images this is done comparing against a unpacked p1 default. But in order to do that it is needed to go through the code that fills up the unpacked p1. We can skip that too. Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4677>	2020-04-22 23:38:18 +02:00
Eric Anholt	12cf484d02	v3d: Ask the state tracker to lower image accesses off of derefs. This saves a bunch of hassle in handling derefs in the backend, and would be needed for reasonable handling of dynamic indexing of image arrays. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>	2020-02-24 18:25:02 +00:00
Jose Maria Casanova Crespo	d983055184	v3d: Fix predication with atomic image operations Fixes dEQP test: dEQP-GLES31.functional.synchronization.inter_call.with_memory_barrier.image_atomic_multiple_interleaved_write_read Fixes piglit test: spec/glsl-es-3.10/execution/cs-image-atomic-if-else.shader_test Fixes: `6281f26f06` ("v3d: Add support for shader_image_load_store.") Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-11-20 11:20:55 +01:00
Iago Toral Quiroga	46182fc1da	v3d: add new flag dirty TMU cache at v3d_compiler That we set for any TMU write on spills and general tmu. It is then used as part of v3d_emit_gl_shader_state later. v2: add a new flag instead at v3d_compiler instead of dirty the flag at v3dx if there is any spill (change suggested by Eric, added by Alejandro) v3: set this for anything that is not a load and do it also in v3d40_vir_emit_image_load_store (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-10-18 14:08:52 +02:00
Jason Ekstrand	c9a4793de8	v3d: Use the correct opcodes for signed image min/max Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-21 17:19:55 +00:00
Jason Ekstrand	951cf94521	nir: Add explicit signs to image min/max intrinsics This better matches all the other atomic intrinsics such as those for SSBOs and shared variables where the sign is part of the intrinsic opcode. Both generators (GLSL and SPIR-V) know the sign from the type of the image variable or handle. In SPIR-V, signed min/max are separate opcodes from unsigned. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-21 17:19:55 +00:00
Alejandro Piñeiro	85b78f96a6	v3d: use inc/dec tmu operation with image atomic sub/add of 1 This allows to remove a mov of 1/-1, as it is implicit with the operation. As with atomic inc/dec/add, usual shader-db set doesn't include any GLES shader using it. So using as workaround vk-gl-cts shaders, we get this: total instructions in shared programs: 1217013 -> 1217006 (<.01%) instructions in affected programs: 53 -> 46 (-13.21%) helped: 2 HURT: 0 One of the helped shader went from 40 to 34 instructions. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 11:51:22 +02:00
Alejandro Piñeiro	2e22879115	v3d: refactor some code from v3d40_vir_emit_image_load_store And moved to new auxiliar method v3d40_image_load_store_tmu_op, equivalent to the nir_to_nir v3d_general_tmu_op, to clean-up a little. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 11:49:29 +02:00
Eric Anholt	24587ae8ae	v3d: Assert that we do request the normal texturing return data. An unused tex should be DCEed, but if it wasn't we'd run into trouble with not doing a TMUWT.	2019-04-26 12:42:30 -07:00
Eric Anholt	1bc71e8b65	v3d: Only look up the 3rd texture gather offset for non-arrays. Fixes assertion failures in the CTS since Karol's cleanup when NIR started noticing that we were reading an invalid component. Fixes: `5450f1c9fb` ("v3d: prefer using nir_src_comp_as_int over nir_src_as_const_value")	2019-04-16 12:07:59 -07:00
Karol Herbst	5450f1c9fb	v3d: prefer using nir_src_comp_as_int over nir_src_as_const_value Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-07 15:13:36 +02:00
Eric Anholt	4739181a16	v3d: Switch implicit uniforms over to being any qinst->uniform != ~0. I'm not sure why I didn't do this before -- it's clearly much simpler to add dumping of the extra thing than to have it as another implicit source.	2019-03-05 12:57:39 -08:00
Eric Anholt	f7769b5121	v3d: Fix the autotools build. Noticed while looking at the gitlab-CI MR.	2019-01-29 14:00:27 -08:00
Eric Anholt	6281f26f06	v3d: Add support for shader_image_load_store. This is only exposed on V3D 4.1+, because we didn't have the TMU write operations for images on 3.3 (To do GLES 3.1 there, you have to lower it to SSBO load/stores, which is a problem to solve later).	2019-01-14 15:40:55 -08:00
Eric Anholt	8847370424	v3d: Use the core tex lowering. Even without any clever optimization on the unpack operations, this gives us a useful value for the channels read field, which we can use to avoid ldtmu instructions to the no-op register. instructions in affected programs: 890712 -> 881974 (-0.98%)	2019-01-04 15:59:59 -08:00
Eric Anholt	906fca1b4b	v3d: Add support for non-constant texture offsets. Fixes dEQP-GLES31.functional.texture.gather.offset_dynamic.min_required_offset.2d.rgba8.size_pot.clamp_to_edge_repeat and others.	2018-12-30 08:05:11 -08:00
Eric Anholt	47caefc7b4	v3d: Force sampling from base level for tg4. This is what the GLSL ES 310 spec tells us to do, but apparently the "gather mode" flag doesn't imply it in the HW. Fixes dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.filter_mode.min_nearest_mipmap_linear_mag_linear	2018-12-30 08:05:11 -08:00
Eric Anholt	29927e7524	v3d: Drop in a bunch of notes about performance improvement opportunities. These have all been floating in my head, and while I've thought about encoding them in issues on gitlab once they're enabled, they also make sense to just have in the area of the code you'll need to work in.	2018-12-14 17:48:01 -08:00
Eric Anholt	f2ea936f48	v3d: Skip emitting texture config parameter 2 if it's just the defaults. shader-db: total instructions in shared programs: 91275 -> 90768 (-0.56%) instructions in affected programs: 20702 -> 20195 (-2.45%)	2018-07-23 10:21:43 -07:00
Eric Anholt	26f830d9fc	v3d: Add an assert that we don't provide an invalid texture return words. The docs had an update noting this restriction, so reflect it in the code.	2018-07-16 14:39:59 -07:00
Eric Anholt	778594ae12	v3d: Limit shader threading according to our maximum TMU fifo usage. Fixes simulator assertion failures in dEQP-GLES3.functional.shaders.texture_functions.texture.samplercubeshadow_bias_fragment and similar complicated cases.	2018-06-15 16:09:39 -07:00
Eric Anholt	5aaea3c4a0	broadcom/vc5: Add compiler support for V3D 4.x texturing.	2018-01-12 21:56:57 -08:00

37 Commits