KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Eric Anholt	1f44053301	freedreno+turnip: Upload large shader constants as a UBO. Right now if the shader indirects on some large constant array, we see NIR load_consts (usually from the const file) of its contents into general registers, then indirection on the GPRs. This often results in register allocation failures, as it's easy to go beyond the ~256 dwords of registers per invocation. By moving the large constants to a UBO, we can load an arbitrary number of them. They also can be theoretically moved to the constant reg file (~2k dwords), though you're unlikely to hit this path without an indirect load on your large constant, and we don't yet let UBO indirect loads get moved to constant regs. This possibly won't work out right if we have 16-bit load_constants, but without other MRs in flight we won't see 16-bit temps to be lowered to this. This allows 2 kerbal-space-program shaders to compile that previously would fail, and fixes the new dEQP-VK and -GLES2 tests I wrote that dynamically index a 40-element temporary array of float/vec2/vec3/vec4 with constant element initializers. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2789 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>	2020-11-16 13:55:41 -08:00
Rob Clark	cf9ef90066	freedreno/ir3: Add pass to deal with load_uniform base offsets With indirect load_uniform, we can only encode 10b of constant base offset. This pass detects problematic cases and peels out the high bits of the base offset. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7612>	2020-11-13 22:44:04 +00:00
Connor Abbott	9e063b01b7	ir3: Switch tess lowering to use location Clip & cull distances, which are compact arrays, exposed a lot of holes because they can take up multiple slots and partially overlap. I wanted to eliminate our dependence on knowing the layout of the variables, as this can get complicated with things like partially overlapping arrays, which can happen with ARB_enhanced_layouts or with clip/cull distance arrays. This means no longer changing the layout based on whether the i/o is part of an array or not, and no longer matching producer <-> consumer based on the variables. At the end of the day we have to match things based on the user-specified location, so for simplicity this switches the entire i/o handling to be based off the user location rather than the driver location. This means that the primitive map may be a little bigger, but it reduces the complexity because we never have to build a table mapping user location to driver location, and it reduces the amount of work done at link time in the SSO case. It also brings us closer to what the other drivers do. While here, I also fixed the handling of component qualifiers, which was another thing broken with clip/cull distances. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6959>	2020-10-23 11:09:18 +00:00
Connor Abbott	6982e8510b	ir3, tu: Run optimization loop twice This call to ir3_optimize_nir() mirrors what st/mesa does for us in Gallium, and will be necessary for cross-stage linking and the multiview lowering. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6515>	2020-09-29 16:16:05 +00:00
Jonathan Marek	f472c98443	freedreno/ir3: add support for a650 tess shared storage A650 uses LDL/STL, and the "local_primitive_id" in tess ctrl shader comes from bits 16-21 in the header instead of 0-5. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5764>	2020-07-08 02:30:23 +00:00
Rob Clark	d3ae559378	freedreno/ir3: add ir3_finalize_nir() The next step is to hook this into pscreen->finalize_nir() so it can come before the state tracker's shader-caching. Unfortunately we still need to do lower_io after mesa/st, so that is split out into a post-finalize pass. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5372>	2020-06-26 08:43:22 -07:00
Rob Clark	82815bc980	freedreno/ir3: split ubo analysis/lowering passes Since binning pass variants share the same const_state with their draw-pass counterpart, we should re-use the draw-pass variant's ubo range analysis. So split the two functions of the existing pass into two parts. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5526>	2020-06-21 00:52:02 +00:00
Rob Clark	8f11cc4cad	freedreno/ir3: move output_loc to variant This moves the last bit of important state to be serialized from ir3_shader to ir3_shader_variant. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>	2020-06-19 13:16:57 +00:00
Rob Clark	640ff0e847	freedreno/ir3: move const_state back to variant For shader-cache, we want to not have anything important in `ir3_shader`. And to have shader variants with lower const size limits (to properly handle cross-stage limits), we also want variants to be able to have their own const_state. But we still need binning pass shaders to align with their draw pass counterpart so that the same const emit can be used for both passes. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>	2020-06-19 13:16:57 +00:00
Connor Abbott	65660622a1	ir3: Split out variant-specific lowering and optimizations It seems a lot of the lowerings being run the second time were unnecessary. In addition, when const_state is moved to the variant, then it will become impossible to know ahead of time whether a variant needs additional optimizing, which means that ir3_key_lowers_nir() needs to go away. The new approach should have the same effect, since it skips running lowerings that are unnecessary and then skips the opt loop if no optimizations made progress, but it will work better when we move ir3_nir_analyze_ubo_ranges() to be after variant creation. The one maybe controversial thing I did is to make nir_opt_algebraic_late() always happen during variant lowering. I wanted to avoid code duplication, and it seems to me that we should push the _late variants as far back as possible so that later opt_algebraic runs don't miss out on optimization opportunities. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>	2020-06-19 13:16:57 +00:00
Eric Anholt	486b894307	freedreno/ir3: Account for driver params in UBO max const upload. The const state setup needs to be able to push its driver params, so account for them in the analyze_ubo_ranges. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>	2020-06-05 13:36:29 -07:00
Eric Anholt	112c65825f	freedreno/a6xx: Use LDC for UBO loads. It saves addressing math, but may cause multiple loads to be done and bcseled due to NIR not giving us good address alignment information currently. I don't have any workloads I know of using non-const-uploaded UBOs, so I don't have perf numbers for it This makes us match the GLES blob's behavior, and turnip (other than being bindful). Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4858>	2020-05-14 00:10:43 +00:00
Kristian H. Kristensen	dd8d257a30	freedreno/ir3: Lower GS builtins before lowering IO We mostly got away with replacing a store_output with a store_var, but for complex types like structs, that doesn't work. Once the IO has been lowered from vars to intrinsic, we've lost the deref chains and can't properly shadow the outputs. This commits moves the GS lowering up so we do it before the output variables get lowered to store_output. This way the pass works much like nir_lower_io_to_temporaries() and cleanly shadows the outputs. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>	2020-05-01 16:26:31 +00:00
Kristian H. Kristensen	79355fd901	freedreno/ir3: Add ir3_nir_lower_to_explicit_input() pass This pass lowers per-vertex input intrinsics to load_shared_ir3. This was open coded in the TCS and GS lowering passes before - this way we can share it. Furthermore, we'll need to run the rest of the GS lowering earlier (before lowering IO) so we need to split off this part that operates on the IO intrinsics first. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>	2020-05-01 16:26:31 +00:00
Kristian H. Kristensen	b7bfccf085	freedreno/ir3: Rename ir3_nir_lower_to_explicit_io We rename it to ir3_nir_lower_to_explicit_output, since it only handles output and we'll add a lowering pass for input next. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>	2020-05-01 16:26:31 +00:00
Connor Abbott	274f3815a5	ir3: Plumb through bindless support Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4358>	2020-04-09 15:56:55 +00:00
Brian Ho	012773be26	turnip: Configure VPC for geometry shaders This commit updates tu6_emit_vpc to selectively emit GS-specifc configuration. Most of this is repurposed from fd6_program.c. This also refactors `link_geometry_stages` to ir3_nir_lower_tess.c so it can be shared between fd and tu. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4436>	2020-04-07 14:13:20 +00:00
Kristian H. Kristensen	56ed835bff	freedreno/ir3: Extend geometry lowering pass to handle tessellation VS and TCS pass varyings the same way as VS and GS does. TCS then writes entire patch to a system memory BO and TES eventually reads back from the BO once the TE starts generating vertices. TES outputs vertices the same way as VS and GS, except when there's a GS as well, in which case TES passes varyings to GS same way the VS would. In addition, the TCS needs a little bit of control flow massaging so that it only runs for valid invocations needs a couple of unknown instructions to synchronize with the TE. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-11-07 16:36:59 -08:00
Eduardo Lima Mitev	2a0d45ae6c	freedreno/ir3: Add a NIR pass to select tex instructions eligible for pre-fetch The pass should run once at the end of shader compilation, for a4xx onwards. It iterates texture sampling instructions and mark those eligibile for pre-dispatch by changing the tex op from 'tex' to 'tex_prefetch'. An instruction is eligibile if: * The coordinate is a vector where all its components come from a shader input. * The order of the components match exactly that of the input (no swizzles). * The instruction is in the 'main' function, and in the outer most-block. The first two restrictions were arrived to empirically, so more testing could tighten or loosen it. The 3rd restriction is there to allow moving the instructions eligible for pre-dispatch to the beginning of the shader, so that we don't block the registers holding the result for too long. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-10-18 21:11:54 +00:00
Kristian H. Kristensen	8e16fb1528	freedreno/ir3: Implement lowering passes for VS and GS This introduces two new lowering passes. One to lower VS to explicit outputs using STLW and one to lower GS to load input using LDLW and implement the GS specific functionality. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-10-17 13:43:53 -07:00
Eric Anholt	01d0bad9ef	freedreno: Remove silly return from ir3_optimize_nir(). We only ever return the shader we were passed in (but internally modified). Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-21 17:14:43 -07:00
Eduardo Lima Mitev	340277ad71	ir3/nir: Add new NIR AlgebraicPass for lowering imul Currently, ir3 backend compiler is lowering integer multiplication from: dst = a * b to: dst = (al * bl) + (ah * bl << 16) + (al * bh << 16) by emitting this code: mull.u tmp0, a, b ; mul low, i.e. al * bl madsh.m16 tmp1, a, b, tmp0 ; mul-add shift high mix, i.e. ah * bl << 16 madsh.m16 dst, b, a, tmp1 ; i.e. al * bh << 16 which at that point has very low chances of being optimized. This patch adds a new nir_algebraic.AlgebraicPass to performs this lowering during NIR algebraic optimization passes, giving it a better chance for optimizing the resulting code. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 08:45:05 +02:00
Rob Clark	b15c46e6bf	freedreno/ir3: move const_state to ir3_shader For a6xx, we construct/emit a single VS const state used for both binning pass and draw pass. So far we were mostly getting lucky that there were not (obvious) mismatches between the const_state (like different lowered immediates) between the binning and draw pass VS ir3_shader_variant. And I guess this situation will come up more as GS and tess is added into the equation. Since really everything about the const state is not specific to the variant, move this. The main exception is lowered immediates, but these are the last to appear in the layout, and it doesn't hurt for each new shader variant to just append any immed's it lowers to the end of the immediate state. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-07 07:26:00 -07:00
Rob Clark	5690f83bb5	freedreno/ir3: split out const_state setup Next patch moves const_state to ir3_shader, before the compile context is created. So move the code around in prep to call it earlier. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-07 07:26:00 -07:00
Rob Clark	23e7a34466	freedreno/ir3: consolidate const state Combine the offsets of differenet parts of the constant space with (what was formerly known as) ir3_driver_const_layout. Bunch of churn, but no functional change. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-07 07:26:00 -07:00
Rob Clark	2f0b9d2249	freedreno/ir3: lower load_barycentric_at_offset Calculates i,j at specified offset within a pixel. A new load_size_ir3 intrinsic is used in conjunction with fddx/fddy to translate the offset into primitive space and adjust the i,j from load_barycentric_pixel accordingly. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	c4f423aa36	freedreno/ir3: lower load_barycentric_at_sample This lowers load_barycentric_at_sample to load_sample_pos_from_id plus load_barycentric_at_offset. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	fc865de777	freedreno/ir3: add pass to move varying loads Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-30 12:56:01 -04:00
Kristian H. Kristensen	3c8779af32	freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS This commit turns on the gallium cap and adds a pass to lower the load_ubo intrinsics for block 0 back to load_uniform intrinsics and adjust the backend where the cap switches units from vec4s to dwords. As we stop using ir3_glsl_type_size() for uniform layout, this also corrects an issue where we would allocate a vec4 slot for samplers in uniforms, fixing: dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_fragment dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_vertex dEQP-GLES3.functional.shaders.struct.uniform.sampler_nested_fragment dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-27 13:26:02 -07:00
Eduardo Lima Mitev	9dd0cfafc9	ir3/nir: Add a new pass 'ir3_nir_lower_io_offsets' This NIR->NIR pass implements offset computations that are currently done on the IR3 backend compiler, to give NIR a better chance of optimizing them. For now, it supports lowering the dword-offset computation for SSBO instructions. It will take an SSBO intrinsic and replace it with the new ir3-specific version that adds an extra source. That source will hold the SSA value resulting from inserting a division by 4 (an SHR op) of the original byte-offset source already provided by NIR in one of the intrinsic sources. Note that on a6xx the original byte-offset is not needed, so we could potentially replace that source instead of adding a new one. But to keep things simple and consistent we always add the new source and a6xx will just ignore the original one. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-13 21:19:44 +01:00
Rob Clark	aa0fed10d3	freedreno: move ir3 to common location Move (most of) the ir3 compiler to src/freedreno/ir3 so that it can be re-used by some future vulkan driver. The parts that are gallium specific have been refactored out and remain in the gallium driver. Getting the move done now so that it can happen before further refactoring to support a6xx specific instructions. NOTE also removes ir3_cmdline compiler tool from autotools build since that was easier than fixing it and I normally use meson build. Waiting patiently for the day that we can remove everything from the autotools build. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00

31 Commits