KonstantinSeurer/mesa

Commit Graph

Author	SHA1	Message	Date
Jason Ekstrand	9750164c09	nir: Rename get_buffer_size to get_ssbo_size This makes it explicit that this intrinsic is only for SSBOs. For the v3dv driver, we'll be adding a get_ubo_size intrinsic and we want to be able to distinguish between the two. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6812>	2020-09-22 13:34:12 +00:00
Gert Wollny	7ab804dbb4	freedreno/ir3: set lower_uniforms_to_ubo compiler flag Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6316>	2020-09-16 10:07:42 +00:00
Jonathan Marek	52534c3a86	freedreno/ir3: add view_zero to shader key Does the same thing as layer_zero, but for VARYING_SLOT_VIEWPORT. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5832>	2020-09-15 16:18:45 +00:00
Marek Olšák	ac55b1a9a6	nir: get ffma support from NIR options for nir_lower_flrp This also fixes the inverted last parameter of nir_lower_flrp in most drivers. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6599>	2020-09-04 17:06:22 +00:00
Eric Anholt	2b25240993	freedreno/ir3: Replace our custom vec4 UBO intrinsic with the shared lowering. This gets us fewer comparisons in the shaders that we need to optimize back out, and reduces backend code. total instructions in shared programs: 11547270 -> 7219930 (-37.48%) total full in shared programs: 334268 -> 319602 (-4.39%) Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6378>	2020-08-24 09:53:36 -07:00
Rob Clark	4f060549be	freedreno/ir3: lower local_index using local_id Somehow this works ok with the full compiler stack, but not in ir3_cmdline. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6189>	2020-08-05 21:00:44 +00:00
Boris Brezillon	9e23925991	freedreno: Initialize lower_int64_options to a proper value We're trying to get rid of the options argument passed to nir_lower_int64() and use the nir_options.lower_int64_options instead. But before we can do that we must patch nir_lower_int64() callers that don't have this field properly set. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Rob Clark <robdclark@chromium.org> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5588>	2020-07-30 16:54:24 +00:00
Jason Ekstrand	2956d53400	nir: Add nir_foreach_shader_in/out_variable helpers Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>	2020-07-29 17:38:57 +00:00
Connor Abbott	b559d26c74	freedreno/ir3: Fix SSBO size for bindless SSBO's We theoretically could push these sizes to the const file opportunistically, which appears to be what the blob does, but the maximum number of SSBO's is way too big to do that unconditionally. Just use resinfo to get the size for now. Fixes on turnip: dEQP-VK.ssbo.unsized_array_length.* Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6012>	2020-07-21 19:53:32 +00:00
Icecream95	314ba5e174	nir: Add a face_sysval argument to nir_lower_two_sided_color This is needed for handling drivers that use an input for loading the face, for example Panfrost with Midgard GPUs. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Rob Clark <robdclark@chromium.org> Tested-by: Urja Rannikko <urjaman@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5915>	2020-07-17 14:50:26 +00:00
Jonathan Marek	ffb6eb6d5d	freedreno/ir3: run nir_opt_loop_unroll in optimization loop GL driver was relying on this being done by gallium, but there might be new loops to unroll during optimizations and turnip needs it. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5818>	2020-07-09 23:30:33 +00:00
Jonathan Marek	f472c98443	freedreno/ir3: add support for a650 tess shared storage A650 uses LDL/STL, and the "local_primitive_id" in tess ctrl shader comes from bits 16-21 in the header instead of 0-5. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5764>	2020-07-08 02:30:23 +00:00
Connor Abbott	4f91345f49	ir3: Add layer_zero variant bit Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5732>	2020-07-07 08:10:47 +00:00
Ilia Mirkin	836d41d772	ir3: use empirical size for params as used by the shader For example only some UCPs may be used by the shader, triggering asserts that too many consts are being uploaded. While we're at it, also fix the const size when loading UCPs, since otherwise it doesn't correspond to what the shader is actually using. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5752>	2020-07-06 23:57:51 +00:00
Jason Ekstrand	36a9046848	freedreno: Only call nir_lower_io on shader_in/out Gallium drivers should never see nir_var_uniform because gallium lowers regular uniforms to a UBO. No GL driver should ever see either nir_var_mem_shared because that's lowered in GLSL IR. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>	2020-07-06 19:54:30 +00:00
Rob Clark	d3ae559378	freedreno/ir3: add ir3_finalize_nir() The next step is to hook this into pscreen->finalize_nir() so it can come before the state tracker's shader-caching. Unfortunately we still need to do lower_io after mesa/st, so that is split out into a post-finalize pass. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5372>	2020-06-26 08:43:22 -07:00
Connor Abbott	9edff0cfd4	ir3: Support variants with different constlen's This provides the mechanism for compiling variants with a reduced constlen. The next patch provides the policy for choosing which to reduce. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5607>	2020-06-26 09:34:33 +00:00
Jonathan Marek	16a9e233da	freedreno/ir3: add support for load_draw_id This is part of adding VK_KHR_shader_draw_parameters for turnip. IR3_DP_VTXID_BASE/IR3_DP_VTXCNT_MAX offsets are changed to match what CP_DRAW_INDIRECT_MULTI requires. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5635>	2020-06-25 15:57:45 +00:00
Rob Clark	3065c4bf92	freedreno/ir3: switch PIPE_CAP_TGSI_TEXCOORD We don't really need the varying remapping, and it seems to somehow happen twice when shader-cache comes into the picture. But we can just choose not to have this problem. Now that everything is using the ir3_point_sprite() helper, we can flip this pipe cap without it being a massive flag-day. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5595>	2020-06-24 22:29:28 +00:00
Rob Clark	82815bc980	freedreno/ir3: split ubo analysis/lowering passes Since binning pass variants share the same const_state with their draw-pass counterpart, we should re-use the draw-pass variant's ubo range analysis. So split the two functions of the existing pass into two parts. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5526>	2020-06-21 00:52:02 +00:00
Rob Clark	8f11cc4cad	freedreno/ir3: move output_loc to variant This moves the last bit of important state to be serialized from ir3_shader to ir3_shader_variant. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>	2020-06-19 13:16:57 +00:00
Rob Clark	640ff0e847	freedreno/ir3: move const_state back to variant For shader-cache, we want to not have anything important in `ir3_shader`. And to have shader variants with lower const size limits (to properly handle cross-stage limits), we also want variants to be able to have their own const_state. But we still need binning pass shaders to align with their draw pass counterpart so that the same const emit can be used for both passes. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>	2020-06-19 13:16:57 +00:00
Rob Clark	00926954c3	freedreno/ir3: un-embed const_state Make it an rzalloc'd ptr instead of embedded struct, so it can serve as the mem ctx for immediates. This gets rid of needing to explicitly free the immediates, so one less thing to deal with when moving const_state. (Also, after we move const_state to the shader variant, we won't need one for binning pass variants) Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>	2020-06-19 13:16:57 +00:00
Connor Abbott	65660622a1	ir3: Split out variant-specific lowering and optimizations It seems a lot of the lowerings being run the second time were unnecessary. In addition, when const_state is moved to the variant, then it will become impossible to know ahead of time whether a variant needs additional optimizing, which means that ir3_key_lowers_nir() needs to go away. The new approach should have the same effect, since it skips running lowerings that are unnecessary and then skips the opt loop if no optimizations made progress, but it will work better when we move ir3_nir_analyze_ubo_ranges() to be after variant creation. The one maybe controversial thing I did is to make nir_opt_algebraic_late() always happen during variant lowering. I wanted to avoid code duplication, and it seems to me that we should push the _late variants as far back as possible so that later opt_algebraic runs don't miss out on optimization opportunities. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>	2020-06-19 13:16:57 +00:00
Rob Clark	91ed8b7fe3	freedreno/ir3: drop shader->num_ubos The only difference between this and `const_state->num_ubos` was that the latter is counting # of ubos loaded via `ldg` (based on UBO addrs in push-consts). But turns out there isn't really any reason to care. Instead just add an early return in the one code-path that cares about the number of `ldg` UBOs. This gets rid of one more thing we need to move from `ir3_shader` to `ir3_shader_variant`. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>	2020-06-19 13:16:57 +00:00
Rob Clark	70fbd48b3a	freedreno/ir3: move ubo_state into const_state As with const_state, this will also need to move into the variant. To simplify that, just move it into the const_state itself, since after all it is related. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>	2020-06-19 13:16:57 +00:00
Eric Anholt	486b894307	freedreno/ir3: Account for driver params in UBO max const upload. The const state setup needs to be able to push its driver params, so account for them in the analyze_ubo_ranges. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>	2020-06-05 13:36:29 -07:00
Eric Anholt	9e58ab09ff	freedreno/ir3: Drop unnecessary alignment of pushed UBO size. The analysis pass gives us vec4-aligned size, and all of our other constbuf allocations here are in vec4 units, so we can just divide by 16. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>	2020-06-05 13:36:29 -07:00
Timothy Arceri	04dbf709ed	nir: add callback to nir_remove_dead_variables() This allows us to do API specific checks before removing variable without filling nir_remove_dead_variables() with API specific code. In the following patches we will use this to support the removal of dead uniforms in GLSL. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4797>	2020-06-03 02:22:23 +00:00
Rob Clark	cf21b76383	freedreno/ir3: use lower_wrmasks pass Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2020-05-13 20:24:53 -07:00
Eric Anholt	112c65825f	freedreno/a6xx: Use LDC for UBO loads. It saves addressing math, but may cause multiple loads to be done and bcseled due to NIR not giving us good address alignment information currently. I don't have any workloads I know of using non-const-uploaded UBOs, so I don't have perf numbers for it This makes us match the GLES blob's behavior, and turnip (other than being bindful). Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4858>	2020-05-14 00:10:43 +00:00
Eric Anholt	ab93a631b4	freedreno: Trim num_ubos to just the ones we haven't lowered to constbuf. With the upcoming LDC usage in the GL driver, we don't want to be uploading descriptors for every UBO when they aren't actually in use. Trimming NIR's num_ubos will avoid that, and cleans up num_ubo handling elsewhere right now. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4858>	2020-05-14 00:10:43 +00:00
Eric Anholt	d5176c453e	freedreno/ir3: Move i/o offset lowering after analyze_ubo_ranges. I found that when moving more UBOs to load_ubo_ir3, analyze_ubo_ranges would move things back in a broken way. We can just run this pass later and drop the _ir3 path. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4858>	2020-05-14 00:10:43 +00:00
Kristian H. Kristensen	dd8d257a30	freedreno/ir3: Lower GS builtins before lowering IO We mostly got away with replacing a store_output with a store_var, but for complex types like structs, that doesn't work. Once the IO has been lowered from vars to intrinsic, we've lost the deref chains and can't properly shadow the outputs. This commits moves the GS lowering up so we do it before the output variables get lowered to store_output. This way the pass works much like nir_lower_io_to_temporaries() and cleanly shadows the outputs. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>	2020-05-01 16:26:31 +00:00
Kristian H. Kristensen	79355fd901	freedreno/ir3: Add ir3_nir_lower_to_explicit_input() pass This pass lowers per-vertex input intrinsics to load_shared_ir3. This was open coded in the TCS and GS lowering passes before - this way we can share it. Furthermore, we'll need to run the rest of the GS lowering earlier (before lowering IO) so we need to split off this part that operates on the IO intrinsics first. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>	2020-05-01 16:26:31 +00:00
Kristian H. Kristensen	b7bfccf085	freedreno/ir3: Rename ir3_nir_lower_to_explicit_io We rename it to ir3_nir_lower_to_explicit_output, since it only handles output and we'll add a lowering pass for input next. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>	2020-05-01 16:26:31 +00:00
Jonathan Marek	065068c66a	freedreno/ir3: run nir_lower_pack This lowers pack_32_2x16/unpack_32_2x16 into the scalar versions of those instructions. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4738>	2020-04-27 18:40:03 +00:00
Jonathan Marek	42093bb694	nir: add pack_32_2x16_split/unpack_32_2x16_split lowering The new option replaces the two other _split lowering options, since there's no need for separate options. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Rob Clark <robdclark@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4738>	2020-04-27 18:40:03 +00:00
Rob Clark	a0de0db0e4	freedreno/ir3: small cleanup and comments Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4272>	2020-03-27 22:41:36 +00:00
Hyunjun Ko	1ee2ad584c	freedreno/ir3: enable nir_opt_loop_unroll on a6xx If precision lowering happens at GLSL IR, loop_analysis at IR doesn't work as expected since it can't handle things like: "(expression bool < (expression float16_t f2fmp (var_ref ndx) ) (constant float16_t (1.000000)) )" So we'd rather do this optimization at the NIR stage. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>	2020-03-24 23:21:21 +00:00
Eric Anholt	e4baff9081	freedreno: Switch to using lowered image intrinsics. This cuts out a bunch of deref chain walking that the compiler can do for us. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728>	2020-02-24 18:25:02 +00:00
Jonathan Marek	a3a70588c0	freedreno/ir3: support load_base_instance Not supported by hardware, uses same mechanism as base vertex. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3162>	2019-12-19 15:13:40 -05:00
Jonathan Marek	c7c5a84cf3	freedreno/ir3: lower pack/unpack ops Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3106> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3106>	2019-12-16 19:20:07 -05:00
Jonathan Marek	b936143327	freedreno/ir3: lower mul_2x32_64 lower_mul_2x32_64 generates mul_high opcodes, and lower_mul_high is done by nir_lower_alu, so call nir_lower_alu after nir_opt_algebraic. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-12-16 13:37:09 -05:00
Eric Anholt	f58ef5d481	turnip: Lower usub_borrow. Fixes dEQP-VK.glsl.builtin.function.integer.usubborrow.uvec2_mediump_fragment. Reviewed-by: Jonathan Marek <jonathan@marek.ca> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2986> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2986>	2019-12-16 04:52:09 +00:00
Neil Roberts	37f5395783	freedreno/ir3: Enabling lowering 16-bit flrp Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-11-20 14:09:43 +01:00
Kristian H. Kristensen	7272e8a709	freedreno/ir3: Allocate const space for tessellation parameters The tessellation stages need size and stride or the patch layout as well as locations of attributes in the patch. The tesselation stages also use two system memory BOs and need the iovas of those. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-11-07 16:40:27 -08:00
Kristian H. Kristensen	56ed835bff	freedreno/ir3: Extend geometry lowering pass to handle tessellation VS and TCS pass varyings the same way as VS and GS does. TCS then writes entire patch to a system memory BO and TES eventually reads back from the BO once the TE starts generating vertices. TES outputs vertices the same way as VS and GS, except when there's a GS as well, in which case TES passes varyings to GS same way the VS would. In addition, the TCS needs a little bit of control flow massaging so that it only runs for valid invocations needs a couple of unknown instructions to synchronize with the TE. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-11-07 16:36:59 -08:00
Kristian H. Kristensen	8621fbc37b	freedreno/ir3: Add tessellation field to shader key Whether we're tessellating and which primitives the TES outputs affects the entire pipeline so let's add a field to the key to track that. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-11-07 16:36:56 -08:00
Rhys Perry	8b98d0954e	nir/lower_idiv: add new llvm-based path v2: make variable names snake_case v2: minor cleanups in emit_udiv() v2: fix Panfrost build failure v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature v4: remove nir_op_urcp v5: drop nv50 path v5: rebase v6: add back nv50 path v6: add comment for nir_lower_idiv_path enum v7: rename _nv50/_llvm to _fast/_precise v8: fix etnaviv build failure Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-10-21 18:49:46 +00:00
Rob Clark	5e08f070f0	nir: add nir_lower_amul pass Lower amul to either imul or imul24, depending on whether 24b is enough bits to calculate an offset within the thing being dereferenced. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-10-18 15:08:54 -07:00
Kristian H. Kristensen	0293d14719	freedreno/ir3: Implement primitive layout intrinsics This implements the load_vs_primitive_stride_ir3, load_vs_vertex_stride_ir3 and load_primitive_location_ir3 intrinsics, used for getting the primitive layout strides and locations. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-10-17 13:43:53 -07:00
Kristian H. Kristensen	8e16fb1528	freedreno/ir3: Implement lowering passes for VS and GS This introduces two new lowering passes. One to lower VS to explicit outputs using STLW and one to lower GS to load input using LDLW and implement the GS specific functionality. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-10-17 13:43:53 -07:00
Erik Faye-Lund	71c0dcf266	nir: support feeding state to nir_lower_clip_[vg]s Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-10-17 10:41:36 +02:00
Erik Faye-Lund	eb3047c094	nir: support lowering clipdist to arrays This allows us to make sure clipdist is emitted as a scalar array rather than two vec4s. This matches SPIR-V semantics, and will be useful for Zink. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-10-17 10:41:36 +02:00
Marek Olšák	cebc38ff60	nir: add nir_shader_compiler_options::lower_to_scalar This will replace PIPE_SHADER_CAP_SCALAR_ISA. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-10-10 15:49:18 -04:00
Daniel Schürmann	10e508c815	freedreno: Enable the nir_opt_algebraic_late() pass. Reviewed-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-09-30 09:44:10 +00:00
Kristian H. Kristensen	cc4fe81145	freedreno/a6xx: Turn on vectorize_io We want this for tessellation eventually, but we can turn it on now. Shader-db results: total instructions in shared programs: 8612905 -> 8611387 (-0.02%) instructions in affected programs: 164952 -> 163434 (-0.92%) total dwords in shared programs: 11952000 -> 11950560 (-0.01%) dwords in affected programs: 68096 -> 66656 (-2.11%) total full in shared programs: 315019 -> 315009 (<.01%) full in affected programs: 1642 -> 1632 (-0.61%) total constlen in shared programs: 2463654 -> 2463654 (0.00%) constlen in affected programs: 0 -> 0 total (ss) in shared programs: 152379 -> 152409 (0.02%) (ss) in affected programs: 1503 -> 1533 (2.00%) total (sy) in shared programs: 96473 -> 96525 (0.05%) (sy) in affected programs: 654 -> 706 (7.95%) total max_sun in shared programs: 1172454 -> 1172472 (<.01%) max_sun in affected programs: 104 -> 122 (17.31%) Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-09-18 16:59:10 -07:00
Vasily Khoruzhick	9367d2ca37	nir: allow specifying filter callback in lower_alu_to_scalar Set of opcodes doesn't have enough flexibility in certain cases. E.g. Utgard PP has vector conditional select operation, but condition is always scalar. Lowering all the vector selects to scalar increases instruction number, so we need a way to filter only those ops that can't be handled in hardware. Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-09-06 01:51:28 +00:00
Jason Ekstrand	951cf94521	nir: Add explicit signs to image min/max intrinsics This better matches all the other atomic intrinsics such as those for SSBOs and shared variables where the sign is part of the intrinsic opcode. Both generators (GLSL and SPIR-V) know the sign from the type of the image variable or handle. In SPIR-V, signed min/max are separate opcodes from unsigned. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-21 17:19:55 +00:00
Rob Clark	4a188e4215	freedreno/ir3: track # of driver params To avoid emitting unneeded const state. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:11:26 -07:00
Rhys Perry	da8ed68aca	nir: replace nir_move_load_const() with nir_opt_sink() This is mostly the same as nir_move_load_const() but can also move undef instructions, comparisons and some intrinsics (being careful with loops). v2: actually delete nir_move_load_const.c v3: fix nir_opt_sink() usage in freedreno v3: update Makefile.sources v4: replace get_move_def with nir_can_move_instr and nir_instr_ssa_def v4: handle if uses v4: fix handling of nested loops v5: re-write adjust_block_for_loops v5: re-write setting of use_block for if uses Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Co-authored-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-12 22:01:30 +00:00
Sagar Ghuge	456557a837	nir: Add lower_rotate flag and set to true in all drivers Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Eric Anholt	852704976a	freedreno: Stop treating UBO 0 specially in UBO uploading. ir3_nir_analyze_ubo_ranges() has already told us how much of cb0 we need to upload (all of it, since it will lower indirect UBO 0 accesses from load_ubo back to indirection on the constant buffer). Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-24 14:23:07 -07:00
Daniel Schürmann	165b7f3a44	nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec. That is: the five least significant bits provide the values of 'bits' and 'offset' which is the case for all hardware currently supported by NIR and using the bfm/bfe instructions. This patch also changes the lowering of bitfield_insert/extract using shifts to not use bfm and removes the flag 'lower_bfm'. Tested-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-24 18:42:20 +02:00
Eric Anholt	4449572c47	freedreno: Only upload UBO pointers for UBOs that haven't been lowered. total constlen in shared programs: 2485933 -> 2462236 (-0.95%) Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-21 17:14:43 -07:00
Eric Anholt	01d0bad9ef	freedreno: Remove silly return from ir3_optimize_nir(). We only ever return the shader we were passed in (but internally modified). Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-21 17:14:43 -07:00
Kenneth Graunke	c7d1b52a2c	nir: Combine lower_fmod16/32 back into a single lower_fmod. We originally had a single lower_fmod option. In commit `2ab2d2e5`, Sam split 32 and 64-bit lowering into separate flags, with the rationale that some drivers might want different options there. This left 16-bit unhandled, so Iago added a lower_fmod16 option in commit `ca31df6f`. Now that lower_fmod64 is gone (in favor of nir_lower_doubles and nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a single lower_fmod flag again. I'm not aware of any hardware which need lowering for one bitsize and not the other. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 16:45:12 -07:00
Kenneth Graunke	fa56a3795f	gallium: Drop lower_fmod64 from drivers that don't support doubles. Neither freedreno nor nv50 expose PIPE_CAP_DOUBLES, so there's no fmod64 to be lowered. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 16:45:12 -07:00
Jonathan Marek	d0bff89159	nir: allow specifying a set of opcodes in lower_alu_to_scalar This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalar-only. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-10 15:10:41 +00:00
Rob Clark	b15c46e6bf	freedreno/ir3: move const_state to ir3_shader For a6xx, we construct/emit a single VS const state used for both binning pass and draw pass. So far we were mostly getting lucky that there were not (obvious) mismatches between the const_state (like different lowered immediates) between the binning and draw pass VS ir3_shader_variant. And I guess this situation will come up more as GS and tess is added into the equation. Since really everything about the const state is not specific to the variant, move this. The main exception is lowered immediates, but these are the last to appear in the layout, and it doesn't hurt for each new shader variant to just append any immed's it lowers to the end of the immediate state. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-07 07:26:00 -07:00
Rob Clark	5690f83bb5	freedreno/ir3: split out const_state setup Next patch moves const_state to ir3_shader, before the compile context is created. So move the code around in prep to call it earlier. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-07 07:26:00 -07:00
Rob Clark	23e7a34466	freedreno/ir3: consolidate const state Combine the offsets of differenet parts of the constant space with (what was formerly known as) ir3_driver_const_layout. Bunch of churn, but no functional change. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-07 07:26:00 -07:00
Ian Romanick	d41cdef2a5	nir: Use the flrp lowering pass instead of nir_opt_algebraic I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1-c)+bc. On all other platforms 64-bit nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 -> 188647748 (<.01%) cycles in affected programs: 5096 -> 5090 (-0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Christian Gmeiner	4e110eca42	nir: nir_shader_compiler_options: drop native_integers Driver which do not support native integers should use a lowering pass to go from integers to floats. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-07 07:35:52 +02:00
Rob Clark	650246523b	freedreno/ir3: fb read support Lower load_output to txf_ms_fb and add support for the new texture fetch instruction. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	32925f4072	freedreno/ir3: fix shader variants vs UBO analysis Otherwise we zero out the state again, but all the UBO loads that we could lower are already lowered. End result is that we didn't emit the uniforms for lowered UBO access in any case where multiple shader variants are used. Fixes: `893425a607` freedreno/ir3: Push UBOs to constant file Fixes: `3c8779af32` freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 11:19:22 -07:00
Karol Herbst	fe8c57e859	freedreno/ir3: use nir_src_as_uint in a few places v2 (Jason Ekstrand): - Add even more places Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-14 22:25:56 +02:00
Jason Ekstrand	6279074de1	nir: Get rid of global registers We have a pass to lower global registers to locals and many drivers dutifully call it. However, no one ever creates a global register ever so it's all dead code. It's time we bury it. Acked-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-09 00:29:36 -05:00
Timothy Arceri	e30804c602	nir/radv: remove restrictions on opt_if_loop_last_continue() When I implemented opt_if_loop_last_continue() I had restricted this pass from moving other if-statements inside the branch opposite the continue. At the time it was causing a bunch of spilling in shader-db for i965. However Samuel Pitoiset noticed that making this pass more aggressive significantly improved the performance of Doom on RADV. Below are the statistics he gathered. 28717 shaders in 14931 tests Totals: SGPRS: 1267317 -> 1267549 (0.02 %) VGPRS: 896876 -> 895920 (-0.11 %) Spilled SGPRs: 24701 -> 26367 (6.74 %) Code Size: 48379452 -> 48507880 (0.27 %) bytes Max Waves: 241159 -> 241190 (0.01 %) Totals from affected shaders: SGPRS: 23584 -> 23816 (0.98 %) VGPRS: 25908 -> 24952 (-3.69 %) Spilled SGPRs: 503 -> 2169 (331.21 %) Code Size: 2471392 -> 2599820 (5.20 %) bytes Max Waves: 586 -> 617 (5.29 %) The codesize increases is related to Wolfenstein II it seems largely due to an increase in phis rather than the existing jumps. This gives +10% FPS with Doom on my Vega56. Rhys Perry also benchmarked Doom on his VEGA64: Before: 72.53 FPS After: 80.77 FPS v2: disable pass on non-AMD drivers Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1) Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-09 11:29:41 +10:00
Rob Clark	7ff6705b8d	freedreno/ir3: convert to "new style" frag inputs Add support for load_barycentric_pixel, load_interpolated_input, and friends. For now, this retains support for old-style inputs, which can probably be dropped with some ttn work. Prep work for sample-shading support. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-30 12:56:01 -04:00
Kristian H. Kristensen	3c8779af32	freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS This commit turns on the gallium cap and adds a pass to lower the load_ubo intrinsics for block 0 back to load_uniform intrinsics and adjust the backend where the cap switches units from vec4s to dwords. As we stop using ir3_glsl_type_size() for uniform layout, this also corrects an issue where we would allocate a vec4 slot for samplers in uniforms, fixing: dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_fragment dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_vertex dEQP-GLES3.functional.shaders.struct.uniform.sampler_nested_fragment dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-27 13:26:02 -07:00
Rob Clark	3d8349048b	freedreno/ir3: additional lowering For some things that show up when we expose higher glsl TODO check blob traces to see if we have instructions for some of this? I guess we don't but worth a check.. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-21 09:13:05 -04:00
Jason Ekstrand	08f804ec0c	anv,radv,turnip: Lower TG4 offsets with nir_lower_tex v2: turn on for turnip as well (Karol Herbst) Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-03-21 02:58:41 +00:00
Eduardo Lima Mitev	2e4525883f	ir3/compiler: Enable lower_io_offsets pass and handle new SSBO intrinsics These intrinsics have the offset in dwords already computed in the last source, so the change here is basically using that instead of emitting the ir3_SHR to divide the byte-offset by 4. The improvement in shader stats is significant, of up to ~15% in instruction count in some cases. Tested only on a5xx. shader-db is unfortunately not very useful here because shaders that use SSBO require GLSL versions that are not supported by freedreno yet. For examples, most Khronos CTS tests under 'dEQP-GLES31.functional.ssbo.*' are helped. A random case: dEQP-GLES31.functional.ssbo.layout.2_level_array.packed.row_major_mat3x2 with current master: ; CL prog 14/1: 1252 instructions, 0 half, 48 full ; 8 const, 8 constlen ; 61 (ss), 43 (sy) with the SSBO dword-offset moved to NIR: ; CL prog 14/1: 1053 instructions, 0 half, 45 full ; 7 const, 7 constlen ; 34 (ss), 73 (sy) The SHR previously emitted for every single SSBO instruction disappears in most cases, and the dword-offset ends up embedded in the STGB instruction as immediate in many cases as well. There are also a few of those tests that are currently failing on register allocation, that start to pass as a result of reducing the pressure. At least these, probably more: dEQP-GLES31.functional.ssbo.layout.random.unsized_arrays.24 dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.6 dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.17 dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays.14 dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.5 dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.7 No regressions observed with relevant CTS and piglit tests. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-13 21:19:44 +01:00
Rob Clark	ad25948261	freedreno/ir3: turn on [iu]mul_high Which also requires uadd_carry lowering Until recently this was lowered in glsl ir so it went unnoticed that we weren't lowering it. Fixes: `1d8994a63b` glsl: [u/i]mulExtended optimization for GLSL Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-08 18:44:57 -05:00
Rob Clark	db1fa21374	freedreno/a6xx: vertex_id is not _zero_based Fixes dEQP-GLES31.functional.draw_base_vertex.draw_elements_base_vertex.builtin_variable.vertex_id Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-26 13:19:44 -05:00
Eric Anholt	338d399fd0	freedreno: Use the NIR lowering for isign. I think this will save an instruction and hopefully not increase any other costs (possibly the immediate -1 and 1?), but I haven't actually tested. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-14 00:32:30 +00:00
Karol Herbst	9b24028426	nir: rename nir_var_function to nir_var_function_temp Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-19 20:01:41 +01:00
Karol Herbst	d0c6ef2793	nir: rename global/local to private/function memory the naming is a bit confusing no matter how you look at it. Within SPIR-V "global" memory is memory accessible from all threads. glsl "global" memory normally refers to shader thread private memory declared at global scope. As we already use "shared" for memory shared across all thrads of a work group the solution where everybody could be happy with is to rename "global" to "private" and use "global" later for memory usually stored within system accessible memory (be it VRAM or system RAM if keeping SVM in mind). glsl "local" memory is memory only accessible within a function, while SPIR-V "local" memory is memory accessible within the same workgroup. v2: rename local to function as well v3: rename vtn_variable_mode_local as well Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 18:51:46 +01:00
Ian Romanick	378f996771	nir/opt_peephole_select: Don't peephole_select expensive math instructions On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	09b7e1d8e4	nir/opt_peephole_select: Don't try to remove flow control around indirect loads That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Rob Clark	aa0fed10d3	freedreno: move ir3 to common location Move (most of) the ir3 compiler to src/freedreno/ir3 so that it can be re-used by some future vulkan driver. The parts that are gallium specific have been refactored out and remain in the gallium driver. Getting the move done now so that it can happen before further refactoring to support a6xx specific instructions. NOTE also removes ir3_cmdline compiler tool from autotools build since that was easier than fixing it and I normally use meson build. Waiting patiently for the day that we can remove everything from the autotools build. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00

1 2 3 4

193 Commits